1
|
|
2
|
|
3
|
- Feedback cancellation technology at RANE
- http://www.aes.org/sections/pnw/
|
4
|
- I wanted to have you experiment with quality tables but didn’t find
anything sufficiently quantifiable to give…
|
5
|
|
6
|
- Moving Picture Experts Group
- 1st me in 1988
|
7
|
- MPEG-1: Audio/Video compression.
- VCD, MP3
- Intended for intermediate data rates: 1.5Mb/s
- MPEG-2: Transport, A/V for broadcast quality
- Digital TV, and DVD
- Intended for high data rates > 10 Mb/s
|
8
|
- MPEG-3: Abandoned when MPEG-2 proved sufficient for HDTV
- MPEG-4: Intended for very low bitrate
- Adds objects, 3D scenegraph, DRM, and more efficient encoding
- Intended for very low rates < 64Kb/s
- But CODEC useful for very high quality too
|
9
|
- MPEG-7: Multimedia description
- How to identify multimedia content
- MPEG-21: Multimedia Framework
|
10
|
- Whitebook standard
- Standard created in 1993
- (DVD’s not released until 1997 in US)
- Hugely popular in Asia
- Cheap to manufacture
- Easy to copy
- Works in high humidity (horrible for VHS)
|
11
|
- Redbook (1980): CD Digital Audio
- Yellow Book (1984): CD-ROM
- Orange Book: CD-R, CD-RW
- White Book: VCD
- Blue Book: Enhanced Music CD, CD+G, CD-Plus
- Beige Book: Photo CD
- Green Book: CD-I
|
12
|
- Interestingly many new audio CD’s violate Redbook standards
- An attempt at copy prevention and violation of trademark
- Once familiar CDDA trademark disappearing:
|
13
|
- ISO-9660 Filesytems
- Bitrate: ≈1.13 Mbit/Second
- Over a megabit a second
- Sound like a lot?
- (I want you to be able to reason with sizes and rates)
|
14
|
- Ethernet: 10Mb/s
- CD (1x) Rate:
- CLV (constant linear velocity)
- Typical CDR
- So (within 1x CD-ROM rates):
700MB*8bits/byte*1/80minutes*1minute/60seconds=1.16Mb/second
|
15
|
- So how much video on a CD?
- Fully compliant CD’s are 650MB
- Longer recorded CDR’s can’t be read on all machines
- 650MB*8bits/Byte*1s/1.13Mb*1min/60sec
- Pretty impressive, what is the catch?
|
16
|
- Resolution (QCIF):
- NTSC: 352×240
- PAL: 352×288
- Frame Rate:
- NTSC: 29.97 Frames/Second
- PAL: 25 Frames/Second
- Audio MP2 (MPEG Layer 2)
|
17
|
- Approximates VHS Quality
- (VHS Quality is fairly poor)
- Many movies are longer than 77 minutes
- No Digital Rights Management (DRM)
- Very easy to pirate
- Studios reluctant to release in this format
- Most DVD players will play VCDs
|
18
|
- People claim an amazing 150:1
- But I don’t accept that value
|
19
|
- Assumes video is ≈ 20MB/s
- D1 video is over 30MB/s
- 640x480pixel * 30 Frames/s * 10 bits/RGB component
- More importantly assumes that MPEG-1is inexact (lossy) but not
noticeably different from the original
- This is very far from the case
- Maybe that much data reduction…
|
20
|
- Digital Component Video
- This is the digital representation of broadcast quality video
- 640x480 (square pixel)
- 30 Frames a second
- 10 bit R, G, B components
|
21
|
- JPEG begins showing artifacts at compression ratios above 10 to 20:1
- How does MPEG provide 7 times the compression?
- (it doesn’t need to)
|
22
|
- Can cheat with video
- It turns out that JPEG-type artifacts are much more difficult to see on
moving than on still images
- (Try pausing a DVD and suddenly notice the apparent quality of image
dramatically drop)
|
23
|
- This effectively means that JPEG compression may be significantly higher
- Perhaps 50:1 ratio before artifacts become noticeable
- Still, how does MPEG achieve a 2-3x improvement over MJPEG?
- (Motion JPEG encodes video as a stream of JPEG frames)
|
24
|
- So far we have explored image compression algorithms that make use of spatial
coherence
- Pixels tend to resemble their neighbors
- MPEG exploits temporal coherence
- Pixels from adjacent frames tend to be similar
- Or
- Most objects tend not to move in a scene
|
25
|
- Consider a “talking head” (newscaster)
- How much really changes in 1/30th second?
- Mouth moves
- Hand Waves
- Background remains the same
|
26
|
- Similar to lossless audio CODEC
- Record the difference of each pixel in present frame and the same pixel
location in previous frame
- Due to temporal coherence
- Differences will often be small
- Can be encoded in fewer bits
|
27
|
- Compare pixel in present frame to same pixel location in previous frame
- Only record the difference if delta exceeds some value
|
28
|
- technical events eliminate coherence:
- Scene cut, wipe, fade, etc.
- Zoom
- Pan
- These all ‘break’ coherence between every pixel
|
29
|
- Defined as anything unnatural
- Camera changes, Computer Graphics/Titles, Music
- Used to keep TV interesting
- These events catch our attention
- One every 10 seconds when Jerry Mander wrote Four Arguments for the
Elimination of Television in 1978 (he’s an advertising executive)
- Now the MTV inspired media includes one TE a second
|
30
|
- MPEG uses a more sophisticated delta encoding scheme
- MPEG divides frames into blocks then determines if a given block moves
between frames
|
31
|
- Frame encoded independently from any other frame
- Labeled I
- A Keyframe
- JPEG-like DCT lossy encoding
- Encodes to approximately 2 bits per pixel (24:1 compression ratio)
|
32
|
- Inter frames dependent on other (I or P) frames
- Two varieties
- Frame dependent on previous frame
- Frame dependent on previous and future frames
- Labeled B for bi-directional
|
33
|
- Inter frames encode much better than intra frames
- Why not only use inter frames?
- (only use frames based on other frames)
|
34
|
- If every frame is dependent on all that came before in
- Lossy compression
- Distortion would increase w/o bound
- Lossless compression
- Fragile, damage from dropped bits would accumulate/be uncorrectable
|
35
|
- Now that we know
- B, P Inter-frames encode better than Intra (I) frames
- Can’t only use Inter-frames
- Why do B frames encode better than P frames?
|
36
|
- From the previous example:
- Consider when a coffee mug is picked up (after a long while)
- The information for what is behind the mug isn’t in the previous
frames
- But is available in the future frames
- P frames are used to limit how far a B frame needs to ‘look’ for its
base
- How can the encoder predict the future?
|
37
|
- MPEG Encode/Decode is highly asymmetric
- MPEG encode was originally imagined as an expensive/slow operation
- However we now need to encode MPEG in realtime for many applications
- MPEG actually encodes frames in the near past while accessing frames in
the near future
- Exact frame latency based on the MPEG Data Stream
|
38
|
- No specification on order or quantity of I, B, or P frames
- Ex: Can be all I frames
- No inter-frame dependency
- Similar to MJPEG
- Typically a GOP might contain:
- I B B B P B B B P B B B P…
- (next GOP would begin with an I)
|
39
|
- Optimally encoding video as MPEG requires many decisions to be made
- One decision is on the mix of I, B, and P frames
- A carefully hand optimized MPEG stream will make sure that I frames
correspond to key-frames in the video
|
40
|
- Video playback can only ‘start’ with an I frame
- Mix can influence playback at different rates
- (if playback device isn’t fast enough to decode all frames faster than
realtime)
- Operations like reverse are complicated
|
41
|
- Makes editing rather complicated
- Consider implications for:
- Changing a frame
- Cutting a frame
- Adding frames of MPEG video
|
42
|
- Due to the presence of P and B frames
- Future frames need to be presented before the frames that rely on them:
- Image: Distronics
|
43
|
- Some ratio of B to P is decided on
- Say BBPBBPBBP
- Or BBBPBBBPBBBP
- Then I frames are sprinkled in every nth second
|
44
|
- This information is encoded into the data stream
- Is straightforward/inexpensive to decode
- Unfortunately it is a very difficult problem to encode
- Hence MPEG is a highly asymmetric CODEC
- (Far more computationally expensive to encode than decode)
|
45
|
- Frame divided into 16x16 pixel macroblocks comprised of:
- 4 blocks of Luminance
- 1 Block each of U and V
- Image: Distronics.co.uk
|
46
|
- Each of Macroblock’s 6 blocks are encoded as a 1x64 DCT
- Results filtered by QTable
|
47
|
- Pixel values are estimated by relocating a block of pixels from a
previous or future frame
- Motion is encoded using a 2D vector
- Motion vectors are:
- straightforward to decode
- Complicated/expensive to encode
|
48
|
|
49
|
- For each block B the previous frame is searched for a similar block
- Encoder records vector describing the change of position:
- (Cx-Bx, Cy-By) = (Δx, Δy)
- Called the motion vector
|
50
|
- Frame Segmentation
- Search Threshold
- Distortion Metric
- Block Search
|
51
|
- Frame divided into blocks
- Blocks are equally sized
- Blocks do not overlap
- Implication
- Larger boxes harder to match
- Smaller boxes create more motion vectors
|
52
|
- If search finds a block such as the difference (distortion metric)
between the blocks is ≤ threshold
- Then they considered matching
- Vector is output
|
53
|
- Computation used to compare blocks
- Conflicting goals
|
54
|
- Likely candidate
- Calculates average of the (absolute) pixel differences in a pair of
blocks
- For instance comparing pixel Bij in block B and corresponding
pixel in candidate block C:
|
55
|
- Per each block comparison
- b2 subtractions, abs value ops
- b2 additions
- 1 division
- Calculated (worst case) on each block comparison
- The worst often occurs (no matching block)
- For instance when a new object enters field of view
|
56
|
- Don’t search all blocks for comparison
- Only search neighboring blocks
- Progressive search criteria
- Use an inexpensive metric to reduce the pool of potential matching
candidates
- Then use better metric to make selection
|
57
|
- Distance dilute search
- We know faster moving object appear blurry
- (even if each frame is sharp)
- Consequently we can relax the matching threshold at greater distances
(representing faster movement)
|
58
|
- Locality Based Search
- Assumes that once a good match is found, even better matches may be in
same area
- Use inexpensive metric to find match
- Then use better metric, and perhaps denser blocks, to find optimal
match
|
59
|
- Quandrant Monotonic Search
- A refinement of Locality
- Use a quadtree like algorithm to find and successively refine areas of
match
- (begin with large blocks and subdivide based on quality of match)
|
60
|
- Dependent Algorithms
- Assumes movement is the result of object or camera motion
- So neighboring blocks should have similar motion vectors
- Begins search assuming these vectors
|
61
|
- Multidimensional Search Space
- When searching, look for matches that are:
- Rotations
- Zooms
- Changes of illumination
- (not just translations)
- VERY computationally expensive
|
62
|
- Perhaps up to ˝ of blocks may be converted to motion vectors
- Important to compress these well
- They are correlated, and non-uniform
- Neighboring blocks will likely have similar motion vectors
- IE they all represent the same object moving in the same way between
frames
- This leads to efficient coding
|