Notes
Slide Show
Outline
1
Graphics File Format and
Data Compression Techniques
(take 2)  Week 11
2
Administrivia
  • Attendance
3
AES Meeting
  • Feedback cancellation technology at RANE
  • http://www.aes.org/sections/pnw/
4
No Homework from lastweek
  • I wanted to have you experiment with quality tables but didn’t find anything sufficiently quantifiable to give…
5
Questions on JPEG?
  • MPEG builds on JPEG…
6
The wonderful world of MPEG
  • Moving Picture Experts Group
  • 1st me in 1988
7
MPEG-1 and -2 Standards
  • MPEG-1: Audio/Video compression.
    • VCD, MP3
    • Intended for intermediate data rates: 1.5Mb/s
  • MPEG-2: Transport, A/V for broadcast quality
    • Digital TV, and DVD
    • Intended for high data rates > 10 Mb/s
8
MPEG-3 and -4 standards
  • MPEG-3: Abandoned when MPEG-2 proved sufficient for HDTV
  • MPEG-4: Intended for very low bitrate
    • Adds objects, 3D scenegraph, DRM, and more efficient encoding
    • Intended for very low rates < 64Kb/s
    • But CODEC useful for very high quality too
9
Other MPEG standards
  • MPEG-7: Multimedia description
    • How to identify multimedia content
  • MPEG-21: Multimedia Framework
10
Video CD
  • Whitebook standard
    • Audio CD: Redbook
  • Standard created in 1993
    • (DVD’s not released until 1997 in US)
  • Hugely popular in Asia
    • Cheap to manufacture
    • Easy to copy
    • Works in high humidity (horrible for VHS)
11
Rainbow book standards
  • Redbook (1980): CD Digital Audio
  • Yellow Book (1984): CD-ROM
  • Orange Book: CD-R, CD-RW
  • White Book: VCD
  • Blue Book: Enhanced Music CD, CD+G, CD-Plus
  • Beige Book: Photo CD
  • Green Book: CD-I
12
Redbook violations
  • Interestingly many new audio CD’s violate Redbook standards
  • An attempt at copy prevention and violation of trademark
  • Once familiar CDDA trademark disappearing:
13
VCD Specifications
  • ISO-9660 Filesytems
  • Bitrate: ≈1.13 Mbit/Second


  • Over a megabit a second
    • Sound like a lot?
    • (I want you to be able to reason with sizes and rates)
14
VCD’s 1.13 Mb/s
  • Ethernet: 10Mb/s
  • CD (1x) Rate:
    • CLV (constant linear velocity)
    • Typical CDR
      • 700MB, 80minute
    • So (within 1x CD-ROM rates): 700MB*8bits/byte*1/80minutes*1minute/60seconds=1.16Mb/second
15
 VCD’s 1.13 Mb/s cont.
  • So how much video on a CD?


    • Fully compliant CD’s are 650MB
      • Longer recorded CDR’s can’t be read on all machines
    • 650MB*8bits/Byte*1s/1.13Mb*1min/60sec
      • ≈ 77 Minutes of Video

  • Pretty impressive, what is the catch?
16
VCD Specification
  • Resolution (QCIF):
    • NTSC: 352×240
    • PAL:    352×288
  • Frame Rate:
    • NTSC: 29.97 Frames/Second
    • PAL:    25 Frames/Second
  • Audio MP2 (MPEG Layer 2)
    • 224kb/s
17
VCD Perception
  • Approximates VHS Quality
    • (VHS Quality is fairly poor)
  • Many movies are longer than 77 minutes
  • No Digital Rights Management (DRM)
    • Very easy to pirate
    • Studios reluctant to release in this format
  • Most DVD players will play VCDs
18
MPEG-1 How much compression?
  • People claim an amazing 150:1
  • But I don’t accept that value
19
150:1 Compression Ratio
  • Assumes video is ≈ 20MB/s
    • D1 video is over 30MB/s
    • 640x480pixel * 30 Frames/s * 10 bits/RGB component
  • More importantly assumes that MPEG-1is inexact (lossy) but not noticeably different from the original
    • This is very far from the case
  • Maybe that much data reduction…
20
D1 NTSC Video
  • Digital Component Video
    • From my SGI days
  • This is the digital representation of broadcast quality video
    • 640x480 (square pixel)
    • 30 Frames a second
      • 29.97 if dropframe
    • 10 bit R, G, B components
      • Or YUV, YCbCr, etc.
21
JPEG compression comparison
  • JPEG begins showing artifacts at compression ratios above 10 to 20:1


  • How does MPEG provide 7 times the compression?
  • (it doesn’t need to)
22
Motion Artifacts
  • Can cheat with video
    • It turns out that JPEG-type artifacts are much more difficult to see on moving than on still images
    • (Try pausing a DVD and suddenly notice the apparent quality of image dramatically drop)
23
MJPEG Compression Ratios
  • This effectively means that JPEG compression may be significantly higher
  • Perhaps 50:1 ratio before artifacts become noticeable


  • Still, how does MPEG achieve a 2-3x improvement over MJPEG?
    • (Motion JPEG encodes video as a stream of JPEG frames)
24
Temporal Coherence
  • So far we have explored image compression algorithms that make use of spatial coherence
    • Pixels tend to resemble their neighbors
  • MPEG exploits temporal coherence
    • Pixels from adjacent frames tend to be similar
  • Or
    • Most objects tend not to move in a scene
25
Temporal Coherence
  • Consider a “talking head” (newscaster)
  • How much really changes in 1/30th second?
    • Mouth moves
    • Hand Waves
    • Background remains the same
26
Lossless Delta Coding
  • Similar to lossless audio CODEC
  • Record the difference of each pixel in present frame and the same pixel location in previous frame
  • Due to temporal coherence
    • Differences will often be small
    • Can be encoded in fewer bits
27
Lossy Delta Coding
  • Compare pixel in present frame to same pixel location in previous frame
  • Only record the difference if delta exceeds some value
28
Problems with Delta Coding Video
  • technical events eliminate coherence:
    • Scene cut, wipe, fade, etc.
    • Zoom
    • Pan
  • These all ‘break’ coherence between every pixel
29
Technical Events
  • Defined as anything unnatural
    • Camera changes, Computer Graphics/Titles, Music
  • Used to keep TV interesting
  • These events catch our attention
  • One every 10 seconds when Jerry Mander wrote Four Arguments for the Elimination of Television in 1978 (he’s an advertising executive)
  • Now the MTV inspired media includes one TE a second
30
Motion Estimation (compensation)
  • MPEG uses a more sophisticated delta encoding scheme
  • MPEG divides frames into blocks then determines if a given block moves between frames
31
MPEG Intra frame
  • Frame encoded independently from any other frame
    • Labeled I
    • A Keyframe
    • JPEG-like DCT lossy encoding
    • Encodes to approximately 2 bits per pixel (24:1 compression ratio)
32
MPEG Inter frame
  • Inter frames dependent on other (I or P) frames
  • Two varieties
    • Frame dependent on previous frame
      • Labeled P for predictive
    • Frame dependent on previous and future frames
      • Labeled B for bi-directional
33
Inter vs Intra
  • Inter frames encode much better than intra frames


  • Why not only use inter frames?
    • (only use frames based on other frames)
34
Errors Accumulate
  • If every frame is dependent on all that came before in
    • Lossy compression
      • Distortion would increase w/o bound
    • Lossless compression
      • Fragile, damage from dropped bits would accumulate/be uncorrectable
35
B versus P Inter-frames
  • Now that we know
    • B, P Inter-frames encode better than Intra (I) frames
    • Can’t only use Inter-frames

  • Why do B frames encode better than P frames?
36
B Versus P
  • From the previous example:
    • Consider when a coffee mug is picked up (after a long while)
      • The information for what is behind the mug isn’t in the previous frames
      • But is available in the future frames


    • P frames are used to limit how far a B frame needs to ‘look’ for its base


  • How can the encoder predict the future?
37
Encoder works in the near past
  • MPEG Encode/Decode is highly asymmetric
  • MPEG encode was originally imagined as an expensive/slow operation
    • However we now need to encode MPEG in realtime for many applications
      • Camcorder, PVR, etc.
  • MPEG actually encodes frames in the near past while accessing frames in the near future
  • Exact frame latency based on the MPEG Data Stream
38
MPEG GOP (Group of Pictures)
  • No specification on order or quantity of I, B, or P frames
    • Ex: Can be all I frames
      • No inter-frame dependency
      • Similar to MJPEG
  • Typically a GOP might contain:
    • I B B B P B B B P B B B P…
    • (next GOP would begin with an I)
39
MPEG Data Stream
  • Optimally encoding video as MPEG requires many decisions to be made
  • One decision is on the mix of I, B, and P frames
  • A carefully hand optimized MPEG stream will make sure that I frames correspond to key-frames in the video
40
GOP Mix Implications
  • Video playback can only ‘start’ with an I frame
  • Mix can influence playback at different rates
    • (if playback device isn’t fast enough to decode all frames faster than realtime)
  • Operations like reverse are complicated
41
GOP Mix Implications continued
  • Makes editing rather complicated
    • Consider implications for:
      • Changing a frame
      • Cutting a frame
      • Adding frames of MPEG video
42
Encode and Decode ordering
  • Due to the presence of P and B frames
  • Future frames need to be presented before the frames that rely on them:











  • Image: Distronics
43
Another way to consider GOP Ordering
  • Some ratio of B to P is decided on
    • Say BBPBBPBBP
    • Or BBBPBBBPBBBP
  • Then I frames are sprinkled in every nth second
44
Temporal Coherence
  • This information is encoded into the data stream
  • Is straightforward/inexpensive to decode


  • Unfortunately it is a very difficult problem to encode
    • Hence MPEG is a highly asymmetric CODEC
      • (Far more computationally expensive to encode than decode)
45
MPEG Encoding Details
  • Frame divided into 16x16 pixel macroblocks comprised of:
    • 4 blocks of Luminance
    • 1 Block each of U and V




  • Image: Distronics.co.uk
46
DCT Lossy Encoding
  • Each of Macroblock’s 6 blocks are encoded as a 1x64 DCT
  • Results filtered by QTable
    • (Very similar to JPEG)
47
Motion Estimation
  • Pixel values are estimated by relocating a block of pixels from a previous or future frame
  • Motion is encoded using a 2D vector
  • Motion vectors are:
    • straightforward to decode
    • Complicated/expensive to encode




48
"Image"











  • Image: Distronics.co.uk
49
Motion Vector
  • For each block B the previous frame is searched for a similar block
  • Encoder records vector describing the change of position:
    • (Cx-Bx, Cy-By) = (Δx, Δy)
  • Called the motion vector
50
Motion Compensation Components
  • Frame Segmentation
  • Search Threshold
  • Distortion Metric
  • Block Search
51
Frame Segmentation
  • Frame divided into blocks
  • Blocks are equally sized
  • Blocks do not overlap


  • Implication
    • Larger boxes harder to match
    • Smaller boxes create more motion vectors
52
Search Threshold
  • If search finds a block such as the difference (distortion metric) between the blocks is ≤ threshold
    • Then they considered matching
    • Vector is output
53
Distortion Metric
  • Computation used to compare blocks


  • Conflicting goals
    • Accurate
    • Fast
54
Mean Absolute Difference Metric
  • Likely candidate
  • Calculates average of the (absolute) pixel differences in a pair of blocks
  • For instance comparing pixel Bij in block B and corresponding pixel in candidate block C:
55
Mean Absolute Diff Expense
  • Per each block comparison
    • b2 subtractions, abs value ops
    • b2 additions
    • 1 division
  • Calculated (worst case) on each block comparison
    • The worst often occurs (no matching block)
    • For instance when a new object enters field of view
56
Search Strategies
  • Don’t search all blocks for comparison
    • Only search neighboring blocks
  • Progressive search criteria
    • Use an inexpensive metric to reduce the pool of potential matching candidates
    • Then use better metric to make selection
57
More Search Strategies
  • Distance dilute search
    • We know faster moving object appear blurry
      • (even if each frame is sharp)
    • Consequently we can relax the matching threshold at greater distances (representing faster movement)
58
More Search Strategies
  • Locality Based Search
    • Assumes that once a good match is found, even better matches may be in same area
    • Use inexpensive metric to find match
    • Then use better metric, and perhaps denser blocks, to find optimal match
59
More Searches
  • Quandrant Monotonic Search
    • A refinement of Locality
    • Use a quadtree like algorithm to find and successively refine areas of match
    • (begin with large blocks and subdivide based on quality of match)
60
More Searches
  • Dependent Algorithms
    • Assumes movement is the result of object or camera motion
    • So neighboring blocks should have similar motion vectors
    • Begins search assuming these vectors
61
Final Searches
  • Multidimensional Search Space
    • When searching, look for matches that are:
      • Rotations
      • Zooms
      • Changes of illumination
      • (not just translations)
    • VERY computationally expensive
      • (Not yet practical)
62
Motion Vector Compression
  • Perhaps up to ˝ of blocks may be converted to motion vectors
  • Important to compress these well
  • They are correlated, and non-uniform
    • Neighboring blocks will likely have similar motion vectors
    • IE they all represent the same object moving in the same way between frames
    • This leads to efficient coding