1	Graphics File Format and Data Compression Techniques (take 2) Week 11
2	Administrivia Attendance
3	AES Meeting Feedback cancellation technology at RANE http://www.aes.org/sections/pnw/
4	No Homework from lastweek I wanted to have you experiment with quality tables but didn’t find anything sufficiently quantifiable to give…
5	Questions on JPEG? MPEG builds on JPEG…
6	The wonderful world of MPEG Moving Picture Experts Group 1^st me in 1988
7	MPEG-1 and -2 Standards MPEG-1: Audio/Video compression. VCD, MP3 Intended for intermediate data rates: 1.5Mb/s MPEG-2: Transport, A/V for broadcast quality Digital TV, and DVD Intended for high data rates > 10 Mb/s
8	MPEG-3 and -4 standards MPEG-3: Abandoned when MPEG-2 proved sufficient for HDTV MPEG-4: Intended for very low bitrate Adds objects, 3D scenegraph, DRM, and more efficient encoding Intended for very low rates < 64Kb/s But CODEC useful for very high quality too
9	Other MPEG standards MPEG-7: Multimedia description How to identify multimedia content MPEG-21: Multimedia Framework
10	Video CD Whitebook standard Audio CD: Redbook Standard created in 1993 (DVD’s not released until 1997 in US) Hugely popular in Asia Cheap to manufacture Easy to copy Works in high humidity (horrible for VHS)
11	Rainbow book standards Redbook (1980): CD Digital Audio Yellow Book (1984): CD-ROM Orange Book: CD-R, CD-RW White Book: VCD Blue Book: Enhanced Music CD, CD+G, CD-Plus Beige Book: Photo CD Green Book: CD-I
12	Redbook violations Interestingly many new audio CD’s violate Redbook standards An attempt at copy prevention and violation of trademark Once familiar CDDA trademark disappearing:
13	VCD Specifications ISO-9660 Filesytems Bitrate: ≈1.13 Mbit/Second Over a megabit a second Sound like a lot? (I want you to be able to reason with sizes and rates)
14	VCD’s 1.13 Mb/s Ethernet: 10Mb/s CD (1x) Rate: CLV (constant linear velocity) Typical CDR 700MB, 80minute So (within 1x CD-ROM rates): 700MB8bits/byte1/80minutes*1minute/60seconds=1.16Mb/second
15	VCD’s 1.13 Mb/s cont. So how much video on a CD? Fully compliant CD’s are 650MB Longer recorded CDR’s can’t be read on all machines 650MB8bits/Byte1s/1.13Mb*1min/60sec ≈ 77 Minutes of Video Pretty impressive, what is the catch?
16	VCD Specification Resolution (QCIF): NTSC: 352×240 PAL: 352×288 Frame Rate: NTSC: 29.97 Frames/Second PAL: 25 Frames/Second Audio MP2 (MPEG Layer 2) 224kb/s
17	VCD Perception Approximates VHS Quality (VHS Quality is fairly poor) Many movies are longer than 77 minutes No Digital Rights Management (DRM) Very easy to pirate Studios reluctant to release in this format Most DVD players will play VCDs
18	MPEG-1 How much compression? People claim an amazing 150:1 But I don’t accept that value
19	150:1 Compression Ratio Assumes video is ≈ 20MB/s D1 video is over 30MB/s 640x480pixel * 30 Frames/s * 10 bits/RGB component More importantly assumes that MPEG-1is inexact (lossy) but not noticeably different from the original This is very far from the case Maybe that much data reduction…
20	D1 NTSC Video Digital Component Video From my SGI days This is the digital representation of broadcast quality video 640x480 (square pixel) 30 Frames a second 29.97 if dropframe 10 bit R, G, B components Or YUV, YCbCr, etc.
21	JPEG compression comparison JPEG begins showing artifacts at compression ratios above 10 to 20:1 How does MPEG provide 7 times the compression? (it doesn’t need to)
22	Motion Artifacts Can cheat with video It turns out that JPEG-type artifacts are much more difficult to see on moving than on still images (Try pausing a DVD and suddenly notice the apparent quality of image dramatically drop)
23	MJPEG Compression Ratios This effectively means that JPEG compression may be significantly higher Perhaps 50:1 ratio before artifacts become noticeable Still, how does MPEG achieve a 2-3x improvement over MJPEG? (Motion JPEG encodes video as a stream of JPEG frames)
24	Temporal Coherence So far we have explored image compression algorithms that make use of spatial coherence Pixels tend to resemble their neighbors MPEG exploits temporal coherence Pixels from adjacent frames tend to be similar Or Most objects tend not to move in a scene
25	Temporal Coherence Consider a “talking head” (newscaster) How much really changes in 1/30^th second? Mouth moves Hand Waves Background remains the same
26	Lossless Delta Coding Similar to lossless audio CODEC Record the difference of each pixel in present frame and the same pixel location in previous frame Due to temporal coherence Differences will often be small Can be encoded in fewer bits
27	Lossy Delta Coding Compare pixel in present frame to same pixel location in previous frame Only record the difference if delta exceeds some value
28	Problems with Delta Coding Video technical events eliminate coherence: Scene cut, wipe, fade, etc. Zoom Pan These all ‘break’ coherence between every pixel
29	Technical Events Defined as anything unnatural Camera changes, Computer Graphics/Titles, Music Used to keep TV interesting These events catch our attention One every 10 seconds when Jerry Mander wrote Four Arguments for the Elimination of Television in 1978 (he’s an advertising executive) Now the MTV inspired media includes one TE a second
30	Motion Estimation (compensation) MPEG uses a more sophisticated delta encoding scheme MPEG divides frames into blocks then determines if a given block moves between frames
31	MPEG Intra frame Frame encoded independently from any other frame Labeled I A Keyframe JPEG-like DCT lossy encoding Encodes to approximately 2 bits per pixel (24:1 compression ratio)
32	MPEG Inter frame Inter frames dependent on other (I or P) frames Two varieties Frame dependent on previous frame Labeled P for predictive Frame dependent on previous and future frames Labeled B for bi-directional
33	Inter vs Intra Inter frames encode much better than intra frames Why not only use inter frames? (only use frames based on other frames)
34	Errors Accumulate If every frame is dependent on all that came before in Lossy compression Distortion would increase w/o bound Lossless compression Fragile, damage from dropped bits would accumulate/be uncorrectable
35	B versus P Inter-frames Now that we know B, P Inter-frames encode better than Intra (I) frames Can’t only use Inter-frames Why do B frames encode better than P frames?
36	B Versus P From the previous example: Consider when a coffee mug is picked up (after a long while) The information for what is behind the mug isn’t in the previous frames But is available in the future frames P frames are used to limit how far a B frame needs to ‘look’ for its base How can the encoder predict the future?
37	Encoder works in the near past MPEG Encode/Decode is highly asymmetric MPEG encode was originally imagined as an expensive/slow operation However we now need to encode MPEG in realtime for many applications Camcorder, PVR, etc. MPEG actually encodes frames in the near past while accessing frames in the near future Exact frame latency based on the MPEG Data Stream
38	MPEG GOP (Group of Pictures) No specification on order or quantity of I, B, or P frames Ex: Can be all I frames No inter-frame dependency Similar to MJPEG Typically a GOP might contain: I B B B P B B B P B B B P… (next GOP would begin with an I)
39	MPEG Data Stream Optimally encoding video as MPEG requires many decisions to be made One decision is on the mix of I, B, and P frames A carefully hand optimized MPEG stream will make sure that I frames correspond to key-frames in the video
40	GOP Mix Implications Video playback can only ‘start’ with an I frame Mix can influence playback at different rates (if playback device isn’t fast enough to decode all frames faster than realtime) Operations like reverse are complicated
41	GOP Mix Implications continued Makes editing rather complicated Consider implications for: Changing a frame Cutting a frame Adding frames of MPEG video
42	Encode and Decode ordering Due to the presence of P and B frames Future frames need to be presented before the frames that rely on them: Image: Distronics
43	Another way to consider GOP Ordering Some ratio of B to P is decided on Say BBPBBPBBP Or BBBPBBBPBBBP Then I frames are sprinkled in every nth second
44	Temporal Coherence This information is encoded into the data stream Is straightforward/inexpensive to decode Unfortunately it is a very difficult problem to encode Hence MPEG is a highly asymmetric CODEC (Far more computationally expensive to encode than decode)
45	MPEG Encoding Details Frame divided into 16x16 pixel macroblocks comprised of: 4 blocks of Luminance 1 Block each of U and V Image: Distronics.co.uk
46	DCT Lossy Encoding Each of Macroblock’s 6 blocks are encoded as a 1x64 DCT Results filtered by QTable (Very similar to JPEG)
47	Motion Estimation Pixel values are estimated by relocating a block of pixels from a previous or future frame Motion is encoded using a 2D vector Motion vectors are: straightforward to decode Complicated/expensive to encode
48	"Image" Image: Distronics.co.uk
49	Motion Vector For each block B the previous frame is searched for a similar block Encoder records vector describing the change of position: (C_x-B_x, C_y-B_y) = (Δx, Δy) Called the motion vector
50	Motion Compensation Components Frame Segmentation Search Threshold Distortion Metric Block Search
51	Frame Segmentation Frame divided into blocks Blocks are equally sized Blocks do not overlap Implication Larger boxes harder to match Smaller boxes create more motion vectors
52	Search Threshold If search finds a block such as the difference (distortion metric) between the blocks is ≤ threshold Then they considered matching Vector is output
53	Distortion Metric Computation used to compare blocks Conflicting goals Accurate Fast
54	Mean Absolute Difference Metric Likely candidate Calculates average of the (absolute) pixel differences in a pair of blocks For instance comparing pixel B_ij in block B and corresponding pixel in candidate block C:
55	Mean Absolute Diff Expense Per each block comparison b² subtractions, abs value ops b² additions 1 division Calculated (worst case) on each block comparison The worst often occurs (no matching block) For instance when a new object enters field of view
56	Search Strategies Don’t search all blocks for comparison Only search neighboring blocks Progressive search criteria Use an inexpensive metric to reduce the pool of potential matching candidates Then use better metric to make selection
57	More Search Strategies Distance dilute search We know faster moving object appear blurry (even if each frame is sharp) Consequently we can relax the matching threshold at greater distances (representing faster movement)
58	More Search Strategies Locality Based Search Assumes that once a good match is found, even better matches may be in same area Use inexpensive metric to find match Then use better metric, and perhaps denser blocks, to find optimal match
59	More Searches Quandrant Monotonic Search A refinement of Locality Use a quadtree like algorithm to find and successively refine areas of match (begin with large blocks and subdivide based on quality of match)
60	More Searches Dependent Algorithms Assumes movement is the result of object or camera motion So neighboring blocks should have similar motion vectors Begins search assuming these vectors
61	Final Searches Multidimensional Search Space When searching, look for matches that are: Rotations Zooms Changes of illumination (not just translations) VERY computationally expensive (Not yet practical)
62	Motion Vector Compression Perhaps up to ½ of blocks may be converted to motion vectors Important to compress these well They are correlated, and non-uniform Neighboring blocks will likely have similar motion vectors IE they all represent the same object moving in the same way between frames This leads to efficient coding