1	Graphics File Format and Data Compression Techniques (take 2) Week 12
2	Administrivia Attendance
3	Penultimate Class December 6^th is our last class Finals week of December 17^th Final Exam time to be announced
4	Optical Flow Mouse update: 1995 mouse prototype used linear arrays 1999 Agilent ADNK-2610 optical mouse sensor 2D image sensor 1500 frames/s, 18x18 sensor 400 CPI (counts per inch) Speeds up to 12 IPS http://cp.literature.agilent.com/litweb/pdf/5988-9774EN.pdf Used in most optical mice including Microsoft’s and Apple’s
5	Variable Compression As we have seen there are many variables that effect compression ratios Nature of input signal Ratio of B to P to I frames in the GOP Efficiency of estimators Quality table selection The encoder needs to create a conforming stream of a given data rate
6	Encoder rate control Somehow the encoder must dynamically make tradeoffs to create a conforming MPEG stream Stream must average a certain data rate Can’t diverge from this rate for too long Some players (like my Sony S530D)have a special mode that will show the instantaneous date rate and will provide statistics
7	Rate Control An encoder must watch the dynamic data rate and tune compression for future frames to keep within limits This is a difficult control problem Is another place where human operators control very sophisticated MPEG encoders for higher quality results At SGI we were creating software encoders at a time HW encoders had entered the market to provide more control to humans
8	Rate Control Many hardware encoders suffer from poor rate control They are intended to be fast (real-time) Why else purchase HW? Consequently they never backtrack (re-encode) If a particularly difficult frame/scene appears they will give it far too many bits Then the following scenes/frames suffer
9	What MPEG doesn’t specify MPEG defines the data stream and the decoder Doesn’t specify how encoding is to be performed Hard and undefined part of MPEG is the encoder Optimal encoder is still a research problem!
10	What MPEG does specify Session Layer Video Layer Audio Layer
11	MPEG Audio MPEG specifies various ‘layers’ of audio Each with more refinement/complexity (MP3 is MPEG Layer 3 Audio) MPEG Audio is a lossy perceptual coder Sony’s Minidisk player is one of the 1^st commercial applications of this technology It took many years for these algorithms to catch on
12	MPEG Audio CODEC Asymmetry Perceptual models used by the encoder Decoder is designed to be very efficient Run on inexpensive “low power” platforms
13	Fraunhofer 1987 began work on EUREKA perceptual audio codec Now standardized as ISO-MPEG Audio Layer 3
14	Audio Bitrates Raw CD Quality PCM ≈ 1.4Mb/s MP3 provides: 12:1 compression without perceptible loss 24:1 and greater still provide greater quality than reducing sampling rate or quantization Achieved by using models of human audio perception
15	MPEG Audio layers Three layers (differ in complexity and compression ability) Layer I: 384kbps (stereo signal) Layer II: 256-192kbps Layer III: 128-112kbps Popularized as MP3 Very complicated Very high quality results
16	Perceptual Audio CODEC on general CPU I first began working with perceptual CODECS while at SGI in the early 1990’s Using the best MIPS processors (much higher floating pt performance than Intel) MPEG Audio encode took much greater than realtime to encode MPEG Audio decode took almost 100% of CPU
17	MPEG Layer 3 Performance
18	MP3 Block Diagram Fraunhofer
19	MPEG Layer 1 32 Subbands Floating point Used for: Phillips Digital Compact Cassette (DCC) Data rate: 384kb/s
20	MPEG Layer 2 Adds complexity Improves Coding
21	MPEG Layer 3 Much more complicated Greatly improved Coding
22	New: MP3 Surround 5.1 Channel Surround Data rate comparable to stereo MP3
23	MP3 Concepts Minimal Audition Threshold Masking Effect Byte Reservoir Joint Stereo Coding Quantization Entropy Coding
24	1 Minimal Audition Threshold Ear has non-linear frequency response Fletcher/Munson equal loudness curve Sounds at diff freqs have to have diff amplitudes to be perceived as having same loudness Consequently: Sounds under threshold do not have to be coded
25	Absolute Threshold Curve Audio Anecdotes
26	Absolute Threshold Curve Model Curve can be approximated by this eq:
27	2 Masking Effect Strong sounds ‘mask’ the perception of sounds that are Softer Slightly higher in frequency
28	Masking in Vision Bird flies in front of the sun You don’t see the bird It is masked by the much brighter sun
29	Perceptible model of audio masking Model described in Audio Anecdotes V1: Auditory Masking in Audio Compression
30	Critical Bands (Bark Scale) Audio Anecdotes V1
31	Critical Bands Important information per band: Absolute threshold for perception Relative threshold How great of a change is needed to be perceived
32	Example: 1kHz Sinewave Audio Anecdotes V1
33	Example: 1kHz, 1.5kHz, 3kHz waves Audio Anecdotes V1
34	Example Notice Bark bands excitation Unchanged for 50db 1.5kHz addition Changed for the 50db 3kHz addition (1.5 signal was masked by the 1kHz signal)
35	Example 2: Typical Music Spectrum Audio Anecdotes V1
36	Example 2:Typical Music Spectrum Block of samples from pop-music Spectrum derived by a length 4096 FFT Corresponding masking thresholds (dark line) Notice: ≈ 1/2 freq. components fall below threshold Those don’t have to be coded
37	Temporal masking considerations An otherwise masked signal (like the 1.5kHz signal from our example) Might be audible if The signal expends in time beyond the masking signal 5-10ms before 20-80ms after
38	Masking as Compression Masking typically eliminates ½ - 2/3 of coefficients This step along results in 2:1 to 3:1 compression
39	Codecs that use Masking MP3 (MPEG) WMA (Microsoft) AC-3 (Dolby) ATRAC (Sony MiniDisk) AAC (MPEG-4 used by Apple)
40	3 Byte Reservoir MP3 is a fixed rate codec Some passages (of music) might not encode well at that rate (loose quality) Byte Reservoir (buffer) allows these passages to be encoded at higher bit rate Reservoir ‘refilled’ by compressing other content at a correspondingly higher rate
41	MP3 refinements MP3 uses a Modified Discrete Cosine Transformation (MDCT) Better frequency resolution Poorer time resolution Errors spread over longer time (Leads to pre-echo phenomena) Identify and better code conditions that led to pre-echo Temporarily increases number of quantization levels
42	4 Joint Stereo Coding People can not accurately spatialize very low (or hi) frequencies Consider subwoofer in 5.1 surround sound system If, say, an explosion, comes from behind Low frequency boom appears to come from the rear speakers (and not the subwoofer)
43	Joint Stereo Coding cont. Some sounds that are present in both channels Are encoded monophonically
44	Mid/Side (M/S) Stereo Often in music stereo channels are very similar Producer will pan guitar a little to one channel And vocals a little to the other Then Encoding middle: (L+R) And a side: (L-R) channel More similar the channels the fewer bits are needed to encode side channel
45	5-6 Quantization, Entropy Coding Quantization Frequency coefficients are divided by perceptually inspired values Entropy Coding To squeeze a little more compression out Interesting compliment to perceptual coding: In polyphony: much masking, little entropy In pure tones: little masking, much entropy
46	Other Audio Codecs CELP Voxware MPEG-4 Structured Audio
47	Voice CODECS Human voice has properties conducive to compression: Sound created by Passing air over vocal chords High pressure, high volume Pitch controlled by shape of vocal tract Mostly by tongue and lip placement
48	Things change slowly Humans are slow compared to computers Or audio samples Humans don’t change speech parameters quickly Vocal tract Lung pressure Consequently sound changes slowly Pitch Volume
49	Vocal chords have fundamental frequency Consequently speech tends to have a fundamental frequency F₀
50	CELP Text Codec Code Excited Linear Prediction Developed by Bell Labs Low delay: < 5ms 1992 standardized as G728 Uses 5 sample frame < 2ms latency Very low bitrate: 4.8kb/s
51	CELP cont. Analysis by synthesis time domain algorithm Models vocal tract by a linear prediction filter Uses excitation signal as input to filter
52	MIDI as Music Compression .MID files Realtime, low latency Records note-on note-off messages (events) Most messages are very small 4 bit channel number 4 bit opcode 1-2 data bytes
53	MIDI Playback MIDI messages interpreted by a synthesizer General MIDI specifies the instruments a synthesizer must emulate MIDI messages presented at correct time by sequencer
54	Incredible compression MIDI can encode a long musical performance as a relatively tiny number of messages Mozart’s Eine Klein Nachtmusik in 34kb When compared to the huge size of PCM audio samples encoding same performance MIDI synthesizers can be arbitrarily high quality IE synthesize instruments at high Sample rates Quantization
55	MIDI issues MIDI messages Note on Note off Best encode keyboard performance Difficult to encode ‘articulation’
56	MIDI Encoder Mainly: Author Only Generally MIDI is created Directly by the composer Using a sequencer application Entered as a musical score or directly as MIDI events Captured from a performance On a MIDI instrumented instrument MIDI keyboard String instrument with MIDI
57	PCM to MIDI encoder Very difficult research problem to take arbitrary PCM recording of polyphonic music Determine instruments played Determine gestures on each instrument (used to create the music) Output MIDI Monophonic encoders work pretty well Polyphonic encoders exist but leave a lot to be desired
58	Audio Scene Analysis Field Created by Al Bregman Described in Audio Anecdotes Deals with analyzing arbitrary sounds and determining how they were created