Notes
Slide Show
Outline
1
Graphics File Format and
Data Compression Techniques
(take 2)  Week 12
2
Administrivia
  • Attendance
3
Penultimate Class
  • December 6th is our last class
  • Finals week of December 17th
    • Final Exam time to be announced
4
Optical Flow
  • Mouse update:
    • 1995 mouse prototype used linear arrays
    • 1999 Agilent ADNK-2610 optical mouse sensor
      • 2D image sensor
        • 1500 frames/s, 18x18 sensor
        • 400 CPI (counts per inch)
        • Speeds up to 12 IPS
        • http://cp.literature.agilent.com/litweb/pdf/5988-9774EN.pdf
      • Used in most optical mice including Microsoft’s and Apple’s
5
Variable Compression
  • As we have seen there are many variables that effect compression ratios
    • Nature of input signal
    • Ratio of B to P to I frames in the GOP
    • Efficiency of estimators
    • Quality table selection

  • The encoder needs to create a conforming stream of a given data rate
6
Encoder rate control
  • Somehow the encoder must dynamically make tradeoffs to create a conforming MPEG stream
    • Stream must average a certain data rate
    • Can’t diverge from this rate for too long


  • Some players (like my Sony S530D)have a special mode that will show the instantaneous date rate and will provide statistics
7
Rate Control
  • An encoder must watch the dynamic data rate and tune compression for future frames to keep within limits
  • This is a difficult control problem
  • Is another place where human operators control very sophisticated MPEG encoders for higher quality results
  • At SGI we were creating software encoders at a time HW encoders had entered the market to provide more control to humans
8
Rate Control
  • Many hardware encoders suffer from poor rate control
    • They are intended to be fast (real-time)
      • Why else purchase HW?
    • Consequently they never backtrack (re-encode)
    • If a particularly difficult frame/scene appears they will give it far too many bits
    • Then the following scenes/frames suffer
9
What MPEG doesn’t specify
  • MPEG defines the data stream and the decoder
  • Doesn’t specify how encoding is to be performed
  • Hard and undefined part of MPEG is the encoder
    • Optimal encoder is still a research problem!
10
What MPEG does specify
  • Session Layer
  • Video Layer
  • Audio Layer
11
MPEG Audio
  • MPEG specifies various ‘layers’ of audio
    • Each with more refinement/complexity
    • (MP3 is MPEG Layer 3 Audio)
  • MPEG Audio is a lossy perceptual coder
  • Sony’s Minidisk player is one of the 1st commercial applications of this technology
  • It took many years for these algorithms to catch on
12
MPEG Audio CODEC Asymmetry
  • Perceptual models used by the encoder
  • Decoder is designed to be very efficient
    • Run on inexpensive “low power” platforms
13
Fraunhofer
  • 1987 began work on EUREKA perceptual audio codec
  • Now standardized as ISO-MPEG Audio Layer 3
14
Audio Bitrates
  • Raw CD Quality PCM ≈ 1.4Mb/s
  • MP3 provides:
    • 12:1 compression without perceptible loss
    • 24:1 and greater still provide greater quality than reducing sampling rate or quantization
  • Achieved by using models of human audio perception
15
MPEG Audio layers
  • Three layers (differ in complexity and compression ability)
    • Layer I: 384kbps (stereo signal)
    • Layer II: 256-192kbps
    • Layer III: 128-112kbps
      • Popularized as MP3
      • Very complicated
      • Very high quality results
16
Perceptual Audio CODEC on general CPU
  • I first began working with perceptual CODECS while at SGI in the early 1990’s
  • Using the best MIPS processors (much higher floating pt performance than Intel)
    • MPEG Audio encode took much greater than realtime to encode
    • MPEG Audio decode took almost 100% of CPU
17
MPEG Layer 3 Performance
18
MP3 Block Diagram






  • Fraunhofer
19
MPEG Layer 1
  • 32 Subbands
  • Floating point
  • Used for:
    • Phillips Digital Compact Cassette (DCC)
    • Data rate: 384kb/s
20
MPEG Layer 2
  • Adds complexity
  • Improves Coding
21
MPEG Layer 3
  • Much more complicated
  • Greatly improved Coding
22
New: MP3 Surround
  • 5.1 Channel Surround
  • Data rate comparable to stereo MP3
23
MP3 Concepts
  • Minimal Audition Threshold
  • Masking Effect
  • Byte Reservoir
  • Joint Stereo Coding
  • Quantization
  • Entropy Coding
24
1 Minimal Audition Threshold
  • Ear has non-linear frequency response
    • Fletcher/Munson equal loudness curve
    • Sounds at diff freqs have to have diff amplitudes to be perceived as having same loudness
  • Consequently:
    • Sounds under threshold do not have to be coded
25
Absolute Threshold Curve






  • Audio Anecdotes
26
Absolute Threshold Curve Model
  • Curve can be approximated by this eq:
27
2 Masking Effect
  • Strong sounds ‘mask’ the perception of sounds that are
    • Softer
    • Slightly higher in frequency
28
Masking in Vision
  • Bird flies in front of the sun
    • You don’t see the bird
    • It is masked by the much brighter sun
29
Perceptible model of audio masking
  • Model described in Audio Anecdotes V1: Auditory Masking in Audio Compression
30
Critical Bands (Bark Scale)









  • Audio Anecdotes V1
31
Critical Bands
  • Important information per band:
    • Absolute threshold for perception
    • Relative threshold
      • How great of a change is needed to be perceived


32
Example: 1kHz Sinewave







  • Audio Anecdotes V1


33
Example: 1kHz, 1.5kHz, 3kHz waves







  • Audio Anecdotes V1
34
Example
  • Notice Bark bands excitation
    • Unchanged for 50db 1.5kHz addition
    • Changed for the 50db 3kHz addition
  • (1.5 signal was masked by the 1kHz signal)
35
Example 2: Typical Music Spectrum










  • Audio Anecdotes V1
36
Example 2:Typical Music Spectrum
  • Block of samples from pop-music
  • Spectrum derived by a length 4096 FFT
  • Corresponding  masking thresholds (dark line)
  • Notice:
    • ≈ 1/2 freq. components fall below threshold
    • Those don’t have to be coded
37
Temporal masking considerations
  • An otherwise masked signal
    • (like the 1.5kHz signal from our example)
  • Might be audible if
    • The signal expends in time beyond the masking signal
      • 5-10ms before
      • 20-80ms after
38
Masking as Compression
  • Masking typically eliminates
    • ½ - 2/3 of coefficients
  • This step along results in
    • 2:1 to 3:1 compression
39
Codecs that use Masking
  • MP3 (MPEG)
  • WMA (Microsoft)
  • AC-3 (Dolby)
  • ATRAC (Sony MiniDisk)
  • AAC (MPEG-4 used by Apple)
40
3 Byte Reservoir
  • MP3 is a fixed rate codec
  • Some passages (of music) might not encode well at that rate (loose quality)
  • Byte Reservoir (buffer) allows these passages to be encoded at higher bit rate
    • Reservoir ‘refilled’ by compressing other content at a correspondingly higher rate
41
MP3 refinements
  • MP3 uses a Modified Discrete Cosine Transformation (MDCT)
    • Better frequency resolution
    • Poorer time resolution
    • Errors spread over longer time
      • (Leads to pre-echo phenomena)
  • Identify and better code conditions that led to pre-echo
    • Temporarily increases number of quantization levels
42
4 Joint Stereo Coding
  • People can not accurately spatialize very low (or hi) frequencies
    • Consider subwoofer in 5.1 surround sound system
    • If, say, an explosion, comes from behind
    • Low frequency boom appears to come from the rear speakers (and not the subwoofer)
43
Joint Stereo Coding cont.
  • Some sounds that are present in both channels
  • Are encoded monophonically
44
Mid/Side (M/S) Stereo
  • Often in music stereo channels are very similar
    • Producer will pan guitar a little to one channel
    • And vocals a little to the other
  • Then Encoding
    • middle: (L+R)
    • And a side: (L-R) channel

  • More similar the channels
    • the fewer bits are needed to encode side channel
45
5-6 Quantization, Entropy Coding
  • Quantization
    • Frequency coefficients are divided by perceptually inspired values
  • Entropy Coding
    • To squeeze a little more compression out
    • Interesting compliment to perceptual coding:
      • In polyphony: much masking, little entropy
      • In pure tones: little masking, much entropy
46
Other Audio Codecs
  • CELP
  • Voxware
  • MPEG-4 Structured Audio
47
Voice CODECS
  • Human voice has properties conducive to compression:
    • Sound created by
      • Passing air over vocal chords
        • High pressure, high volume
      • Pitch controlled by shape of vocal tract
        • Mostly by tongue and lip placement
48
Things change slowly
  • Humans are slow compared to computers
    • Or audio samples
  • Humans don’t change speech parameters quickly
    • Vocal tract
    • Lung pressure
  • Consequently sound changes slowly
    • Pitch
    • Volume
49
Vocal chords have fundamental frequency
  • Consequently speech tends to have a fundamental frequency F0
50
CELP Text Codec
  • Code Excited Linear Prediction
  • Developed by Bell Labs
  • Low delay: < 5ms
  • 1992 standardized as G728
    • Uses 5 sample frame
    • < 2ms latency
  • Very low bitrate: 4.8kb/s
51
CELP cont.
  • Analysis by synthesis time domain algorithm
  • Models vocal tract by a linear prediction filter
  • Uses excitation signal as input to filter
52
MIDI as Music Compression
  • .MID files
  • Realtime, low latency
  • Records note-on note-off messages (events)
  • Most messages are very small
    • 4 bit channel number
    • 4 bit opcode
    • 1-2 data bytes
53
MIDI Playback
  • MIDI messages interpreted by a synthesizer
  • General MIDI specifies the instruments a synthesizer must emulate
  • MIDI messages presented at correct time by sequencer
54
Incredible compression
  • MIDI can encode a long musical performance as a relatively tiny number of messages
    • Mozart’s Eine Klein Nachtmusik in 34kb
    • When compared to the huge size of PCM audio samples encoding same performance
  • MIDI synthesizers can be arbitrarily high quality
    • IE synthesize instruments at high
      • Sample rates
      • Quantization
55
MIDI issues
  • MIDI messages
    • Note on
    • Note off
    • Best encode keyboard performance
  • Difficult to encode ‘articulation’
56
MIDI Encoder
  • Mainly: Author Only
    • Generally MIDI is created
      • Directly by the composer
        • Using a sequencer application
          • Entered as a musical score or directly as MIDI events
      • Captured from a performance
        • On a MIDI instrumented instrument
          • MIDI keyboard
          • String instrument with MIDI
57
PCM to MIDI encoder
  • Very difficult research problem to take arbitrary PCM recording of polyphonic music
    • Determine instruments played
    • Determine gestures on each instrument (used to create the music)
    • Output MIDI
  • Monophonic encoders work pretty well
  • Polyphonic encoders exist but leave a lot to be desired
58
Audio Scene Analysis
  • Field Created by Al Bregman
  • Described in Audio Anecdotes
  • Deals with analyzing arbitrary sounds and determining how they were created