Notes
Slide Show
Outline
1
Graphics File Format and
Data Compression Techniques
(take 2)  Week 5
2
Business
  • Attendance (please sign in)
  • Break(s)
  • Show’n’Tell
  • Textbooks
3
Breaks
  • I will try my best to have two 10 minute breaks
  • 1st around 7:00 and the 2nd about an hour later


  • Please be sure to make it back and be settled within the 10 minutes
4
SIGGRAPH 2004 Proceedings
  • SIGGRAPH = Sexiest computer graphics, animation, art and multimedia research
  • Seattle SIGGRAPH Chapter
    • Signup for announcements:
    • Website: http://seattle.siggraph.org
  • Special DigiPen Video Review meeting
  • Looking for student volunteers (contact keng@sworks.com)
5
Audio Anecdotes Volume2
  • Advanced copy arrived after class last week


  • Will be sharing copies of articles that pertain to class
6
Shown’n’tell
  • Please take a look at the books over break
7
Textbooks
  • Borders group order?
    • Only one response so far?
  • 20% discount on one book (other is full price)
8
Homework 4
  • What did you think of the HW?
9
HW4 Media Types
  • Compression ratios
    • .txt
    • .c
    • .wave
    • .mp3
    • .bmp
10
HW4 gzip worst case
  • How did you construct the file
  • What is going on?
  • Is there a max expansion?
11
HW4 Classifier
  • Anybody get it to work?


  • What went wrong?


  • How should I have specified the comparison?
12
Fragility of bitstreams
  • What a difference a bit can make


  • Flipped or dropped bit
    • Turn one pixel dark
    • OR
    • Change a $0 balance to 4 $294,967,296
    • OR
    • Render the data unrecoverable
13
Effect of Compression on Robustness
  • Compression by removing redundancy in data simultaneously reduces robustness


  • If a file could be reconstructed w/o a given bit then that bit is unnecessary
14
Bit errors in compressed formats
  • Consider implication of single bit error in increasingly more efficient schemes:
    • RLE
    • Huffman Encoding
    • LZW w/Dynamically constructed Codebooks
15
Sources of bit errors
  • Bit errors are caused by physical realities:
    • High energy Gamma rays or alpha particles
      • Can change the charge on the capacitors that store bits in DRAM
    • Scratch/smudge on a CD
    • CD pressing is imperfect
    • Flakey hunk of Ferris oxide (rust) on tape/disk
    • Noise on phone line disrupt MODEM
    • Noise on digital bus
16
Statistical noise and information theory
  • Noise is inescapable
  • Noise may be modeled statistically as a Gaussian normal probability density
  • Relative loudness of signal and noise define how much information may be communicated over any channel
17
Shannon Limit
  • Theoretically states max information capacity for any channel:
    • C <= W log2(1+S/N)


      • C: channel capacity (bits per second)
      • W: raw bandwidth in Hz
      • S: Signal strength (raw power not dB)
      • N: Noise strength (raw power not dB)
18
Shannon Limit and intuition
  • A channel with:
    • infinite bandwidth should be able to carry infinite information


  • Less clear is how the relationship of signal to noise affects capacity
19
Effect of noise on a channel
  • If the Signal to Noise ratio is too low
    • You can’t determine the signal from the noise
    • Ex: My speaking very softly


  • OK, but why would a channel with:
    • LIMITED bandwidth but
    • No noise
  • Yield a channel with infinite carrying capacity?
20
0 Noise = Infinite Capacity
  • Consider quantization (from last lecture)
  • Theoretically:
    • W/O noise
    • There are infinite levels of amplitude
    • Between 0 and the stated signal strength


  • This also explains why infinite signal strength yields infinite capacity
    • Infinite levels above noise floor
21
Fundamentals
  • Capacity is large when:
    • Bandwidth is high
    • Signal is strong
    • Noise is low
22
Error Detection vs Correction
  • Noise is inevitable
    • (consider the ‘tails’ on the Gaussian curve)
  • Bits will get flipped


  • What do we do?
23
Simple strategies
  • Shout!
  • Repeat everything 3 times
    • (and use best out of 3)
24
Best out of Three
  • Not bad…
    • Will correct single errors
    • Can detect up to two errors


  • Just not very efficient
25
Error Correcting Codes
  • Parity
  • CRC’s (cyclic redundancy checks)
  • Crypto Digests (SHA1, MD5)
  • Hamming Code
  • Reed-Solomon Codes
  • Turbo Codes
26
Parity
  • Used in memory systems


  • Count number of 1’s in a word
  • Add a 0 or 1 parity bit to make the count
    • Even (even parity)
    • Odd (odd parity)
27
Parity Calculation
  • Even Parity
    • XOR all the data bits
  • Odd Parity
    • Negate the XOR of all the bits
28
Even Parity Example
  • Data: 0 0 1 0 0 0 1 1
  • Parity = 0 ^ 0 ^ 1 ^ 0 ^ 0 ^ 0 ^ 1 ^ 1
    • Parity = 1


  • Change any bit?
  • Change 2 bits?
29
Parity
  • Detects single bit errors
  • (an even number of incorrect bits is undetectable)
  • Can not determine which bit is incorrect
    • ie can’t reconstruct the correct word


  • Practically
    • The computer or program will crash instead of providing bad results
30
General ECC
  • “Shout 3 times and vote” seems better than parity
  • Provides:
    • 1 bit error correction
    • 2 bit error detection
  • But
    • Requires sending 3x the bits

  • Fortunately there are better error correcting codes
31
Hamming Codes
  • Developed in 1950’s by a Bell Labs researcher
  • To combat errors in punch card readers


  • Consider:
    • In 7 bit message
    • 7 possible 1 bit errors
  • So (potentially):
    • 3 bits could identify and correct error
32
Hamming Rule
  • d + p + 1 <= 2p


    • d  number of data bits
    • p number of parity bits
    • c = d+p size of code word
    • (c,d) set describing code
33
Hamming (7,4)
  • Most used code
  • 3 control bits added to every 4 data bits
  • Can detect 2 bit errors and correct 1 bit errors
  • Is considered lossless since a channel with 2 out of 4 bit error rate wouldn’t be used
34
Reed-Solomon Codes
  • Data sent as encoded block
  • Encodes message as pts in a polynomial
  • Coefficients of polynomial are the data
  • Plot over determines the coefficients
  • Missing coefs may be recovered from subsets of plotted pts.
    • (consider correcting a curve by interpolating over a gap)
  • Best when errors come in bursts
    • Multiple bit errors in the same value count as a single error
35
Reed-Solomon Code usage
  • Invented in 1960
    • Too expensive to then implement
  • Used to send images back from Voyager
    • Launched 1977
  • Used on CD’s
    • 1980
  • Most advanced versions within 1.5 dB of Shannon Limit
36
Turbo codes
  • Introduced in a 1993 paper (but no one believed them)
  • Remarkably close to achieving Shannon Limit!
  • Presently used in deep space probes
  • Issues:
    • Relatively large latency
    • (hence its use in probes)
37
Latency
  • We discuss latency in my Audio Class
  • Also covered in Audio Anecdotes V1
  • Discussed it briefly in lecture
38
Now for something different
39
Binary File Format
  • What factors complicate binary file formats?
40
File Format Challenges
  • How to:
    • Formally describe the format
    • Be sure you have implemented the format correctly
    • Understand the implications of platform architecture
    • Write code that will work across platforms
41
ASCII Impunity
  • ASCII representations can be passed from machine to machine with impunity
  • Binary representations have complications
42
Architecture issues
  • Word size
  • Padding
  • Alignment
  • Signed number representation
  • Floating point representation
  • ‘endianess’
43
Endianess
  • Big-endian
    • Store information big end first
      • MSB in lower address
      • LSB in higher address
  • Historically most machines big-endian
    • IBM 370, Motorola CPU’s (68000)
    • Most RISC architectures
      • Some (like MIPS) are ‘bi-endian’
  • Internet protocols are big-endian
44
Little Endian
  • Send the little end first
  • Machines:
    • DEC PDP-11 and VAX
    • Intel Microprocessors (hence WINTEL PCs)
  • RS-232 serial protocols
    • Send least significant bit first
45
What does this print?
  • short *ptr;
  • char array[2];
  • ptr = (short *)array;
  • *ptr = 0x1234;


  • printf(“1st value: %02x\n, array[0]);
  • printf(“2nd value: %02x\n, array[1]);
46
Depends
  • My little endian WINTEL laptop:
    • 1st value: 34
    • 2nd value: 12
  • My big-endian MIPS R4000 SGI:
    • 1st value: 12
    • 2nd value: 34
47
Inter-computer Problem
  • Problem whenever one computer reads binary information another writes:
    • Tapes, disks, files
    • Network
48
Compilers add problems
  • Padding
  • Packing
  • Alignment
49
Solution
  • Carefully specify formats/protocols
  • Write code that runs consistently (has same semantics) on any architecture/compiler
    • NOT like the previous example!
50
Standardization
  • Can now safely assume:
    • 2’s complement signed integers
    • IEEE floating point
    • Word sizes of 1,2,4 and now 8
      • 8 bit bytes
51
Now for something different
52
What is color?
  • I believe to understand and utilize phenomena one needs to study
    • The physics of the phenomenon
    • The related human perception
      •  psychology & physiology
53
Color
  • By  understanding light & vision
  • We can make intelligent decisions as to what to:
    • Capture
    • Store
    • Re-synthesize (render)

  • (Will get us ready for lossy algorithms)
54
Color
  • What phenomena do we perceive as color?


  • How does out color perception work?


  • How do we capture & store this information?
55
The eye
56
Visual Angle
  • Angle subtended by a structure
  • When seen from nodal pt inside the eye
57
Diopter
  • Measure of the power of a lens
  • Reciprocal of the focal length of lens in meters
    • Ex: lens with focal length of .1m
    • Has power of 10 diopters
58
Cornea
  • Human cornea has optical power of 40 diopters
    • Due to curvature and refraction
  • Strongest focusing element in the eye
59
Lens
  • Surrounded by ciliary body
    • Pushes at sides of lens
    • When relaxed
      • Lens is stretched radially
      • Flat
      • Reduced optical power
      • Light focused as far from lens as possible
60
Lens
  • When ciliary muscles tensed
    • Push on sides of lens
    • Thickens lens
    • Optical power increases
    • Focal pt moves closer to lens
61
Lens summary
  • When relaxed
    • Longest focal length
    • (focus at infinity?)
  • When tensed
    • Shortest focal length
    • (focused on close objects)
  • Accommodation: Ability to stretch
    • Elasticity reduces with age
    • Young children lens accomodates from 10 to 30 diopters
    • After 45: lens largely inelastic
62
Retina
  • Light focused by the lens falls on the retina
  • Covers 200° of the back of the eye
  • Contains photoreceptors
    • Rods
    • Cones
63
Photoreceptors
  • Cones
    • Responsible for color perception
  • Rods
    • Intensity perception
    • 10x more sensitive to light
    • Smaller than cones
      • (Can be packed tighter == higher res)
64
Fovea
  • Most of retina is photosensitive
  • Fovea
    • Small region at center of visual axis
    • Subtends less than 2° of visual angle
    • Contains only cones
      • (densest concentration of cones on retina)
    • ≈147,000 Cones per mm2
      • Contrast to hawks with ≈1,000,000 receptors per mm2
65
Photoreceptor distribution
  • Increasingly higher concentration of rods as distance from fovea increases
  • Retina contains approximately
    • 120 million rods
    •    6 million cones
66
Optic Nerve
  • Optic nerve contains ≈1 million fibers
    • Hence much processing takes place in eye
  • The blind spot
    • Where optic nerve meets retina
    • No receptors there at all