|
Back to : Text, Bytes and Videotape Main Page
Text, Bytes and Videotape
Go to Huffman Coding
Go to Task 2
Go to Entropy
Go to Task 3
Go to Real-life examples
Codings
If you use the '20 Questions' game to identify a particular person, then that
person has a sequence of yes/no answers associated with them. Alternatively,
we could express this as a sequence of zeros and ones, giving a coding for that
person. Coding, used in this sense, has nothing to do with spies and secret
messages - that is the topic known as cryptography. Here we are talking about
codings which are used to label something, and often involve just the two symbols
0 and 1. Examples are bar codes, the digits which identify a particular credit
card, and so on; further examples are given below.
Example 1: ASCII
This is a coding of the alphabet, used on computer keyboards. It includes all
the letters of the alphabet, both lower and upper case, plus the other symbols
used on a keyboard. ASCII codes 256 symbols, so it needs 8 digits for each symbol:
1 digit can code 2 symbols (0 or 1); 2 digits can code 4 symbols (00, 10, 01,
11); 3 digits can code 8 symbols (000, 100, 010, 001, 110, 101, 011, 111). If
you have n digits available to you, the number of symbols that can
be coded is equal to 2n. 256 = 28 so 8 digits
can code 256 symbols.
Each individual digit (ie. each individual 0 or 1) is a bit of information,
and a sequence of 8 bits is known as 1 byte of information. A sequence of 0's
and 1's is known as a binary number, and it corresponds to one of the decimal
numbers we generally use. The table below gives a few examples:
Coding number (from 1 to 256) |
Equivalent binary number |
Keyboard symbol encoded by this binary number |
63 |
00111111 |
? |
64 |
01000000 |
@ |
65 |
01000001 |
A |
66 |
01000010 |
B |
Morse Code
Here is a copy of the Morse Code, which uses the binary system of short and
long sounds or light beams.
A
 |
B
 |
C
|
D
 |
E  |
F 
 |
G
 |
H
|
I
 |
J
|
K
|
L
|
M
|
N
 |
O
|
P
 |
Q
|
R
|
S    |
T  |
U 
|
V  
|
W
|
X
|
Y
|
Z
 |
|
|
Discussion
Have a look at the Morse Code:
- How many letters use just one digit (ie.
or
) ?
- Which use the largest number of digits?
- Are all the possible 2, 3 and 4 digit combinations used in the Morse Code?
- Why do you think these particular combinations were chosen?
Go to top
Huffman Coding
Suppose event A occurs with probability 1/2, and events B, C,
D and E with probabilities 1/8 each.
One very simple way to do a coding for these is to pick five sequences of three
bits each, eg. A = 000, B = 001, C = 010, D
= 011, E = 100. The average letter then takes 3 bits to describe it.
Huffman Coding can beat this. It's a bit like the '20 Questions' game, where
we tried to cut down the possible options by a half at each stage. David
Huffman invented his code in 1954, while he was still a graduate student
at MIT. His code is now used in computers, television, modems, and so on. The
story goes that Huffman had chosen the option of writing a paper on a problem
given to the students by their professor, instead of doing a final exam. Huffman
had worked on his paper for months, but failed to come up with a way of showing
that his method of coding was the most efficient method available. He was about
to give up, and revise for the final exam instead, when he hit on the solution:
"It was the most singular moment of my life. There was the absolute lightning
of sudden realization." Apparently when the professor saw Huffman's solution
he said "Is that all there is to it!" Like the '20 questions' game,
the idea is to cut down the options by half at each stage.
To create the Huffman coding for the example above [p(A) = 1/2,
p(B) = 1/8, p(C) = 1/8, p(D) = 1/8, p(E) =
1/8] proceed as follows:
| Stage 1: |
Take two letters which have
the smallest probabilities of occurring, eg. D and E,
and bracket them together, (with the letter with the least probability
of occurring on the left): |
|
| Stage 2: |
Repeat the above Stage, putting together the two letters
or groups of letters with the smallest probabilities of occurring, (smallest
probability on the left): |
|
| Stage 3: |
Repeat Stage 2 until all letters have been grouped together: |
|
| |
|
|
This isn't the only way you could have done this, of course, and you will get
the opportunity to try a different way during Task 2.
We can represent these stages in a tree, as we did in the '20 Questions' game:

This gives us the Huffman tree:

We can create codes for each letter from this tree by putting
a 0 for each left hand edge of a pair, and a 1 for each right hand edge: A
= 0; B = 100; C = 101; D = 110; E = 111.
Of course, this coding is not unique: if we had started with C
and D, say, rather than D and E, we would end
up with a different coding, but this is not important. What is important is
the length of the sequence of symbols, or 'word', for each letter, rather than
the actual sequence of 0's and 1's. If we now calculate the average length of
a word, it is
which is less than the 3 we obtained by just assigning symbols randomly to each
letter.
Now have a go at Task
2.
Answers to Task 2
Go to top
Decipherability
Using the tree structure which Hoffman coding provides ensures
that a message is decipherable, eg. that message 0110101100011 can
be broken apart to give BCAAEB as we saw at the beginning of Task 2.
Huffman is the best coding
There is no coding from A, B, C, D, E, to a collection
of 0's and 1's which is both
(a) decipherable
(b)of shorter average length
with these probabilities. Whatever the collection of letters, and whatever their
probabilties, the Huffman coding is always best.
Entropy
Entropy gives us a way of talking about the degree of disorder in a system,
or the degree of uncertainty about the outcomes - it measures how random something
is, or, equivalently, how surprised we will be when we hear the answer. A physical
example of the presence of entropy is the limit on how much the molecules in
a volume of air can be compressed - they can't be compressed into zero volume,
because they possess a degree of disorder or uncertainty about their position
which can't be removed. In information theory, entropy manifests itself as the
limit on how much data can be compressed. Data which is more random can be compressed
less than data which has a degree or order to it. If you toss a fair coin, you
expect the probability of getting a head to be 1/2, and the probability of getting
a tail to be 1/2 also. The entropy of this coin toss is 1, which you will be
able to verify in the next task. On the other hand, if you toss a biased coin,
with a 90% probability of giving you a head, the entropy is only 0.47. You are
much less surprised at the result, and the outcome is much less random.
Shannon found a formula for entropy. If a series of events, labelled 1, 2,
3, ..., r, have a probability of p(1), p(2), p(3), ..., ,p(r) of
occurring, then the entropy of this set of events is given by:

Task 3 will give you an opportunity
to find out what this means, and to do some calculations of entropy. Have a
go at this now.
Answers to Task 3
Additional notes for Task 3
The entropy formula involves logs to the base of 2, because these are the inverse
of the operation of raising 2 to some power. In the Huffman coding and the '20
questions' game, where we want to cut the possibilities by half at each stage,
the number of 'questions' asked or stages in the tree, doubles at each stage.
This is equivalent to raising 2 to a power, since increasing the power by 1
doubles the previous answer.
Real-life examples
The symbols in the messages described so far are independent - a symbol is
not affected by what went before, in the same way that the number you get when
you throw a dice is not affeted by what went before. Many real-life messages
won't be independent, however. For instance, on a video, one black pixel is
more likely to be surrounded by black pixels than by white ones, and it is likely
to be black still in the next frame. This is exploited by DVDs and MP3s.
Loss
So far, we have considered making a perfect copy of a message, or sequence
of symbols or pixels. If we accept something that, is, say, 99% accurate, we
can do better. For instance, MP3s cut out frequencies beyond the range of human
hearing. Given a loud sound at one pitch, they remove quiet sounds at other
pitches in the sound. This doesn't give a perfect copy, but if it downloads
ten times faster, you may not care!
Go to top
top
of page
Back to : Text, Bytes and Videotape Main Page
contact
| accessibility
© 2002 Millennium Mathematics Project,
University of Cambridge
|