Skip to page content. |Motivate home.  
Google



Back to : Text, Bytes and Videotape Main Page

Display maths using:

fonts images
Need help?

Text, Bytes and Videotape

Go to Huffman Coding
Go to Task 2
Go to Entropy
Go to Task 3
Go to Real-life examples

Codings

If you use the '20 Questions' game to identify a particular person, then that person has a sequence of yes/no answers associated with them. Alternatively, we could express this as a sequence of zeros and ones, giving a coding for that person. Coding, used in this sense, has nothing to do with spies and secret messages - that is the topic known as cryptography. Here we are talking about codings which are used to label something, and often involve just the two symbols 0 and 1. Examples are bar codes, the digits which identify a particular credit card, and so on; further examples are given below.

Example 1: ASCII

This is a coding of the alphabet, used on computer keyboards. It includes all the letters of the alphabet, both lower and upper case, plus the other symbols used on a keyboard. ASCII codes 256 symbols, so it needs 8 digits for each symbol: 1 digit can code 2 symbols (0 or 1); 2 digits can code 4 symbols (00, 10, 01, 11); 3 digits can code 8 symbols (000, 100, 010, 001, 110, 101, 011, 111). If you have n digits available to you, the number of symbols that can be coded is equal to 2n. 256 = 28 so 8 digits can code 256 symbols.

Each individual digit (ie. each individual 0 or 1) is a bit of information, and a sequence of 8 bits is known as 1 byte of information. A sequence of 0's and 1's is known as a binary number, and it corresponds to one of the decimal numbers we generally use. The table below gives a few examples:

Coding number (from 1 to 256)
Equivalent binary number
Keyboard symbol encoded by this binary number
63
00111111
?
64
01000000
@
65
01000001
A
66
01000010
B

Morse Code

Here is a copy of the Morse Code, which uses the binary system of short and long sounds or light beams.

A a dot a dash B a dash a dot a dot a dot a dot C a dash a dot a dash a dot D a dash a dot a dot
E a dot F a dota dot a dash a dot G a dash a dash a dot H a dot a dot a dot a dot
I a dot a dot J a dot a dash a dash a dash K a dash a dot a dash L a dot a dash a dot a dot
M a dash a dash N a dash a dot O a dash a dash a dash P a dot a dash a dash a dot
Q a dash a dash a dot a dash R a dot a dash a dot S a dota dota dot T a dash
U a dota dot a dash V a dota dota dot a dash W a dot a dash a dash X a dash a dot a dot a dash
Y a dash a dot a dash a dash Z a dash a dash a dot a dot    

Discussion

Have a look at the Morse Code:
  • How many letters use just one digit (ie. a dot or a dash ) ?
  • Which use the largest number of digits?
  • Are all the possible 2, 3 and 4 digit combinations used in the Morse Code?
  • Why do you think these particular combinations were chosen?

Go to top

Huffman Coding

Suppose event A occurs with probability 1/2, and events B, C, D and E with probabilities 1/8 each.
One very simple way to do a coding for these is to pick five sequences of three bits each, eg. A = 000, B = 001, C = 010, D = 011, E = 100. The average letter then takes 3 bits to describe it.

Huffman Coding can beat this. It's a bit like the '20 Questions' game, where we tried to cut down the possible options by a half at each stage. David Huffman invented his code in 1954, while he was still a graduate student at MIT. His code is now used in computers, television, modems, and so on. The story goes that Huffman had chosen the option of writing a paper on a problem given to the students by their professor, instead of doing a final exam. Huffman had worked on his paper for months, but failed to come up with a way of showing that his method of coding was the most efficient method available. He was about to give up, and revise for the final exam instead, when he hit on the solution: "It was the most singular moment of my life. There was the absolute lightning of sudden realization." Apparently when the professor saw Huffman's solution he said "Is that all there is to it!" Like the '20 questions' game, the idea is to cut down the options by half at each stage.

To create the Huffman coding for the example above [p(A) = 1/2, p(B) = 1/8, p(C) = 1/8, p(D) = 1/8, p(E) = 1/8] proceed as follows:

Stage 1:
Take two letters which have the smallest probabilities of occurring, eg. D and E, and bracket them together, (with the letter with the least probability of occurring on the left):
A B C DE
1/2 1/8 1/8 1/4
Stage 2: Repeat the above Stage, putting together the two letters or groups of letters with the smallest probabilities of occurring, (smallest probability on the left):
A BC DE
1/2 1/4 1/4
Stage 3: Repeat Stage 2 until all letters have been grouped together:
A BCDE
1/2 1/2
   
ABCDE
1

This isn't the only way you could have done this, of course, and you will get the opportunity to try a different way during Task 2.

We can represent these stages in a tree, as we did in the '20 Questions' game:

diagram showing stages in working out the Huffman coding for this example

This gives us the Huffman tree:

Huffman tree for this example

We can create codes for each letter from this tree by putting a 0 for each left hand edge of a pair, and a 1 for each right hand edge: A = 0; B = 100; C = 101; D = 110; E = 111.

Of course, this coding is not unique: if we had started with C and D, say, rather than D and E, we would end up with a different coding, but this is not important. What is important is the length of the sequence of symbols, or 'word', for each letter, rather than the actual sequence of 0's and 1's. If we now calculate the average length of a word, it is

\begin{displaymath}mean = \Sigma p(X) X = 1/2 x 1 + 4 x 1/8 x 3) = 2 \end{displaymath}

which is less than the 3 we obtained by just assigning symbols randomly to each letter.

Now have a go at Task 2.
Answers to Task 2
Go to top


Decipherability

Using the tree structure which Hoffman coding provides ensures that a message is decipherable, eg. that message 0110101100011 can be broken apart to give BCAAEB as we saw at the beginning of Task 2.

Huffman is the best coding

There is no coding from A, B, C, D, E, to a collection of 0's and 1's which is both
(a) decipherable
(b)of shorter average length
with these probabilities. Whatever the collection of letters, and whatever their probabilties, the Huffman coding is always best.

Entropy

Entropy gives us a way of talking about the degree of disorder in a system, or the degree of uncertainty about the outcomes - it measures how random something is, or, equivalently, how surprised we will be when we hear the answer. A physical example of the presence of entropy is the limit on how much the molecules in a volume of air can be compressed - they can't be compressed into zero volume, because they possess a degree of disorder or uncertainty about their position which can't be removed. In information theory, entropy manifests itself as the limit on how much data can be compressed. Data which is more random can be compressed less than data which has a degree or order to it. If you toss a fair coin, you expect the probability of getting a head to be 1/2, and the probability of getting a tail to be 1/2 also. The entropy of this coin toss is 1, which you will be able to verify in the next task. On the other hand, if you toss a biased coin, with a 90% probability of giving you a head, the entropy is only 0.47. You are much less surprised at the result, and the outcome is much less random.

Shannon found a formula for entropy. If a series of events, labelled 1, 2, 3, ..., r, have a probability of p(1), p(2), p(3), ..., ,p(r) of occurring, then the entropy of this set of events is given by:

formula for entropy, reads entropy equals the sum from s=1 to s=r of minus p(s) times the log to base 2 of p(s)

 

Task 3 will give you an opportunity to find out what this means, and to do some calculations of entropy. Have a go at this now.

Answers to Task 3
Additional notes for Task 3

The entropy formula involves logs to the base of 2, because these are the inverse of the operation of raising 2 to some power. In the Huffman coding and the '20 questions' game, where we want to cut the possibilities by half at each stage, the number of 'questions' asked or stages in the tree, doubles at each stage. This is equivalent to raising 2 to a power, since increasing the power by 1 doubles the previous answer.

Real-life examples

The symbols in the messages described so far are independent - a symbol is not affected by what went before, in the same way that the number you get when you throw a dice is not affeted by what went before. Many real-life messages won't be independent, however. For instance, on a video, one black pixel is more likely to be surrounded by black pixels than by white ones, and it is likely to be black still in the next frame. This is exploited by DVDs and MP3s.

Loss

So far, we have considered making a perfect copy of a message, or sequence of symbols or pixels. If we accept something that, is, say, 99% accurate, we can do better. For instance, MP3s cut out frequencies beyond the range of human hearing. Given a loud sound at one pitch, they remove quiet sounds at other pitches in the sound. This doesn't give a perfect copy, but if it downloads ten times faster, you may not care!

Go to top

 

 top of page

Back to : Text, Bytes and Videotape Main Page
contact | accessibility
© 2002 Millennium Mathematics Project, University of Cambridge