|
|
|
|
|
Back to : Text, Bytes and Videotape Main Page
Text, Bytes and VideotapeTask 2: AnswersUsing A = 1, B = 011, C = 010, D = 001, E = 000 (which is another Huffman coding for the example in Olly's talk) to work out what 01101011000011 means (starting from the left), we might start by assuming that the words come in blocks of three as far as possible, since the only word with just one bit in it is A (1), giving:011 / 010 / 11000011 At this point we hit a snag, because there isn't a word 110 in our code, so the first 1 at least must be an A. The second 1 must also be an A as we don't have 100 in our code either. So altogether we have:
In each case the average word length is the same, ie. 2. Using the same algorithm to find a Huffman coding for these probabilities: p(a) = 0.34; p(b) = 0.12; p(c) = 0.05; p(d) = 0.17; p(e) = 0.10; p(f) = 0.16; p(g) = 0.06, one way to start is to put the letters into order according to their probability, with the least probable on the left, then to put the first two together. On the next row, reorder the letters, with the new unit put into its correct place, according to its probability. Repeat this until all the letters are accounted for. At each stage, the letters which have been bracketed together are shown in bold type.
This gives the following Huffman tree:
From this we can read off the codeword for each letter:
The bottom row of this table gives the probability of each letter occurring,
for comparison. We can see that the Huffman coding gives the letters with the
highest probability of occurring the shortest codewords (a and d
have length 2), and the letters with the smallest probability of occurring have
the longest codewords (c and g have length 4). The average
length for a codeword in this coding is:
Calculating the mean length shows why it is important to have the shortest codewords associated with the letters which occur most frequently, and vice versa. Here is an alternative coding made up to see if it is better than the Huffman coding:
The lengths of the codewords were chosen to minimise the mean length of a letter, which works out at 1.2, much less than that of the Huffman coding. This would be a better coding than the Huffman coding, if it is decipherable. This is a random selection of letters from {a, b, c, d, e, f, g} (generated using the random integer function of a graphic calculator) coded using both codes: Alternative coding: 00011000110010100110011110110010000100101 Huffman coding 1011000110100001010011011000011000011101000011001010111011000100 The Huffman coding is clearly longer, which coresponds to the average word length being longer, so on that basis the Alternative coding would be preferable. What happens when we try to decipher it?
0 / 0 / 0 / 1 / 1 / 0 / 0 / 0 / 110010100110011110110010000100101 = a a
a d d a a a ... Compare this with the Huffman coding:
101 / 100 / 0110 / 100 / 00 / 101 / 00 / 11 / 0110 / 00 / 0110 / 00 / 0111 / 010 / 00 / 0110 / 010 / 101 / 11 / 0110 / 00 / 100 = f b c b d f d a c d c d g e d c e f a c d b This is precisely the original message, and at no point was there the option of decoding it differently (try it out for yourself).
Return to Task 2
Back to : Text, Bytes and Videotape Main Page contact | accessibility © 2002 Millennium Mathematics Project, University of Cambridge
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||