Huffman coding

Huffman coding is one of the oldest methods of compressing data and is based on creating a Huffman tree, which is used to both encode and decode data. Huffman coding can represent data content in a more compact form by exploiting the fact that some data (for instance certain characters of the alphabet) appears more frequently in a data stream. By using encodings of different length (shorter for the most frequent characters and longer for the least frequent ones), the data consumes less space.

Now, let's learn a few terminologies related to Huffman coding:

Coding: Coding in the context of data represents the method of representing data from one form to another. We would like the resultant form to be concise.
Codeword: A particular character in encoded form is called a codeword.
Fixed-length coding: This is when each encoded character, that is, codeword, uses the same number of bits.
Variable-length coding: Codewords can use a different number of bits.
Evaluation of code: This is the expected number of bits per codeword.
Prefix free codes: This means that no codeword is a prefix of any other codeword.
Decoding: This means that a variable-length code must be free from any prefix.

For understanding the last two terms, you need to have a look at this table first:

character	frequency	fixed length code	variable length code
L	.45	000	0
M	.13	001	101
N	.12	010	100
X	.16	011	111
Y	.09	100	1101
Z	.05	101	1100

Now, we can infer the following:

Fixed length code: The fixed-length code for this table is 3.
Variable length code: The variable-length code for this table is 45(1) + .13(3) + .12(3) + .16(3) + .09(4) + .05(4) = 2.24.

The following diagram shows the Huffman tree created from the preceding example:

Note that Huffman encoding is about converting data into a Huffman tree that enables compression. Decoding or decompression brings the data back to the original format.

Table of Contents for Huffman coding

Create new playlist

Sign In

Sign Up

Table of Contents for
Huffman coding