Last modified: April 17, 1997
A Huffman tree is a binary tree which minimizes the sum of f(i)D(i) over all leaves i, where
f(i) is the frequency or weight of leaf i and D(i) is the length of the path from the root to leaf i.
In each of the applications, f(i) has a different physical meaning.
See topic 21 on coding and compression.
Huffman trees have the following properties:
Let us assume an alphabet containing the letters "a", "b", "c", "d" and "e" which are the names of leaves shown in Table 1 with their corresponding frequencies.
 According to Table 1 the leaves "c" and "b" are siblings and will be furthest away from the root. The two leaves with smallest frequencies should be siblings so that we keep the sum f(i)D(i) at its minimum value. A counter example, that is, if we did not make the two leaves with smallest frequencies siblings, would result in a tree where the sum f(i)D(i) would not be at its minimum possible value. This would then fail the definition of the Huffman tree. In this case, we would have to swap the leaf with a smaller frequency to be siblings with the other smaller frequency leaf. Also, by a counter example we can prove that every internal node must have 2 children to be a Huffman tree. Let us consider what would happen if we removed the leaf "a" in the tree below. 

Removing the leaf would result in a tree where the sum of f(i)D(i) would no longer be at its minimum value.Therefore, we need to reconstruct the tree by removing a leaf from the subtree of the node to make it its second child. The resulting tree is shown below.
After two siblings are identified, removing them and replacing them by an internal node in the list of leaves is fine. This way, the problem is reduced in size by one. (This follows from the fact that the Huffman tree minimizes the sum of f(i)D(i), and not, say, the sum of f(i)D(i)^5.)
It should be noted that Huffman trees are not unique. That is, you can have different trees for the same set of leaves with the same frequencies. For example, if you have leaves with the same frequencies, their positions with respect to each other is not important.
We develop an efficient O(n log n) algorithm for the construction
of a Huffman tree due to Hu and Tucker. An array of nodes with frequency and left and right pointers is
used:



In the top half of the table, arrays 1 to n (the pink area), we have the leaves of the Huffman
tree. They have given frequencies and have no left or right children, hence the 0's in the columns.
In the bottom half of the tree,arrays n+1 to 2n1, we have the internal nodes, including the root at 2n1.
Their frequencies (in the green area) can be calculated from their left and right children. As we shall see in the
following section, the worst case complexity is O(n log n).
The alphabet is arranged in a heap, which has by definition, the smallest element at the top. The key used to order the heap is the frequencies of the characters. The heap is initially composed of n pairs of (i, f(i)) for leaves i. At each iteration, two elements of minimal frequency are deleted and an internal node is created. The frequency of this newly created internal node is the sum of frequencies of its children and is represented by a new "supercharacter" which has the probability of finding its children. This node is in turn inserted into the heap and the process is repeated. 
O(n)  Make heap using (1,f_{1}),...,(n,f_{n}) 

O(n log n)  FOR i = n+1 TO 2n1 DO 
 
 
 
 
 
 
RETURN 
The FOR loop, which is executed n1 times for an alphabet of size n, repeatedly extracts the two nodes with the lowest frequencies, and replaces them with a node with a frequency equal to the sum of their frequencies. This new node has the two previous smallest nodes as its left and right children. The worst case is O(n log n)
A language is made up of symbols or characters. The set of symbols of a language is called its alphabet. A code word is a way of representing a symbol using other symbols. Each symbol of an alphabet has its own unique codeword. The collection of codewords is known as a code.
Codes are used in many computer applications. One such application is the compression and decompression of data. Another common application is used in the transmission of data. Data is sent over phone lines or data cables as electrical signals . Since these electrical signals are a form of binary language (having an alphabet of only "0" and "1" or "off" and "on"), the data must be encoded. To make the transmission as quick as possible, we like an efficient code with the shortest possible average codeword length. However, the average length of the codeword is NOT simply the average length of the codewords but rather the average of each codeword's length multiplied by the symbol's probability p(i), as indicated by the following formula:
where Pi is the probability of the symbol. The probability of a symbol is the probability of finding it in a text. As always, the sum of the probabilities p(i) is equal to 1.
A prefix code is a code one where no codeword is also a prefix of some other codeword. For example, you cannot have one code be "110" and another "1101" since the first is a prefix to the second.
If we take our example of the five letter alphabet from Table 1, we can generate a Huffman tree for it. Note: The frequency of the characters can be given beforehand from statistical tables or by processing the file once. The frequency can be given in the form of probabilities, percentages or any other numeric way.

The prefix code can be visualized by a trie with leaves corresponding to symbols. The average that average codeword length is the sum over all leaves I of the probability times the distance of the leaf from the root. If we treat the Huffman tree as a trie, the codeword for each character of the alphabet is generated by following the path from the root to the character, where each left branch is a 0 and each right branch is a 1. A Huffman code it is obtained by using the codewords obtained by constructing the Huffman tree for the probabilities.If the probabilities are exact, then the Huffman code gives optimal average number of bits per input symbol among all prefix codes, but not necessarily optimal among all compression methods (see also LempelZiv Compression )! This is a tree optimization problem of the same form as that discussed in the definition of Huffman trees, but here, probability p(i) replaces frequency f(i). 

Using our example, we can calculate the average codeword length:
0.49*1 + 0.20*2 + 0.12*3 + 0.11*3 + 0.08*3 = 1.82 

We would like to merge n sorted files of sizes s(1), . . . ,s(n) into one long sorted file with as few comparisons as possible. The number of comparisons required to merge two sorted files of sizes m and n is m + n in the worst case (actually, m+n1 comparisons, but we will ignore the "1" for now.) We can only merge two files at a time, so the order of merging will be important.
For example, take five files of size 200, 150, 810, 300 and 40. Let us try to merge them in that order. First, we merge the files of size 200 and 150 to obtain a file of size 350. Next, we merge the file of size 350 with the file of size 810 to give a file of size 1160. Next, we merge the file of size 1160 with the file of size 300 to give a file of size 1460. And finally, we merge the files of size 1460 and 40 to give a file of size 1500. We can visualize this merging process with the help of the following binary tree:
Each file is a leaf in the tree. Two siblings merge into a larger parent file.
This might seem okay, but it is in fact quite wasteful. Because the earlier files are merged again and again as part of the bigger files, the number of comparisons required to merge the files is that particular order is 200 + 150 + 350 + 810 + 1160 + 300 + 1460 + 40 = 4470. This is unavoidable, but there is a more efficient way to proceed that will result in fewer comparisons.
If we were to look at this from a tree perspective, the total number of comparisons would be the sum of the sizes of the files s(i) multiplied by their distances D(i) from the root.
Again, we have the same optimization: we wish to minimize the sum of s(i)D(i) over all leaves i. Note that s(i) now plays the role of f(i) . The optimal method to merge these files would be to go in increasing order of size: merge the files of size 40 and 150 to give a file of size 190, merge the files of size 190 and 200 to give a file of size 390, merge the files of size 390 and 300 to give a file of size 690, merge the files of size 690 and 810 to give a final file of size 1500. This would give a total number of comparisons of: 40 + 150 + 190 + 200 + 390 + 300 + 690 + 810 = 2770 (compared to 4470 comparisons using the other order). This way, the smallest files are repeatedly merged the most. This is analogous to the character with the lowest frequency having the longest codeword in Huffman coding.
Of all trees, the Huffman tree is optimal. One of the fundamental properties of Huffman trees is the smallest
frequencies are the furthest away from the root and siblings merge into their parent. Retaining this fact, we do not really need the tree. The only thing we
need to note is that the two smallest files must be merged first. This step can then be repeated over and over.
To generate a random integer with given probabilities.
As a case in point, take the imperfect die with six probabilities. The method used is based upon a binary tree in which each leaf corresponds to an integer. The idea is to take a uniform [0,1] random number U, available in most software libraries and environments, and to travel from root to leaf so that each leaf is reached with the correct probability.
Our example will assume the probabilities Pi for each of the i probabilities of occuring as shown below. For example, the probability of rolling a 1 is 0.15 whereas the probability of rolling a 2 is 0.20. We want to construct a tree which will reach each of the six leaves with the correct probability.
The algorithm for traversing the tree associates with each internal node i a threshold value t(i). The t(i) values are the sum of all leaf values to its left. To find the t(i) values you traverse the tree in preorder:
Preorder(i) 
if i is not NIL 




Once this is set up, each internal node has a threshold value which will allow us to move down the tree and reach the leaves with correct probability. The tree below shows the threshold values at each internal node. We will compare these values with the given random number.
The algorithm below allows us to move down the tree and reach one of the six leaves with the correct probability.
Generate U uniformly on [0,1]  
i<root of the tree  
while i is not a leaf do  
 
 
 
return index[i] (where index[i] is the number associated with leaf i).  
Now, according the above statement, we would need 3 time units to reach the leaf "6" and 2 time units to
reach the leaf "2". The expected time is then the sum over all leaves of p(i)D(i), where the p(i)'s are the given probabilities.
Once again this is in the format seen earlier for Huffman trees. The tree that optimizes the expected time is the Huffman
tree. So, one should construct the Huffman tree (as shown above).