Huffman coding
From ScienceZero
A simple way of reducing the size of a block of data is to replace the most common words with shorter ones that are not already used in the data. It is possible to reconstruct the original data using a dictionary that lists the short words and the longer words they replace.
The problem is to decide which codes to use, David A. Huffman found the optimal way of generating codes that guarantees the shortest possible output.
Count the number of times each word occurs in the data, this will be a histogram do Find the two active nodes with the lowest count, these are the parents Create a new child node and link it to each of the two parent nodes, let it contain the sum of the two parents Mark the two parent nodes as as inactive loop until only one active node remains
We will now have a binary tree structure.
Example
"Hello_World" The occurrence of each letter sorted in ascending order from left to right: H e W r d _ o l 1 1 1 1 1 1 2 3 \ / \ / \ / | | 2 2 2 | | \ / \ / | 4 4 | \ \ / \ 7 \ / \ / 11