Difference between revisions of "Huffman coding"

From ScienceZero
Jump to: navigation, search
Line 3: Line 3:
 
The problem is to decide which codes to use, David A. Huffman found the optimal way of generating codes that guarantees the shortest possible output.
 
The problem is to decide which codes to use, David A. Huffman found the optimal way of generating codes that guarantees the shortest possible output.
  
Count the number of times each word occurs in the data, this will be a histogram
+
Count the number of times each word occurs in the data, this will be a histogram
 
+
 
  do
 
  do
 
   Find the two smallest numbers in the histogram
 
   Find the two smallest numbers in the histogram

Revision as of 03:03, 25 January 2011

A simple way of reducing the size of a block of data is to replace the most common words with shorter ones that are not already used in the data. It is possible to reconstruct the original data using a dictionary that lists the short words and the longer words they replace.

The problem is to decide which codes to use, David A. Huffman found the optimal way of generating codes that guarantees the shortest possible output.

Count the number of times each word occurs in the data, this will be a histogram

do
 Find the two smallest numbers in the histogram
 Create a new node with the sum of the two just found and a link to the two
 Mark the two originals as spent
loop until only one node remains