A Fast Symbol Coding Scheme with Specific Application in Bulk Compression
A. Palau and G. Mirchandani
1998 Data Compression Conference, Snowbird, Utah
Given a source alphabet of M symbols and associated
probabilities or their estimates, we find a sub-optimal
set of codewords using a simple prefix property type of
iterative algorithm to generate codewords lengths and
a look-up table based mapping algorithm for assigning
codewords. The expected codeword length L_f is slightly
longer than that obtained for a Huffman code, but may
equal it. When equal, the algorithm generates a larger
set of applicable codewords. The time complexity for
generating lengths and the associated codewords is less than
that with Huffman, where these tasks have a complexity of
O(M log M), while they are of order O(M) in the new
algorithm. For bulk compression, where it is necessary to
compress a large number of small files, the algorithm
typically shows greater compression efficiency that that
obtained with other standard UNIX based compressors.