A Fast Symbol Coding Scheme with Specific Application in Bulk Compression
A. Palau and G. Mirchandani
1998 Data Compression Conference, Snowbird, Utah


Given a source alphabet of M symbols and associated probabilities or their estimates, we find a sub-optimal set of codewords using a simple prefix property type of iterative algorithm to generate codewords lengths and a look-up table based mapping algorithm for assigning codewords. The expected codeword length L_f is slightly longer than that obtained for a Huffman code, but may equal it. When equal, the algorithm generates a larger set of applicable codewords. The time complexity for generating lengths and the associated codewords is less than that with Huffman, where these tasks have a complexity of O(M log M), while they are of order O(M) in the new algorithm. For bulk compression, where it is necessary to compress a large number of small files, the algorithm typically shows greater compression efficiency that that obtained with other standard UNIX based compressors.

Publications     Home