NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 16561 < prev next >

Wrap

Text File | 1992-11-15 | 3.9 KB | 78 lines

Newsgroups: comp.lang.c Path: sparky!uunet!utcsri!newsflash.concordia.ca!sifon!thunder.mcrcim.mcgill.edu!mouse From: mouse@thunder.mcrcim.mcgill.edu (der Mouse) Subject: Re: LZW Compression algorithm Message-ID: <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu> Organization: McGill Research Centre for Intelligent Machines References: <BxMAwz.D6C@willamette.edu> Date: Sun, 15 Nov 92 20:02:57 GMT Lines: 68 In article <BxMAwz.D6C@willamette.edu>, bbrown@willamette.edu (Brian Brown) writes: > Does anybody have any information on how to implement LZW > compression/decompression? It's fairly simple. I will outline a 16-bit compress compressing a stream of 8-bit bytes. This is for simplicity, so that I don't have to keep using roundabout language to describe sizes - it's fairly clear how to alter the sizes anyway. I'll describe the essential core algorithm first, then some refinements. You have a table of 65536 (2^16) entries. Each table entry conceptually contains a string of bytes, or is empty. Initially, 256 of the entries contain the 256 possible single-byte strings, with the rest empty. Initialize the algorithm by reading the first byte of input into a working variable which conceptually contains a string of bytes. While the current working string of bytes is listed in the table, read another byte. As soon as the string is not listed in the table, output the table index corresponding to the last string that was in the table, add the working string to the table, and reset the working string so it contains just the last byte. Then go back to the beginning of this paragraph. Dealing with EOF on input complicates this slightly, but only slightly. Dealing with a full table is fairly easy: just skip the "add the working string to the table" step. Now for the refinements I mentioned above. Note that the decoder must keep a similar table, and must use the same algorithm for choosing unused slots that the encoder uses (so that table indices match). One of the simplest ways of doing this is to make the table an array, with a single variable marking the point up to which the table is filled, with the preloaded entries at the bottom. When this is done, there is an additional advantage: the first 256 output codes will all be 511 or less, and both the encoder and decoder know this, so only nine bits per code need to actually be emitted. Similarly, the next 512 will need only ten bits each, etc. At least one implementation - compress(1) - has an additional feature: in addition to the 256 preloaded entries I described above, there is a 257th code that is reserved. Once the table is full, the program continually monitors the compression ratio it's achieving, and if it drops too low, it emits this code and clears its table back to the 257 preloaded entries. (The decoder program must of course take similar action upon seeing this code in the compressed stream.) An additional idea I have experimented with is to replace the compress technique for dealing with a full table with another one: when the table fills up, scan all table entries, and free up any whose strings are such that when any character is appended to them, the result is not in the table. (That is, the "leaves" of the tree. Of course, the predefined entries can never be freed.) In my (minimal) tests, this had comparable effect to compress(1)'s strategy. If you wish to be compatible with any existing implementation (eg, compress(1), or the compression used by GIFs), you will have to be careful to make sure that your table sizes and code choice algorithms and so on are all identical to those used by what you're trying to be compatible with. The best way to do this is probably to find source code that implements it. In the case of GIFs, this is easy; pbmplus contains a ppmtogif filter. I think some version of compress is available free, but I'm not sure. der Mouse mouse@larry.mcrcim.mcgill.edu