home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.c
- Path: sparky!uunet!utcsri!newsflash.concordia.ca!sifon!thunder.mcrcim.mcgill.edu!mouse
- From: mouse@thunder.mcrcim.mcgill.edu (der Mouse)
- Subject: Re: LZW Compression algorithm
- Message-ID: <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu>
- Organization: McGill Research Centre for Intelligent Machines
- References: <BxMAwz.D6C@willamette.edu>
- Date: Sun, 15 Nov 92 20:02:57 GMT
- Lines: 68
-
- In article <BxMAwz.D6C@willamette.edu>, bbrown@willamette.edu (Brian Brown) writes:
-
- > Does anybody have any information on how to implement LZW
- > compression/decompression?
-
- It's fairly simple. I will outline a 16-bit compress compressing a
- stream of 8-bit bytes. This is for simplicity, so that I don't have to
- keep using roundabout language to describe sizes - it's fairly clear
- how to alter the sizes anyway. I'll describe the essential core
- algorithm first, then some refinements.
-
- You have a table of 65536 (2^16) entries. Each table entry
- conceptually contains a string of bytes, or is empty. Initially, 256
- of the entries contain the 256 possible single-byte strings, with the
- rest empty. Initialize the algorithm by reading the first byte of
- input into a working variable which conceptually contains a string of
- bytes.
-
- While the current working string of bytes is listed in the table, read
- another byte. As soon as the string is not listed in the table, output
- the table index corresponding to the last string that was in the table,
- add the working string to the table, and reset the working string so it
- contains just the last byte. Then go back to the beginning of this
- paragraph. Dealing with EOF on input complicates this slightly, but
- only slightly. Dealing with a full table is fairly easy: just skip the
- "add the working string to the table" step.
-
- Now for the refinements I mentioned above.
-
- Note that the decoder must keep a similar table, and must use the same
- algorithm for choosing unused slots that the encoder uses (so that
- table indices match). One of the simplest ways of doing this is to
- make the table an array, with a single variable marking the point up to
- which the table is filled, with the preloaded entries at the bottom.
- When this is done, there is an additional advantage: the first 256
- output codes will all be 511 or less, and both the encoder and decoder
- know this, so only nine bits per code need to actually be emitted.
- Similarly, the next 512 will need only ten bits each, etc.
-
- At least one implementation - compress(1) - has an additional feature:
- in addition to the 256 preloaded entries I described above, there is a
- 257th code that is reserved. Once the table is full, the program
- continually monitors the compression ratio it's achieving, and if it
- drops too low, it emits this code and clears its table back to the 257
- preloaded entries. (The decoder program must of course take similar
- action upon seeing this code in the compressed stream.)
-
- An additional idea I have experimented with is to replace the compress
- technique for dealing with a full table with another one: when the
- table fills up, scan all table entries, and free up any whose strings
- are such that when any character is appended to them, the result is not
- in the table. (That is, the "leaves" of the tree. Of course, the
- predefined entries can never be freed.) In my (minimal) tests, this
- had comparable effect to compress(1)'s strategy.
-
- If you wish to be compatible with any existing implementation (eg,
- compress(1), or the compression used by GIFs), you will have to be
- careful to make sure that your table sizes and code choice algorithms
- and so on are all identical to those used by what you're trying to be
- compatible with. The best way to do this is probably to find source
- code that implements it. In the case of GIFs, this is easy; pbmplus
- contains a ppmtogif filter. I think some version of compress is
- available free, but I'm not sure.
-
- der Mouse
-
- mouse@larry.mcrcim.mcgill.edu
-