home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!olivea!charnel!sifon!thunder.mcrcim.mcgill.edu!mouse
- From: mouse@thunder.mcrcim.mcgill.edu (der Mouse)
- Newsgroups: comp.lang.c
- Subject: Re: LZW Compression algorithm
- Message-ID: <1992Nov17.124947.7065@thunder.mcrcim.mcgill.edu>
- Date: 17 Nov 92 12:49:47 GMT
- References: <BxMAwz.D6C@willamette.edu> <Bxs4oA.9s6.2@cs.cmu.edu>
- Organization: McGill Research Centre for Intelligent Machines
- Lines: 48
-
- In article <Bxs4oA.9s6.2@cs.cmu.edu>, kosak+@cs.cmu.edu (Corey Kosak) writes:
- > In article <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
- >> In article <BxMAwz.D6C@willamette.edu>, bbrown@willamette.edu (Brian Brown) writes:
- >>> Does anybody have any information on how to implement LZW
- >>> compression/decompression?
- >> [description of compression algorithm]
-
- > Mr./Ms. Mouse leaves off the description of the decoding process.
-
- > The description that Herr/Frau Mouse gives for the encoding process
- > is almost correct. The problem is that in the given algorithm, the
- > encoder's table updates are one step ahead of the decoder's; this
- > usually doesn't matter, but will cause incorrect behavior for some
- > input.
-
- True enough. This is a relatively subtle point I probably should have
- mentioned: as you point out, when the input being compressed is of the
- form xfooxfoox for some character x and string foo, the compression
- algorithm will generate a code that the decompressor does not yet
- recognize at the time it gets it.
-
- The compress(1) source I have access to contains the following snippet:
-
- /*
- * Special case for KwKwK string.
- */
- if ( code >= free_ent ) {
- *stackp++ = finchar;
- code = oldcode;
- }
-
- When compressing KwKwK (w is a string, K is a single character) and Kw
- is present in the table but KwK is not, the encoder will emit the code
- for Kw and enter KwK, and the next code it emits will be the
- newly-entered one for KwK, which a naive decoder won't yet know how to
- recognize. Consequently, the decoder recognizes input codes it doesn't
- yet have in its table as indicating this has occurred, and compensates.
- (The decoder does have enough information to decode this correctly,
- since the previous code will always be the one for Kw, from which KwK
- can trivially be deduced.)
-
- Fortunately, this is the only circumstance in which a correct
- compresser will generate a code the decoder doesn't already know...
-
- der Mouse
-
- mouse@larry.mcrcim.mcgill.edu
-