NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 16662 < prev next >

Wrap

Internet Message Format | 1992-11-17 | 2.5 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!olivea!charnel!sifon!thunder.mcrcim.mcgill.edu!mouse From: mouse@thunder.mcrcim.mcgill.edu (der Mouse) Newsgroups: comp.lang.c Subject: Re: LZW Compression algorithm Message-ID: <1992Nov17.124947.7065@thunder.mcrcim.mcgill.edu> Date: 17 Nov 92 12:49:47 GMT References: <BxMAwz.D6C@willamette.edu> <Bxs4oA.9s6.2@cs.cmu.edu> Organization: McGill Research Centre for Intelligent Machines Lines: 48 In article <Bxs4oA.9s6.2@cs.cmu.edu>, kosak+@cs.cmu.edu (Corey Kosak) writes: > In article <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes: >> In article <BxMAwz.D6C@willamette.edu>, bbrown@willamette.edu (Brian Brown) writes: >>> Does anybody have any information on how to implement LZW >>> compression/decompression? >> [description of compression algorithm] > Mr./Ms. Mouse leaves off the description of the decoding process. > The description that Herr/Frau Mouse gives for the encoding process > is almost correct. The problem is that in the given algorithm, the > encoder's table updates are one step ahead of the decoder's; this > usually doesn't matter, but will cause incorrect behavior for some > input. True enough. This is a relatively subtle point I probably should have mentioned: as you point out, when the input being compressed is of the form xfooxfoox for some character x and string foo, the compression algorithm will generate a code that the decompressor does not yet recognize at the time it gets it. The compress(1) source I have access to contains the following snippet: /* * Special case for KwKwK string. */ if ( code >= free_ent ) { *stackp++ = finchar; code = oldcode; } When compressing KwKwK (w is a string, K is a single character) and Kw is present in the table but KwK is not, the encoder will emit the code for Kw and enter KwK, and the next code it emits will be the newly-entered one for KwK, which a naive decoder won't yet know how to recognize. Consequently, the decoder recognizes input codes it doesn't yet have in its table as indicating this has occurred, and compensates. (The decoder does have enough information to decode this correctly, since the previous code will always be the one for Kw, from which KwK can trivially be deduced.) Fortunately, this is the only circumstance in which a correct compresser will generate a code the decoder doesn't already know... der Mouse mouse@larry.mcrcim.mcgill.edu