NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 16569 < prev next >

Wrap

Internet Message Format | 1992-11-15 | 4.8 KB

Path: sparky!uunet!ogicse!das-news.harvard.edu!cantaloupe.srv.cs.cmu.edu!kosak From: kosak+@cs.cmu.edu (Corey Kosak) Newsgroups: comp.lang.c Subject: Re: LZW Compression algorithm Message-ID: <Bxs4oA.9s6.2@cs.cmu.edu> Date: 15 Nov 92 22:48:09 GMT Article-I.D.: cs.Bxs4oA.9s6.2 References: <BxMAwz.D6C@willamette.edu> <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu> Sender: Corey Kosak <kosak+@cmu.edu> Organization: School of Computer Science, Carnegie Mellon Lines: 96 Nntp-Posting-Host: ibis.warp.cs.cmu.edu In article <1992Nov15.200257.20378@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes: >In article <BxMAwz.D6C@willamette.edu>, bbrown@willamette.edu (Brian Brown) writes: > >> Does anybody have any information on how to implement LZW >> compression/decompression? > >You have a table of 65536 (2^16) entries. Each table entry >conceptually contains a string of bytes, or is empty. Initially, 256 >of the entries contain the 256 possible single-byte strings, with the >rest empty. Initialize the algorithm by reading the first byte of >input into a working variable which conceptually contains a string of >bytes. > >While the current working string of bytes is listed in the table, read >another byte. As soon as the string is not listed in the table, output >the table index corresponding to the last string that was in the table, >add the working string to the table, and reset the working string so it >contains just the last byte. Then go back to the beginning of this >paragraph. Dealing with EOF on input complicates this slightly, but >only slightly. Dealing with a full table is fairly easy: just skip the >"add the working string to the table" step. Mr./Ms. Mouse leaves off the description of the decoding process. However, it is fairly straightforward. Input a table index, decode it by retrieving the associated string from your table, and then add a new entry to your table. This new entry is the concatenation of the previous string found and the first character of the current string. (Note that for the first time through the loop, there is no "previous string", so you do not do a table update) The description that Herr/Frau Mouse gives for the encoding process is almost correct. The problem is that in the given algorithm, the encoder's table updates are one step ahead of the decoder's; this usually doesn't matter, but will cause incorrect behavior for some input. Consider an input that begins with "aaaa", and trace the algorithm. 1. Initialize the table 2. read a byte. (the first 'a'. Our working string is "a". This is in the table, so we loop) 3. read another byte. (the second 'a'. Now our working string is "aa", which is not in the table. Following the instructions, we: - output the index for the last string that was in the table OUTPUT (INDEX ("a")) -- assume for the sake of argument that this index is 97 - add the working string to the table TABLE [256] = "aa" - reset the working string so it contains the last byte WORKING_STRING = "a" - loop 4. read another byte. (the third 'a'. Now our working string is "aa". This is in the table, so we loop) 5. read another byte. (the fourth 'a'. Now our working string is "aaa", which is not in the table. Following the instructions we: - output the index for the last string that was in the table OUTPUT (INDEX ("aa")) -- this is 256 - add the working string to the table TABLE [257] = "aaa" - reset the working string so it contains the last byte WORKING_STRING = "a" - loop Let's stop here and look at what happens when we try to decompress this data. So far, we've emitted two codes: 97 and 256. Decoding 97 tells us that the first byte is 'a'. But when we encounter the second code, 256, we have no way of knowing what the sender has in slot 256 of its array. Maybe "aa". Maybe not. As I mentioned above, this problem is caused by the encoder being one step ahead of the decoder with respect to table updates. The way to fix this is to delay sender table updates by one iteration. So the revised instructions might read: >While the current working string of bytes is listed in the table, read >another byte. As soon as the string is not listed in the table, output >the table index corresponding to the last string that was in the table, ***add LAST_UPDATE to the table, set LAST_UPDATE to the current working string,*** >and reset the working string so it >contains just the last byte. Then go back to the beginning of this >paragraph. Dealing with EOF on input complicates this slightly, but >only slightly. Dealing with a full table is fairly easy: just skip the >"add the working string to the table" step. (The same note applies to the compressor: for the first time through the loop, LAST_UPDATE is not valid, so you do not add anything to the table.) -Corey Kosak