home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!moe.ksu.ksu.edu!hobbes.physics.uiowa.edu!news.uiowa.edu!news
- From: jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879)
- Subject: Re: Fixed point of the lzw compression alg.
- Sender: news@news.uiowa.edu (News)
- Message-ID: <1992Dec15.172309.3667@news.uiowa.edu>
- Date: Tue, 15 Dec 1992 17:23:09 GMT
- References: <1992Dec15.165902.3227@news.uiowa.edu>
- Nntp-Posting-Host: pyrite.cs.uiowa.edu
- Organization: University of Iowa, Iowa City, IA, USA
- Lines: 58
-
- From article <1992Dec15.165902.3227@news.uiowa.edu>,
- by williams@herky.cs.uiowa.edu (Kent Williams):
-
- > From ren@function.mps.ohio-state.edu (Liming Ren):
- >
- >> Is there a fixed point for lzw?
-
- A really interesting question, but in a realm akin to "pure math" in that
- I can't see a single reason that the answer could be useful to anyone.
- (And I say this as someone who's looked seriously at fixed points of
- macro assembly languages and linking loaders!)
-
- > Taking the Unix Compress. You start out with ...
- >
- > <NO PREDECESSOR>N
- >
- > for N in 0..256. The code for <NO PREDECESSOR>N is N, but the code
- > will be output at nine bits of precision. ...
-
- This gives us the key we need to begin constructing a fixed point for
- LZW, but keep in mind that UNIX compress puts a three byte magic cookie
- on the front of the compressed file. Ignoring this magic cookie, we can
- make the following observation:
-
- Input XXXXXXXX < a single byte input file
-
- Output XXXXXXXX < the LZW compressed result
- 0
-
- Input XXXXXXXX < a two byte input file
- YYYYYYYY
-
- Output XXXXXXXX < the LZW compressed output
- YYYYYYY0
- 0Y
-
- From this, we conclude that if there is a fixed point, the second
- character must have an even code in binary (the LSB must be zero).
-
- Furthermore, each new byte that occurs (before a string is found
- that is actually compressible) adds one bit to the output, so there
- must eventually be a compressible string in the sequence to take away
- a bit.
-
- Conclusion. 7 new characters add 7 bits to the string. If the 8th
- and 9th characters are a repeat of a previous digraph in the string,
- they will add a single 9 bit lump for two characters, thus making the
- total input string length equal to the total output string length.
-
- Therefore, the shortest possible fixed point for LZW (ignoring end
- of file coding) will be something like:
-
- ABCDEFGAB
-
- with a letter substitution that is very carefully chosen.
-
- Doug Jones
- jones@cs.uiowa.edu
-