home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!destroyer!news.iastate.edu!hobbes.physics.uiowa.edu!news.uiowa.edu!news
- From: jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879)
- Subject: Re: How well do random files crunch?
- Sender: news@news.uiowa.edu (News)
- Message-ID: <1992Nov13.151607.23707@news.uiowa.edu>
- Date: Fri, 13 Nov 1992 15:16:07 GMT
- References: <1992Nov13.135437.19044@decuac.dec.com>
- Nntp-Posting-Host: pyrite.cs.uiowa.edu
- Organization: University of Iowa, Iowa City, IA, USA
- Lines: 29
-
- From article <1992Nov13.135437.19044@decuac.dec.com>,
- by bell@ufp.enet.dec.com ():
- >
- > So the probability for any given character to occur is 1/256, no matter
- > the file length (because it's an even distribution).
-
- Don't expect any compression. In fact, expect some expansion
- with any compression algorithm that's not theoretically optimal.
- >
- > How about for a parabolic distribution? I mean (since I'm not that good
- > with statistics) that it's more likely to get a high or low number, that
- > one close to 0.5. Do you think that would make any difference?
-
- If different characters have significantly different
- probabilities, expect some compression with just about any
- decent compression algorithm. If the likelyhood of different
- characters depends in any way on the context in which that
- character is found, expect to gain from higher-order source
- models.
-
- For example, you can construct strings where all letters are
- equally likely, but where the letters A-M are always followed
- by the letters N-Z and vicea versa. In such a "language",
- A simple Huffman code won't manage to compress anything, but
- both LZW and context dependant Huffman coding will compress
- the files fairly well.
-
- Doug Jones
- jones@cs.uiowa.edu
-