NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / compress / 3828 < prev next >

Wrap

Text File | 1992-11-14 | 1.7 KB | 42 lines

Newsgroups: comp.compression Path: sparky!uunet!destroyer!news.iastate.edu!hobbes.physics.uiowa.edu!news.uiowa.edu!news From: jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) Subject: Re: How well do random files crunch? Sender: news@news.uiowa.edu (News) Message-ID: <1992Nov13.151607.23707@news.uiowa.edu> Date: Fri, 13 Nov 1992 15:16:07 GMT References: <1992Nov13.135437.19044@decuac.dec.com> Nntp-Posting-Host: pyrite.cs.uiowa.edu Organization: University of Iowa, Iowa City, IA, USA Lines: 29 From article <1992Nov13.135437.19044@decuac.dec.com>, by bell@ufp.enet.dec.com (): > > So the probability for any given character to occur is 1/256, no matter > the file length (because it's an even distribution). Don't expect any compression. In fact, expect some expansion with any compression algorithm that's not theoretically optimal. > > How about for a parabolic distribution? I mean (since I'm not that good > with statistics) that it's more likely to get a high or low number, that > one close to 0.5. Do you think that would make any difference? If different characters have significantly different probabilities, expect some compression with just about any decent compression algorithm. If the likelyhood of different characters depends in any way on the context in which that character is found, expect to gain from higher-order source models. For example, you can construct strings where all letters are equally likely, but where the letters A-M are always followed by the letters N-Z and vicea versa. In such a "language", A simple Huffman code won't manage to compress anything, but both LZW and context dependant Huffman coding will compress the files fairly well. Doug Jones jones@cs.uiowa.edu