home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!sun4nl!cwi.nl!dik
- From: dik@cwi.nl (Dik T. Winter)
- Newsgroups: comp.compression
- Subject: Re: more entropy
- Message-ID: <6775@charon.cwi.nl>
- Date: 24 Jul 92 22:31:02 GMT
- References: <1992Jul23.174740.14559@usenet.ins.cwru.edu> <1992Jul24.003709.21603@s1.gov>
- Sender: news@cwi.nl
- Organization: CWI, Amsterdam
- Lines: 41
-
- In article <1992Jul24.003709.21603@s1.gov> lip@s1.gov (Loren I. Petrich) writes:
- > In article <1992Jul23.174740.14559@usenet.ins.cwru.edu> daf10@po.CWRU.Edu (David A. Ferrance) writes:
- >
- > >for (i=0;i<256;i++) for (j=0;j<256;j++) {
- > > freq = count[i][j] / total;
- > > ent += freq * log10(1/freq) / 0.30103;
- > > }
- >
- > I presume that the code also included:
- >
- > ent = 0;
- > total = 0;
- > for (i=0;i<256;i++) for (j=0;j<256;j++)
- > total += count[i][j];
- >
- > >I get values > 8.
- > The theoretical maximum value is log2(256*256) = 16.
- >
- Right. And indeed the entropy can very well be larger than 8. What creates
- the confusion is that you calculate the entropy of pairs of characters, and
- in that case an entropy of 8 means you can compress by at most 50% by encoding
- pairs of characters. What you want is single character entropy with memory
- of one previous character. I am not an expert in information theory etc.
- (and indeed never have read very much about it), but my offhand suggestion
- is: add an array count1 of 256 elements where you count the occurrences of
- individual characters. Next change the statement:
- freq = count[i][j] / total;
- ent += freq * log10(1/freq) / 0.30103;
- to:
- freq = (double)count[i][j] / (double)count1[i];
- ent += (double)count1[i] / total * freq * log10(1/freq) / 0.30103;
- (perhaps including some of the other suggestions, like using log2, -log2(freq)
- etc.). This way you calculate for each character the entropy when it is
- following a specific character and you weigh that by the relative frequency
- of that preceding character.
-
- dik
- --
- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland
- home: bovenover 215, 1025 jn amsterdam, nederland
- dik@cwi.nl
-