home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!noc.near.net!gateway!pictel!garys
- From: garys@pictel.com (Gary Sullivan)
- Subject: Re: more entropy
- Message-ID: <1992Jul24.225603.3832@pictel.com>
- Organization: PictureTel Corporation
- References: <1992Jul23.174740.14559@usenet.ins.cwru.edu>
- Date: Fri, 24 Jul 1992 22:56:03 GMT
- Lines: 32
-
- In article <1992Jul23.174740.14559@usenet.ins.cwru.edu> daf10@po.CWRU.Edu (David A. Ferrance) writes:
- >
- >If I have an unsigned int count[256][256], what is wrong with
- >calculating entropy like this:
- >
- >for (i=0;i<256;i++) for (j=0;j<256;j++) {
- > freq = count[i][j] / total;
- > ent += freq * log10(1/freq) / 0.30103;
- > }
- >Where total and ent are doubles, total is the # of bytes total, ent
- >starts off as 0, and the values of the array are the # of occurances of
- >each 2 letter combination?
- >
- >I get values > 8.
-
- "total" shouldn't be the total number of "bytes", it should be the total
- of the counts. I assume that's what you meant.
- The only other thing I can see that is wrong is that it's inefficient.
- I'd do:
-
- for(i=0, tmp=0.0; i<256; i++)
- for(j=0; j<256; j++) {
- freq = (double)count[i][j];
- tmp += freq * log10(freq);
- }
- entropy = (log10((double)total) - tmp / total) / 0.30103;
-
- As someone else replied, you'd need to worry if this came out above 16,
- but 8 or so is OK.
-
- ---------------------------------
- Gary Sullivan (garys@pictel.com)
-