home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!sdd.hp.com!usc!hacgate!rigel!abirenbo
- From: abirenbo@rigel.cel.scg.hac.com (Aaron Birenboim)
- Newsgroups: comp.compression
- Subject: Re: entropy
- Message-ID: <22552@hacgate.SCG.HAC.COM>
- Date: 24 Jul 92 18:49:11 GMT
- References: <1992Jul22.144104.27576@usenet.ins.cwru.edu>
- Sender: news@hacgate.SCG.HAC.COM
- Organization: Hughes Aircraft Colorado Engineering Labs
- Lines: 45
-
- In article <1992Jul22.144104.27576@usenet.ins.cwru.edu> daf10@po.CWRU.Edu (David A. Ferrance) writes:
- >
- >Right now I compute the entropy of a file by summing the probability of
- >each character ocurring times the log (base2) of the reciprocal of that
- >probability. This gives me the theoretical minimum number of bits
- >needed to represent each character, on average (from what I understand).
- >What I would like to know is, does this number only work for first order
- >(1-ary?) compression techniques
-
- 2 points for david. good intuition. I only stumbled through my
- info-theory class, but this would seem to be the case to me.
- As soon as you start using some kind of memory, or markov chain,
- or dictionary the rules of the game change.
-
- > If it does not serve for the more complex
- >techniques (and I suspect it doesn't), is there any way to determine
- >entropies for different orders?
-
- i dunno. i'd like to hear some discussion on that though. I'd immagine
- that there are papers out covering this. anybody care to enlighten us?
-
- My guess as to how this works is as follows:
-
- Lets talk about entropy as bits/SYMBOL. If your symbols are
- all characters, you have the usual case. However, if, for
- example, you go to a DICTIONARY technique, you start getting
- SYMBOLS which are STRINGS of CHARACTERS. Now... the usual
- (first order ?!?) entropy law still holds. You can compute
- a theoretical minimum number fo bits per SYMBOL, but your
- symbol table is much more complex than before. This is why
- most good compression techniques can beat single-character entropy.
-
- Now... i'd like to see some kind of markov chain applied
- to strings of SYMBOLS... will this be somehow analagous to
- say encodeing the error from a predictor of order x.
- (like in a DPCM with prediction based of the last X symbols)
- This idea fits well to signals, but not so well to language
- where DPCM makes less sense... (but is conceivable).
-
- Will the info theory gurus please stand up ! ;-)
- --
- Aaron Birenboim |I am just too tired to think
- abirenbo%rigel.cel.scg.hac.com@hac2arpa.hac.com|of a good quote.
- abirenbo@isis.cs.du.edu | - ChickenHead
- w (303) 344-6486 |
-