home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!news.uiowa.edu!ns-mx!pyrite.cs.uiowa.edu
- From: jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879)
- Newsgroups: comp.compression
- Subject: Re: entropy
- Message-ID: <13282@ns-mx.uiowa.edu>
- Date: 22 Jul 92 15:46:14 GMT
- References: <1992Jul22.144104.27576@usenet.ins.cwru.edu>
- Sender: news@ns-mx.uiowa.edu
- Lines: 27
-
- From article <1992Jul22.144104.27576@usenet.ins.cwru.edu>, by daf10@po.CWRU.Edu (David A. Ferrance):
- >
- > Right now I compute the entropy of a file by summing the probability of
- > each character ocurring times the log (base2) of the reciprocal of that
- > probability. This gives me the theoretical minimum number of bits
- > needed to represent each character, on average (from what I understand).
- > What I would like to know is, does this number only work for first order
- > (1-ary?) compression techniques or does it serve also as a limit for
- > more complex techniques?
-
- This is a universal way to compute entropy, but the answer you get depends
- on how you compute the probability of a character occuring. The first
- order entropy is obtained by measuring probabilities in isolation -- the
- probability of a letter is taken out of context. To get the second order
- entropy, you look at the probability of each letter in the context of the
- immediately preceeding letter. For the third order, you look at each letter
- in the context of each pair of preceeding letters, and so on.
-
- The absolute entropy of a bit of text can only be determined if you use an
- infinite order model, taking into account every bit of context information
- available. Shannon did a very elegant experiment to measure this by having
- human subjects read bits of text up to random points and then guess the next
- letter. This is how he arrived at the estimate that the entropy of English
- text is a bit over one bit per character.
-
- Doug Jones
- jones@cs.uiowa.edu
-