home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!spool.mu.edu!yale.edu!ira.uka.de!rz.uni-karlsruhe.de!stepsun.uni-kl.de!sun.rhrk.uni-kl.de!sun.rhrk.uni-kl.de!marpia
- From: marpia@sun.rhrk.uni-kl.de (David Powers [Informatik])
- Subject: Compressing English text to 1.75bits or better (80%)
- Message-ID: <1992Sep12.103552.24873@rhrk.uni-kl.de>
- Summary: Entropy of English is 1.75 bits or less (put in FAQ)
- Keywords: FAQ
- Sender: news@rhrk.uni-kl.de
- Organization: University of Kaiserslautern, Germany
- X-Newsreader: Tin 1.1 PL4
- Date: Sat, 12 Sep 1992 10:35:52 GMT
- Lines: 18
-
- I want to draw attention to a journal article which might not other-
- wise come to the attention of this group:
-
- An Estimate of an Upper Bound for the Entropy of English
- Peter L Brown, SA & VJ Della Pietra, JC Lai & Robert L Mercer,
- Computational Linguistic V18#1, pp31-40, March 1992, MIT Press/ACL.
-
- They used cross-entropy techniques to estimate an UPPER bound for
- the 6million word Brown corpus, using a model based on a couple of
- dictionary, plus lists of names, addresses and places, etc. and
- TRIGRAM prediction. The article is intended "as a gauntlet thrown
- down" to challenge the bettering of this result - which is already
- equivalent to achieving around 80% compression.
-
- Perhaps this information should go into the FAQ question [73] on
- the theoretical limits of compression.
-
- David Powers
-