home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.mac.programmer
- Path: sparky!uunet!sun-barr!ames!data.nas.nasa.gov!taligent!keith@taligent.com
- From: keith@taligent.com (Keith Rollin)
- Subject: Re: Best text compression?
- Message-ID: <Bu1L4L.1ps@taligent.com>
- Sender: usenet@taligent.com (More Bytes Than You Can Read)
- Organization: Taligent
- References: <1992Sep3.214518.9599@mnemosyne.cs.du.edu> <71986@apple.Apple.COM>
- Date: Fri, 4 Sep 1992 07:07:32 GMT
- Lines: 24
-
- In article <71986@apple.Apple.COM>, anderson@Apple.COM (Clark Anderson) writes:
- > agoates@nyx.cs.du.edu (Alan Goates) writes:
- >
- > >Second, does anyone know what the best published (Public Domain) algorithm is
- > >for compressing text. And does anyone know where I could get my hands on
- > >examplesource code for said algorithm (The only one I've seen is Lempel-Ziv).
- >
- > I have gotten pretty good compression on text using
- > a Huffman algorithm. It's pretty easy to implement,
- > any standard book on compression schemes should
- > have it.
-
- I've never tried this myself, but someone once told me that you can get really
- good compression if you use a Huffman algorithm applied at the word level.
- Instead of counting and ranking all of the letters in the document, count the
- distinct words and treat each one like a single letter. This means that oft used
- words like "the" or "MicroSoftWord5.0Sucks" will compress down to 3 or 4 bits
- for the entire word.
-
- --
- Keith Rollin
- Phantom Programmer
- Taligent, Inc.
-
-