NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / sys / mac / programm / 14989 < prev next >

Wrap

Text File | 1992-09-03 | 1.4 KB | 36 lines

Newsgroups: comp.sys.mac.programmer Path: sparky!uunet!sun-barr!ames!data.nas.nasa.gov!taligent!keith@taligent.com From: keith@taligent.com (Keith Rollin) Subject: Re: Best text compression? Message-ID: <Bu1L4L.1ps@taligent.com> Sender: usenet@taligent.com (More Bytes Than You Can Read) Organization: Taligent References: <1992Sep3.214518.9599@mnemosyne.cs.du.edu> <71986@apple.Apple.COM> Date: Fri, 4 Sep 1992 07:07:32 GMT Lines: 24 In article <71986@apple.Apple.COM>, anderson@Apple.COM (Clark Anderson) writes: > agoates@nyx.cs.du.edu (Alan Goates) writes: > > >Second, does anyone know what the best published (Public Domain) algorithm is > >for compressing text. And does anyone know where I could get my hands on > >examplesource code for said algorithm (The only one I've seen is Lempel-Ziv). > > I have gotten pretty good compression on text using > a Huffman algorithm. It's pretty easy to implement, > any standard book on compression schemes should > have it. I've never tried this myself, but someone once told me that you can get really good compression if you use a Huffman algorithm applied at the word level. Instead of counting and ranking all of the letters in the document, count the distinct words and treat each one like a single letter. This means that oft used words like "the" or "MicroSoftWord5.0Sucks" will compress down to 3 or 4 bits for the entire word. -- Keith Rollin Phantom Programmer Taligent, Inc.