home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!uwm.edu!linac!att!att!dptg!ulysses!allegra!princeton!tyrolia!jmd
- From: jmd@tyrolia.Princeton.EDU (John M. Danskin)
- Subject: Re: request for compression algorithm for small text buffers
- Message-ID: <1993Jan20.143605.4160@Princeton.EDU>
- Originator: news@nimaster
- Sender: news@Princeton.EDU (USENET News System)
- Nntp-Posting-Host: tyrolia.princeton.edu
- Reply-To: jmd@tyrolia.Princeton.EDU (John M. Danskin)
- Organization: Dept. of Computer Science, Princeton University
- References: <23244@venera.isi.edu>
- Date: Wed, 20 Jan 1993 14:36:05 GMT
- Lines: 25
-
- In article <23244@venera.isi.edu>, sondeen@isi.edu (Jeff Sondeen) writes:
- |> I'm looking for a compression routine that I can use for small text
- |> buffers accessed separately within huge text files. For example, a
- |> dictionary whose word entries are hashed (ala gdbm) to their
- |> definition text -- but I want the definition text strings to be
- |> compressed with minimal (space) overhead. Would the separate "table"
- |> overheads of LZ type compressions be a disadvantage compared to a
- |> huffman/arithmetic compression based on a fixed table? Thanks for any
- |> tips.
- |>
- |> /jeff
-
- It seems to me that you could improve on the effect of a fixed huffman table by
- generating a standard LZW table from a small sample subset of your dictionary
- definitions. When you want to encode a new definition, always start from this
- sample table. Sizing the sample table is an interesting trade off between
- initial code size and code power. In practice, once you had a system
- running, it would be easy to try sample tables resulting in several different
- code sizes, and just use the best one.
-
-
- John Danskin |
- (609) 258-5386 | Gradual student
- (609) 258-1771 fax | Graphics systems
- jmd@cs.princeton.edu | (efficient low bandwidth graphics?)
-