NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / compress / 3916 < prev next >

Wrap

Internet Message Format | 1992-11-24 | 3.5 KB

Path: sparky!uunet!spool.mu.edu!agate!darkstar.UCSC.EDU!jade!paula From: paula@jade.ucsc.edu (Paul J. Ausbeck Jr. ) Newsgroups: comp.compression Subject: Re: Need a compressor for sparse bit datastream Date: 24 Nov 1992 02:53:39 GMT Organization: UC Santa Cruz CIS/CE Lines: 67 Distribution: world Message-ID: <1es5fjINNou1@darkstar.UCSC.EDU> References: <1992Nov13.120505.29654@spectrum.xerox.com> <1992Nov16.232015.15970@coe.montana.edu> NNTP-Posting-Host: jade.ucsc.edu Keywords: compression technique In article <1992Nov16.232015.15970@coe.montana.edu>, bithead@cs.montana.edu (Bob Wall) writes: |> In article <1992Nov13.120505.29654@spectrum.xerox.com> landells.sbd-e@rx.xerox.com writes: |> >I have an application that generates binary output. The output is relatively random, but there are approximately twice as many off bits as on bits. My objective is to compress this as much as possible. |> > |> >I have tried several 'standard' compressors, arj 2.2, lharc, pkzip 1.1, and have only managed to achieve very minimal compression in the order of 4% at best (on a 40K file). Now I know that a truly random binary datastream cannot be compressed, but I wa|> s kind of hoping for better than 4%. Am I missing something fundamental, or is this really the best that can be achieved? |> > |> >If there is a technique to compress this type of data, I would appreciate some pointers to some source code that implements it. |> > |> > |> >Richard Landells (landells.sbd-e@rx.xerox.com) |> >Rank Xerox System Centre |> |> You might try a variation of arithmetic coding referred to as the BAC |> (Binary Arithmetic Code). It is described in the paper |> |> Glen G. Langdon, Jr. and Jorma Rissanenm, "A Simple General Binary Source |> Code," _IEEE Transactions on Information Theory_, Vol. IT-28, No. 5, |> Sept. 1982, pp. 800-803. |> |> I implemented this once, and I remember that it did OK on binary data |> streams where the frequency of one bit was substantially higher than the |> frequency of the other bit (and I would classify 2 to 1 as substantially |> higher) but I don't remember just what compression ratio OK would be. |> |> It's a pretty simple algorithm, although the article is written in |> math, instead of English B^(. A more straightforward explanation (in |> near-English) is given in |> |> Glen G. Langdon, Jr., "An Introduction to Arithmetic Coding," _IBM |> Journal of Research and Development_, Vol. 28, No. 2, March 1984, |> pp. 135-149. |> |> This article also has some other references that might help, although I |> haven't tracked them down. |> |> |> Hope this helps, |> Bob |> |> |> -- |> ============================================================================= |> bithead@fubar.cs.montana.edu (Bob Wall, CS grad student) |> "Software means never having to say you're finished." |> --J. D. Hildebrand in UNIX REVIEW For stationary data information theory predicts compression as follows: Pm = Probability of occurance of most probable symbol Pl = Probability of occurance of the least probable (1 - Pm in this case) lg is the base 2 logarithm compressed data size (%) = -lg(Pm)*Pm -lg(Pl)*Pl in this case: -lg(2/3)*2/3 - lg(1/3)*1/3 = .92 approximately 8 percent is obtainable on stationary data. If the data is non-stationary adaptive schemes such as the Q-coder (Langdon, Rissanen etal.) can be used for better compression. I hope I didn't foul up the formula. In any case the 8% compression sounds about right for this data set. Paul Ausbeck