home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.lang.perl:5700 comp.compression:3172 comp.lang.postscript:4560
- Path: sparky!uunet!gumby!wupost!zaphod.mps.ohio-state.edu!not-for-mail
- From: parker@shape.mps.ohio-state.edu (Steve Parker)
- Newsgroups: comp.lang.perl,comp.compression,comp.lang.postscript
- Subject: postscript compression script in perl.
- Date: 3 Sep 1992 12:52:31 -0400
- Organization: Department of Mathematics, The Ohio State University
- Lines: 2268
- Sender: parker@mps.ohio-state.edu
- Distribution: world
- Message-ID: <185fsfINNcbs@shape.mps.ohio-state.edu>
- NNTP-Posting-Host: shape.mps.ohio-state.edu
- Keywords: compression compress postscript perl
-
- To Whom It May Concern,
-
- A few months ago I posted a request to the net regarding a postscript
- compression routine. I have since found out that Postscript 2 is to have
- it's own standard way to compress images, nevertheless I have written a
- postscript image compressor.
-
- The postscript compressor I have written is in perl but the form of compression
- is run-length encoding, and could be written in other languages, such as C, sed,
- awk, nawk, etc. (I am sure that Larry could probably even write the thing in
- roff--the man is mutant! ceveat: I am forever in his debt for writting perl.)
-
- The most common responce that I received when telling my peers of my plans was
- "Why write the thing in postscript?" or "Why write the thing at all?"
-
- Everyone knows that the way screen dump/image capture routines output their
- results on various architures is straight forward but disk wasteful. Computer
- screen images are not like images from other sources in that they almost always
- have huge amounts of repetition, large areas all the same pattern. And are
- therefore very susceptible to run-length-encoding-type compression schemes.
-
- The reasons I think that a compression routine written in postscript is valuable
- are that the resulting compressed image is still a valid posctscript program,
- which can be sent via E-mail without fear of corruption to anyone with a
- postscript printer regardless of the machine to which it is connected, and
- futhermore can be subsequently compressed with your favorite routine:
- compress, pack, zip, zoo, arc, etc.
-
- Finally I realize that this is not the most efficient method, but before I work
- on enhancing it, I thought that I'd post it. I would like your input on how to
- improve it and I thought that many of you could use it 'as is' to help with
- disk space problems (a never-ending problem at our site at least). I have found
- that users mind this form of compression less since it requires no additional
- actions to print the file later.
-
- If you use this package, I would appreciate timing results/comments/suggestions.
- Please E-mail me any posts that you might make concerning this post so that
- I will post a summary at a later date:
-
- Steve Parker parker@mps.ohio-state.edu
- Dept of Chemistry 201 McPherson Labs
- Ohio State University 614-292-5042
-
- I have thought of many ways to improve this script:
-
- 1) Record which and how many of the decompression routines were used in a given
- image, and insert only those that were used and in the order of fequency
- that they were used.
- 2) Search the unique runs for patterns of 2 4 or 8 characters.
- 3) Make a new decompression routine called I for insert which would be used for
- inserting unique string into a long run of repeating characters.
-
- NOTE: The above ideas would most easily be accomplished by saving the compressed
- image in memory or a temp file and/or making multiple passes at the image
- data.
-
- Any other suggestions?
-
- Here are timing results for postscript images created on various machines,
- compressed on a Sparc ELC and printed on a AppleLaserWriter II:
-
-
- Test cases:
-
- (Apple LaserWriter II)
- Filename size in chars bits in time to approximate time
- image compress to print
- -------------------------------------------------------------------------
- snapshot.cmp.ps 63861 --- 67.0 s 100 s
- snapshot.ps 262906 1024000 -- 245 s
- stripes.cmp.ps 2241 --- 31.0 s 30 s
- stripes.ps 133403 1036800 -- 130 s
- iris.cmp.ps 73384 --- 68.5 s 100 s
- iris.ps 261385 524288 -- 250 s
- stellar.cmp.ps 129140 --- 1027.3 s 425 s
- stellar.ps 1968436 1966728 -- 1740 s
-
- I am presently getting results for NeXT printers, and some others.
-
- These files are available by E-mail at request to above address.
-
- Here is my description of the two pieces necessary for
- compression/decompression (I originally had two files but now use the <DATA>
- file handle of perl):
-
- decomp.header is the postscript decompression header that will be used in
- place of
- "/picstr 1024 string def
- { currentfile /picstr readhexstring pop }"
- which is often used as the proc for the image function
- ie "width hieght bitpersample proc image"
-
- pscmp is the perl script that compresses the hex digit pair
- format often used to encode a bitmap in postscript, it also
- inserts the decompression header file in a clever way.
- Since the last thing on the stack before the image command
- is called is the procedure that image will use to obtain the
- image, pscmp looks for the image command and inserts
- pop { decompress }
- before it. The 'pop' command removes whatever procedure was
- on the stack and then '{ decompress }' (my command) is pushed
- on the stack in it's place.
-
- It does compression with the following four "codes":
- u - one character follows, whos ascii value will determine
- how many "unique" hex pairs follow. 1-256 pairs.
- U - two characters follows, whos ascii values will determine
- how many "unique" hex pairs follow. 257-65535 pairs.
- r - one character follows, whos ascii value will determine
- how many times to "repeat" the hex pair that follows.
- R - one characters follows, whos ascii values will determine
- how many times to "repeat" the hex pair that follows.
- NOTES:
- * ranges for R and U could not be made to be 257-65792,
- without splitting the runs into multiple strings,
- since the largest string is 65335.
- * I attempted two ways of storing the length of unique and
- repeating runs.
- The first and most straight forward to interpret in
- postscipt, was to store them as one or two characters whose
- ascii value was then interpretted as an integer by using the
- 'currentfile read pop' sequence.
- The second used two or four digit hex number to represent
- the length of the run, and used the postscript command
- sequence:
-
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /getcount { hexnum2 3 currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3 currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
-
- which works by putting the hex number ,ie. 'fd', in a string
- like '16#00' thus giving the string '16#fd' which the command
- 'cvi' interprets as 0xfd, or 253.
-
- The later method was necessary because characters representing
- serial port I/O controls, ie. '^D', '^S/^Q' were interpretted
- by the printers I/O control and not pasted to the postscript
- interpretter.
- The former method did work however with Sun's Postscript
- previewer "pageview version 3"
- * pscmp removes the comments and unnecessary white space (used
- for readability) from decomp.header as it inserts it into the
- postscript.
-
- *******************************************************************************
- Here is the script:
-
- #!/usr/local/bin/perl
- # A perl script to compress postscript images.
-
- # Written by:
- # Steve Parker parker@mps.ohio-state.edu
- # Dept of Chemistry 201 McPherson Labs
- # Ohio State University 614-292-5042
- # Columbus OH, 43210
- #
- # codes: u - small count run of unique hex pairs
- # U - big count run of unique hex pairs
- # r - small count+1 repeated hex pair
- # R - big count+1 repeated hex pair
- # a repeat last r or R. NOT SUPPORTED IN THIS PERL SCRIPT.
- #
- # formats: u cc 'hphp...'
- # U CC CC 'hphp...'
- # r cc 'hp'
- # R CC CC 'hp'
- #
- # where: 1) spaces are not output
- # 2) uUrR are output literally
- # 3) cc is a 2 digit hex number (0-255) and represents range (1-256)
- # 4) CCCC is a 4 digit hex number (0-65535) for a range (257-65535)
- # if not for max size on postscript string would be (257-65792)
- # 5) 'hp' is a hex digit pair from 'image' data.
-
- $name = $0;
- $name =~ s'.*/''; # remove path--like basename
- $usage = "usage:\n$name [postscript_file_with_IMAGE_data]";
-
- select(STDOUT); $|=1;
-
- $biggest=65534;
- $last="";
- while (<>) {
- if ( /([^A-Fa-f\d\n])/ ) {
- # print "'$1' ->$_";
- if ($_ =~ /showpage/ || $_ =~ /grestore/ ) {
- #
- # FOUND a showpage or grestore so write out last repeating pair or unique run.
- #
- if ($repeating) {
- # we didn't record the first pair in $repeating
- # so we needn't subtract 1.
- #$num=$repeating-1;
- $num=$repeating;
- if ( $num <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- printf("r%02X%2s\n",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- printf("R%02X%02X%2s\n",int($num/256),($num%256),$last);
- $R++;
- }
- } else {
- $unique_str.=$last;
- # we didn't yet record this last pair in $unique_run
- # so we needn't subtract 1.
- $num=$unique_run;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- printf("u%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- & end;
- }
- # add the postscript decompression header
- # inbetween the original proc called by the 'image' command
- # and the 'image' command itself
- if ( $_ =~ /^(image\s?.*)$|^([^%]*)?(\simage\s?.*)$/ ) {
- print "$1\n" if ($1);
- if ($headerin) {
- $file="/home/sysadmin/postscript/compress/decomp.header";
- # open(HEADER,"$file") || die("$name: Cannot open $file: '$!'\n");
- while (<DATA>) { s/(\s)\s+/\1/g; print if !(/^%/); }
- $headerin++;
- close(DATA);
- } else {
- print " pop { decompress } ";
- }
- print "$2";
- next;
- }
- print;
- next;
- } # else { print "\n" if ($unique_run || $repeating); }
- #
- #-------------------- HEX PAIR HANDLING LOOP --------------------------
- #
- while (s?([A-F0-9a-f][A-F0-9a-f])??) {
- if ($repeating) {
- if ($1 eq $last) {
- #-debug print STDERR "rs"; # repeating; same
- $repeating++; # found another one.
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($repeating eq $biggest) {
- printf("Rfffe%2s",$last);
- # set to start over fresh
- $repeating=0;
- # $unique_str should be set to null and $unique_run set to 0
- }
- } else {
- #-debug print STDERR "rd"; # repeating; different
- #
- # FOUND a unique hex pair so repeating unit has ended, write it out.
- #
- #$num=$repeating-1;
- $num=$repeating;
- if ( $repeating <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- # -line- $line+=6; if ( $line > 80) { $line=6; print "\n"; }
- #-debug printf STDERR ">2,%2X,%2s ",$num,$last;
- printf("r%02X%2s",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- # -line- $line+=8; if ( $line > 80) { $line=8; print "\n"; }
- #-debug printf(">3,%2X,%2X,%2s ",int($num/256),($num%256),$last);
- printf("R%02X%02X%2s",int($num/256),($num%256),$last);
- $R++;
- }
- $repeating=0;
- $last=$1;
- }
-
- } else { # must be unique'ing
-
- if ($1 eq $last) {
- #-debug print "us"; # uniquing; same
- #
- # FOUND a repeating hex pair so might have a unique run
- # which has ended, if so write it out.
- #
- if ($unique_str) {
- $num=$unique_run-1;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- # -line- $line+=(4+$unique_run)); if ( $line > 80) { $line=4+$unique_run; print "\n"; }
- #-debug printf("\n>0,%2X,'%s' ",$num,$unique_str);
- printf("\nu%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- # -line- $line+=(6+$unique_run); if ( $line > 80) { $line=6+$unique_run; print "\n"; }
- #-debug printf("\n>1,%2X,%2X,'%s' ",int($num/256),($num%256),
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- # start counting repeating pairs, reset unique_run count
- # and remember last.
- $repeating++;
- $unique_str='';$unique_run=0;
- $last=$1;
- } else { # countiue uniquing
- #-debug print "ud"; # uniquing; different
- $unique_str.=$last;
- # $unique_run+=2; # use this if using $line to limit to 80 chars/line.
- # but REMEMBER to divid by two when outputing!
- $unique_run++;
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($unique_run eq $biggest) {
- printf("Ufffe%s",$unique_str);
- # set to start over fresh
- $unique_str='';$unique_run=0;
- $last=$1;
- # $repeating should be set to 0
- }
- $last=$1;
- }
- }
- }
- }
- &end;
- sub end {
- printf STDERR "Statistics:\n" ;
- printf STDERR "r's:%5d\n",$r ;
- printf STDERR "R's:%5d\n",$R ;
- printf STDERR "u's:%5d\n",$u ;
- printf STDERR "U's:%5d\n",$U ;
- ($user,$system,$cuser,$csystem)=times;
- printf STDERR "Times:\tuser,\tsystem,\tcuser,\tcsystem\n";
- printf STDERR "Times:\t%5f,\t%5f,\t%5f,\t%5f\n",
- $user,$system,$cuser,$csystem;
- exit;
- }
- __END__
- %-------------------------------------------------------------------------------
- %
- % header to define 'decompress' which will replace the
- % { currentfile string readhexstring pop } proc commonly used with 'image'
- %
- % to be placed just before the 'image' command
- % the 'pop' on the following line is to remove bogus 'proc' (as above)
- pop
- /repeater 1 string def
- /char 1 string def
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /debug 30 string def
- /big 65535 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /gethexpair { currentfile char readhexstring pop } def
- /getcount { hexnum2 3
- currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3
- currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
- /codeu { pop /cnt getcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /codeU { pop /cnt getbigcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /coder { pop /cnt getcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeR { pop /cnt getbigcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeX { pop big 0 cnt 1 add getinterval } def
- /done { currentfile debug readstring pstack exit } def
- /skip { pop decompress } def
- #
- # the following order of r,u,R,U was chosen by noting the frequency
- # of occurance from a small number of examples but can easily be changed.
- /others0 { dup (u) eq { codeu } { others1 } ifelse } def
- /others1 { dup (R) eq { codeR } { others2 } ifelse } def
- /others2 { dup (U) eq { codeU } { others3 } ifelse } def
- /others3 { dup (a) eq { codeX } { others4 } ifelse } def
- /others4 { dup (\n) eq { skip } { done } ifelse } def
- /decompress { currentfile char readstring pop
- dup (r) eq { coder } { others0 } ifelse } def
- { decompress }
- %-----------------------------------------------------------------------------
- Newsgroups: comp.lang.postscript
- Subject: postscript compression script in perl
- Summary:
- Expires:
- Sender:
- Followup-To:
- Distribution: world
- Organization: Department of Mathematics, The Ohio State University
- Keywords:
-
- Newsgroups: comp.compression
- Subject: Re: comp.compression Frequently Asked Questions (part 1/2)
- Summary:
- Expires:
- References: <compr1_11aug92@chorus.fr>
- Sender:
- Followup-To:
- Distribution:
- Organization: Department of Mathematics, The Ohio State University
- Keywords: data compression, FAQ
-
- In article <compr1_11aug92@chorus.fr> jloup@chorus.fr writes:
- >Archive-name: compression-faq/part1
- >Last-modified: Aug 11th, 1992
- >
- >This file is part 1 of a set of Frequently Asked Questions (FAQ) for
- >the groups comp.compression and comp.compression.research. Certain
- >questions get asked time and again, and this is an attempt to reduce
- >the bandwidth taken up by these posts and their associated replies.
- >If you have a question, *please* check this file before you post. It
- >may save a lot of peoples time.
- >
- >If you have not already read the overall Usenet introductory material
- >posted to "news.announce.newusers", please do.
- >
- >If you don't want to see this FAQ regularly, please add the
- >subject line to your kill file. If you have corrections or suggestions
- >for this FAQ, send them to Jean-loup Gailly <jloup@chorus.fr>.
- >Thank you.
- >
- >Part 1 is oriented towards practical usage of compression programs.
- >Part 2 is more intended for people who want to know how compression works.
- >
- >Main changes relative to the previous version:
- >
- >- renumbered some questions to group together image compression subjects
- >- included comp.compression.research (item 1)
- >- corrected pointer to stuffit (item 2)
- >- corrected pointer to Calgary text compression corpus (item 5)
- >- added a few patent references -- to be checked for Fiala (item 8)
- >- ccitt standards are no longer on ftp.uu.net (item 11)
- >- added a pointer to more arithmetic coding sources (item 13)
- >- added some references for fractal compression (item 17)
- >- added pointer to H.261 spec (item 20)
- >- added item 25: Fast DCT (Discrete Cosine Transform) algorithms
- >- added pointers to a compression program for Sun audio files
- > and for a fast ADPCM algorithm (item 26)
- >- discrete wavelet transforms are not slower than DCT (item 72)
- >
- >
- >Contents
- >========
- >
- >General questions:
- >
- >[1] What are these newsgroups about?
- >[2] What is this .xxx file type?
- > Where can I find the corresponding compression program?
- >[4] What is an archiver?
- >[5] What is the best general purpose compression program?
- >[7] Which books should I read?
- >[8] What about patents on data compression algorithms?
- >[9] The WEB 16:1 compressor.
- >[11] What is the V.42bis standard?
- >[12] I need source for the winners of the Dr Dobbs compression contest
- >[13] I need source for arithmetic coding
- >
- >Image and audio compression:
- >
- >[15] Where can I get image compression programs?
- >[16] What is the state of the art in lossless image compression?
- >[17] What is the state of fractal compression?
- >[18] I need specs and source for TIFF and CCITT group 4 Fax.
- >[19] What is JPEG?
- >[20] I am looking for source of an H.261 codec.
- >[25] Fast DCT (Discrete Cosine Transform) algorithms
- >[26] Are there algorithms and standards for audio compression?
- >
- >Common problems:
- >
- >[30] My archive is corrupted!
- >[31] pkunzip reports a CRC error!
- >[32] VMS zip is not compatible with pkzip!
- >
- >Questions which do not really belong to comp.compression:
- >
- >[50] What is this 'tar' compression program?
- >[51] I need a CRC algorithm
- >[52] What about those people who continue to ask frequently asked questions?
- >[53] Where are FAQ lists archived?
- >[54] I need specs for graphics formats
- >[55] Where can I find Lenna and other images?
- >
- >(Long) introductions to data compression techniques (in part 2)
- >
- >[70] Introduction to data compression (long)
- > Huffman and Related Compression Techniques
- > Arithmetic Coding
- > Substitutional Compressors
- > The LZ78 family of compressors
- > The LZ77 family of compressors
- >
- >[71] Introduction to MPEG (long)
- > What is MPEG?
- > Does it have anything to do with JPEG?
- > Then what's JBIG and MHEG?
- > What has MPEG accomplished?
- > So how does MPEG I work?
- > What about the audio compression?
- > So how much does it compress?
- > What's phase II?
- > When will all this be finished?
- > How do I join MPEG?
- > How do I get the documents, like the MPEG I draft?
- >
- >[72] What is wavelet theory?
- >[73] What is the theoretical compression limit?
- >[74] Introduction to JBIG
- >
- >[99] Acknowledgments
- >
- >Search for "Subject: [#]" to get to question number # quickly. Some news
- >readers can also take advantage of the message digest format used here.
- >
- >If you know very little about data compression, read question 70 in
- >part 2 first.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [1] What are these newsgroups about?
- >
- >
- >comp.compression is the place to discuss about data compression, both
- >lossless (for text or data) and lossy (for images, sound, etc..).
- >comp.compression.research was created later to provide a forum for
- >current research on data compression and data compression algorithms.
- >If you are not experienced in data compression, please post in
- >comp.compression only.
- >
- >If you only want to find a particular compression program for a
- >particular operating system, please read first this FAQ and the
- >article "How to find sources" which is regularly posted in
- >news.answers.
- >
- >If you can't resist posting such a request, other groups are probably
- >more appropriate (comp.binaries.ibm.pc.wanted, comp.sources.wanted,
- >comp.sys.mac.wanted, alt.graphics.pixutils). Please post your request
- >in comp.compression only as a last resource.
- >
- >Please do not post any program in binary form to comp.compression.
- >Very short sources can be posted, but long sources should be be posted
- >to the specialized source groups, such as comp.sources.* or alt.sources.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [2] What is this .xxx file type?
- > Where can I find the corresponding compression program?
- >
- >
- >For most programs, one US and one European ftp site are given.
- >(wuarchive.wustl.edu: 128.152.135.4, garbo.uwasa.fi: 128.214.87.1)
- >Many other sites (in particular wsmr-simtel20.army.mil: 192.88.110.2)
- >have the same programs.
- >
- >To keep this list to a reasonable size, many programs are not
- >mentioned here. Additional information can be found in the file
- >ux1.cso.uiuc.edu:/doc/pcnet/compression [128.174.5.59] maintained by
- >David Lemson (lemson@uiuc.edu). When several programs can handle
- >the same archive format, only one of them is given.
- >
- >For Macintosh programs, look on sumex-aim.stanford.edu:/info-mac [36.44.0.6].
- >For VM/CMS, look on vmd.cso.uiuc.edu:/public.477 [128.174.5.98].
- >For Atari, look on terminator.cc.umich.edu:/atari/archivers [141.211.164.8]
- >For Amiga, look on ab20.larc.nasa.gov:/amiga/utils/archivers [128.155.23.64]
- >
- >If you don't know how to use ftp or don't have ftp access, read the
- >article "How to find sources" which is regularly posted in news.answers.
- >
- >If you can't find a program given below, it is likely that a newer
- >version exists in the same directory. (Tell me <jloup@chorus.fr>)
- >
- >ext: produced by or read by
- >
- >.arc: arc, pkarc (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/starter/pk361.exe
- > garbo.uwasa.fi:/pc/arcers/pk361.exe
- >
- > arc (Unix)
- > wuarchive.wustl.edu:/mirrors/misc/unix/arc521e.tar-z
- > garbo.uwasa.fi:/unix/arcers/arc.tar.Z
- > Contact: Howard Chu <hyc@umix.cc.umich.edu>
- >
- > arc (VMS)
- > wuarchive.wustl.edu:/packages/compression/vax-vms/arc.exe
- >
- > arcmac (Mac)
- > mac.archive.umich.edu:/mac/utilities/compressionapps/arcmac.hqx
- >
- >.arj: arj (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/arc-lbr/arj230.exe
- > garbo.uwasa.fi:/pc/arcers/arj230ng.exe
- >
- > unarj (Unix). There is *no* arj for Unix. Don't post a request.
- > wuarchive.wustl.edu:/mirrors/misc/unix/unarj230.tar-z
- > garbo.uwasa.fi:/unix/arcers/unarj221.tar.Z
- > Contact: Robert K Jung <robjung@world.std.com>
- >
- >.cpt: Compact Pro (Macintosh)
- > sumex-aim.stanford.edu:/info-mac/util/compact-pro-132.hqx [36.44.0.6]
- >
- >.gif: gif files are images compressed with the LZW algorithm. See the
- > comp.graphics FAQ list for programs manipulating .gif files.
- >
- >.hqx: Macintosh BinHex format. (binhex is *not* a compression program, BTW.)
- > for Mac:
- > mac.archive.umich.edu:/mac/utilities/compressionapps/binhex4.0.bin
- >
- > for Unix:
- > sumex-aim.stanford.edu:/info-mac/unix/mcvert-165.shar [36.44.0.6]
- >
- >.lzh: lha (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/arc-lbr/lha213.exe
- > garbo.uwasa.fi:/pc/arcers/lha213.exe
- >
- > lharc (Unix). Warning: lharc can extract .lzh files created by
- > lharc 1.xx but not those created by lha. See lha for Unix below.
- > wuarchive.wustl.edu:/mirrors/misc/unix/lharc102a.tar-z
- > garbo.uwasa.fi:/unix/arcers/lharcsrc.zoo
- >
- > lha (Unix) Warning: all doc is in Japanese.
- > garbo.uwasa.fi:/unix/arcers/lha-1.00.tar.Z
- > ftp.kuis.kyoto-u.ac.jp:/utils/lha-1.00.tar.Z
- > Contact: lha-admin@oki.co.jp
- >
- > lha (Mac)
- > mac.archive.umich.edu:/mac/utilities/compressionapps/maclha2.0.cpt.hqx
- >
- > lharc (VMS). Same warning as for Unix lharc.
- > wuarchive.wustl.edu:/packages/compression/vax-vms/lharc.exe
- >
- >.pak: pak (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/arc-lbr/pak251.exe
- > garbo.uwasa.fi:/pc/arcers/pak251.exe
- >
- >.pit: PackIt (Macintosh)
- > for Mac:
- > sumex-aim.stanford.edu:/info-mac/util/stuffit-151.hqx [36.44.0.6]
- >
- > for Unix:
- > sumex-aim.stanford.edu:/info-mac/unix/mcvert-165.shar [36.44.0.6]
- >
- >.sea: self-extracting archive (Macintosh)
- > Run the file to extract it. You can also extract it with:
- > mac.archive.umich.edu:/mac/utilities/compressionapps/desea1.11.cpt.hqx
- >
- >.sit: Stuffit (Macintosh)
- > for Mac:
- > sumex-aim.stanford.edu:/info-mac/util/stuffit-lite-30.hqx [36.44.0.6]
- >
- > for Unix:
- > sumex-aim.stanford.edu:/info-mac/unix/unsit-15.shar [36.44.0.6]
- >
- >.tar: tar is *not* a compression program. However, to be kind for you:
- > for MSDOS
- > wuarchive.wustl.edu:/mirrors/msdos/starter/tarread.exe
- > garbo.uwasa.fi:/pc/unix/tar4dos.zoo
- >
- > for Unix
- > tar (you have it already. To extract: tar xvf file.tar)
- >
- > for VMS
- > wuarchive.wustl.edu:/packages/compression/vax-vms/tar.exe
- >
- > for Macintosh
- > sumex-aim.stanford.edu:/info-mac/util/tar-30.hqx
- >
- >.tar.Z, .tar-z, .taz: tar + compress
- > For Unix: zcat file.tar.Z | tar xvf -
- > Other OS: first uncompress (see .Z below) then untar (see .tar above)
- >
- >.zip: pkzip 1.10 (MSDOS).
- > From the PKWARE BBS (June 92): "PKZIP 2.0 is expected to be released
- > sometime in the next few months, as soon as possible".
- > Contact: pkware.inc@mixcom.com
- >
- > wuarchive.wustl.edu:/mirrors/msdos/zip/pkz110eu.exe.
- > garbo.uwasa.fi:/pc/arcers/pkz110eu.exe.
- > Note: pkz110eu.exe is an 'export' version without encryption.
- > ux1.cso.uiuc.edu:/pc/exec-pc/pkz193a.exe [128.174.5.59]
- > Note: pkzip 1.93a is an alpha version.
- > pkz201.exe is a hacked (illegal) copy of pkz193a.exe
- > pkz220.exe is also a hack and contains a dangerous virus.
- >
- > zip 1.0 and unzip 4.2 (Unix, MSDOS, VMS, OS/2)
- > wuarchive.wustl.edu:/mirrors/misc/unix/zip10ex.zip
- > wuarchive.wustl.edu:/mirrors/misc/unix/unzip42.tar-z
- > wuarchive.wustl.edu:/mirrors3/garbo.uwasa.fi/arcutil/zcrypt10.zip
- > Non US residents must get the encryption code from garbo (see below)
- > garbo.uwasa.fi:/unix/arcers/zip10ex.zip
- > garbo.uwasa.fi:/unix/arcers/unzip42.tar.Z.
- > garbo.uwasa.fi:/pc/arcutil/zcrypt10.zip (encryption code)
- > Contact: zip-bugs@cs.ucla.edu
- >
- > zip 1.0 and unzip 4.2 (Mac)
- > valeria.cs.ucla.edu:/info-zip/Mac/zip_uzip.hqx
- > sumex-aim.stanford.edu:/info-mac/util/unzip-42.hqx
- >
- >.zoo: zoo 2.10 (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/zoo/zoo210.exe
- > garbo.uwasa.fi:/pc/arcers/zoo210.exe
- >
- > zoo 2.10 (Unix, VMS)
- > wsmr-simtel20.army.mil:pd8:<misc.unix>zoo210.tar-z [192.88.110.2]
- > garbo.uwasa.fi:/unix/arcers/zoo210.tar.Z
- >
- > zoo (Mac)
- > mac.archive.umich.edu:/mac/utilities/compressionapps/maczoo.sit.hqx
- >
- > Contact: Rahul Dhesi <dhesi@cirrus.com>
- >
- >.F: freeze (Unix)
- > wuarchive.wustl.edu:/usenet/comp.sources.misc/volume25/freeze/part0[1-2].Z
- > ftp.inria.fr:/system/arch-compr/freeze-2.3.2.tar.Z
- > Contact: Leonid A. Broukhis <leo@s514.ipmce.su>
- >
- >.Y: yabba (Unix, VMS, ...)
- > wuarchive.wustl.edu:/usenet/comp.sources.unix/volume24/yabbawhap/part0[1-4].Z
- > ftp.inria.fr:/system/arch-compr/yabba.tar.Z
- > Contact: Dan Bernstein <brnstnd@nyu.edu>
- >
- >.Z: compress (Unix)
- > It is likely that your Unix system has 'compress' already. Otherwise:
- > wuarchive.wustl.edu:/packages/compression/compress-4.1.tar
- > (not in .Z format to avoid chicken and egg problem)
- >
- > compress (MSDOS)
- > wuarchive.wustl.edu:/mirrors/msdos/compress/comp430[ds].zip
- > garbo.uwasa.fi:/pc/unix/comp430d.zip
- >
- > compress (Macintosh)
- > sumex-aim.stanford.edu:/info-mac/util/maccompress-32.hqx
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [4] What is an archiver?
- >
- >
- >There is a distinction between archivers and other compression
- >programs:
- >
- >- an archiver takes several input files, compresses them and produces
- > a single archive file. Examples are arc, arj, lha, zip, zoo.
- >
- >- other compression programs create one compressed file for each
- > input file. Examples are freeze, yabba, compress. Such programs
- > are often combined with tar to create compressed archives (see
- > question 50: "What is this tar compression program?").
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [5] What is the best general purpose compression program?
- >
- >
- >The answer is: it depends. (You did not expect a definitive answer,
- >did you?)
- >
- >It depends whether you favor speed, compression ratio, a standard and
- >widely used archive format, the number of features, etc... Just as
- >for text editors, personal taste plays an important role. compress has
- >4 options, arj 2.30 has about 130 options; different people like
- >different programs. *Please* do not start or continue flame wars on
- >such matters of taste.
- >
- >The only objective comparisons are speed and compression ratio. Here
- >is a short table comparing various programs on a 33Mhz Compaq 386.
- >All programs have been run on Unix SVR4, except pkzip and arj which
- >only run on MSDOS. Detailed benchmarks have been posted in
- >comp.compression by Peter Gutmann <pgut1@cs.aukuni.ac.nz>.
- >
- >*Please* do not post your own benchmarks made on your own files that
- >nobody else can access. If you think that you must absolutely post yet
- >another benchmark, make sure that your test files are available by
- >anonymous ftp.
- >
- >The programs compared here were chosen because they are the most popular
- >or because they run on Unix and source is available. For ftp
- >information, see above. Two programs (hpack and comp-2) have been added
- >because they achieve better compression (at the expense of speed)
- >and one program (lzrw3-a) has been added because it favors speed at
- >the expense of compression:
- >- comp-2 is in wuarchive.wustl.edu:/mirrors/msdos/ddjmag/ddj9102.zip
- > (inner zip file nelson.zip),
- >- hpack is in wuarchive.wustl.edu:/mirrors/misc/unix/hpack75a.tar-z
- > and garbo.uwasa.fi:/unix/arcers/hpack75a.tar.Z
- >- sirius.ucs.adelaide.edu.au:/pub/compression/lzrw3-a.c [129.127.40.3]
- >
- >The 14 files used in the comparison are from the standard Calgary
- >Text Compression Corpus, available by ftp on fsa.cpsc.ucalgary.ca [136.159.2.1]
- >in /pub/text.compression.corpus/text.compression.corpus.tar.Z.
- >
- >The whole corpus includes 18 files, but the 4 files paper[3-6] are
- >generally omitted in benchmarks. It contains several kinds of file
- >(ascii, binary, image, etc...) but has a bias towards large files.
- >You may well get different ratings on the typical mix of files that
- >you use daily, so keep in mind that the comparisons given below are
- >only indicative.
- >
- >The programs are ordered by decreasing total compressed size. For a
- >fair comparison between archivers and other programs, this size is
- >only the size of the compressed data, not the archive size.
- >
- >The programs were run on an idle machine, so the elapsed time
- >is significant and can be used to compare Unix and MSDOS programs.
- >
- >[Note: I still have to add all decompression times.]
- >
- > size lzrw3a compress lharc yabba pkzip freeze
- >version: 4.0 1.02 1.0 1.10 2.3.2
- >options: -m300000
- > ------ ----- ------ ------ ------ ------ ------
- >bib 111261 49040 46528 46502 40456 41354 41515
- >book1 768771 416131 332056 369479 306813 350560 344793
- >book2 610856 274371 250759 252540 229851 232589 230861
- >geo 102400 84214 77777 70955 76695 76172 68626
- >news 377109 191291 182121 166048 168287 157326 155783
- >obj1 21504 12647 14048 10748 13859 10546 10453
- >obj2 246814 108040 128659 90848 114323 90130 85500
- >paper1 53161 24522 25077 21748 22453 20041 20021
- >paper2 82199 39479 36161 35275 32733 32867 32693
- >pic 513216 111000 62215 61394 65377 63805 53291
- >progc 39611 17919 19143 15399 17064 14164 14143
- >progl 71646 24358 27148 18760 23512 17255 17064
- >progp 49379 16801 19209 12792 16617 11877 11686
- >trans 93695 30292 38240 28092 31300 23135 22861
- > 3,141,622 1,400,105 1,259,141 1,200,580 1,159,340 1,141,821 1,109,290
- >real 0m35s 0m59s 5m03s 2m40s 5m09s
- >user 0m25s 0m29s 4m29s 1m46s 4m04s
- >sys 0m05s 0m10s 0m07s 0m18s 0m11s
- >MSDOS: 1m39s
- >
- > zip zoo lha arj pkzip hpack comp-2
- > 1.0 2.10 1.0(Unix) 2.30 1.93a 0.75a
- > -9 ah 2.13(MSDOS) -jm -ex
- > ------ ------ ------ ------ ------ ------ ------
- >bib 40717 40742 40740 36090 35186 35619 29840
- >book1 339932 339076 339074 318382 313566 306876 237380
- >book2 229419 228444 228442 210521 207204 208486 174085
- >geo 69837 68576 68574 69209 68698 58976 64590
- >news 154865 155086 155084 146855 144954 141608 128047
- >obj1 10522 10312 10310 10333 10307 10572 10819
- >obj2 86661 84983 84981 82052 81213 80806 85465
- >paper1 19761 19678 19676 18710 18519 18607 16895
- >paper2 32296 32098 32096 30034 29566 29825 25453
- >pic 56828 52223 52221 53578 52777 51778 55461
- >progc 13955 13943 13941 13408 13363 13475 12896
- >progl 16954 16916 16914 16408 16148 16586 17354
- >progp 11558 11509 11507 11308 11214 11647 11668
- >trans 22737 22580 22578 20046 19808 20506 21023
- > 1,106,013 1,096,166 1,096,138 1,036,934 1,022,523 1,005,367 890,976
- >real 3m28s 4m07s 6m03s 1h22m17s 27m05s
- >user 1m45s 3m47s 4m23s 1h20m46s 19m27s
- >sys 0m11s 0m04s 0m08s 0m12s 2m03s
- >MSDOS: 1m49s 2m41s 1m55s
- >
- >Notes:
- >- zip 2.0 is not included in this comparison since it will be released
- > only when pkzip 2.0 is released. (Compression is comparable to that of
- > pkzip 1.93a.)
- >
- >- the compressed data for 'zoo ah' is always two bytes longer than for
- > lha. This is simply because both programs are derived from the same
- > source (ar002, written by Haruhiko Okumura).
- >
- >- hpack 0.75a gives slightly different results on SunOS (undeterministic
- > behaviour still under investigation).
- >
- >- the MSDOS versions are all optimized with assembler code and were run
- > on a RAM disk. So it is not surprising that they go faster than their
- > Unix equivalent.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [7] Which books should I read?
- >
- >
- >[BWC 1989] Bell, T.C, Witten, I.H, and Cleary, J.G. "Text Compression",
- > Prentice-Hall 1989. ISBN: 0-13-911991-4. Price: approx. US$40
- > The reference on text data compression.
- >
- >[Nel 1991] Mark Nelson, "The Data Compression Book"
- > M&T Books, Redwood City, CA, 1991. ISBN 1-55851-216-0.
- > Price $36.95 including two 5" PC-compatible disks bearing
- > all the source code printed in the book.
- > A practical introduction to data compression.
- > The book is targeted at a person who is comfortable reading C code but
- > doesn't know anything about data compression. Its stated goal is to get
- > you up to the point where you are competent to program standard
- > compression algorithms.
- >
- >[Will 1990] Williams, R. "Adaptive Data Compression", Kluwer Books, 1990.
- > ISBN: 0-7923-9085-7. Price: US$75.
- > Reviews the field of text data compression and then addresses the
- > problem of compressing rapidly changing data streams.
- >
- >[Stor 1988] Storer, J.A. "Data Compression: Methods and Theory", Computer
- > Science Press, Rockville, MD. ISBN: 0-88175-161-8.
- > A survey of various compression techniques, mainly statistical
- > non-arithmetic compression and LZSS compression. Includes complete Pascal
- > code for a series of LZ78 variants.
- >
- >[ACG 1991] Advances in Speech Coding, edited by Atal, Cuperman, and Gersho,
- > Kluwer Academic Press, 1991.
- >
- >[GG 1991] Vector Quantization and Signal Compression, by Gersho and Gray,
- > Kluwer Acad. Press, 1991
- >
- >[CT 1991] Elements of Information Theory, by T.M.Cover and J.A.Thomas
- > John Wiley & Sons, 1991.
- >
- >Review papers:
- >
- >[BWC 1989] Bell, T.C, Witten, I.H, and Cleary, J.G. "Modeling for Text
- > Compression", ACM Computing Surveys, Vol.21, No.4 (December 1989), p.557
- > A good general overview of compression techniques (as well as modeling for
- > text compression); the condensed version of "Text Compression".
- >
- >[Lele 1987] Lelewer, D.A, and Hirschberg, D.S. "Data Compression", ACM
- > Computing Surveys, Vol.19, No.3 (September 1987), p.261.
- > A survey of data compression techniques which concentrates on Huffman
- > compression and makes only passing mention of other techniques.
- >
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [8] What about patents on data compression algorithms?
- >
- >
- >[Note: the appropriate group for discussing software patents is
- >comp.patents (or misc.legal.computing), not comp.compression.]
- >
- >All patents mentioned here are US patents, and thus probably
- >not applicable outside the US.
- >
- >- The Gibson & Graybill patent 5,049,881 is the most general.
- > Claims 4 and 12 cover about any LZ algorithm using hashing, and could
- > even be interpreted as applying to the LZ78 family. (See below,
- > "Introduction to data compression" for the meaning of 'LZ').
- >
- > 4. A compression method for compressing a stream of input data into
- > a compressed stream of output data based on a minimum number of
- > characters in each input data string to be compressed, said
- > compression method comprising the creation of a hash table, hashing
- > each occurrence of a string of input data and subsequently searching
- > for identical strings of input data and if such an identical string
- > of input data is located whose string size is at least equal to the
- > minimum compression size selected, compressing the second and all
- > subsequent occurrences of such identical string of data, if a string
- > of data is located which does not match to a previously compressed
- > string of data, storing such data as uncompressed data, and for each
- > input strings after each hash is used to find a possible previous
- > match location of the string, the location of the string is stored
- > in the hash table, thereby using the previously processed data to
- > act as a compression dictionary.
- >
- > Claim 12 is identical, with 'method' replaced with 'apparatus'.
- > Since the 'minimal compression size' can be as small as 2, the claim
- > covers any dictionary technique of the LZ family.
- >
- >- Phil Katz, author of pkzip, also has a patent on LZ77 (5,051,745)
- > but the claims only apply to sorted hash tables, and when the hash
- > table is substantially smaller than the window size.
- >
- >- The LZW algorithm used in 'compress' is patented by IBM (4,814,746)
- > and Unisys (4,558,302). Unisys has licensed it for use in the V.42bis
- > compression standard. (See question 11 on V.42bis below.)
- >
- >- AP coding is patented by Storer (4,876,541). (Get the yabba package
- > for source code, see question 2 above, file type .Y)
- >
- >- Fiala and Greene have a patent (4,906,991?) on the algorithms they
- > published in Comm.ACM, April 89. One of their algorithms is used in lha
- > and zoo, and was used in zip 0.8.
- >
- >- IBM patented (5,001,478) the idea of combining a history buffer (the
- > LZ77 technique) and a lexicon (as in LZ78).
- >
- >- IBM holds a patent on the Q-coder implementation of arithmetic
- > coding. The arithmetic coding option of the JPEG standard requires
- > use of the patented algorithm. (See the JPEG FAQ for details.)
- >
- >
- >Here are some references on data compression patents, taken from the
- >list maintained by Michael Ernst <mernst@theory.lcs.mit.edu> in
- >mintaka.lcs.mit.edu:/mitlpf/ai/patent-list (or patent-list.Z).
- >
- >4,464,650
- >Apparatus and method for compressing data signals and restoring the
- >compressed data signals
- >inventors Lempel, Ziv, Cohn, Eastman
- >assignees Sperry Corporation and At&T Bell Laboratories
- >filed 8/10/81, granted 8/7/84
- >
- >4,558,302
- >High speed data compression and decompression apparatus and method
- >inventor Welch
- >assignee Sperry Corporation (now Unisys)
- >filed 6/20/83, granted 12/10/85
- >The text for this patent can be ftped from ftp.uu.net as pub/lzw-patent.Z
- >
- >4,814,746
- >Data compression method
- >inventors Victor S. Miller, Mark N. Wegman
- >assignee IBM
- >filed 8/11/86, granted 3/21/89
- >
- >4,876,541
- >Stem [sic] for dynamically compressing and decompressing electronic data
- >inventor James A. Storer
- >assignee Data Compression Corporation
- >filed 10/15/87, granted 10/24/89
- >
- >4,955,066
- >Compressing and Decompressing Text Files
- >inventor Notenboom, L.A.
- >assignee Microsoft
- >filed 10/13/89, granted 09/04/90
- >
- >5,001,478
- >Method of Encoding Compressed Data
- >filed 1989-12-28
- >granted 1991-03-19
- >inventor Michael E. Nagy
- >assignee IBM
- >
- >5,049,881
- >Apparatus and method for very high data rate-compression incorporating
- >lossless data compression and expansion utilizing a hashing technique
- >inventors Dean K. Gibson, Mark D. Graybill
- >assignee Intersecting Concepts, Inc.
- >filed 6/18/90, granted 9/17/91
- >cites McIntosh 3,914,586, Johannesson 4,087,788, Eastman 4,464,650,
- > Finn 4,560,976, Tsukiyama 4,586,027, 4,758,899 and 4,872,009,
- > Kunishi 4,677,649, Mathes 4,682,150, Shimoni 4,809,359, Miller 4,814,746,
- > Mukherjee 4,853,696, Fiala 4,906,991.
- >
- >5,051,745
- >String searcher, and compressor using same
- >inventor Phillip W. Katz (author of pkzip)
- >filed 8/21/90, granted 9/24/91
- >cites MacCrisken 4,730,348 and Hong 4,961,139
- >
- >Data Compression with Finite Windows, Comm.ACM, 32,4 (1989) 490-595.
- >inventors Fiala,E.R., and Greene,D.H.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [9] The WEB 16:1 compressor.
- >
- >
- >[WARNING: this topic has generated the greatest volume of news in the
- >history of comp.compression. Read this before posting on this subject.]
- >
- >
- >(a) What the press says
- >
- >April 20, 1992 Byte Week Vol 4. No. 25:
- >
- > "In an announcement that has generated high interest - and more than a
- > bit of skepticism - WEB Technologies (Smyrna, GA) says it has
- > developed a utility that will compress files of greater than 64KB in
- > size to about 1/16th their original length. Furthermore, WEB says its
- > DataFiles/16 program can shrink files it has already compressed."
- > [...]
- > "A week after our preliminary test, WEB showed us the program successfully
- > compressing a file without losing any data. But we have not been able
- > to test this latest beta release ourselves."
- > [...]
- > "WEB, in fact, says that virtually any amount of data can be squeezed
- > to under 1024 bytes by using DataFiles/16 to compress its own output
- > multiple times."
- >
- >June 1992 Byte, Vol 17 No 6:
- >
- > [...] According to Earl Bradley, WEB Technologies' vice president of
- > sales and marketing, the compression algorihtm used by DataFiles/16
- > is not subject to the laws of information theory. [...]
- >
- >
- >(b) First details, by John Wallace <buckeye@spf.trw.com>:
- >
- >I called WEB at (404)514-8000 and they sent me some product
- >literature as well as chatting for a few minutes with me on the phone.
- >Their product is called DataFiles/16, and their claims for it are
- >roughly those heard on the net.
- >
- >According to their flier:
- >
- >"DataFiles/16 will compress all types of binary files to approximately
- >one-sixteenth of their original size ... regardless of the type of
- >file (word processing document, spreadsheet file, image file,
- >executable file, etc.), NO DATA WILL BE LOST by DataFiles/16."
- >(Their capitalizations; 16:1 compression only promised for files >64K
- >bytes in length.)
- >
- >"Performed on a 386/25 machine, the program can complete a
- >compression/decompression cycle on one megabyte of data in less than
- >thirty seconds"
- >
- >"The compressed output file created by DataFiles/16 can be used as the
- >input file to subsequent executions of the program. This feature of
- >the utility is known as recursive or iterative compression, and will
- >enable you to compress your data files to a tiny fraction of the
- >original size. In fact, virtually any amount of computer data can
- >be compressed to under 1024 bytes using DataFiles/16 to compress its
- >own output files muliple times. Then, by repeating in reverse the
- >steps taken to perform the recusive compression, all original data
- >can be decompressed to its original form without the loss of a single
- >bit."
- >
- >They had a table that showed the expected size of resulting files,
- >with the warning that "Your actual compresion results may vary
- >slightly from the figures shown". Here is my abridged version of their
- >table.
- > ---------- Iteration -------
- >INPUT 1 2 3 4 Ratio
- >1K 630 1.6:1
- >16K 1.5K 812 20:1
- >64K 4K 1K 644 101:1
- >512K 30K 2K 938 558:1
- >8M 490K 14K 1.5K 798 10512:1
- >64M 3M 40K 3K 994 67513:1
- >
- >Their flier also claims:
- >
- >"Constant levels of compression across ALL TYPES of FILES"
- >"Convenient, single floppy DATA TRANSPORTATION"
- >
- >From my telephone conversation, I was was assured that this is an
- >actual compression program. Decompression is done by using only the
- >data in the compressed file; there are no hidden or extra files.
- >
- >
- >(c) More information, by Rafael Ramirez <rafael.ramirez@channel1.com>:
- >
- > Today (Tuesday, 28th) I got a call from Earl Bradley of Web
- >who now says that they have put off releasing a software version of
- >the algorithm because they are close to signing a major contract with
- >a big company to put the algorithm in silicon. He said he could not
- >name the company due to non-disclosure agreements, but that they had
- >run extensive independent tests of their own and verified that the
- >algorithm works. [...]
- >
- > Mr. Bradley went on to say that Web will not be sending out any
- >more copies to magazines and that they had even recalled the copy they
- >had sent to Byte. He claimed that he told the guy at Byte that the
- >version they had at the time had just been translated to assembler and
- >had some bugs, but that the guy at Byte kept insisting that they send
- >a copy anyway, and so of course the version Byte had didn't work.
- >
- >[Correction from Russ Schnapp <rschnapp@metaflow.com>:
- >Contrary to WEB's claim, I did NOT insist that WEB send me a
- >non-functional copy of the program. I didn't even ask for the latest
- >version.]
- >
- >He said the algorithm is so simple that he doesn't want anybody
- >getting their hands on it and copying it even though he said they
- >have filed a patent on it. [...] Mr. Bradley said the silicon version
- >would hold up much better to patent enforcement and be harder to copy.
- >
- > He claimed that the algorithm takes up about 4K of code, uses only
- >integer math, and the current software implementation only uses a 65K
- >buffer. He said the silicon version would likely use a parallel
- >version and work in real-time.
- >
- > He said they will be sending out copies to about seven companies
- >that want to license the technology for various applications but he
- >could not give out any names due to non-disclosure agreements. He
- >hoped that in about two weeks he will be able to make an announcement
- >and said he would call me when he could provide independent
- >verification that the algorithm works. For now, we can only wait.
- >
- > He also said they have not as yet sold anything, but that he's
- >been traveling constantly for the last three weeks (presumably to
- >the seven companies he mentioned) and that he hasn't slept very
- >well lately just thinking about all the applications the algorithm
- >could be applied to.
- >
- > And he confirmed that each pass will get 16:1 compression as
- >long as the data is >64K, and that regardless, any file should be
- >able to be compressed to less than 1024 bytes after enough passes.
- >I asked if he is claiming that they can compress ANY data including
- >data that is already compressed, and he just answered by saying
- >that they had tested the program by compressing PKZIP files, and
- >other files and that they all compressed as claimed (which of course
- >still doesn't answer the question).
- >
- >
- >(d) The interpretation of the claims
- >
- >The biggest controversy is over the claim to compress "all types of
- >files". As noted above by Rafael Ramirez, we do not know with
- >certainty if WEB claims to compress *all* files greater than 64K
- >bytes, or just *most* files. The WEB flier only says all *types* of
- >files, not *all* files. Keep this in mind when reading the
- >impossibility proof given below.
- >
- >
- >(e) The impossiblity proofs.
- >
- >It is impossible for a given program to compress without loss *all*
- >files greater than a certain size by at least one bit. This can be
- >proven by a simple counting argument. (Many other proofs have been
- >posted on comp.compression, *please* do not post yet another one.)
- >
- >Assume that the program can compress without loss all files of size >= N
- >bits. Compress with this program all the 2^N files which have
- >exactly N bits. All compressed files have at most N-1 bits, so there
- >are at most 2^(N-1) different compressed files. So at least two
- >different input files must compress to the same output file. (Actually
- >at least half of them, but two suffice for the proof.) Hence the
- >compression program cannot be lossless.
- >
- >This argument applies of course to WEB's case (take N = 64K*8 bits).
- >Note that no assumption is made about the compression algorithm.
- >The proof applies to *any* algorithm, including those using an
- >external dictionary, or repeated application of another algorithm,
- >or combination of different algorithms, or representation of the
- >data as formulas, etc... All schemes are subject to the counting argument.
- >There is no need to use information theory to provide a proof, just
- >basic mathematics.
- >
- >This assumes of course that the information available to the decompressor
- >is only the bit sequence of the compressed data. If external information
- >such as a file name or a number of iterations is necessary to decompress
- >the data, the bits providing the extra information must be included in
- >the bit count of the compressed data. (Otherwise, it would be sufficient
- >to consider any input data as a number, use this as the iteration
- >count or file name, and pretend that the compressed size is zero.)
- >
- >
- >(f) Late news: no software version
- >
- >Appeared on BIX, reposted by Bruce Hoult <Bruce.Hoult@actrix.gen.nz>:
- >
- >tojerry/chaos #673, from abailey, 562 chars, Tue Jun 16 20:40:34 1992
- >Comment(s).
- >----------
- >TITLE: WEB Technology
- >I promised everyone a report when I finally got the poop on WEB's
- >16:1 data compression. After talking back and forth for a year
- >and being put off for the past month by un-returned phone calls,
- >I finally got hold of Marc Spindler who is their sales manager.
- >
- >_No_ software product is forth coming, period!
- >
- >He began talking about hardware they are designing for delivery
- >at the end of the year. [...]
- >
- >
- >(g) Conclusion
- >
- >Most readers of comp.compression are tired of this thread. Please,
- >please, do not post another article "I know this has been beaten to
- >death, but...".
- >
- >The consensus is that we have to wait until WEB delivers a real product
- >which can be independently tested. Only then will it be possible
- >to know exactly what the product can and cannot do.
- >
- >[See also question 73 "What is the theoretical compression limit?" in
- >part 2 of this FAQ.]
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [11] What is the V.42bis standard?
- >
- >
- >from Alejo Hausner <hausner@qucis.queensu.ca>:
- >
- >The V.42bis Compression Standard was proposed by the International
- >Consultative Committee on Telephony and Telegraphy (CCITT) as an
- >addition to the v.42 error-correction protocol for modems. Its purpose
- >is to increase data throughput, and uses a variant of the
- >Lempel-Ziv-Welch (LZW) compression method. It is meant to be
- >implemented in the modem hardware, but can also be built into the
- >software that interfaces to an ordinary non-compressing modem.
- >
- >V.42bis can send data compressed or not, depending on the
- >data. There are some types of data that cannot be
- >compressed. For example, if a file was compressed first,
- >and then sent through a V.42bis modem, the modem would not
- >likely reduce the number of bits sent. Indeed it is likely
- >that the amount of data would increase somewhat.
- >
- >To avoid this problem, the algorithm constantly monitors the
- >compressibility of the data, and if it finds fewer bits
- >would be necessary to send it uncompressed, it switches to
- >transparent mode. The sender informs the receiver of this
- >transition through a reserved escape code. Henceforth the
- >data is passed as plain bytes.
- >
- >The choice of escape code is clever. Initially, it is a
- >zero byte. Any occurrence of the escape code is replaced,
- >as is customary, by two escape codes. In order to prevent a
- >string of escape codes from temporarily cutting throughput
- >in half, the escape code is redefined by adding 51 mod 256
- >each time it is used.
- >
- >While transmitting in transparent mode, the sender maintains
- >the LZW trees of strings, and expects the receiver to do
- >likewise. If it finds an advantage in returning to
- >compressed mode, it will do so, first informing the receiver
- >by a special control code. Thus the method allows the
- >hardware to adapt to the compressibility of the data.
- >
- >
- >The CCITT standards documents are available by ftp on src.doc.ic.ac.uk,
- >in directory doc/ccitt-standards/ccitt. The v42bis standard is in
- >/doc/ccitt-standards/ccitt/1992/v/v42bis.asc.Z.
- >(ccitt standards used to be on ftp.uu.net in /doc/standards/ccitt
- >but this directory is now empty -- Aug 11th.)
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [12] I need source for the winners of the Dr Dobbs compression contest
- >
- >
- >The source of the top 6 programs of the Feb 91 Dr Dobbs data compression
- >contest are available by ftp on
- > wsmr-simtel20.army.mil in pd1:<msdos.compress>ddjcompr.zip. [192.88.110.2]
- > garbo.uwasa.fi:/pc/source/ddjcompr.zip [128.214.87.1]
- >
- >The sources are in MSDOS end-of-line format, one directory per program.
- >Unix or VMS users, use "unzip -ad ddjcompr" to get correct end-of-lines
- >and recreate the directory structure. Five of the 6 programs are not
- >portable and only run on MSDOS.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [13] I need source for arithmetic coding
- >
- >
- >(See question 70 for an introduction to arithmetic coding.)
- >
- >Kris Popat <popat@image.mit.edu> has worked on "Scalar Quantization
- >with Arithmetic Coding." It describes an arithmetic coding technique
- >which is quite general and computationally inexpensive. The
- >documentation and example C code are available via anonymous ftp from
- >media-lab.media.mit.edu (18.85.0.2), in /pub/k-arith-code.
- >
- >See also the directory /pub/arithmetic.coding on fsa.cpsc.ucalgary.ca
- >[136.159.2.1].
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [15] Where can I get image compression programs?
- >
- >
- >JPEG:
- > Source code for most any machine:
- > ftp.uu.net:/graphics/jpeg/jpegsrc.v3.tar.Z [137.39.1.9]
- > nic.funet.fi:/pub/graphics/programs/jpeg/jpegsrc.v3.tar.Z [128.214.6.100]
- > Contact: jpeg-info@uunet.uu.net (Independent JPEG Group)
- >
- > xv, an image viewer which can read JPEG pictures, is available in
- > ftp.cicb.fr:/pub/X11R5/contrib/xv-2.20.tar.Z [129.20.128.2]
- >
- >epic:
- > whitechapel.media.mit.edu:/pub/epic.tar.Z [18.85.0.125]
- > The "Lenna" test image is available as part of the EPIC package,
- > where it is named "test_image".
- >
- >compfits:
- > uwila.cfht.hawaii.edu:/pub/compfits/compfits.tar.Z [128.171.80.50]
- > Contact: Jim Wright <jwright@cfht.hawaii.edu>
- >
- >fitspress:
- > 128.103.40.79:/pub/fitspress08.tar.Z
- >
- >tiff:
- > For source and sample images, see question 18 below.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [16] What is the state of the art in lossless image compression?
- >
- >
- >The current state-of-the-art is the JBIG algorithm. For an
- >introduction to JBIG, see question 54 in part 2.
- >
- >JBIG works best on bi-level images (like faxes) and also works well on
- >Gray-coded grey scale images up to about six or so bits per pixel. You
- >just apply JBIG to the bit planes individually. For more bits/pixel,
- >lossless JPEG provides better performance, sometimes. (For JPEG, see
- >question 19 below.)
- >
- >You can find a description of JBIG in ISO/IEC CD 11544, contained in
- >document ISO/IEC JTC1/SC2/N2285. The only way to get it is to ask
- >your National Standards Body for a copy.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [17] What is the state of fractal compression?
- >
- >
- >from Tal Kubo <kubo@zariski.harvard.edu>:
- >
- >According to Barnsley's book 'Fractals Everywhere', this method is
- >based on a measure of deviation between a given image and its
- >approximation by an IFS code. The Collage Theorem states that there is
- >a convergent process to minimize this deviation. Unfortunately,
- >according to an article Barnsley wrote for BYTE a few years ago, this
- >convergence was rather slow, about 100 hours on a Cray, unless assisted by
- >a person.
- >
- >Barnsley et al are not divulging any technical information beyond the
- >meager bit in 'Fractals Everywhere'. The book explains the idea of IFS
- >codes at length, but is vague about the application of the Collage theorem
- >to specific compression problems.
- >
- >There is reason to believe that Barnsley's company has
- >*no algorithm* which takes a given reasonable image and achieves
- >the compression ratios initially claimed for their fractal methods.
- >The 1000-to-1 compression advertised was achieved only for a 'rigged'
- >class of images, with human assistance. The best unaided
- >performance I've heard of is good lossy compression of about 80-1.
- >
- >Steve Tate <srt@duke.cs.duke.edu> confirms:
- >
- >Compression ratios (unzoomed) seem to range from 20:1 to 60:1... The
- >quality is considerably worse than wavelets or JPEG on most of the
- >non-contrived images I have seen.
- >
- >
- >There is a fractal image compression demo program available via anonymous
- >ftp in lyapunov.ucsd.edu:/pub/fractal_image_processing/all.tar.Z.
- >There are executables and sample images in the same directory.
- >
- >References:
- > M. Barnsley, L. Anson, "Graphics Compression Technology, SunWorld,
- > October 1991, pp. 42-52.
- > M.F. Barnsley, A. Jacquin, F. Malassenet, L. Reuter & A.D. Sloan,
- > 'Harnessing chaos for image synthesis', Computer Graphics,
- > vol 22 no 4 pp 131-140, 1988.
- > M.F. Barnsley, A.E. Jacquin, 'Application of recurrent iterated
- > function systems to images', Visual Comm. and Image Processing,
- > vol SPIE-1001, 1988.
- > A. Jacquin, "Image Coding Based on a Fractal Theory of Iterated Contractive
- > Image Transformations" p.18, January 1992 (Vol 1 Issue 1) of IEEE Trans
- > on Image Processing.
- > A. Jacquin, A Fractal Theory of Iterated Markov Operators with
- > Applications to Digital Image Coding, PhD Thesis, Georgia Tech, 1989.
- > A.E. Jacquin, 'A novel fractal block-coding technique for digital
- > images', Proc. ICASSP 1990.
- > A. Jacquin, 'Fractal image coding based on a theory of iterated
- > contractive image transformations', Visual Comm. and Image
- > Processing, vol SPIE-1360, 1990.
- > G.E. Oien, S. Lepsoy & T.A. Ramstad, 'An inner product space
- > approach to image coding by contractive transformations',
- > Proc. ICASSP 1991, pp 2773-2776.
- > D.S. Mazel, Fractal Modeling of Time-Series Data, PhD Thesis,
- > Georgia Tech, 1991. (One dimensional, not pictures)
- > S. A. Hollatz, "Digital image compression with two-dimensional affine
- > fractal interpolation functions", Department of Mathematics and
- > Statistics, University of Minnesota-Duluth, Technical Report 91-2.
- > (a nuts-and-bolts how-to-do-it paper on the technique)
- > Stark, J., "Iterated function systems as neural networks",
- > Neural Networks, Vol 4, pp 679-690, Pergamon Press, 1991.
- > Monro D M and Dudbridge F, "Fractal block coding of images",
- > Electronics Letters 28(11):1053-1054 (1992)
- > Beaumont J M, "Image data compression using fractal techniques",
- > British Telecom Technological Journal 9(4):93-108 (1991)
- > Fisher Y, "Fractal image compression", Siggraph 92
- > Graf S, "Barnsley's Scheme for the Fractal Encoding of Images",
- > Journal Of Complexity, V8, 72-78 (1992).
- >
- >
- >Barnsley's company is:
- >
- >Iterated Systems, Inc.
- >5550A Peachtree Parkway, Suite 545
- >Norcross, GA 30092
- >404-840-0728
- >404-840-0029 (fax)
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [18] I need specs and source for TIFF and CCITT group 4 Fax
- >
- >
- >Specs for Group 3 and 4 image coding (group 3 is very similar to group 4)
- >are in CCITT (1988) volume VII fascicle VII.3. They are recommendations
- >T.4 and T.6 respectively. There is also an updated spec contained in 1992
- >recommendations T.1 to T.6.
- >
- >CCITT specs are available by anonymous ftp (see above answer on V.42bis).
- >The T.4 spec is in ccitt/1988/ascii/7_3_01.txt.Z, the T.6 spec
- >is in 7_3_02.txt.Z.
- >
- >Source code can be obtained as part of a TIFF toolkit - TIFF image
- >compression techniques for binary images include CCITT T.4 and T.6:
- >
- > sgi.com:/graphics/tiff/v3.0beta.tar.Z [192.48.153.1]
- > Contact: sam@sgi.com
- >
- >There is also a companion compressed tar file (v3.0pics.tar.Z) that
- >has sample TIFF image files. A draft of TIFF 6.0 is in TIFF6.ps.Z.
- >
- >See also question 54 below.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [19] What is JPEG?
- >
- >
- >JPEG (pronounced "jay-peg") is a standardized image compression mechanism.
- >JPEG stands for Joint Photographic Experts Group, the original name of the
- >committee that wrote the standard. JPEG is designed for compressing either
- >full-color or gray-scale digital images of "natural" (real-world) scenes.
- >JPEG does not handle black-and-white (1-bit-per-pixel) images, nor does it
- >handle motion picture compression. (Standards for compressing those types
- >of images are being worked on by other committees, named JBIG and MPEG
- >respectively.)
- >
- >A good introduction to JPEG is posted regularly in news.answers by
- >Tom Lane <tgl+@cs.cmu.edu>. (See question 53 "Where are FAQ lists archived"
- >if this posting has expired at your site.)
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [20] I am looking for source of an H.261 codec.
- >
- >
- >The H.261 spec is available on src.doc.ic.ac.uk in
- >/doc/ccitt-standards/ccitt/1992/h/h261.doc.Z (or h261.rtf.Z)
- >
- >
- >from Thierry TURLETTI <turletti@sophia.inria.fr>:
- >
- >We have implemented a software version of H.261 codec.
- >It runs on top of UNIX and X-Windows. The coder uses the simple video capture
- >board "VideoPix" provided by SUN for the SparcStation. The output is directed
- >towards a standard TCP connection, instead of the leased lines or switched
- >circuits for which regular H.261 codecs are designed. This enable us to test
- >video conferences over regular internet connections.
- >We have to polish it a bit, but the first release is now available by anonymous
- >ftp from avahi.inria.fr, in "/pub/h261.tar.Z".
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [25] Fast DCT (Discrete Cosine Transform) algorithms
- >
- >
- >Here are some references provided by Tom Lane <tgl+@cs.cmu.edu>:
- >
- >Polynomial Transform Computation of the 2-D DCT, Duhamel & Guillemot,
- > ICASSP '90 p. 1515.
- >A Forward-Mapping Realization of the Inverse DCT, McMillan & Westover,
- > DCC '92 p. 219.
- >A Fast Algorithm for 2-D DCT, Cho, Yun & Lee, ICASSP '91 p. 2197.
- >Fast Algorithm and Implementation of 2-D DCT, Cho & Lee, Tr. CAS v38 p. 297.
- >A DCT Chip based on a new Structured and Computationally Efficient DCT
- > Algorithm, Duhamel, Guillemot & Carlach, ICCAS '90 p. 77.
- >Trade-offs in the Computation of Mono- and Multi-dimensional DCTs,
- > Vetterli, Duhamel & Guillemot, ICASSP '89 p. 999.
- >Practical Fast 1-D DCT Algorithms with 11 Multiplications,
- > Loeffler, Ligtenberg & Moschytz, ICASSP '89 p. 988.
- >New Scaled DCT Algorithms for Fused Multiply/Add Architectures,
- > Linzer & Feig, ICASSP '91 p. 2201.
- >Fast Algorithms for the 2-D Discrete Cosine Transform, Kamangar & Rao,
- > IEEE Tr. Computers, v C-31 p. 899.
- >Fast 2-D Discrete Cosine Transform, Vetterli, ICASSP '85 p. 1538.
- >A Two-Dimensional Fast Cosine Transform, Haque, Tr. ASSP v ASSP-33 p. 1532.
- >Real-Time Parallel and Fully Pipelined 2-D DCT Lattice Structures with
- > Application to HDTV Systems, Chiu & Liu, Tr. CAS for Video Tech, v 2 p. 25.
- >
- >The Rao & Yip book cited in the jpeg v3 DCT code (see item 15 above) is a
- >good overall introduction, with an extensive (though now dated) bibliography.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [26] Are there algorithms and standards for audio compression?
- >
- >
- >Yes. See the introduction to MPEG given in part 2 of this FAQ.
- >
- >A compression program for Sun audio files (.au) is available by anonymous
- >ftp as svr-ftp.eng.cam.ac.uk:/pub/misc/shorten-0.2.shar.
- >
- >
- >Copied from the comp.dsp FAQ posted by guido@cwi.nl (Guido van Rossum):
- >
- >Strange though it seems, audio data is remarkably hard to compress
- >effectively. For 8-bit data, a Huffman encoding of the deltas between
- >successive samples is relatively successful. For 16-bit data,
- >companies like Sony and Philips have spent millions to develop
- >proprietary schemes.
- >
- >Public standards for voice compression are slowly gaining popularity,
- >e.g. CCITT G.721 and G.723 (ADPCM at 32 and 24 kbits/sec). (ADPCM ==
- >Adaptive Delta Pulse Code Modulation.) Free source code for a *fast*
- >32 kbits/sec ADPCM algorithm is available by ftp from ftp.cwi.nl as
- >/pub/adpcm.shar.
- >
- >
- >There are also two US federal standards, 1016 (Code excited linear
- >prediction (CELP), 4800 bits/s) and 1015 (LPC-10E, 2400 bits/s). See
- >also the appendix for 1016.
- >
- >(Note that U-LAW and silence detection can also be considered
- >compression schemes.)
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [30] My archive is corrupted!
- >
- >
- >The two most common reasons for this are
- >
- >(1) failing to use the magic word "tenex" (when connected to SIMTEL20 and
- > other TOPS20 systems) or "binary" (when connected to UNIX systems) when
- > transferring the file from an ftp site to your host machine. The
- > reasons for this are technical and boring. A synonym for "tenex" is
- > "type L 8", in case your ftp doesn't know what "tenex" means.
- >
- >(2) failing to use an eight-bit binary transfer protocol when transferring
- > the file from the host to your PC. Make sure to set the transfer type
- > to "binary" on both your host machine and your PC.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [31] pkunzip reports a CRC error!
- >
- >
- >The portable zip contains many workarounds for undocumented restrictions
- >in pkunzip. Compatibility is ensured for pkunzip 1.10 only. All previous
- >versions (pkunzip 1.0x) have too many bugs and cannot be supported. This
- >includes Borland unzip.
- >
- >So if your pkunzip reports a CRC error, check that you are not using
- >an obsolete version. Get either pkzip 1.10 or unzip 4.2 (see question
- >2 above for ftp sites).
- >
- >Immediately after zip 1.0 was released, a new undocumented feature
- >of pkunzip was discovered, which causes CRC errors even with pkunzip 1.10
- >on rare occasions. A patch is available on valeria.cs.ucla.edu in
- >/pub/zip10.patch.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [32] VMS zip is not compatible with pkzip!
- >
- >
- >The problem is most likely in the file transfer program.
- >
- >Many use kermit to transfer zipped files between PC and VMS VAX. The
- >following VMS kermit settings make VMS-ZIP compatible with PKZIP:
- >
- > VMS kermit PC kermit
- > --------------- --------------
- >
- >Uploading PKZIPped file to be UNZIPped: set fi ty fixed set fi ty bi
- >Downloading ZIPped file to be PKUNZIPped: set fi ty block set fi ty bi
- >
- >If you are not using kermit, transfer a file created by pkzip on MSDOS
- >to VMS, transfer it back to your PC and check that pkunzip can extract it.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [50] What is this 'tar' compression program?
- >
- >
- >tar is not a compression program. It just combines several files
- >into one, without compressing them. tar file are often compressed with
- >'compress', resulting in a .tar.Z file. See question 2, file type .tar.Z.
- >(However, some versions of tar have the capability to compress files
- >as well.)
- >
- >When you have to archive a lot of very small files, it is often
- >preferable to create a single .tar file and compress it, than to
- >compress the individual files separately. The compression program can
- >thus take advantage of redundancy between separate files. The
- >disadvantage is that you must uncompress the whole .tar file to
- >extract any member.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [51] I need a CRC algorithm
- >
- >
- >As its name implies (Cyclic Redundancy Check) a crc adds redundancy
- >whereas the topic of this group is to remove it. But since this
- >question comes up often, here is some code (by Rob Warnock <rpw3@sgi.com>).
- >
- >The following C code does CRC-32 in BigEndian/BigEndian byte/bit order.
- >That is, the data is sent most significant byte first, and each of the bits
- >within a byte is sent most significant bit first, as in FDDI. You will need
- >to twiddle with it to do Ethernet CRC, i.e., BigEndian/LittleEndian byte/bit
- >order. [Left as an exercise for the reader.]
- >
- >The CRCs this code generates agree with the vendor-supplied Verilog models
- >of several of the popular FDDI "MAC" chips.
- >
- >u_long crc32_table[256];
- >/* Initialized first time "crc32()" is called. If you prefer, you can
- > * statically initialize it at compile time. [Another exercise.]
- > */
- >
- >u_long crc32(u_char *buf, int len)
- >{
- > u_char *p;
- > u_long crc;
- >
- > if (!crc32_table[1]) /* if not already done, */
- > init_crc32(); /* build table */
- > crc = 0xffffffff; /* preload shift register, per CRC-32 spec */
- > for (p = buf; len > 0; ++p, --len)
- > crc = (crc << 8) ^ crc32_table[(crc >> 24) ^ *p];
- > return ~crc; /* transmit complement, per CRC-32 spec */
- >}
- >
- >/*
- > * Build auxiliary table for parallel byte-at-a-time CRC-32.
- > */
- >#define CRC32_POLY 0x04c11db7 /* AUTODIN II, Ethernet, & FDDI */
- >
- >init_crc32()
- >{
- > int i, j;
- > u_long c;
- >
- > for (i = 0; i < 256; ++i) {
- > for (c = i << 24, j = 8; j > 0; --j)
- > c = c & 0x80000000 ? (c << 1) ^ CRC32_POLY : (c << 1);
- > crc32_table[i] = c;
- > }
- >}
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [52] What about those people who continue to ask frequently asked
- > questions in spite of the frequently asked questions document?
- >
- >
- >Just send them a polite mail message, referring them to this document.
- >There is no need to flame them on comp.compression. That would just
- >add more noise to this group. Posted answers that are in the FAQ are
- >just as annoying as posted questions that are in the FAQ.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [53] Where are FAQ lists archived?
- >
- >
- >Many are crossposted to news.answers. That newsgroup should have a
- >long expiry time at your site; if not, talk to your sysadmin.
- >
- >FAQ lists are available by anonymous FTP from pit-manager.mit.edu
- >(18.72.1.58) and by email from mail-server@pit-manager.mit.edu (send
- >a message containing "help" for instructions about the mail server).
- >
- >This posting is /pub/usenet/news.answers/compression-faq/part1.
- >Part 2 is in (guess?) compression-faq/part2.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [54] I need specs for graphics formats
- >
- >
- >Have a look in directory /pub/graphics.formats on zamenhof.cs.rice.edu.
- >It contains descriptions of gif, tiff, fits, etc...
- >
- >See also the FAQ list for comp.graphics.
- >
- >------------------------------------------------------------------------------
- >
- >Subject: [55] Where can I find Lenna and other images?
- >
- >
- >A bunch of standard images (lenna, baboon, cameraman, crowd, moon
- >etc..) were on ftp site gauss.eedsp.gatech.edu (130.207.226.2) in
- >directory /database/images. On Apr 1st, the system manager said:
- >this site has had some hardware problems and will have the image
- >database back online as soon as the problems get corrected. However
- >the images are still not there (June 4th).
- >
- >The site ftp.ipl.rpi.edu also has standard images, in two directories:
- > ftp.ipl.rpi.edu:/pub/image/still/usc
- > ftp.ipl.rpi.edu:/pub/image/still/canon
- >
- >In each of those directories are the following directories:
- > bgr - 24 bit blue, green, red
- > color - 24 bit red, green, blue
- > gray - 8 bit grayscale uniform weighted
- > gray601 - 8 bit grayscale CCIR-601 weighted
- >
- >And in these directories are the actual images.
- >
- >For example, the popular lena image is in
- > ftp.ipl.rpi.edu:/pub/image/still/usc/color/lena # 24 bit RGB
- > ftp.ipl.rpi.edu:/pub/image/still/usc/bgr/lena # 24 bit BGR
- > ftp.ipl.rpi.edu:/pub/image/still/usc/gray/lena # 8 bit gray
- >
- >All of the images are in Sun rasterfile format. You can use the pbm
- >utilities to convert them to whatever format is most convenient.
- >[pbm is available in ftp.ee.lbl.gov:/pbmplus*.tar.Z].
- >Questions about the ipl archive should be sent to rodney@ipl.rpi.edu.
- >
- >The archive maintainer at ftp.ipl.rpi.edu is interested in some method
- >of establishing a canonical ftp database of images and could volunteer
- >the ipl to be an ftp site for that database. Send suggestions to
- >rodney@ipl.rpi.edu.
- >
- >
- >Beware: the same image often comes in many different forms, at
- >different resolutions, etc... The original lenna image is 512 wide,
- >512 high, 8 bits per pel, red, green and blue fields. Gray-scale
- >versions of Lenna have been obtained in two different ways from the
- >original:
- > (1) Using the green field as a gray-scale image, and
- > (2) Doing an RGB->YUV transformation and saving the Y component.
- >Method (1) makes it easier to compare different people's results since
- >everyone's version should be the same using that method. Method (2)
- >produces a more correct image.
- >
- >For the curious: 'lena' or 'lenna' is a digitized Playboy centerfold,
- >from November 1972. (Lenna is the spelling in Playboy, Lena is the
- >Swedish spelling of the name.) Lena Soderberg (ne Sjooblom) was last
- >reported living in her native Sweden, happily married with three kids
- >and a job with the state liquor monopoly. In 1988, she was
- >interviewed by some Swedish computer related publication, and she was
- >pleasantly amused by what had happened to her picture. That was the
- >first she knew of the use of that picture in the computer business.
- >
- >The editorial in the January 1992 issue of Optical Engineering (v. 31
- >no. 1) details how Playboy has finally caught on to the fact that
- >their copyright on Lenna Sjooblom's photo is being widely infringed.
- >It sounds as if you will have to get permission from Playboy to
- >publish it in the future.
- >
- > End of part 1 of the comp.compression faq.
-
- To Whom It May Concern,
-
- A few months ago I posted a request to the net regarding a postscript
- compression routine. I have since found out that Postscript 2 is to have
- it's own standard way to compress images, nevertheless I have written a
- postscript image compressor.
-
- The postscript compressor I have written is in perl but the form of compression
- is run-length encoding, and could be written in other languages, such as C, sed,
- awk, nawk, etc. (I am sure that Larry could probably even write the thing in
- roff--the man is mutant! ceveat: I am forever in his debt for writting perl.)
-
- The most common responce that I received when telling my peers of my plans was
- "Why write the thing in postscript?" or "Why write the thing at all?"
-
- Everyone knows that the way screen dump/image capture routines output their
- results on various architures is straight forward but disk wasteful. Computer
- screen images are not like images from other sources in that they almost always
- have huge amounts of repetition, large areas all the same pattern. And are
- therefore very susceptible to run-length-encoding-type compression schemes.
-
- The reasons I think that a compression routine written in postscript is valuable
- are that the resulting compressed image is still a valid posctscript program,
- which can be sent via E-mail without fear of corruption to anyone with a
- postscript printer regardless of the machine to which it is connected, and
- futhermore can be subsequently compressed with your favorite routine:
- compress, pack, zip, zoo, arc, etc.
-
- Finally I realize that this is not the most efficient method, but before I work
- on enhancing it, I thought that I'd post it. I would like your input on how to
- improve it and I thought that many of you could use it 'as is' to help with
- disk space problems (a never-ending problem at our site at least). I have found
- that users mind this form of compression less since it requires no additional
- actions to print the file later.
-
- If you use this package, I would appreciate timing results/comments/suggestions.
- Please E-mail me any posts that you might make concerning this post so that
- I will post a summary at a later date:
-
- Steve Parker parker@mps.ohio-state.edu
- Dept of Chemistry 201 McPherson Labs
- Ohio State University 614-292-5042
-
- I have thought of many ways to improve this script:
-
- 1) Record which and how many of the decompression routines were used in a given
- image, and insert only those that were used and in the order of fequency
- that they were used.
- 2) Search the unique runs for patterns of 2 4 or 8 characters.
- 3) Make a new decompression routine called I for insert which would be used for
- inserting unique string into a long run of repeating characters.
-
- NOTE: The above ideas would most easily be accomplished by saving the compressed
- image in memory or a temp file and/or making multiple passes at the image
- data.
-
- Any other suggestions?
-
- Here are timing results for postscript images created on various machines,
- compressed on a Sparc ELC and printed on a AppleLaserWriter II:
-
-
- Test cases:
-
- (Apple LaserWriter II)
- Filename size in chars bits in time to approximate time
- image compress to print
- -------------------------------------------------------------------------
- snapshot.cmp.ps 63861 --- 67.0 s 100 s
- snapshot.ps 262906 1024000 -- 245 s
- stripes.cmp.ps 2241 --- 31.0 s 30 s
- stripes.ps 133403 1036800 -- 130 s
- iris.cmp.ps 73384 --- 68.5 s 100 s
- iris.ps 261385 524288 -- 250 s
- stellar.cmp.ps 129140 --- 1027.3 s 425 s
- stellar.ps 1968436 1966728 -- 1740 s
-
- I am presently getting results for NeXT printers, and some others.
-
- These files are available by E-mail at request to above address.
-
- Here is my description of the two pieces necessary for
- compression/decompression (I originally had two files but now use the <DATA>
- file handle of perl):
-
- decomp.header is the postscript decompression header that will be used in
- place of
- "/picstr 1024 string def
- { currentfile /picstr readhexstring pop }"
- which is often used as the proc for the image function
- ie "width hieght bitpersample proc image"
-
- pscmp is the perl script that compresses the hex digit pair
- format often used to encode a bitmap in postscript, it also
- inserts the decompression header file in a clever way.
- Since the last thing on the stack before the image command
- is called is the procedure that image will use to obtain the
- image, pscmp looks for the image command and inserts
- pop { decompress }
- before it. The 'pop' command removes whatever procedure was
- on the stack and then '{ decompress }' (my command) is pushed
- on the stack in it's place.
-
- It does compression with the following four "codes":
- u - one character follows, whos ascii value will determine
- how many "unique" hex pairs follow. 1-256 pairs.
- U - two characters follows, whos ascii values will determine
- how many "unique" hex pairs follow. 257-65535 pairs.
- r - one character follows, whos ascii value will determine
- how many times to "repeat" the hex pair that follows.
- R - one characters follows, whos ascii values will determine
- how many times to "repeat" the hex pair that follows.
- NOTES:
- * ranges for R and U could not be made to be 257-65792,
- without splitting the runs into multiple strings,
- since the largest string is 65335.
- * I attempted two ways of storing the length of unique and
- repeating runs.
- The first and most straight forward to interpret in
- postscipt, was to store them as one or two characters whose
- ascii value was then interpretted as an integer by using the
- 'currentfile read pop' sequence.
- The second used two or four digit hex number to represent
- the length of the run, and used the postscript command
- sequence:
-
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /getcount { hexnum2 3 currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3 currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
-
- which works by putting the hex number ,ie. 'fd', in a string
- like '16#00' thus giving the string '16#fd' which the command
- 'cvi' interprets as 0xfd, or 253.
-
- The later method was necessary because characters representing
- serial port I/O controls, ie. '^D', '^S/^Q' were interpretted
- by the printers I/O control and not pasted to the postscript
- interpretter.
- The former method did work however with Sun's Postscript
- previewer "pageview version 3"
- * pscmp removes the comments and unnecessary white space (used
- for readability) from decomp.header as it inserts it into the
- postscript.
-
- *******************************************************************************
- Here is the script:
-
- #!/usr/local/bin/perl
- # A perl script to compress postscript images.
-
- # Written by:
- # Steve Parker parker@mps.ohio-state.edu
- # Dept of Chemistry 201 McPherson Labs
- # Ohio State University 614-292-5042
- # Columbus OH, 43210
- #
- # codes: u - small count run of unique hex pairs
- # U - big count run of unique hex pairs
- # r - small count+1 repeated hex pair
- # R - big count+1 repeated hex pair
- # a repeat last r or R. NOT SUPPORTED IN THIS PERL SCRIPT.
- #
- # formats: u cc 'hphp...'
- # U CC CC 'hphp...'
- # r cc 'hp'
- # R CC CC 'hp'
- #
- # where: 1) spaces are not output
- # 2) uUrR are output literally
- # 3) cc is a 2 digit hex number (0-255) and represents range (1-256)
- # 4) CCCC is a 4 digit hex number (0-65535) for a range (257-65535)
- # if not for max size on postscript string would be (257-65792)
- # 5) 'hp' is a hex digit pair from 'image' data.
-
- $name = $0;
- $name =~ s'.*/''; # remove path--like basename
- $usage = "usage:\n$name [postscript_file_with_IMAGE_data]";
-
- select(STDOUT); $|=1;
-
- $biggest=65534;
- $last="";
- while (<>) {
- if ( /([^A-Fa-f\d\n])/ ) {
- # print "'$1' ->$_";
- if ($_ =~ /showpage/ || $_ =~ /grestore/ ) {
- #
- # FOUND a showpage or grestore so write out last repeating pair or unique run.
- #
- if ($repeating) {
- # we didn't record the first pair in $repeating
- # so we needn't subtract 1.
- #$num=$repeating-1;
- $num=$repeating;
- if ( $num <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- printf("r%02X%2s\n",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- printf("R%02X%02X%2s\n",int($num/256),($num%256),$last);
- $R++;
- }
- } else {
- $unique_str.=$last;
- # we didn't yet record this last pair in $unique_run
- # so we needn't subtract 1.
- $num=$unique_run;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- printf("u%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- & end;
- }
- # add the postscript decompression header
- # inbetween the original proc called by the 'image' command
- # and the 'image' command itself
- if ( $_ =~ /^(image\s?.*)$|^([^%]*)?(\simage\s?.*)$/ ) {
- print "$1\n" if ($1);
- if ($headerin) {
- $file="/home/sysadmin/postscript/compress/decomp.header";
- # open(HEADER,"$file") || die("$name: Cannot open $file: '$!'\n");
- while (<DATA>) { s/(\s)\s+/\1/g; print if !(/^%/); }
- $headerin++;
- close(DATA);
- } else {
- print " pop { decompress } ";
- }
- print "$2";
- next;
- }
- print;
- next;
- } # else { print "\n" if ($unique_run || $repeating); }
- #
- #-------------------- HEX PAIR HANDLING LOOP --------------------------
- #
- while (s?([A-F0-9a-f][A-F0-9a-f])??) {
- if ($repeating) {
- if ($1 eq $last) {
- #-debug print STDERR "rs"; # repeating; same
- $repeating++; # found another one.
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($repeating eq $biggest) {
- printf("Rfffe%2s",$last);
- # set to start over fresh
- $repeating=0;
- # $unique_str should be set to null and $unique_run set to 0
- }
- } else {
- #-debug print STDERR "rd"; # repeating; different
- #
- # FOUND a unique hex pair so repeating unit has ended, write it out.
- #
- #$num=$repeating-1;
- $num=$repeating;
- if ( $repeating <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- # -line- $line+=6; if ( $line > 80) { $line=6; print "\n"; }
- #-debug printf STDERR ">2,%2X,%2s ",$num,$last;
- printf("r%02X%2s",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- # -line- $line+=8; if ( $line > 80) { $line=8; print "\n"; }
- #-debug printf(">3,%2X,%2X,%2s ",int($num/256),($num%256),$last);
- printf("R%02X%02X%2s",int($num/256),($num%256),$last);
- $R++;
- }
- $repeating=0;
- $last=$1;
- }
-
- } else { # must be unique'ing
-
- if ($1 eq $last) {
- #-debug print "us"; # uniquing; same
- #
- # FOUND a repeating hex pair so might have a unique run
- # which has ended, if so write it out.
- #
- if ($unique_str) {
- $num=$unique_run-1;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- # -line- $line+=(4+$unique_run)); if ( $line > 80) { $line=4+$unique_run; print "\n"; }
- #-debug printf("\n>0,%2X,'%s' ",$num,$unique_str);
- printf("\nu%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- # -line- $line+=(6+$unique_run); if ( $line > 80) { $line=6+$unique_run; print "\n"; }
- #-debug printf("\n>1,%2X,%2X,'%s' ",int($num/256),($num%256),
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- # start counting repeating pairs, reset unique_run count
- # and remember last.
- $repeating++;
- $unique_str='';$unique_run=0;
- $last=$1;
- } else { # countiue uniquing
- #-debug print "ud"; # uniquing; different
- $unique_str.=$last;
- # $unique_run+=2; # use this if using $line to limit to 80 chars/line.
- # but REMEMBER to divid by two when outputing!
- $unique_run++;
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($unique_run eq $biggest) {
- printf("Ufffe%s",$unique_str);
- # set to start over fresh
- $unique_str='';$unique_run=0;
- $last=$1;
- # $repeating should be set to 0
- }
- $last=$1;
- }
- }
- }
- }
- &end;
- sub end {
- printf STDERR "Statistics:\n" ;
- printf STDERR "r's:%5d\n",$r ;
- printf STDERR "R's:%5d\n",$R ;
- printf STDERR "u's:%5d\n",$u ;
- printf STDERR "U's:%5d\n",$U ;
- ($user,$system,$cuser,$csystem)=times;
- printf STDERR "Times:\tuser,\tsystem,\tcuser,\tcsystem\n";
- printf STDERR "Times:\t%5f,\t%5f,\t%5f,\t%5f\n",
- $user,$system,$cuser,$csystem;
- exit;
- }
- __END__
- %-------------------------------------------------------------------------------
- %
- % header to define 'decompress' which will replace the
- % { currentfile string readhexstring pop } proc commonly used with 'image'
- %
- % to be placed just before the 'image' command
- % the 'pop' on the following line is to remove bogus 'proc' (as above)
- pop
- /repeater 1 string def
- /char 1 string def
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /debug 30 string def
- /big 65535 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /gethexpair { currentfile char readhexstring pop } def
- /getcount { hexnum2 3
- currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3
- currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
- /codeu { pop /cnt getcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /codeU { pop /cnt getbigcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /coder { pop /cnt getcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeR { pop /cnt getbigcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeX { pop big 0 cnt 1 add getinterval } def
- /done { currentfile debug readstring pstack exit } def
- /skip { pop decompress } def
- #
- # the following order of r,u,R,U was chosen by noting the frequency
- # of occurance from a small number of examples but can easily be changed.
- /others0 { dup (u) eq { codeu } { others1 } ifelse } def
- /others1 { dup (R) eq { codeR } { others2 } ifelse } def
- /others2 { dup (U) eq { codeU } { others3 } ifelse } def
- /others3 { dup (a) eq { codeX } { others4 } ifelse } def
- /others4 { dup (\n) eq { skip } { done } ifelse } def
- /decompress { currentfile char readstring pop
- dup (r) eq { coder } { others0 } ifelse } def
- { decompress }
- %-----------------------------------------------------------------------------
-