home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.lang.perl:5779 comp.lang.postscript:4628 comp.compression:3225
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!not-for-mail
- From: parker@shape.mps.ohio-state.edu (Steve Parker)
- Newsgroups: comp.lang.perl,comp.lang.postscript,comp.compression
- Subject: new postscript compression script in perl
- Date: 8 Sep 1992 16:55:08 -0400
- Organization: Department of Mathematics, The Ohio State University
- Lines: 409
- Distribution: world
- Message-ID: <18j3vcINN8vu@shape.mps.ohio-state.edu>
- NNTP-Posting-Host: shape.mps.ohio-state.edu
- Keywords: postscript,compress,perl
-
- To Whom It May Concern,
-
- NOTE: I posted this message recently with a old reversion of the compression
- script that had errors and didn't work. This version works!
- My thanks to Mats Lidell, for finding the some of errors.
-
- A few months ago I posted a request to the net regarding a postscript
- compression routine. I have since found out that Postscript 2 is to have
- it's own standard way to compress images, nevertheless I have written a
- postscript image compressor.
-
- The postscript compressor I have written is in perl but the form of compression
- is run-length encoding, and could be written in other languages, such as C, sed,
- awk, nawk, etc. (I am sure that Larry could probably even write the thing in
- roff--the man is mutant! ceveat: I am forever in his debt for writting perl.)
-
- The most common responce that I received when telling my peers of my plans was
- "Why write the thing in postscript?" or "Why write the thing at all?"
-
- Everyone knows that the way screen dump/image capture routines output their
- results on various architures is straight forward but disk wasteful. Computer
- screen images are not like images from other sources in that they almost always
- have huge amounts of repetition, large areas all the same pattern. And are
- therefore very susceptible to run-length-encoding-type compression schemes.
-
- The reasons I think that a compression routine written in postscript is valuable
- are that the resulting compressed image is still a valid posctscript program,
- which can be sent via E-mail without fear of corruption to anyone with a
- postscript printer regardless of the machine to which it is connected, and
- futhermore can be subsequently compressed with your favorite routine:
- compress, pack, zip, zoo, arc, etc.
-
- Finally I realize that this is not the most efficient method, but before I work
- on enhancing it, I thought that I'd post it. I would like your input on how to
- improve it and I thought that many of you could use it 'as is' to help with
- disk space problems (a never-ending problem at our site at least). I have found
- that users mind this form of compression less since it requires no additional
- actions to print the file later.
-
- If you use this package, I would appreciate timing results/comments/suggestions.
- Please E-mail me any posts that you might make concerning this post so that
- I will post a summary at a later date:
-
- Steve Parker parker@mps.ohio-state.edu
- Dept of Chemistry 201 McPherson Labs
- Ohio State University 614-292-5042
-
- I have thought of many ways to improve this script:
-
- 1) Record which and how many of the decompression routines were used in a given
- image, and insert only those that were used and in the order of fequency
- that they were used.
- 2) Search the unique runs for patterns of 2 4 or 8 characters.
- 3) Make a new decompression routine called I for insert which would be used for
- inserting unique string into a long run of repeating characters.
-
- NOTE: The above ideas would most easily be accomplished by saving the compressed
- image in memory or a temp file and/or making multiple passes at the image
- data.
-
- Any other suggestions?
-
- Here are timing results for postscript images created on various machines,
- compressed on a Sparc ELC and printed on a AppleLaserWriter II:
-
-
- Test cases:
-
- (Apple LaserWriter II)
- Filename size in chars bits in time to approximate time
- image compress to print
- -------------------------------------------------------------------------
- snapshot.cmp.ps 63861 --- 67.0 s 100 s
- snapshot.ps 262906 1024000 -- 245 s
- stripes.cmp.ps 2241 --- 31.0 s 30 s
- stripes.ps 133403 1036800 -- 130 s
- iris.cmp.ps 73384 --- 68.5 s 100 s
- iris.ps 261385 524288 -- 250 s
- stellar.cmp.ps 129140 --- 1027.3 s 425 s
- stellar.ps 1968436 1966728 -- 1740 s
-
- I am presently getting results for NeXT printers, and some others.
-
- These files are available by E-mail at request to above address.
-
- Here is my description of the two pieces necessary for
- compression/decompression (I originally had two files but now use the <DATA>
- file handle of perl):
-
- decomp.header is the postscript decompression header that will be used in
- place of
- "/picstr 1024 string def
- { currentfile /picstr readhexstring pop }"
- which is often used as the proc for the image function
- ie "width hieght bitpersample proc image"
-
- pscmp is the perl script that compresses the hex digit pair
- format often used to encode a bitmap in postscript, it also
- inserts the decompression header file in a clever way.
- Since the last thing on the stack before the image command
- is called is the procedure that image will use to obtain the
- image, pscmp looks for the image command and inserts
- pop { decompress }
- before it. The 'pop' command removes whatever procedure was
- on the stack and then '{ decompress }' (my command) is pushed
- on the stack in it's place.
-
- It does compression with the following four "codes":
- u - one character follows, whos ascii value will determine
- how many "unique" hex pairs follow. 1-256 pairs.
- U - two characters follows, whos ascii values will determine
- how many "unique" hex pairs follow. 257-65535 pairs.
- r - one character follows, whos ascii value will determine
- how many times to "repeat" the hex pair that follows.
- R - one characters follows, whos ascii values will determine
- how many times to "repeat" the hex pair that follows.
- NOTES:
- * ranges for R and U could not be made to be 257-65792,
- without splitting the runs into multiple strings,
- since the largest string is 65335.
- * I attempted two ways of storing the length of unique and
- repeating runs.
- The first and most straight forward to interpret in
- postscipt, was to store them as one or two characters whose
- ascii value was then interpretted as an integer by using the
- 'currentfile read pop' sequence.
- The second used two or four digit hex number to represent
- the length of the run, and used the postscript command
- sequence:
-
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /getcount { hexnum2 3 currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3 currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
-
- which works by putting the hex number ,ie. 'fd', in a string
- like '16#00' thus giving the string '16#fd' which the command
- 'cvi' interprets as 0xfd, or 253.
-
- The later method was necessary because characters representing
- serial port I/O controls, ie. '^D', '^S/^Q' were interpretted
- by the printers I/O control and not pasted to the postscript
- interpretter.
- The former method did work however with Sun's Postscript
- previewer "pageview version 3"
- * pscmp removes the comments and unnecessary white space (used
- for readability) from decomp.header as it inserts it into the
- postscript.
-
- *******************************************************************************
- Here is the script:
- #!/usr/local/bin/perl
- # A perl script to compress postscript images.
- #
- # codes: u - small count run of unique hex pairs
- # U - big count run of unique hex pairs
- # r - small count+1 repeated hex pair
- # R - big count+1 repeated hex pair
- # a repeat last r or R. NOT SUPPORTED IN THIS PERL SCRIPT.
- #
- # formats: u cc 'hphp...'
- # U CC CC 'hphp...'
- # r cc 'hp'
- # R CC CC 'hp'
- #
- # where: 1) spaces are not output
- # 2) uUrR are output literally
- # 3) cc is a 2 digit hex number (0-255) and represents range (1-256)
- # 4) CCCC is a 4 digit hex number (0-65535) for a range (257-65535)
- # if not for max size on postscript string would be (257-65792)
- # 5) 'hp' is a hex digit pair from 'image' data.
-
- $name = $0;
- $name =~ s'.*/''; # remove path--like basename
- $usage = "usage:\n$name [postscript_file_with_IMAGE_data]";
-
- select(STDOUT); $|=1;
-
- $biggest=65534;
- $last="";
- while (<>) {
- if ( /([^A-Fa-f\d\n])/ ) {
- # print "'$1' ->$_";
- if ($_ =~ /showpage/ || $_ =~ /grestore/ ) {
- #
- # FOUND a showpage or grestore so write out last repeating pair or unique run.
- #
- if ($repeating) {
- # we didn't record the first pair in $repeating
- # so we needn't subtract 1.
- #$num=$repeating-1;
- $num=$repeating;
- if ( $num <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- printf("r%02X%2s\n",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- printf("R%02X%02X%2s\n",int($num/256),($num%256),$last);
- $R++;
- }
- } else {
- $unique_str.=$last;
- # we didn't yet record this last pair in $unique_run
- # so we needn't subtract 1.
- $num=$unique_run;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- printf("u%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- print;
- & end;
- }
- # add the postscript decompression header
- # inbetween the original proc called by the 'image' command
- # and the 'image' command itself
- if ( $_ =~ /^(image\s?.*)$|^([^%]*)?(\simage\s?.*)$/ ) {
- print "$1\n" if ($2);
- if (! $headerin) {
- # $file="/home/sysadmin/postscript/compress/decomp.header";
- # open(HEADER,"$file") || die("$name: Cannot open $file: '$!'\n");
- while (<DATA>) { s/(\s)\s+/\1/g; print if !(/^%/); }
- $headerin++;
- close(DATA);
- print " pop { decompress }\n";
- } else {
- print " pop { decompress }\n";
- }
- if ($2) {
- print "$2\n";
- } else {
- print "$1\n";
- }
- next;
- }
- print;
- next;
- } # else { print "\n" if ($unique_run || $repeating); }
- #
- #-------------------- HEX PAIR HANDLING LOOP --------------------------
- #
- while (s?([A-F0-9a-f][A-F0-9a-f])??) {
- if ($repeating) {
- if ($1 eq $last) {
- #-debug print STDERR "rs"; # repeating; same
- $repeating++; # found another one.
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($repeating eq $biggest) {
- printf("Rfffe%2s",$last);
- # set to start over fresh
- $repeating=0;
- # $unique_str should be set to null and $unique_run set to 0
- }
- } else {
- #-debug print STDERR "rd"; # repeating; different
- #
- # FOUND a unique hex pair so repeating unit has ended, write it out.
- #
- #$num=$repeating-1;
- $num=$repeating;
- if ( $repeating <= 255 ) {
- # case 2 small count repeat unit 2 hex digits.
- # -line- $line+=6; if ( $line > 80) { $line=6; print "\n"; }
- #-debug printf STDERR ">2,%2X,%2s ",$num,$last;
- printf("r%02X%2s",$num,$last);
- $r++;
- } else {
- # case 3 big count repeat unit 2 hex digits.
- # -line- $line+=8; if ( $line > 80) { $line=8; print "\n"; }
- #-debug printf(">3,%2X,%2X,%2s ",int($num/256),($num%256),$last);
- printf("R%02X%02X%2s",int($num/256),($num%256),$last);
- $R++;
- }
- $repeating=0;
- $last=$1;
- }
-
- } else { # must be unique'ing
-
- if ($1 eq $last) {
- #-debug print "us"; # uniquing; same
- #
- # FOUND a repeating hex pair so might have a unique run
- # which has ended, if so write it out.
- #
- if ($unique_str) {
- $num=$unique_run-1;
- if ( $num <= 255 ) {
- # case 0 small count unique string of hex digit pairs.
- # -line- $line+=(4+$unique_run)); if ( $line > 80) { $line=4+$unique_run; print "\n"; }
- #-debug printf("\n>0,%2X,'%s' ",$num,$unique_str);
- printf("\nu%02X%s",$num,$unique_str);
- $u++;
- } else {
- # case 1 big count unique string of hex digit pairs.
- # -line- $line+=(6+$unique_run); if ( $line > 80) { $line=6+$unique_run; print "\n"; }
- #-debug printf("\n>1,%2X,%2X,'%s' ",int($num/256),($num%256),
- printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str);
- $U++;
- }
- }
- # start counting repeating pairs, reset unique_run count
- # and remember last.
- $repeating++;
- $unique_str='';$unique_run=0;
- $last=$1;
- } else { # countiue uniquing
- #-debug print "ud"; # uniquing; different
- $unique_str.=$last;
- # $unique_run+=2; # use this if using $line to limit to 80 chars/line.
- # but REMEMBER to divid by two when outputing!
- $unique_run++;
- # check to see if we have filled biggest postscript string
- # this will kept the decompress in postscript simple and fast.
- if ($unique_run eq $biggest) {
- printf("Ufffe%s",$unique_str);
- # set to start over fresh
- $unique_str='';$unique_run=0;
- $last=$1;
- # $repeating should be set to 0
- }
- $last=$1;
- }
- }
- }
- }
- &end;
- sub end {
- printf STDERR "Statistics:\n" ;
- printf STDERR "r's:%5d\n",$r ;
- printf STDERR "R's:%5d\n",$R ;
- printf STDERR "u's:%5d\n",$u ;
- printf STDERR "U's:%5d\n",$U ;
- ($user,$system,$cuser,$csystem)=times;
- printf STDERR "Times:\tuser,\tsystem,\tcuser,\tcsystem\n";
- printf STDERR "Times:\t%5f,\t%5f,\t%5f,\t%5f\n",
- $user,$system,$cuser,$csystem;
- exit;
- }
- __END__
- %-------------------------------------------------------------------------------
- %
- % header to define 'decompress' which will replace the
- % { currentfile string readhexstring pop } proc commonly used with 'image'
- %
- % to be placed just before the 'image' command
- % the 'pop' on the line inserted above is to remove bogus 'proc' (as above)
- /repeater 1 string def
- /char 1 string def
- /charx2 2 string def
- /charx4 4 string def
- /hexnum2 5 string def
- /hexnum4 7 string def
- /debug 30 string def
- /big 65535 string def
- /hexnum2 (16#00) def
- /hexnum4 (16#0000) def
- /gethexpair { currentfile char readhexstring pop } def
- /getcount { hexnum2 3
- currentfile charx2 readstring pop
- putinterval hexnum2 cvi } def
- /getbigcount { hexnum4 3
- currentfile charx4 readstring pop
- putinterval hexnum4 cvi } def
- /codeu { pop /cnt getcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /codeU { pop /cnt getbigcount def
- big 0 1 cnt { gethexpair putinterval big } for
- 0 cnt 1 add getinterval
- } def
- /coder { pop /cnt getcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeR { pop /cnt getbigcount def
- /repeater gethexpair def % get repeater unit
- big 0 1 cnt {repeater putinterval big} for
- 0 cnt 1 add getinterval
- } def
- /codeX { pop big 0 cnt 1 add getinterval } def
- /done { currentfile debug readstring pstack exit } def
- /skip { pop decompress } def
- %
- % the following order of r,u,R,U was chosen by noting the frequency
- % of occurance from a small number of examples but can easily be changed.
- /others0 { dup (u) eq { codeu } { others1 } ifelse } def
- /others1 { dup (R) eq { codeR } { others2 } ifelse } def
- /others2 { dup (U) eq { codeU } { others3 } ifelse } def
- /others3 { dup (a) eq { codeX } { others4 } ifelse } def
- /others4 { dup (\n) eq { skip } { done } ifelse } def
- /decompress { currentfile char readstring pop
- dup (r) eq { coder } { others0 } ifelse } def
- %-----------------------------------------------------------------------------
-