NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / lang / perl / 5779 < prev next >

Wrap

Internet Message Format | 1992-09-08 | 15.9 KB

Xref: sparky comp.lang.perl:5779 comp.lang.postscript:4628 comp.compression:3225 Path: sparky!uunet!zaphod.mps.ohio-state.edu!not-for-mail From: parker@shape.mps.ohio-state.edu (Steve Parker) Newsgroups: comp.lang.perl,comp.lang.postscript,comp.compression Subject: new postscript compression script in perl Date: 8 Sep 1992 16:55:08 -0400 Organization: Department of Mathematics, The Ohio State University Lines: 409 Distribution: world Message-ID: <18j3vcINN8vu@shape.mps.ohio-state.edu> NNTP-Posting-Host: shape.mps.ohio-state.edu Keywords: postscript,compress,perl To Whom It May Concern, NOTE: I posted this message recently with a old reversion of the compression script that had errors and didn't work. This version works! My thanks to Mats Lidell, for finding the some of errors. A few months ago I posted a request to the net regarding a postscript compression routine. I have since found out that Postscript 2 is to have it's own standard way to compress images, nevertheless I have written a postscript image compressor. The postscript compressor I have written is in perl but the form of compression is run-length encoding, and could be written in other languages, such as C, sed, awk, nawk, etc. (I am sure that Larry could probably even write the thing in roff--the man is mutant! ceveat: I am forever in his debt for writting perl.) The most common responce that I received when telling my peers of my plans was "Why write the thing in postscript?" or "Why write the thing at all?" Everyone knows that the way screen dump/image capture routines output their results on various architures is straight forward but disk wasteful. Computer screen images are not like images from other sources in that they almost always have huge amounts of repetition, large areas all the same pattern. And are therefore very susceptible to run-length-encoding-type compression schemes. The reasons I think that a compression routine written in postscript is valuable are that the resulting compressed image is still a valid posctscript program, which can be sent via E-mail without fear of corruption to anyone with a postscript printer regardless of the machine to which it is connected, and futhermore can be subsequently compressed with your favorite routine: compress, pack, zip, zoo, arc, etc. Finally I realize that this is not the most efficient method, but before I work on enhancing it, I thought that I'd post it. I would like your input on how to improve it and I thought that many of you could use it 'as is' to help with disk space problems (a never-ending problem at our site at least). I have found that users mind this form of compression less since it requires no additional actions to print the file later. If you use this package, I would appreciate timing results/comments/suggestions. Please E-mail me any posts that you might make concerning this post so that I will post a summary at a later date: Steve Parker parker@mps.ohio-state.edu Dept of Chemistry 201 McPherson Labs Ohio State University 614-292-5042 I have thought of many ways to improve this script: 1) Record which and how many of the decompression routines were used in a given image, and insert only those that were used and in the order of fequency that they were used. 2) Search the unique runs for patterns of 2 4 or 8 characters. 3) Make a new decompression routine called I for insert which would be used for inserting unique string into a long run of repeating characters. NOTE: The above ideas would most easily be accomplished by saving the compressed image in memory or a temp file and/or making multiple passes at the image data. Any other suggestions? Here are timing results for postscript images created on various machines, compressed on a Sparc ELC and printed on a AppleLaserWriter II: Test cases: (Apple LaserWriter II) Filename size in chars bits in time to approximate time image compress to print ------------------------------------------------------------------------- snapshot.cmp.ps 63861 --- 67.0 s 100 s snapshot.ps 262906 1024000 -- 245 s stripes.cmp.ps 2241 --- 31.0 s 30 s stripes.ps 133403 1036800 -- 130 s iris.cmp.ps 73384 --- 68.5 s 100 s iris.ps 261385 524288 -- 250 s stellar.cmp.ps 129140 --- 1027.3 s 425 s stellar.ps 1968436 1966728 -- 1740 s I am presently getting results for NeXT printers, and some others. These files are available by E-mail at request to above address. Here is my description of the two pieces necessary for compression/decompression (I originally had two files but now use the <DATA> file handle of perl): decomp.header is the postscript decompression header that will be used in place of "/picstr 1024 string def { currentfile /picstr readhexstring pop }" which is often used as the proc for the image function ie "width hieght bitpersample proc image" pscmp is the perl script that compresses the hex digit pair format often used to encode a bitmap in postscript, it also inserts the decompression header file in a clever way. Since the last thing on the stack before the image command is called is the procedure that image will use to obtain the image, pscmp looks for the image command and inserts pop { decompress } before it. The 'pop' command removes whatever procedure was on the stack and then '{ decompress }' (my command) is pushed on the stack in it's place. It does compression with the following four "codes": u - one character follows, whos ascii value will determine how many "unique" hex pairs follow. 1-256 pairs. U - two characters follows, whos ascii values will determine how many "unique" hex pairs follow. 257-65535 pairs. r - one character follows, whos ascii value will determine how many times to "repeat" the hex pair that follows. R - one characters follows, whos ascii values will determine how many times to "repeat" the hex pair that follows. NOTES: * ranges for R and U could not be made to be 257-65792, without splitting the runs into multiple strings, since the largest string is 65335. * I attempted two ways of storing the length of unique and repeating runs. The first and most straight forward to interpret in postscipt, was to store them as one or two characters whose ascii value was then interpretted as an integer by using the 'currentfile read pop' sequence. The second used two or four digit hex number to represent the length of the run, and used the postscript command sequence: /charx2 2 string def /charx4 4 string def /hexnum2 5 string def /hexnum4 7 string def /hexnum2 (16#00) def /hexnum4 (16#0000) def /getcount { hexnum2 3 currentfile charx2 readstring pop putinterval hexnum2 cvi } def /getbigcount { hexnum4 3 currentfile charx4 readstring pop putinterval hexnum4 cvi } def which works by putting the hex number ,ie. 'fd', in a string like '16#00' thus giving the string '16#fd' which the command 'cvi' interprets as 0xfd, or 253. The later method was necessary because characters representing serial port I/O controls, ie. '^D', '^S/^Q' were interpretted by the printers I/O control and not pasted to the postscript interpretter. The former method did work however with Sun's Postscript previewer "pageview version 3" * pscmp removes the comments and unnecessary white space (used for readability) from decomp.header as it inserts it into the postscript. ******************************************************************************* Here is the script: #!/usr/local/bin/perl # A perl script to compress postscript images. # # codes: u - small count run of unique hex pairs # U - big count run of unique hex pairs # r - small count+1 repeated hex pair # R - big count+1 repeated hex pair # a repeat last r or R. NOT SUPPORTED IN THIS PERL SCRIPT. # # formats: u cc 'hphp...' # U CC CC 'hphp...' # r cc 'hp' # R CC CC 'hp' # # where: 1) spaces are not output # 2) uUrR are output literally # 3) cc is a 2 digit hex number (0-255) and represents range (1-256) # 4) CCCC is a 4 digit hex number (0-65535) for a range (257-65535) # if not for max size on postscript string would be (257-65792) # 5) 'hp' is a hex digit pair from 'image' data. $name = $0; $name =~ s'.*/''; # remove path--like basename $usage = "usage:\n$name [postscript_file_with_IMAGE_data]"; select(STDOUT); $|=1; $biggest=65534; $last=""; while (<>) { if ( /([^A-Fa-f\d\n])/ ) { # print "'$1' ->$_"; if ($_ =~ /showpage/ || $_ =~ /grestore/ ) { # # FOUND a showpage or grestore so write out last repeating pair or unique run. # if ($repeating) { # we didn't record the first pair in $repeating # so we needn't subtract 1. #$num=$repeating-1; $num=$repeating; if ( $num <= 255 ) { # case 2 small count repeat unit 2 hex digits. printf("r%02X%2s\n",$num,$last); $r++; } else { # case 3 big count repeat unit 2 hex digits. printf("R%02X%02X%2s\n",int($num/256),($num%256),$last); $R++; } } else { $unique_str.=$last; # we didn't yet record this last pair in $unique_run # so we needn't subtract 1. $num=$unique_run; if ( $num <= 255 ) { # case 0 small count unique string of hex digit pairs. printf("u%02X%s",$num,$unique_str); $u++; } else { # case 1 big count unique string of hex digit pairs. printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str); $U++; } } print; & end; } # add the postscript decompression header # inbetween the original proc called by the 'image' command # and the 'image' command itself if ( $_ =~ /^(image\s?.*)$|^([^%]*)?(\simage\s?.*)$/ ) { print "$1\n" if ($2); if (! $headerin) { # $file="/home/sysadmin/postscript/compress/decomp.header"; # open(HEADER,"$file") || die("$name: Cannot open $file: '$!'\n"); while (<DATA>) { s/(\s)\s+/\1/g; print if !(/^%/); } $headerin++; close(DATA); print " pop { decompress }\n"; } else { print " pop { decompress }\n"; } if ($2) { print "$2\n"; } else { print "$1\n"; } next; } print; next; } # else { print "\n" if ($unique_run || $repeating); } # #-------------------- HEX PAIR HANDLING LOOP -------------------------- # while (s?([A-F0-9a-f][A-F0-9a-f])??) { if ($repeating) { if ($1 eq $last) { #-debug print STDERR "rs"; # repeating; same $repeating++; # found another one. # check to see if we have filled biggest postscript string # this will kept the decompress in postscript simple and fast. if ($repeating eq $biggest) { printf("Rfffe%2s",$last); # set to start over fresh $repeating=0; # $unique_str should be set to null and $unique_run set to 0 } } else { #-debug print STDERR "rd"; # repeating; different # # FOUND a unique hex pair so repeating unit has ended, write it out. # #$num=$repeating-1; $num=$repeating; if ( $repeating <= 255 ) { # case 2 small count repeat unit 2 hex digits. # -line- $line+=6; if ( $line > 80) { $line=6; print "\n"; } #-debug printf STDERR ">2,%2X,%2s ",$num,$last; printf("r%02X%2s",$num,$last); $r++; } else { # case 3 big count repeat unit 2 hex digits. # -line- $line+=8; if ( $line > 80) { $line=8; print "\n"; } #-debug printf(">3,%2X,%2X,%2s ",int($num/256),($num%256),$last); printf("R%02X%02X%2s",int($num/256),($num%256),$last); $R++; } $repeating=0; $last=$1; } } else { # must be unique'ing if ($1 eq $last) { #-debug print "us"; # uniquing; same # # FOUND a repeating hex pair so might have a unique run # which has ended, if so write it out. # if ($unique_str) { $num=$unique_run-1; if ( $num <= 255 ) { # case 0 small count unique string of hex digit pairs. # -line- $line+=(4+$unique_run)); if ( $line > 80) { $line=4+$unique_run; print "\n"; } #-debug printf("\n>0,%2X,'%s' ",$num,$unique_str); printf("\nu%02X%s",$num,$unique_str); $u++; } else { # case 1 big count unique string of hex digit pairs. # -line- $line+=(6+$unique_run); if ( $line > 80) { $line=6+$unique_run; print "\n"; } #-debug printf("\n>1,%2X,%2X,'%s' ",int($num/256),($num%256), printf("\nU%02X%02X%s",int($num/256),($num%256),$unique_str); $U++; } } # start counting repeating pairs, reset unique_run count # and remember last. $repeating++; $unique_str='';$unique_run=0; $last=$1; } else { # countiue uniquing #-debug print "ud"; # uniquing; different $unique_str.=$last; # $unique_run+=2; # use this if using $line to limit to 80 chars/line. # but REMEMBER to divid by two when outputing! $unique_run++; # check to see if we have filled biggest postscript string # this will kept the decompress in postscript simple and fast. if ($unique_run eq $biggest) { printf("Ufffe%s",$unique_str); # set to start over fresh $unique_str='';$unique_run=0; $last=$1; # $repeating should be set to 0 } $last=$1; } } } } &end; sub end { printf STDERR "Statistics:\n" ; printf STDERR "r's:%5d\n",$r ; printf STDERR "R's:%5d\n",$R ; printf STDERR "u's:%5d\n",$u ; printf STDERR "U's:%5d\n",$U ; ($user,$system,$cuser,$csystem)=times; printf STDERR "Times:\tuser,\tsystem,\tcuser,\tcsystem\n"; printf STDERR "Times:\t%5f,\t%5f,\t%5f,\t%5f\n", $user,$system,$cuser,$csystem; exit; } __END__ %------------------------------------------------------------------------------- % % header to define 'decompress' which will replace the % { currentfile string readhexstring pop } proc commonly used with 'image' % % to be placed just before the 'image' command % the 'pop' on the line inserted above is to remove bogus 'proc' (as above) /repeater 1 string def /char 1 string def /charx2 2 string def /charx4 4 string def /hexnum2 5 string def /hexnum4 7 string def /debug 30 string def /big 65535 string def /hexnum2 (16#00) def /hexnum4 (16#0000) def /gethexpair { currentfile char readhexstring pop } def /getcount { hexnum2 3 currentfile charx2 readstring pop putinterval hexnum2 cvi } def /getbigcount { hexnum4 3 currentfile charx4 readstring pop putinterval hexnum4 cvi } def /codeu { pop /cnt getcount def big 0 1 cnt { gethexpair putinterval big } for 0 cnt 1 add getinterval } def /codeU { pop /cnt getbigcount def big 0 1 cnt { gethexpair putinterval big } for 0 cnt 1 add getinterval } def /coder { pop /cnt getcount def /repeater gethexpair def % get repeater unit big 0 1 cnt {repeater putinterval big} for 0 cnt 1 add getinterval } def /codeR { pop /cnt getbigcount def /repeater gethexpair def % get repeater unit big 0 1 cnt {repeater putinterval big} for 0 cnt 1 add getinterval } def /codeX { pop big 0 cnt 1 add getinterval } def /done { currentfile debug readstring pstack exit } def /skip { pop decompress } def % % the following order of r,u,R,U was chosen by noting the frequency % of occurance from a small number of examples but can easily be changed. /others0 { dup (u) eq { codeu } { others1 } ifelse } def /others1 { dup (R) eq { codeR } { others2 } ifelse } def /others2 { dup (U) eq { codeU } { others3 } ifelse } def /others3 { dup (a) eq { codeX } { others4 } ifelse } def /others4 { dup (\n) eq { skip } { done } ifelse } def /decompress { currentfile char readstring pop dup (r) eq { coder } { others0 } ifelse } def %-----------------------------------------------------------------------------