Archive Magazine 1996

home *** CD-ROM | disk | FTP | other *** search

/ Archive Magazine 1996 / ARCHIVE_96.iso / discs / mag_discs / volume_2 / issue_05 / utilities / COMP_DOC < prev next >

Wrap

Text File | 1988-09-28 | 6KB | 136 lines

->docs.compress Name: compress Purpose: Data compression Usage: compress [-dfvcV] [-b maxbits] [file ...] -V => print Version -e => erase old file -d => uncompress -v => verbose -f => force overwrite of output file -n => no header: useful to uncompress old files -b maxbits => maxbits. If -b is specified, then maxbits MUST be given also. -c => cat all output to stdout -C => generate output compatible with compress 2.0 Overview ======== Compress, is a UNIX program for squashing files so that they take up less room either for transfer or for storage. Compress files, can be identified by the fact that they usually have a '.Z' at the end of their filename and are indecipherable by conventional means. Basically compress works by being fed an ordinary file which it compresses and writes back to disc with a '.Z' (or in the case of the Archimedes a '_Z') tacked onto the end of the old file name. File atributes, load, exec addresses etc. are preserved on the new file. When you actually want to use the file, you feed the _Z form into either uncompress or compress with the -d flag (these are identical programs) and the original file without the '_Z' is recreated. Compress uses the LZW technique like arc (elsewhere on this disc). A crucial parameter in this, is the 'number of bits' used for codes. Whereas arc uses a fixed number 13. Compress, can use a variable number. This has advantages, because for long files, the more bits, the more efficient the compression is. However, more bits means more memory and some machines do not have as much memory as one might guess. All this implies, that either you or someone else may end up with a 'Z' file that can't be expanded on a given machine. This version of compress, needs 500K of memory to run, over and above the program memory and will cope with compress files encoded using upto and including 16 bits. If you want to be on the safe side, encode your files using 12 bits which is something of a standard; being the best that PDP11 owners can manage. But not it should be said, too much of a standard in case you were thinking things were getting simple at this point. There have been several older formats for .Z files which compress can cope with if the correct flags are set so look out for these. Description =========== Compress reduces the size of the named files using adaptive Lempel-Ziv coding. Whenever possible, each file is replaced by one with the extension _Z while keeping the same attributes. If no files are specified, the standard input is compressed to the standard output. Compressed files can be restored to their original form using uncompress or compress -d. Compress uses the modified Lempel-Ziv algorithm popularized in "A Technique for High Performance Data Compression", Terry A. Welch,"IEEE Computer,"vol. 17, no. 6 (June 1984), pp. 8-19. Common substrings in the file are first replaced by 9-bit codes 257 and up. When code 512 is reached, the algorithm switches to 10-bit codes and continues to use more bits until the limit specified by the flag is reached (default 16). Bits must be between 9 and 16. After the bits limit is attained, compress periodically checks the compression ratio. If it is increasing, compress continues to use the existing code dictionary. However, if the compression ratio decreases, compress discards the table of substrings and rebuilds it from scratch. This allows the algorithm to adapt to the next "block" of the file. Note that the -b flag is omitted for uncompress, since the bits parameter specified during compression is encoded within the output, along with a magic number to ensure that neither decompression of random data nor recompression of compressed data is attempted. The amount of compression obtained depends on the size of the input, the number of bits per code, and the distribution of common substrings. Typically, text such as source code or English is reduced by 50-60%. Compression is generally much better than that achieved by Huffman coding or adaptive Huffman coding and takes less time to compute. Under the -v option, a message is printed yielding the percentage of reduction for each file compressed. If the -V option is specified, the current version and compile options are printed on stderr. The -f flag, causes existing files on disc to be overwritten whilst the -e flag, makes compress delete the source file after using it. Diagnostics =========== Usage: compress [-dfvcV] [-b maxbits] [file ...] Invalid options were specified on the command line. Missing maxbits Maxbits must follow -b. file :not in compressed format The file specified to uncompress has not been compressed. file :compressed with xx bits, can only handle yy bits. File was compressed by a program that could deal with more bits than the compress code on this machine. Recompress the file with smaller bits. file :already has _Z suffix -- no change The file is assumed to be already compressed. Rename the file and try again. file :filename too long to tack on _Z The file cannot be compressed because its name is longer than 10 characters. Rename and try again. file already exists; do you wish to overwrite (y or n)? Respond "y" if you want the output file to be replaced; "n" if not. uncompress: corrupt input A memory access violation was detected which usually means that the input file has been corrupted. Compression: "xx.xx%" Percentage of the input saved by compression. (Relevant only for -v.) -- not a regular file: unchanged When the input file is not a regular file, (e.g. a directory), it is left unaltered. -- file unchanged No savings is achieved by compression. The input remains virgin. Notes Release 1.00 October 1988. Archimedes implementation (c) David Pilling 1988.