Compress reduces the size of files using adaptive Lempel-Ziv coding. The following is quoted from the UNIX man page for compress:
Compress uses the modified Lempel-Ziv algorithm popularized
in "A Technique for High Performance Data Compression",
Terry A. Welch, IEEE Computer, vol. 17, no. 6 (June 1984),
pp. 8-19. Common substrings in the file are first replaced
by 9-bit codes 257 and up. When code 512 is reached, the
algorithm switches to 10-bit codes and continues to use more
bits until the limit (default is 16 bits) is reached.
After the bits limit is attained, compress periodically
checks the compression ratio. If it is increasing, compress
continues to use the existing code dictionary. However, if
the compression ratio decreases, compress discards the table
of substrings and rebuilds it from scratch. This allows the
algorithm to adapt to the next "block" of the file.
The amount of compression obtained depends on the size of
the input, the number of bits per code, and the distribution
of common substrings. Typically, text such as source code
or English is reduced by 50-60%. Compression is generally
much better than that achieved by Huffman coding (as used in
pack), or adaptive Huffman coding (compact), and takes less
time to compute.
Tar, on the other hand, is responsible for saving and restoring multiple files from a single file. Coupled with compress, the two form a powerful utility for archiving files. A compressed archive may be stored on floppy disk for backup purposes or transmitted over a modem.
In general, small files do not compress very well. For that reason the above text has been repeated several times below to make a better example file.
Compress reduces the size of files using adaptive Lempel-Ziv coding. The following is quoted from the UNIX man page for compress:
Compress uses the modified Lempel-Ziv algorithm popularized
in "A Technique for High Performance Data Compression",
Terry A. Welch, IEEE Computer, vol. 17, no. 6 (June 1984),
pp. 8-19. Common substrings in the file are first replaced
by 9-bit codes 257 and up. When code 512 is reached, the
algorithm switches to 10-bit codes and continues to use more
bits until the limit (default is 16 bits) is reached.
After the bits limit is attained, compress periodically
checks the compression ratio. If it is increasing, compress
continues to use the existing code dictionary. However, if
the compression ratio decreases, compress discards the table
of substrings and rebuilds it from scratch. This allows the
algorithm to adapt to the next "block" of the file.
The amount of compression obtained depends on the size of
the input, the number of bits per code, and the distribution
of common substrings. Typically, text such as source code
or English is reduced by 50-60%. Compression is generally
much better than that achieved by Huffman coding (as used in
pack), or adaptive Huffman coding (compact), and takes less
time to compute.
Tar, on the other hand, is responsible for saving and restoring multiple files from a single file. Coupled with compress, the two form a powerful utility for archiving files. A compressed archive may be stored on floppy disk for backup purposes or transmitted over a modem.
Compress reduces the size of files using adaptive Lempel-Ziv coding. The following is quoted from the UNIX man page for compress:
Compress uses the modified Lempel-Ziv algorithm popularized
in "A Technique for High Performance Data Compression",
Terry A. Welch, IEEE Computer, vol. 17, no. 6 (June 1984),
pp. 8-19. Common substrings in the file are first replaced
by 9-bit codes 257 and up. When code 512 is reached, the
algorithm switches to 10-bit codes and continues to use more
bits until the limit (default is 16 bits) is reached.
After the bits limit is attained, compress periodically
checks the compression ratio. If it is increasing, compress
continues to use the existing code dictionary. However, if
the compression ratio decreases, compress discards the table
of substrings and rebuilds it from scratch. This allows the
algorithm to adapt to the next "block" of the file.
The amount of compression obtained depends on the size of
the input, the number of bits per code, and the distribution
of common substrings. Typically, text such as source code
or English is reduced by 50-60%. Compression is generally
much better than that achieved by Huffman coding (as used in
pack), or adaptive Huffman coding (compact), and takes less
time to compute.
Tar, on the other hand, is responsible for saving and restoring multiple files from a single file. Coupled with compress, the two form a powerful utility for archiving files. A compressed archive may be stored on floppy disk for backup purposes or transmitted over a modem.