home *** CD-ROM | disk | FTP | other *** search
- From router!tut!draken!kth!mcvax!uunet!cs.utexas.edu!rutgers!iuvax!bsu-cs!ibmbin Wed Apr 12 08:50:48 EET DST 1989
- Article 244 of comp.binaries.ibm.pc:
- Path: chyde!router!tut!draken!kth!mcvax!uunet!cs.utexas.edu!rutgers!iuvax!bsu-cs!ibmbin
- >From: cgh!paul@cbmvax.cbm.commodore.com (Paul Homchick)
- Newsgroups: comp.binaries.ibm.pc
- Subject: v02i052: lz-comp, lzari, lzss, lzhuf compression package (part 01/05)
- Summary: lz-comp, lzari, lzss, lzhuf compression package
- Message-ID: <6637@bsu-cs.bsu.edu>
- Date: 8 Apr 89 19:41:05 GMT
- Sender: ibmbin@bsu-cs.bsu.edu
- Followup-To: comp.binaries.ibm.pc.d
- Lines: 38
- Approved: dhesi@bsu-cs.bsu.edu
- X-Submissions-to: ibmpc-binaries@bsu-cs.bsu.edu
- X-Questions-to: ibmpc-binaries-request@bsu-cs.bsu.edu
- Checksum: 1838218560 (Verify with "brik -cv")
- Posting-number: Volume 02, Issue 052
- Originally-from: Haruhiko Okumura via Kenjirou Okubo <no email address>
- Submitted-by: Paul Homchick <cgh!paul@cbmvax.cbm.commodore.com>
- Archive-name: lz-comp/part01
-
- This package of files was uploaded to GEnie by Kenjirou Okubo, and
- submitted for comp.binaries.ibm.pc distribution by Paul Homchick, one
- of the Sysops of the IBM PC RoundTable there (thanks, Paul). Since
- this is a pure text package, I am posting it as text files. These
- files were collected together by Haruhiko Okumura, who has included a
- paper reviewing several data compression methods and explaining the
- algorithms used in the Japanese archiving programs Lharc and Larc.
- There are five files in this package:
-
- readme package summary
- compress.txt the compression paper by Haruhiko Okumura
- lzari.c C source for lzari compression, by Haruhiko Okumura
- lzhuf.c C source for lzhuf compression, by Haruyasu Yoshizaki,
- comments translated into English by Haruhiko Okumura
- lzss.c C source for lzss compression by Haruhiko Okumura
-
- Each is being posted as a separate text-only article. For
- redistribution, please put all files in an archive named lz-comp.*
- (e.g. lz-comp.arc). To get each original file, cut between the
- "--cut here..." and "--end--" lines.
-
- -- R.D.
- ]
-
- --cut here for "readme"--
- Compress.txt and the associated C source files were uploaded
- to GEnie from Japan by Kenjirou Okubo. He provided this short
- introduction:
-
- This article by H.Okumura explains various algorithms
- of Data Compression. The article, originally uploaded
- in his workshop, is posted here with his permission. Also
- includes three C programs illustrating lzari, lzss and
- lzhuf methods uploaded with permission of their authors.
- These are the compression schemes currently being investigated
- by Japanese hobbiest programmers.
- -Kenjirou Okubo
- --end--
-
-
- From router!tut!draken!kth!mcvax!uunet!lll-winken!csd4.milw.wisc.edu!uxc!iuvax!bsu-cs!ibmbin Wed Apr 12 08:51:07 EET DST 1989
- Article 245 of comp.binaries.ibm.pc:
- Path: chyde!router!tut!draken!kth!mcvax!uunet!lll-winken!csd4.milw.wisc.edu!uxc!iuvax!bsu-cs!ibmbin
- >From: cgh!paul@cbmvax.cbm.commodore.com (Paul Homchick)
- Newsgroups: comp.binaries.ibm.pc
- Subject: v02i053: lz-comp, lzari, lzss, lzhuf compression package (part 02/05)
- Summary: lz-comp, lzari, lzss, lzhuf compression package
- Message-ID: <6639@bsu-cs.bsu.edu>
- Date: 8 Apr 89 21:01:05 GMT
- Sender: ibmbin@bsu-cs.bsu.edu
- Followup-To: comp.binaries.ibm.pc.d
- Lines: 280
- Approved: dhesi@bsu-cs.bsu.edu
- X-Submissions-to: ibmpc-binaries@bsu-cs.bsu.edu
- X-Questions-to: ibmpc-binaries-request@bsu-cs.bsu.edu
- Checksum: 649522192 (Verify with "brik -cv")
- Posting-number: Volume 02, Issue 053
- Originally-from: Haruhiko Okumura via Kenjirou Okubo <no email address>
- Submitted-by: Paul Homchick <cgh!paul@cbmvax.cbm.commodore.com>
- Archive-name: lz-comp/part02
-
- --cut here for "compress.txt"--
- Data Compression Algorithms of LARC and LHarc
- Haruhiko Okumura*
- *The author is the sysop of the Science SIG of PV-VAN. His
- address is: 12-2-404 Green Heights, 580 Nagasawa, Yokosuka 239
- Japan
- ---------------------------------------------------------------
-
- 1. Introduction
-
- In the spring of 1988, I wrote a very simple data compression program
- named LZSS in C language, and uploaded it to the Science SIG (forum) of
- PC-VAN, Japan's biggest personal computer network.
-
- That program was based on Storer and Szymanski's slightly modified
- version of one of Lempel and Ziv's algorithms. Despite its simplicity,
- for most files its compression outperformed the archivers then widely
- used.
-
- Kazuhiko Miki rewrote my LZSS in Turbo Pascal and assembly language, and
- soon made it evolve into a complete archiver, which he named LARC.
-
- The first versions of LZSS and LARC were rather slow. So I rewrote my
- LZSS using a binary tree, and so did Miki. Although LARC's encoding was
- slower than the fastest archiver available, its decoding was quite fast,
- and its algorithm was so simple that even self-extracting files
- (compressed files plus decoder) it created were usually smaller than
- non-self-extracting files from other archivers.
-
- Soon many hobby programmers joined the archiver project at the forum.
- Very many suggestions were made, and LARC was revised again and again.
- By the summer of 1988, LARC's speed and compression have improved so
- much that LARC-compressed programs were beginning to be uploaded in many
- forums of PC-VAN and other networks.
-
- In that summer I wrote another program, LZARI, which combined the LZSS
- algorithm with adaptive arithmetic compression. Although it was slower
- than LZSS, its compression performance was amazing.
-
- Miki, the author of LARC, uploaded LZARI to NIFTY-Serve, another big
- information network in Japan. In NIFTY-Serve, Haruyasu Yoshizaki
- replaced LZARI's adaptive arithmetic coding with a version of adaptive
- Huffman coding to increase speed. Based on this algorithm, which he
- called LZHUF, he developed yet another archiver, LHarc.
-
- In what follows, I will review several of these algorithms and supply
- simplified codes in C language.
-
- 2. Simple coding methods
-
- Replacing several (usually 8 or 4) "space" characters by one "tab"
- character is a very primitive method for data compression. Another
- simple method is run-length coding, which encodes the message
- "AAABBBBAACCCC" into "3A4B2A4C", for example.
-
- 3. LZSS coding
-
- This scheme is initiated by Ziv and Lempel [1]. A slightly modified
- version is described by Storer and Szymanski [2]. An implementation
- using a binary tree is proposed by Bell [3]. The algorithm is quite
- simple: Keep a ring buffer, which initially contains "space" characters
- only. Read several letters from the file to the buffer. Then search
- the buffer for the longest string that matches the letters just read,
- and send its length and position in the buffer.
-
- If the buffer size is 4096 bytes, the position can be encoded in 12
- bits. If we represent the match length in four bits, the <position,
- length> pair is two bytes long. If the longest match is no more than
- two characters, then we send just one character without encoding, and
- restart the process with the next letter. We must send one extra bit
- each time to tell the decoder whether we are sending a <position,
- length> pair or an unencoded character.
-
- The accompanying file LZSS.C is a version of this algorithm. This
- implementation uses multiple binary trees to speed up the search for the
- longest match. All the programs in this article are written in
- draft-proposed ANSI C. I tested them with Turbo C 2.0.
-
- 4. LZW coding
-
- This scheme was devised by Ziv and Lempel [4], and modified by Welch
- [5].
-
- The LZW coding has been adopted by most of the existing archivers, such
- as ARC and PKZIP. The algorithm can be made relatively fast, and is
- suitable for hardware implementation as well.
-
- The algorithm can be outlined as follows: Prepare a table that can
- contain several thousand items. Initially register in its 0th through
- 255th positions the usual 256 characters. Read several letters from the
- file to be encoded, and search the table for the longest match. Suppose
- the longest match is given by the string "ABC". Send the position of
- "ABC" in the table. Read the next character from the file. If it is
- "D", then register a new string "ABCD" in the table, and restart the
- process with the letter "D". If the table becomes full, discard the
- oldest item or, preferably, the least used.
-
- A Pascal program for this algorithm is given in Storer's book [6].
-
- 5. Huffman coding
-
- Classical Huffman coding is invented by Huffman [7]. A fairly readable
- accound is given in Sedgewick [8].
-
- Suppose the text to be encoded is "ABABACA", with four A's, two B's, and
- a C. We represent this situation as follows:
-
- 4 2 1
- | | |
- A B C
-
- Combine the least frequent two characters into one, resulting in the new
- frequency 2 + 1 = 3:
-
- 4 3
- | / \
- A B C
-
- Repeat the above step until the whole characters combine into a tree:
-
- 7
- / \
- / 3
- / / \
- A B C
-
- Start at the top ("root") of this encoding tree, and travel to the
- character you want to encode. If you go left, send a "0"; otherwise
- send a "1". Thus, "A" is encoded by "0", "B" by "10", "C" by "11".
- Algotether, "ABABACA" will be encoded into ten bits, "0100100110".
-
- To decode this code, the decoder must know the encoding tree, which must
- be sent separately.
-
- A modification to this classical Huffman coding is the adaptive, or
- dynamic, Huffman coding. See, e.g., Gallager [9]. In this method, the
- encoder and the decoder processes the first letter of the text as if the
- frequency of each character in the file were one, say. After the first
- letter has been processed, both parties increment the frequency of that
- character by one. For example, if the first letter is 'C', then
- freq['C'] becomes two, whereas every other frequencies are still one.
- Then the both parties modify the encoding tree accordingly. Then the
- second letter will be encoded and decoded, and so on.
-
- 6. Arithmetic coding
-
- The original concept of arithmetic coding is proposed by P. Elias. An
- implementation in C language is described by Witten and others [10].
-
- Although the Huffman coding is optimal if each character must be encoded
- into a fixed (integer) number of bits, arithmetic coding wins if no such
- restriction is made.
-
- As an example we shall encode "AABA" using arithmetic coding. For
- simplicity suppose we know beforehand that the probabilities for "A" and
- "B" to appear in the text are 3/4 and 1/4, respectively.
-
- Initially, consider an interval:
-
- 0 <= x < 1.
-
- Since the first character is "A" whose probability is 3/4, we shrink the
- interval to the lower 3/4:
-
- 0 <= x < 3/4.
-
- The next character is "A" again, so we take the lower 3/4:
-
- 0 <= x < 9/16.
-
- Next comes "B" whose probability is 1/4, so we take the upper 1/4:
-
- 27/64 <= x < 9/16,
-
- because "B" is the second element in our alphabet, {A, B}. The last
- character is "A" and the interval is
-
- 27/64 <= x < 135/256,
-
- which can be written in binary notation
-
- 0.011011 <= x < 0.10000111.
-
- Choose from this interval any number that can be represented in fewest
- bits, say 0.1, and send the bits to the right of "0."; in this case we
- send only one bit, "1". Thus we have encoded four letters into one bit!
- With the Huffman coding, four letters could not be encoded into less
- than four bits.
-
- To decode the code "1", we just reverse the process: First, we supply
- the "0." to the right of the received code "1", resulting in "0.1" in
- binary notation, or 1/2. Since this number is in the first 3/4 of the
- initial interval 0 <= x < 1, the first character must be "A". Shrink
- the interval into the lower 3/4. In this new interval, the number 1/2
- lies in the lower 3/4 part, so the second character is again "A", and so
- on. The number of letters in the original file must be sent separately
- (or a special 'EOF' character must be appended at the end of the file).
-
- The algorithm described above requires that both the sender and receiver
- know the probability distribution for the characters. The adaptive
- version of the algorithm removes this restriction by first supposing
- uniform or any agreed-upon distribution of characters that approximates
- the true distribution, and then updating the distribution after each
- character is sent and received.
-
- 7. LZARI
-
- In each step the LZSS algorithm sends either a character or a <position,
- length> pair. Among these, perhaps character "e" appears more
- frequently than "x", and a <position, length> pair of length 3 might be
- commoner than one of length 18, say. Thus, if we encode the more
- frequent in fewer bits and the less frequent in more bits, the total
- length of the encoded text will be diminished. This consideration
- suggests that we use Huffman or arithmetic coding, preferably of
- adaptive kind, along with LZSS.
-
- This is easier said than done, because there are many possible
- <position, length> combinations. Adaptive compression must keep running
- statistics of frequency distribution. Too many items make statistics
- unreliable.
-
- What follows is not even an approximate solution to the problem posed
- above, but anyway this was what I did in the summer of 1988.
-
- I extended the character set from 256 to three-hundred or so in size,
- and let characters 0 through 255 be the usual 8-bit characters, whereas
- characters 253 + n represent that what follows is a position of string
- of length n, where n = 3, 4 , .... These extended set of characters
- will be encoded with adaptive arithmetic compression.
-
- I also observed that longest-match strings tend to be the ones that were
- read relatively recently. Therefore, recent positions should be encoded
- into fewer bits. Since 4096 positions are too many to encode
- adaptively, I fixed the probability distribution of the positions "by
- hand." The distribution function given in the accompanying LZARI.C is
- rather tentative; it is not based on thorough experimentation. In
- retrospect, I could encode adaptively the most significant 6 bits, say,
- or perhaps by some more ingenious method adapt the parameters of the
- distribution function to the running statistics.
-
- At any rate, the present version of LZARI treats the positions rather
- separately, so that the overall compression is by no means optimal.
- Furthermore, the string length threshold above which strings are coded
- into <position, length> pairs is fixed, but logically its value must
- change according to the length of the <position, length> pair we would
- get.
-
- 8. LZHUF
-
- LZHUF, the algorithm of Haruyasu Yoshizaki's archiver LHarc, replaces
- LZARI's adaptive arithmetic coding with adaptive Huffman. LZHUF encodes
- the most significant 6 bits of the position in its 4096-byte buffer by
- table lookup. More recent, and hence more probable, positions are coded
- in less bits. On the other hand, the remaining 6 bits are sent
- verbatim. Because Huffman coding encodes each letter into a fixed
- number of bits, table lookup can be easily implemented.
-
- Though theoretically Huffman cannot exceed arithmetic compression, the
- difference is very slight, and LZHUF is fairly fast.
-
- The accompanying file LZHUF.C was written by Yoshizaki. I translated
- the comments into English and made a few trivial changes to make it
- conform to the ANSI C standard.
-
- References
- [1] J. Ziv and A. Lempel, IEEE Trans. IT-23, 337-343 (1977).
- [2] J. A. Storer and T. G. Szymanski, J. ACM, 29, 928-951
- (1982).
- [3] T. C. Bell, IEEE Trans. COM-34, 1176-1182 (1986).
- [4] J. Ziv and A. Lempel, IEEE Trans. IT-24, 530-536 (1978).
- [5] T. A. Welch, Computer, 17, No.6, 8-19 (1984).
- [6] J. A. Storer, Data Compression: Methods and Theory
- (Computer Science Press, 1988).
- [7] D. A. Huffman, Proc IRE 40, 1098-1101 (1952).
- [8] R. Sedgewick, Algorithms, 2nd ed. (Addison-Wesley, 1988).
- [9] R. G. Gallager, IEEE Trans. IT-24, 668-674 (1978).
- [10] I. E. Witten, R. M. Neal, and J. G. Cleary, Commun. ACM
- 30, 520-540 (1987).
-
- --end--
-
-
- From router!tut!draken!kth!mcvax!uunet!lll-winken!csd4.milw.wisc.edu!uxc!iuvax!bsu-cs!ibmbin Wed Apr 12 08:51:15 EET DST 1989
- Article 246 of comp.binaries.ibm.pc:
- Path: chyde!router!tut!draken!kth!mcvax!uunet!lll-winken!csd4.milw.wisc.edu!uxc!iuvax!bsu-cs!ibmbin
- >From: cgh!paul@cbmvax.cbm.commodore.com (Paul Homchick)
- Newsgroups: comp.binaries.ibm.pc
- Subject: v02i054: lz-comp, lzari, lzss, lzhuf compression package (part 03/05)
- Summary: lz-comp, lzari, lzss, lzhuf compression package
- Message-ID: <6646@bsu-cs.bsu.edu>
- Date: 9 Apr 89 01:01:06 GMT
- Sender: ibmbin@bsu-cs.bsu.edu
- Followup-To: comp.binaries.ibm.pc.d
- Lines: 460
- Approved: dhesi@bsu-cs.bsu.edu
- X-Submissions-to: ibmpc-binaries@bsu-cs.bsu.edu
- X-Questions-to: ibmpc-binaries-request@bsu-cs.bsu.edu
- Checksum: 488003064 (Verify with "brik -cv")
- Posting-number: Volume 02, Issue 054
- Originally-from: Haruhiko Okumura via Kenjirou Okubo <no email address>
- Submitted-by: Paul Homchick <cgh!paul@cbmvax.cbm.commodore.com>
- Archive-name: lz-comp/part03
-
- --cut here for "lzari.c"--
- /**************************************************************
- LZARI.C -- A Data Compression Program
- (tab = 4 spaces)
- ***************************************************************
- 4/7/1989 Haruhiko Okumura
- Use, distribute, and modify this program freely.
- Please send me your improved versions.
- PC-VAN SCIENCE
- NIFTY-Serve PAF01022
- CompuServe 74050,1022
- **************************************************************/
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include <ctype.h>
-
- /********** Bit I/O **********/
-
- FILE *infile, *outfile;
- unsigned long int textsize = 0, codesize = 0, printcount = 0;
-
- void Error(char *message)
- {
- printf("\n%s\n", message);
- exit(EXIT_FAILURE);
- }
-
- void PutBit(int bit) /* Output one bit (bit = 0,1) */
- {
- static unsigned int buffer = 0, mask = 128;
-
- if (bit) buffer |= mask;
- if ((mask >>= 1) == 0) {
- if (putc(buffer, outfile) == EOF) Error("Write Error");
- buffer = 0; mask = 128; codesize++;
- }
- }
-
- void FlushBitBuffer(void) /* Send remaining bits */
- {
- int i;
-
- for (i = 0; i < 7; i++) PutBit(0);
- }
-
- int GetBit(void) /* Get one bit (0 or 1) */
- {
- static unsigned int buffer, mask = 0;
-
- if ((mask >>= 1) == 0) {
- buffer = getc(infile); mask = 128;
- }
- return ((buffer & mask) != 0);
- }
-
- /********** LZSS with multiple binary trees **********/
-
- #define N 4096 /* size of ring buffer */
- #define F 60 /* upper limit for match_length */
- #define THRESHOLD 2 /* encode string into position and length
- if match_length is greater than this */
- #define NIL N /* index for root of binary search trees */
-
- unsigned char text_buf[N + F - 1]; /* ring buffer of size N,
- with extra F-1 bytes to facilitate string comparison */
- int match_position, match_length, /* of longest match. These are
- set by the InsertNode() procedure. */
- lson[N + 1], rson[N + 257], dad[N + 1]; /* left & right children &
- parents -- These constitute binary search trees. */
-
- void InitTree(void) /* Initialize trees */
- {
- int i;
-
- /* For i = 0 to N - 1, rson[i] and lson[i] will be the right and
- left children of node i. These nodes need not be initialized.
- Also, dad[i] is the parent of node i. These are initialized to
- NIL (= N), which stands for 'not used.'
- For i = 0 to 255, rson[N + i + 1] is the root of the tree
- for strings that begin with character i. These are initialized
- to NIL. Note there are 256 trees. */
-
- for (i = N + 1; i <= N + 256; i++) rson[i] = NIL; /* root */
- for (i = 0; i < N; i++) dad[i] = NIL; /* node */
- }
-
- void InsertNode(int r)
- /* Inserts string of length F, text_buf[r..r+F-1], into one of the
- trees (text_buf[r]'th tree) and returns the longest-match position
- and length via the global variables match_position and match_length.
- If match_length = F, then removes the old node in favor of the new
- one, because the old one will be deleted sooner.
- Note r plays double role, as tree node and position in buffer. */
- {
- int i, p, cmp, temp;
- unsigned char *key;
-
- cmp = 1; key = &text_buf[r]; p = N + 1 + key[0];
- rson[r] = lson[r] = NIL; match_length = 0;
- for ( ; ; ) {
- if (cmp >= 0) {
- if (rson[p] != NIL) p = rson[p];
- else { rson[p] = r; dad[r] = p; return; }
- } else {
- if (lson[p] != NIL) p = lson[p];
- else { lson[p] = r; dad[r] = p; return; }
- }
- for (i = 1; i < F; i++)
- if ((cmp = key[i] - text_buf[p + i]) != 0) break;
- if (i > THRESHOLD) {
- if (i > match_length) {
- match_position = (r - p) & (N - 1);
- if ((match_length = i) >= F) break;
- } else if (i == match_length) {
- if ((temp = (r - p) & (N - 1)) < match_position)
- match_position = temp;
- }
- }
- }
- dad[r] = dad[p]; lson[r] = lson[p]; rson[r] = rson[p];
- dad[lson[p]] = r; dad[rson[p]] = r;
- if (rson[dad[p]] == p) rson[dad[p]] = r;
- else lson[dad[p]] = r;
- dad[p] = NIL; /* remove p */
- }
-
- void DeleteNode(int p) /* Delete node p from tree */
- {
- int q;
-
- if (dad[p] == NIL) return; /* not in tree */
- if (rson[p] == NIL) q = lson[p];
- else if (lson[p] == NIL) q = rson[p];
- else {
- q = lson[p];
- if (rson[q] != NIL) {
- do { q = rson[q]; } while (rson[q] != NIL);
- rson[dad[q]] = lson[q]; dad[lson[q]] = dad[q];
- lson[q] = lson[p]; dad[lson[p]] = q;
- }
- rson[q] = rson[p]; dad[rson[p]] = q;
- }
- dad[q] = dad[p];
- if (rson[dad[p]] == p) rson[dad[p]] = q;
- else lson[dad[p]] = q;
- dad[p] = NIL;
- }
-
- /********** Arithmetic Compression **********/
-
- /* If you are not familiar with arithmetic compression, you should read
- I. E. Witten, R. M. Neal, and J. G. Cleary,
- Communications of the ACM, Vol. 30, pp. 520-540 (1987),
- from which much have been borrowed. */
-
- #define M 15
-
- /* Q1 (= 2 to the M) must be sufficiently large, but not so
- large as the unsigned long 4 * Q1 * (Q1 - 1) overflows. */
-
- #define Q1 (1UL << M)
- #define Q2 (2 * Q1)
- #define Q3 (3 * Q1)
- #define Q4 (4 * Q1)
- #define MAX_CUM (Q1 - 1)
-
- #define N_CHAR (256 - THRESHOLD + F)
- /* character code = 0, 1, ..., N_CHAR - 1 */
-
- unsigned long int low = 0, high = Q4, value = 0;
- int shifts = 0; /* counts for magnifying low and high around Q2 */
- int char_to_sym[N_CHAR], sym_to_char[N_CHAR + 1];
- unsigned int
- sym_freq[N_CHAR + 1], /* frequency for symbols */
- sym_cum[N_CHAR + 1], /* cumulative freq for symbols */
- position_cum[N + 1]; /* cumulative freq for positions */
-
- void StartModel(void) /* Initialize model */
- {
- int ch, sym, i;
-
- sym_cum[N_CHAR] = 0;
- for (sym = N_CHAR; sym >= 1; sym--) {
- ch = sym - 1;
- char_to_sym[ch] = sym; sym_to_char[sym] = ch;
- sym_freq[sym] = 1;
- sym_cum[sym - 1] = sym_cum[sym] + sym_freq[sym];
- }
- sym_freq[0] = 0; /* sentinel (!= sym_freq[1]) */
- position_cum[N] = 0;
- for (i = N; i >= 1; i--)
- position_cum[i - 1] = position_cum[i] + 10000 / (i + 200);
- /* empirical distribution function (quite tentative) */
- /* Please devise a better mechanism! */
- }
-
- void UpdateModel(int sym)
- {
- int i, c, ch_i, ch_sym;
-
- if (sym_cum[0] >= MAX_CUM) {
- c = 0;
- for (i = N_CHAR; i > 0; i--) {
- sym_cum[i] = c;
- c += (sym_freq[i] = (sym_freq[i] + 1) >> 1);
- }
- sym_cum[0] = c;
- }
- for (i = sym; sym_freq[i] == sym_freq[i - 1]; i--) ;
- if (i < sym) {
- ch_i = sym_to_char[i]; ch_sym = sym_to_char[sym];
- sym_to_char[i] = ch_sym; sym_to_char[sym] = ch_i;
- char_to_sym[ch_i] = sym; char_to_sym[ch_sym] = i;
- }
- sym_freq[i]++;
- while (--i >= 0) sym_cum[i]++;
- }
-
- static void Output(int bit) /* Output 1 bit, followed by its complements */
- {
- PutBit(bit);
- for ( ; shifts > 0; shifts--) PutBit(! bit);
- }
-
- void EncodeChar(int ch)
- {
- int sym;
- unsigned long int range;
-
- sym = char_to_sym[ch];
- range = high - low;
- high = low + (range * sym_cum[sym - 1]) / sym_cum[0];
- low += (range * sym_cum[sym ]) / sym_cum[0];
- for ( ; ; ) {
- if (high <= Q2) Output(0);
- else if (low >= Q2) {
- Output(1); low -= Q2; high -= Q2;
- } else if (low >= Q1 && high <= Q3) {
- shifts++; low -= Q1; high -= Q1;
- } else break;
- low += low; high += high;
- }
- UpdateModel(sym);
- }
-
- void EncodePosition(int position)
- {
- unsigned long int range;
-
- range = high - low;
- high = low + (range * position_cum[position ]) / position_cum[0];
- low += (range * position_cum[position + 1]) / position_cum[0];
- for ( ; ; ) {
- if (high <= Q2) Output(0);
- else if (low >= Q2) {
- Output(1); low -= Q2; high -= Q2;
- } else if (low >= Q1 && high <= Q3) {
- shifts++; low -= Q1; high -= Q1;
- } else break;
- low += low; high += high;
- }
- }
-
- void EncodeEnd(void)
- {
- shifts++;
- if (low < Q1) Output(0); else Output(1);
- FlushBitBuffer(); /* flush bits remaining in buffer */
- }
-
- int BinarySearchSym(unsigned int x)
- /* 1 if x >= sym_cum[1],
- N_CHAR if sym_cum[N_CHAR] > x,
- i such that sym_cum[i - 1] > x >= sym_cum[i] otherwise */
- {
- int i, j, k;
-
- i = 1; j = N_CHAR;
- while (i < j) {
- k = (i + j) / 2;
- if (sym_cum[k] > x) i = k + 1; else j = k;
- }
- return i;
- }
-
- int BinarySearchPos(unsigned int x)
- /* 0 if x >= position_cum[1],
- N - 1 if position_cum[N] > x,
- i such that position_cum[i] > x >= position_cum[i + 1] otherwise */
- {
- int i, j, k;
-
- i = 1; j = N;
- while (i < j) {
- k = (i + j) / 2;
- if (position_cum[k] > x) i = k + 1; else j = k;
- }
- return i - 1;
- }
-
- void StartDecode(void)
- {
- int i;
-
- for (i = 0; i < M + 2; i++)
- value = 2 * value + GetBit();
- }
-
- int DecodeChar(void)
- {
- int sym, ch;
- unsigned long int range;
-
- range = high - low;
- sym = BinarySearchSym((unsigned int)
- (((value - low + 1) * sym_cum[0] - 1) / range));
- high = low + (range * sym_cum[sym - 1]) / sym_cum[0];
- low += (range * sym_cum[sym ]) / sym_cum[0];
- for ( ; ; ) {
- if (low >= Q2) {
- value -= Q2; low -= Q2; high -= Q2;
- } else if (low >= Q1 && high <= Q3) {
- value -= Q1; low -= Q1; high -= Q1;
- } else if (high > Q2) break;
- low += low; high += high;
- value = 2 * value + GetBit();
- }
- ch = sym_to_char[sym];
- UpdateModel(sym);
- return ch;
- }
-
- int DecodePosition(void)
- {
- int position;
- unsigned long int range;
-
- range = high - low;
- position = BinarySearchPos((unsigned int)
- (((value - low + 1) * position_cum[0] - 1) / range));
- high = low + (range * position_cum[position ]) / position_cum[0];
- low += (range * position_cum[position + 1]) / position_cum[0];
- for ( ; ; ) {
- if (low >= Q2) {
- value -= Q2; low -= Q2; high -= Q2;
- } else if (low >= Q1 && high <= Q3) {
- value -= Q1; low -= Q1; high -= Q1;
- } else if (high > Q2) break;
- low += low; high += high;
- value = 2 * value + GetBit();
- }
- return position;
- }
-
- /********** Encode and Decode **********/
-
- void Encode(void)
- {
- int i, c, len, r, s, last_match_length;
-
- fseek(infile, 0L, SEEK_END);
- textsize = ftell(infile);
- if (fwrite(&textsize, sizeof textsize, 1, outfile) < 1)
- Error("Write Error"); /* output size of text */
- codesize += sizeof textsize;
- if (textsize == 0) return;
- rewind(infile); textsize = 0;
- StartModel(); InitTree();
- s = 0; r = N - F;
- for (i = s; i < r; i++) text_buf[i] = ' ';
- for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
- text_buf[r + len] = c;
- textsize = len;
- for (i = 1; i <= F; i++) InsertNode(r - i);
- InsertNode(r);
- do {
- if (match_length > len) match_length = len;
- if (match_length <= THRESHOLD) {
- match_length = 1; EncodeChar(text_buf[r]);
- } else {
- EncodeChar(255 - THRESHOLD + match_length);
- EncodePosition(match_position - 1);
- }
- last_match_length = match_length;
- for (i = 0; i < last_match_length &&
- (c = getc(infile)) != EOF; i++) {
- DeleteNode(s); text_buf[s] = c;
- if (s < F - 1) text_buf[s + N] = c;
- s = (s + 1) & (N - 1);
- r = (r + 1) & (N - 1);
- InsertNode(r);
- }
- if ((textsize += i) > printcount) {
- printf("%12ld\r", textsize); printcount += 1024;
- }
- while (i++ < last_match_length) {
- DeleteNode(s);
- s = (s + 1) & (N - 1);
- r = (r + 1) & (N - 1);
- if (--len) InsertNode(r);
- }
- } while (len > 0);
- EncodeEnd();
- printf("In : %lu bytes\n", textsize);
- printf("Out: %lu bytes\n", codesize);
- printf("Out/In: %.3f\n", (double)codesize / textsize);
- }
-
- void Decode(void)
- {
- int i, j, k, r, c;
- unsigned long int count;
-
- if (fread(&textsize, sizeof textsize, 1, infile) < 1)
- Error("Read Error"); /* read size of text */
- if (textsize == 0) return;
- StartDecode(); StartModel();
- for (i = 0; i < N - F; i++) text_buf[i] = ' ';
- r = N - F;
- for (count = 0; count < textsize; ) {
- c = DecodeChar();
- if (c < 256) {
- putc(c, outfile); text_buf[r++] = c;
- r &= (N - 1); count++;
- } else {
- i = (r - DecodePosition() - 1) & (N - 1);
- j = c - 255 + THRESHOLD;
- for (k = 0; k < j; k++) {
- c = text_buf[(i + k) & (N - 1)];
- putc(c, outfile); text_buf[r++] = c;
- r &= (N - 1); count++;
- }
- }
- if (count > printcount) {
- printf("%12lu\r", count); printcount += 1024;
- }
- }
- printf("%12lu\n", count);
- }
-
- int main(int argc, char *argv[])
- {
- char *s;
-
- if (argc != 4) {
- printf("'lzari e file1 file2' encodes file1 into file2.\n"
- "'lzari d file2 file1' decodes file2 into file1.\n");
- return EXIT_FAILURE;
- }
- if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
- || (s = argv[2], (infile = fopen(s, "rb")) == NULL)
- || (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
- printf("??? %s\n", s); return EXIT_FAILURE;
- }
- if (toupper(*argv[1]) == 'E') Encode(); else Decode();
- fclose(infile); fclose(outfile);
- return EXIT_SUCCESS;
- }
- --end--
-
-
- From router!tut!draken!kth!mcvax!uunet!lll-winken!xanth!ukma!rutgers!iuvax!bsu-cs!ibmbin Wed Apr 12 08:57:28 EET DST 1989
- Article 248 of comp.binaries.ibm.pc:
- Path: chyde!router!tut!draken!kth!mcvax!uunet!lll-winken!xanth!ukma!rutgers!iuvax!bsu-cs!ibmbin
- >From: cgh!paul@cbmvax.cbm.commodore.com (Paul Homchick)
- Newsgroups: comp.binaries.ibm.pc
- Subject: v02i055: lz-comp, lzari, lzss, lzhuf compression package (part 04/05)
- Summary: lz-comp, lzari, lzss, lzhuf compression package
- Message-ID: <6667@bsu-cs.bsu.edu>
- Date: 9 Apr 89 11:01:04 GMT
- Sender: ibmbin@bsu-cs.bsu.edu
- Followup-To: comp.binaries.ibm.pc.d
- Lines: 624
- Approved: dhesi@bsu-cs.bsu.edu
- X-Submissions-to: ibmpc-binaries@bsu-cs.bsu.edu
- X-Questions-to: ibmpc-binaries-request@bsu-cs.bsu.edu
- Checksum: 2497375951 (Verify with "brik -cv")
- Posting-number: Volume 02, Issue 055
- Originally-from: Haruhiko Okumura via Kenjirou Okubo <no email address>
- Submitted-by: Paul Homchick <cgh!paul@cbmvax.cbm.commodore.com>
- Archive-name: lz-comp/part04
-
- [ Part 3 accidentally got posted twice. I have sent out a cancellation
- for the second posting. -- R.D. ]
-
- --cut here for "lzhuf.c"--
- /**************************************************************
- lzhuf.c
- written by Haruyasu Yoshizaki 11/20/1988
- some minor changes 4/6/1989
- comments translated by Haruhiko Okumura 4/7/1989
- **************************************************************/
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include <ctype.h>
-
- FILE *infile, *outfile;
- unsigned long int textsize = 0, codesize = 0, printcount = 0;
-
- char wterr[] = "Can't write.";
-
- void Error(char *message)
- {
- printf("\n%s\n", message);
- exit(EXIT_FAILURE);
- }
-
- /********** LZSS compression **********/
-
- #define N 4096 /* buffer size */
- #define F 60 /* lookahead buffer size */
- #define THRESHOLD 2
- #define NIL N /* leaf of tree */
-
- unsigned char
- text_buf[N + F - 1];
- int match_position, match_length,
- lson[N + 1], rson[N + 257], dad[N + 1];
-
- void InitTree(void) /* initialize trees */
- {
- int i;
-
- for (i = N + 1; i <= N + 256; i++)
- rson[i] = NIL; /* root */
- for (i = 0; i < N; i++)
- dad[i] = NIL; /* node */
- }
-
- void InsertNode(int r) /* insert to tree */
- {
- int i, p, cmp;
- unsigned char *key;
- unsigned c;
-
- cmp = 1;
- key = &text_buf[r];
- p = N + 1 + key[0];
- rson[r] = lson[r] = NIL;
- match_length = 0;
- for ( ; ; ) {
- if (cmp >= 0) {
- if (rson[p] != NIL)
- p = rson[p];
- else {
- rson[p] = r;
- dad[r] = p;
- return;
- }
- } else {
- if (lson[p] != NIL)
- p = lson[p];
- else {
- lson[p] = r;
- dad[r] = p;
- return;
- }
- }
- for (i = 1; i < F; i++)
- if ((cmp = key[i] - text_buf[p + i]) != 0)
- break;
- if (i > THRESHOLD) {
- if (i > match_length) {
- match_position = ((r - p) & (N - 1)) - 1;
- if ((match_length = i) >= F)
- break;
- }
- if (i == match_length) {
- if ((c = ((r - p) & (N - 1)) - 1) < match_position) {
- match_position = c;
- }
- }
- }
- }
- dad[r] = dad[p];
- lson[r] = lson[p];
- rson[r] = rson[p];
- dad[lson[p]] = r;
- dad[rson[p]] = r;
- if (rson[dad[p]] == p)
- rson[dad[p]] = r;
- else
- lson[dad[p]] = r;
- dad[p] = NIL; /* remove p */
- }
-
- void DeleteNode(int p) /* remove from tree */
- {
- int q;
-
- if (dad[p] == NIL)
- return; /* not registered */
- if (rson[p] == NIL)
- q = lson[p];
- else
- if (lson[p] == NIL)
- q = rson[p];
- else {
- q = lson[p];
- if (rson[q] != NIL) {
- do {
- q = rson[q];
- } while (rson[q] != NIL);
- rson[dad[q]] = lson[q];
- dad[lson[q]] = dad[q];
- lson[q] = lson[p];
- dad[lson[p]] = q;
- }
- rson[q] = rson[p];
- dad[rson[p]] = q;
- }
- dad[q] = dad[p];
- if (rson[dad[p]] == p)
- rson[dad[p]] = q;
- else
- lson[dad[p]] = q;
- dad[p] = NIL;
- }
-
- /* Huffman coding */
-
- #define N_CHAR (256 - THRESHOLD + F)
- /* kinds of characters (character code = 0..N_CHAR-1) */
- #define T (N_CHAR * 2 - 1) /* size of table */
- #define R (T - 1) /* position of root */
- #define MAX_FREQ 0x8000 /* updates tree when the */
- /* root frequency comes to this value. */
- typedef unsigned char uchar;
-
-
- /* table for encoding and decoding the upper 6 bits of position */
-
- /* for encoding */
- uchar p_len[64] = {
- 0x03, 0x04, 0x04, 0x04, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
- 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08
- };
-
- uchar p_code[64] = {
- 0x00, 0x20, 0x30, 0x40, 0x50, 0x58, 0x60, 0x68,
- 0x70, 0x78, 0x80, 0x88, 0x90, 0x94, 0x98, 0x9C,
- 0xA0, 0xA4, 0xA8, 0xAC, 0xB0, 0xB4, 0xB8, 0xBC,
- 0xC0, 0xC2, 0xC4, 0xC6, 0xC8, 0xCA, 0xCC, 0xCE,
- 0xD0, 0xD2, 0xD4, 0xD6, 0xD8, 0xDA, 0xDC, 0xDE,
- 0xE0, 0xE2, 0xE4, 0xE6, 0xE8, 0xEA, 0xEC, 0xEE,
- 0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
- 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
- };
-
- /* for decoding */
- uchar d_code[256] = {
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
- 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
- 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
- 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
- 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
- 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09,
- 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A,
- 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B,
- 0x0C, 0x0C, 0x0C, 0x0C, 0x0D, 0x0D, 0x0D, 0x0D,
- 0x0E, 0x0E, 0x0E, 0x0E, 0x0F, 0x0F, 0x0F, 0x0F,
- 0x10, 0x10, 0x10, 0x10, 0x11, 0x11, 0x11, 0x11,
- 0x12, 0x12, 0x12, 0x12, 0x13, 0x13, 0x13, 0x13,
- 0x14, 0x14, 0x14, 0x14, 0x15, 0x15, 0x15, 0x15,
- 0x16, 0x16, 0x16, 0x16, 0x17, 0x17, 0x17, 0x17,
- 0x18, 0x18, 0x19, 0x19, 0x1A, 0x1A, 0x1B, 0x1B,
- 0x1C, 0x1C, 0x1D, 0x1D, 0x1E, 0x1E, 0x1F, 0x1F,
- 0x20, 0x20, 0x21, 0x21, 0x22, 0x22, 0x23, 0x23,
- 0x24, 0x24, 0x25, 0x25, 0x26, 0x26, 0x27, 0x27,
- 0x28, 0x28, 0x29, 0x29, 0x2A, 0x2A, 0x2B, 0x2B,
- 0x2C, 0x2C, 0x2D, 0x2D, 0x2E, 0x2E, 0x2F, 0x2F,
- 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
- 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
- };
-
- uchar d_len[256] = {
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
- 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
- 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
- };
-
- unsigned freq[T + 1]; /* frequency table */
-
- int prnt[T + N_CHAR]; /* pointers to parent nodes, except for the */
- /* elements [T..T + N_CHAR - 1] which are used to get */
- /* the positions of leaves corresponding to the codes. */
-
- int son[T]; /* pointers to child nodes (son[], son[] + 1) */
-
- unsigned getbuf = 0;
- uchar getlen = 0;
-
- int GetBit(void) /* get one bit */
- {
- int i;
-
- while (getlen <= 8) {
- if ((i = getc(infile)) < 0) i = 0;
- getbuf |= i << (8 - getlen);
- getlen += 8;
- }
- i = getbuf;
- getbuf <<= 1;
- getlen--;
- return (i < 0);
- }
-
- int GetByte(void) /* get one byte */
- {
- unsigned i;
-
- while (getlen <= 8) {
- if ((i = getc(infile)) < 0) i = 0;
- getbuf |= i << (8 - getlen);
- getlen += 8;
- }
- i = getbuf;
- getbuf <<= 8;
- getlen -= 8;
- return i >> 8;
- }
-
- unsigned putbuf = 0;
- uchar putlen = 0;
-
- void Putcode(int l, unsigned c) /* output c bits of code */
- {
- putbuf |= c >> putlen;
- if ((putlen += l) >= 8) {
- if (putc(putbuf >> 8, outfile) == EOF) {
- Error(wterr);
- }
- if ((putlen -= 8) >= 8) {
- if (putc(putbuf, outfile) == EOF) {
- Error(wterr);
- }
- codesize += 2;
- putlen -= 8;
- putbuf = c << (l - putlen);
- } else {
- putbuf <<= 8;
- codesize++;
- }
- }
- }
-
-
- /* initialization of tree */
-
- void StartHuff(void)
- {
- int i, j;
-
- for (i = 0; i < N_CHAR; i++) {
- freq[i] = 1;
- son[i] = i + T;
- prnt[i + T] = i;
- }
- i = 0; j = N_CHAR;
- while (j <= R) {
- freq[j] = freq[i] + freq[i + 1];
- son[j] = i;
- prnt[i] = prnt[i + 1] = j;
- i += 2; j++;
- }
- freq[T] = 0xffff;
- prnt[R] = 0;
- }
-
-
- /* reconstruction of tree */
-
- void reconst(void)
- {
- int i, j, k;
- unsigned f, l;
-
- /* collect leaf nodes in the first half of the table */
- /* and replace the freq by (freq + 1) / 2. */
- j = 0;
- for (i = 0; i < T; i++) {
- if (son[i] >= T) {
- freq[j] = (freq[i] + 1) / 2;
- son[j] = son[i];
- j++;
- }
- }
- /* begin constructing tree by connecting sons */
- for (i = 0, j = N_CHAR; j < T; i += 2, j++) {
- k = i + 1;
- f = freq[j] = freq[i] + freq[k];
- for (k = j - 1; f < freq[k]; k--);
- k++;
- l = (j - k) * 2;
- memmove(&freq[k + 1], &freq[k], l);
- freq[k] = f;
- memmove(&son[k + 1], &son[k], l);
- son[k] = i;
- }
- /* connect prnt */
- for (i = 0; i < T; i++) {
- if ((k = son[i]) >= T) {
- prnt[k] = i;
- } else {
- prnt[k] = prnt[k + 1] = i;
- }
- }
- }
-
-
- /* increment frequency of given code by one, and update tree */
-
- void update(int c)
- {
- int i, j, k, l;
-
- if (freq[R] == MAX_FREQ) {
- reconst();
- }
- c = prnt[c + T];
- do {
- k = ++freq[c];
-
- /* if the order is disturbed, exchange nodes */
- if (k > freq[l = c + 1]) {
- while (k > freq[++l]);
- l--;
- freq[c] = freq[l];
- freq[l] = k;
-
- i = son[c];
- prnt[i] = l;
- if (i < T) prnt[i + 1] = l;
-
- j = son[l];
- son[l] = i;
-
- prnt[j] = c;
- if (j < T) prnt[j + 1] = c;
- son[c] = j;
-
- c = l;
- }
- } while ((c = prnt[c]) != 0); /* repeat up to root */
- }
-
- unsigned code, len;
-
- void EncodeChar(unsigned c)
- {
- unsigned i;
- int j, k;
-
- i = 0;
- j = 0;
- k = prnt[c + T];
-
- /* travel from leaf to root */
- do {
- i >>= 1;
-
- /* if node's address is odd-numbered, choose bigger brother node */
- if (k & 1) i += 0x8000;
-
- j++;
- } while ((k = prnt[k]) != R);
- Putcode(j, i);
- code = i;
- len = j;
- update(c);
- }
-
- void EncodePosition(unsigned c)
- {
- unsigned i;
-
- /* output upper 6 bits by table lookup */
- i = c >> 6;
- Putcode(p_len[i], (unsigned)p_code[i] << 8);
-
- /* output lower 6 bits verbatim */
- Putcode(6, (c & 0x3f) << 10);
- }
-
- void EncodeEnd(void)
- {
- if (putlen) {
- if (putc(putbuf >> 8, outfile) == EOF) {
- Error(wterr);
- }
- codesize++;
- }
- }
-
- int DecodeChar(void)
- {
- unsigned c;
-
- c = son[R];
-
- /* travel from root to leaf, */
- /* choosing the smaller child node (son[]) if the read bit is 0, */
- /* the bigger (son[]+1} if 1 */
- while (c < T) {
- c += GetBit();
- c = son[c];
- }
- c -= T;
- update(c);
- return c;
- }
-
- int DecodePosition(void)
- {
- unsigned i, j, c;
-
- /* recover upper 6 bits from table */
- i = GetByte();
- c = (unsigned)d_code[i] << 6;
- j = d_len[i];
-
- /* read lower 6 bits verbatim */
- j -= 2;
- while (j--) {
- i = (i << 1) + GetBit();
- }
- return c | (i & 0x3f);
- }
-
- /* compression */
-
- void Encode(void) /* compression */
- {
- int i, c, len, r, s, last_match_length;
-
- fseek(infile, 0L, 2);
- textsize = ftell(infile);
- if (fwrite(&textsize, sizeof textsize, 1, outfile) < 1)
- Error(wterr); /* output size of text */
- if (textsize == 0)
- return;
- rewind(infile);
- textsize = 0; /* rewind and re-read */
- StartHuff();
- InitTree();
- s = 0;
- r = N - F;
- for (i = s; i < r; i++)
- text_buf[i] = ' ';
- for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
- text_buf[r + len] = c;
- textsize = len;
- for (i = 1; i <= F; i++)
- InsertNode(r - i);
- InsertNode(r);
- do {
- if (match_length > len)
- match_length = len;
- if (match_length <= THRESHOLD) {
- match_length = 1;
- EncodeChar(text_buf[r]);
- } else {
- EncodeChar(255 - THRESHOLD + match_length);
- EncodePosition(match_position);
- }
- last_match_length = match_length;
- for (i = 0; i < last_match_length &&
- (c = getc(infile)) != EOF; i++) {
- DeleteNode(s);
- text_buf[s] = c;
- if (s < F - 1)
- text_buf[s + N] = c;
- s = (s + 1) & (N - 1);
- r = (r + 1) & (N - 1);
- InsertNode(r);
- }
- if ((textsize += i) > printcount) {
- printf("%12ld\r", textsize);
- printcount += 1024;
- }
- while (i++ < last_match_length) {
- DeleteNode(s);
- s = (s + 1) & (N - 1);
- r = (r + 1) & (N - 1);
- if (--len) InsertNode(r);
- }
- } while (len > 0);
- EncodeEnd();
- printf("In : %ld bytes\n", textsize);
- printf("Out: %ld bytes\n", codesize);
- printf("Out/In: %.3f\n", (double)codesize / textsize);
- }
-
- void Decode(void) /* recover */
- {
- int i, j, k, r, c;
- unsigned long int count;
-
- if (fread(&textsize, sizeof textsize, 1, infile) < 1)
- Error("Can't read"); /* read size of text */
- if (textsize == 0)
- return;
- StartHuff();
- for (i = 0; i < N - F; i++)
- text_buf[i] = ' ';
- r = N - F;
- for (count = 0; count < textsize; ) {
- c = DecodeChar();
- if (c < 256) {
- if (putc(c, outfile) == EOF) {
- Error(wterr);
- }
- text_buf[r++] = c;
- r &= (N - 1);
- count++;
- } else {
- i = (r - DecodePosition() - 1) & (N - 1);
- j = c - 255 + THRESHOLD;
- for (k = 0; k < j; k++) {
- c = text_buf[(i + k) & (N - 1)];
- if (putc(c, outfile) == EOF) {
- Error(wterr);
- }
- text_buf[r++] = c;
- r &= (N - 1);
- count++;
- }
- }
- if (count > printcount) {
- printf("%12ld\r", count);
- printcount += 1024;
- }
- }
- printf("%12ld\n", count);
- }
-
- int main(int argc, char *argv[])
- {
- char *s;
-
- if (argc != 4) {
- printf("'lzhuf e file1 file2' encodes file1 into file2.\n"
- "'lzhuf d file2 file1' decodes file2 into file1.\n");
- return EXIT_FAILURE;
- }
- if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
- || (s = argv[2], (infile = fopen(s, "rb")) == NULL)
- || (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
- printf("??? %s\n", s);
- return EXIT_FAILURE;
- }
- if (toupper(*argv[1]) == 'E')
- Encode();
- else
- Decode();
- fclose(infile);
- fclose(outfile);
- return EXIT_SUCCESS;
- }
- --end--
-
-
- From router!tut!draken!kth!mcvax!uunet!husc6!purdue!iuvax!bsu-cs!ibmbin Wed Apr 12 08:57:41 EET DST 1989
- Article 249 of comp.binaries.ibm.pc:
- Path: chyde!router!tut!draken!kth!mcvax!uunet!husc6!purdue!iuvax!bsu-cs!ibmbin
- >From: cgh!paul@cbmvax.cbm.commodore.com (Paul Homchick)
- Newsgroups: comp.binaries.ibm.pc
- Subject: v02i056: lz-comp, lzari, lzss, lzhuf compression package (part 05/05)
- Summary: lz-comp, lzari, lzss, lzhuf compression package
- Message-ID: <6668@bsu-cs.bsu.edu>
- Date: 9 Apr 89 17:00:10 GMT
- Sender: ibmbin@bsu-cs.bsu.edu
- Followup-To: comp.binaries.ibm.pc.d
- Lines: 231
- Approved: dhesi@bsu-cs.bsu.edu
- Checksum: 1602543876 (Verify with "brik -cv")
- Posting-number: Volume 02, Issue 056
- Originally-from: Haruhiko Okumura via Kenjirou Okubo <no email address>
- Submitted-by: Paul Homchick <cgh!paul@cbmvax.cbm.commodore.com>
- Archive-name: lz-comp/part05
-
- --cut here for "lzss.c"--
- /**************************************************************
- LZSS.C -- A Data Compression Program
- (tab = 4 spaces)
- ***************************************************************
- 4/6/1989 Haruhiko Okumura
- Use, distribute, and modify this program freely.
- Please send me your improved versions.
- PC-VAN SCIENCE
- NIFTY-Serve PAF01022
- CompuServe 74050,1022
- **************************************************************/
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include <ctype.h>
-
- #define N 4096 /* size of ring buffer */
- #define F 18 /* upper limit for match_length */
- #define THRESHOLD 2 /* encode string into position and length
- if match_length is greater than this */
- #define NIL N /* index for root of binary search trees */
-
- unsigned long int
- textsize = 0, /* text size counter */
- codesize = 0, /* code size counter */
- printcount = 0; /* counter for reporting progress every 1K bytes */
- unsigned char
- text_buf[N + F - 1]; /* ring buffer of size N,
- with extra F-1 bytes to facilitate string comparison */
- int match_position, match_length, /* of longest match. These are
- set by the InsertNode() procedure. */
- lson[N + 1], rson[N + 257], dad[N + 1]; /* left & right children &
- parents -- These constitute binary search trees. */
- FILE *infile, *outfile; /* input & output files */
-
- void InitTree(void) /* initialize trees */
- {
- int i;
-
- /* For i = 0 to N - 1, rson[i] and lson[i] will be the right and
- left children of node i. These nodes need not be initialized.
- Also, dad[i] is the parent of node i. These are initialized to
- NIL (= N), which stands for 'not used.'
- For i = 0 to 255, rson[N + i + 1] is the root of the tree
- for strings that begin with character i. These are initialized
- to NIL. Note there are 256 trees. */
-
- for (i = N + 1; i <= N + 256; i++) rson[i] = NIL;
- for (i = 0; i < N; i++) dad[i] = NIL;
- }
-
- void InsertNode(int r)
- /* Inserts string of length F, text_buf[r..r+F-1], into one of the
- trees (text_buf[r]'th tree) and returns the longest-match position
- and length via the global variables match_position and match_length.
- If match_length = F, then removes the old node in favor of the new
- one, because the old one will be deleted sooner.
- Note r plays double role, as tree node and position in buffer. */
- {
- int i, p, cmp;
- unsigned char *key;
-
- cmp = 1; key = &text_buf[r]; p = N + 1 + key[0];
- rson[r] = lson[r] = NIL; match_length = 0;
- for ( ; ; ) {
- if (cmp >= 0) {
- if (rson[p] != NIL) p = rson[p];
- else { rson[p] = r; dad[r] = p; return; }
- } else {
- if (lson[p] != NIL) p = lson[p];
- else { lson[p] = r; dad[r] = p; return; }
- }
- for (i = 1; i < F; i++)
- if ((cmp = key[i] - text_buf[p + i]) != 0) break;
- if (i > match_length) {
- match_position = p;
- if ((match_length = i) >= F) break;
- }
- }
- dad[r] = dad[p]; lson[r] = lson[p]; rson[r] = rson[p];
- dad[lson[p]] = r; dad[rson[p]] = r;
- if (rson[dad[p]] == p) rson[dad[p]] = r;
- else lson[dad[p]] = r;
- dad[p] = NIL; /* remove p */
- }
-
- void DeleteNode(int p) /* deletes node p from tree */
- {
- int q;
-
- if (dad[p] == NIL) return; /* not in tree */
- if (rson[p] == NIL) q = lson[p];
- else if (lson[p] == NIL) q = rson[p];
- else {
- q = lson[p];
- if (rson[q] != NIL) {
- do { q = rson[q]; } while (rson[q] != NIL);
- rson[dad[q]] = lson[q]; dad[lson[q]] = dad[q];
- lson[q] = lson[p]; dad[lson[p]] = q;
- }
- rson[q] = rson[p]; dad[rson[p]] = q;
- }
- dad[q] = dad[p];
- if (rson[dad[p]] == p) rson[dad[p]] = q; else lson[dad[p]] = q;
- dad[p] = NIL;
- }
-
- void Encode(void)
- {
- int i, c, len, r, s, last_match_length, code_buf_ptr;
- unsigned char code_buf[17], mask;
-
- InitTree(); /* initialize trees */
- code_buf[0] = 0; /* code_buf[1..16] saves eight units of code, and
- code_buf[0] works as eight flags, "1" representing that the unit
- is an unencoded letter (1 byte), "0" a position-and-length pair
- (2 bytes). Thus, eight units require at most 16 bytes of code. */
- code_buf_ptr = mask = 1;
- s = 0; r = N - F;
- for (i = s; i < r; i++) text_buf[i] = ' '; /* Clear the buffer with
- any character that will appear often. */
- for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
- text_buf[r + len] = c; /* Read F bytes into the last F bytes of
- the buffer */
- if ((textsize = len) == 0) return; /* text of size zero */
- for (i = 1; i <= F; i++) InsertNode(r - i); /* Insert the F strings,
- each of which begins with one or more 'space' characters. Note
- the order in which these strings are inserted. This way,
- degenerate trees will be less likely to occur. */
- InsertNode(r); /* Finally, insert the whole string just read. The
- global variables match_length and match_position are set. */
- do {
- if (match_length > len) match_length = len; /* match_length
- may be spuriously long near the end of text. */
- if (match_length <= THRESHOLD) {
- match_length = 1; /* Not long enough match. Send one byte. */
- code_buf[0] |= mask; /* 'send one byte' flag */
- code_buf[code_buf_ptr++] = text_buf[r]; /* Send uncoded. */
- } else {
- code_buf[code_buf_ptr++] = (unsigned char) match_position;
- code_buf[code_buf_ptr++] = (unsigned char)
- (((match_position >> 4) & 0xf0)
- | (match_length - (THRESHOLD + 1))); /* Send position and
- length pair. Note match_length > THRESHOLD. */
- }
- if ((mask <<= 1) == 0) { /* Shift mask left one bit. */
- for (i = 0; i < code_buf_ptr; i++) /* Send at most 8 units of */
- putc(code_buf[i], outfile); /* code together */
- codesize += code_buf_ptr;
- code_buf[0] = 0; code_buf_ptr = mask = 1;
- }
- last_match_length = match_length;
- for (i = 0; i < last_match_length &&
- (c = getc(infile)) != EOF; i++) {
- DeleteNode(s); /* Delete old strings and */
- text_buf[s] = c; /* read new bytes */
- if (s < F - 1) text_buf[s + N] = c; /* If the position is
- near the end of buffer, extend the buffer to make
- string comparison easier. */
- s = (s + 1) & (N - 1); r = (r + 1) & (N - 1);
- /* Since this is a ring buffer, increment the position
- modulo N. */
- InsertNode(r); /* Register the string in text_buf[r..r+F-1] */
- }
- if ((textsize += i) > printcount) {
- printf("%12ld\r", textsize); printcount += 1024;
- /* Reports progress each time the textsize exceeds
- multiples of 1024. */
- }
- while (i++ < last_match_length) { /* After the end of text, */
- DeleteNode(s); /* no need to read, but */
- s = (s + 1) & (N - 1); r = (r + 1) & (N - 1);
- if (--len) InsertNode(r); /* buffer may not be empty. */
- }
- } while (len > 0); /* until length of string to be processed is zero */
- if (code_buf_ptr > 1) { /* Send remaining code. */
- for (i = 0; i < code_buf_ptr; i++) putc(code_buf[i], outfile);
- codesize += code_buf_ptr;
- }
- printf("In : %ld bytes\n", textsize); /* Encoding is done. */
- printf("Out: %ld bytes\n", codesize);
- printf("Out/In: %.3f\n", (double)codesize / textsize);
- }
-
- void Decode(void) /* Just the reverse of Encode(). */
- {
- int i, j, k, r, c;
- unsigned int flags;
-
- for (i = 0; i < N - F; i++) text_buf[i] = ' ';
- r = N - F; flags = 0;
- for ( ; ; ) {
- if (((flags >>= 1) & 256) == 0) {
- if ((c = getc(infile)) == EOF) break;
- flags = c | 0xff00; /* uses higher byte cleverly */
- } /* to count eight */
- if (flags & 1) {
- if ((c = getc(infile)) == EOF) break;
- putc(c, outfile); text_buf[r++] = c; r &= (N - 1);
- } else {
- if ((i = getc(infile)) == EOF) break;
- if ((j = getc(infile)) == EOF) break;
- i |= ((j & 0xf0) << 4); j = (j & 0x0f) + THRESHOLD;
- for (k = 0; k <= j; k++) {
- c = text_buf[(i + k) & (N - 1)];
- putc(c, outfile); text_buf[r++] = c; r &= (N - 1);
- }
- }
- }
- }
-
- int main(int argc, char *argv[])
- {
- char *s;
-
- if (argc != 4) {
- printf("'lzss e file1 file2' encodes file1 into file2.\n"
- "'lzss d file2 file1' decodes file2 into file1.\n");
- return EXIT_FAILURE;
- }
- if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
- || (s = argv[2], (infile = fopen(s, "rb")) == NULL)
- || (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
- printf("??? %s\n", s); return EXIT_FAILURE;
- }
- if (toupper(*argv[1]) == 'E') Encode(); else Decode();
- fclose(infile); fclose(outfile);
- return EXIT_SUCCESS;
- }
- --end--
-
-
-