home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The C Users' Group Library 1994 August
/
wc-cdrom-cusersgrouplibrary-1994-08.iso
/
vol_100
/
103_01
/
pkunpk.doc
< prev
next >
Wrap
Text File
|
1985-03-10
|
7KB
|
187 lines
20 MAY 81 by Kathy Bacon who is going to spend the rest of her
life documenting her (and others) programs.....
DOCUMENTATION FOR PACK - UNPACK SET OF PROGRAMS USING THE
HOFFMAN MINIMUM REDUNDANCY ALGORITHM
The program PACK can be used to "pack" or encode a file into
a form that requires less storage; the original file can then be
gotten back by running it through UNPACK.
Put quite simply, PACK reads through the file to be packed
and counts the number of times each character in the file occurs.
Using this information, it constructs a tree structure, which it then
uses to generate a bit code for every character which appears in the
file, in such a way that the characters which appear the most have
the codes with the fewest bits, and those that appear less often have
longer codes (e.g. the character which appears the most often will
have the bit code (0) or (1); the next group will have the codes
(00),(01), (10),(11), etc.). It then writes out the file using these
codes instead of the characters themselves; the character occurrence
infor- mation is stored in the first four sectors of the new file.
UNPACK reads the occurrence information from a packed file,
reconstructs the tree, regenerates the codes from the tree, and
finally translates the bit codes in the packed file back to their
character equivilents.
Both PACK and UNPACK can accept more than one command line
argument at a time to be packed or unpacked. Also, a destination file
can be specified like so:
A>pack source>destin
There should be NO spaces between the file names and the output
specifier '>'. If no destination file is specified, the output file
from PACK will be source.PAK ; source.UNP from UNPACK.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Note: any type of file can be packed. The more random a file
is the less the savings will be. Best savings occur on text files. C
com files offer less and assembly language com files even less. Even
previously packed files can be packed and sometimes there can be some
savings (not all the time though). Of course you would have to unpack
it twice to get back the original. Work is in progress on shortening
overhead of the character occurrence information. We have packed and
unpacked many files and the programs seem to be fairly robust. Try
unpacking files that are not packed. (GIGO). Savings for text files
are typically on the order of 30%, for com files more like 20%. Have
fun and save space.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
The "tree" is actually an array of elements (512 in all -
twice the possible number of character types), each of which has four
parts:
1) the character occurrence value for that element
2) a pointer to a son (called "son_zero")
3) a pointer to a second son ("son_one")
4) a pointer to the element's "daddy"
The first 256 array elements are the characters - each index into the
array is the numerical equivelent of some character. Initially, all
of the pointers are set to NIL (-1 in this case); all occurrence
fields are set to zero. Then the occurrence information is filled in
for the first 256 elements (in PACK this is done by actully going
through the file; in UNPACK the information is read from the first
four sectors of the file ).
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
The same function is used to generate the tree in both PACK and
UNPACK. A pointer to the tree structure is passed to it; it returns
the number of the "head" of the tree, when it's done:
{
daddy = NIL;
nextpapa = number of char types (256)
while(1)
{
find the two elements in the tree with the lowest occur.
values. Label one ONE, the other ZERO.
if (both ONE and ZERO are NIL)
return the current daddy
if (ONE == NIL)
return the value of ONE
daddy = nextpapa
nextpapa = nextpapa + 1
set the pointer "son_zero" at element "daddy" to ZERO;
set the pointer "son_one" at element "daddy" to ONE
set the pointers to "dad" at elements ONE and ZERO to
"daddy"
the occurrence value at element "daddy" is the sum of
the occur. values at elements ONE and ZERO
}
}
This will generate a tree in which all the "leaves" are characters
(elements 0-255 in the tree array); no character will have a son_one
or a son_zero.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Once the tree has been generated, and the occurrence information
written out to the output file, PACK uses the occurrence field at
each element on the tree to tell whether that element in a "son_one"
or "son_zero" of its "dad". This simplifies the encoding, which is
done as follows:
{
while (more characters in input)
{
get a character
starting at the character's element in the tree, read off
the element's occur. value into a stack; then go to
it's "dad" and do likewise. Continue until you find
an element whose "dad" is NIL. This is the head of
the tree.
read the bits back off the stack (reverse of the way they
went in) one at a time, shoving them into an output
byte. When there are eight in the byte, put it into
the output buffer and start over with a new byte. If
the output buffer is full, write it out to the file.
} end while (more input characters)
if (there's part of an output byte left)
{
put it in the buffer
write out the buffer to the file
}
}
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
To pack a file, we start at the "leaves" of the tree (at a character)
and follow the "dad"s up to the "big_daddy" of the tree to generate
the bit codes; to unpack, we do the reverse: we start at the head of
the tree and follow the bit code down the pointers (following a
son_one if the bit is a one, a son_zer if it's a zero) until we reach
a character:
{
calculate the number of characters that were in the unpacked file
while (no. decoded chars < no. in original file)
{
get a byte from the packed file
for (i=1 to 8)
{ get a bit
if (bit==0)
go to current element's son_zero
else go to son_one
if (new element is a character)
{ put character in output buffer
write out buffer if it's full
reset element to "big_daddy"
}
}
}
if (part of a buffer done)
write out buffer to file
}
tree