home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
walnut-creek-CDROM
/
CPM
/
TURBOPAS
/
SQZTURBO.LBR
/
TURBOSQZ.DQC
/
TURBOSQZ.DOC
Wrap
Text File
|
2000-06-30
|
12KB
|
274 lines
Turbo Pascal File Squeezer / UnSqueezer for CP/M and MsDos/PcDos
Version 2.0 04/17/85 Bob Berry
-----------
I originally downloaded these programs from the CompuServe Borland SIG.
They were (rather brazenly, I thought) identified as being "CP/M compatible".
There were, however, the following problems with CP/M compatibility:
- Wouldn't compile. First the text was too large to allow
compilation in 64k: Compile time error 99 - Compiler Overflow.
Easily overcome by setting up $Include files.
- Syntax error. The original version included "LongFileSize", which is
strictly an MsDos turbo function. Replaced with FileSize for CP/M.
- The input and output files were "File of Char;", which works fine
with Dos. Since the MsDos directory identifies the size of a file in
bytes, Dos Turbo does not need to do anything special to keep track
of record count, etc. In CP/M, turbo writes two integers at the
beginning of each "typed" file ("File of ...;") to identify the
logical record length and record count. So using "File of Char" on
input leads turbo to expect the first four bytes of the file to
contain the two integers. Naturally this will not be true (unless
the file was actually created BY A TURBO PROGRAM as a "File of
..."). For CP/M, the input and output files must be "untyped" files
("File;"), read and written with BlockRead and BlockWrite.
- Recursive procedures and functions require the "A-" compiler
directive to work under CP/M turbo.
With these compatibility problems fixed, one other program bug appeared.
At end-of-file, the original program immediately sent the SpEof character and
terminated. Any characters which were repeated at end of file would be
truncated. A CP/M file which ended with a string of fifty ^Z's (ASCII end-of
file) would be unsqueezed to a single ^Z. A Dos file which ended with a string
of eighty *'s would unsqueeze to a single *. (Dos files don't need a ^Z for
EOF, as the directory keeps track of the length of the file in bytes.)
Actually the Dos error would be more obvious than the CP/M error, as the
"physical" CP/M file is almost always larger than the "logical" file (the tail
end of the last sector is normally "garbage".)
The original programs stored (upon squeezing) the file checksum, but
(intentionally?) avoided comparing the output file to the checksum on
unsqueezing. The above "end-of-file repeats" problem caused a checksum error,
which I first discovered when testing these for "compatibility" with public
domain (CP/M) SQ/USQ.
The current version of the program(s) cause no checksum errors (USQZ does
check!), and the programs are compatible with SQ/USQ. Also the "pre-squeeze"
and "post-unsqueeze" files match exactly (a minimum requirement!) This has
been verified for CP/M with CRC.COM and for DOS with COMP.COM.
The Huffman algorithm is apparently developed differently from the
(standard?) public domain SQ/USQ programs (the CRC's of the squeezed files
don't match.) However, this is of no significance, as a file squeezed with the
public domain program can be unsqueezed with the turbo program, and vice versa
with no checksum errors, and with the output file exactly matching the
original.
Finally, with the programs working correctly, I set about to make them a
little less cumbersome. The original programs would be loaded and then ask for
the file name (no provision to pass the file name from the command line, as
"A>SQZ SAMPLE.DOC".) The program would then squeeze or unsqueeze the file and
boot you back to the system. To do another file, you would start from the "A>"
prompt. I added:
- Provision to specify the first file name on the command line.
- Optional specification of a different output drive.
- CP/M Re-Log to prevent BDOS R/O errors on the output drive. (You can
swap disks!)
- Verification of the file checksum on unsqueezing.
- Statistics showing filesize and percentages of output to input files.
- Repeat until a "blank" file name is entered (just <cr>).
- Separate (TESTED and WORKING) routines to accommodate the differences
between CP/M and MsDos.
Version 2.1 01/29/86 Bob Berry
-----------
- Added WildCard file specifications.
- Added Printer Echo.
- Version 2.1 was prompted primarily by the following:
For years, CP/M (you remember CP/M) has had public domain file
compression programs available. Commonly called "SQUEEZE" and
"UNSQUEEZE" these programs conformed to the following "squeezed"
file definition:
Byte 00 - 01: an integer, FF76H, the squeezed file ID,
(stored LSB, MSB or 76 in 00, FF in 01)
Byte 02 - 03: an integer, the file CheckSum, the integer total
of all characters in the original file.
Byte 04 - ??: the original file name with null (00) terminator
(since the squeezed file has a "Q" as the
second character of the file extension.)
Followed By: an integer, the number of nodes in the tree
built when the file was squeezed.
Followed By: 2*Nodes integers: the tree.
Followed By: The Squeezed file.
Recently at least one new format of squeezed file has surfaced.
Apparently in an attempt to preserve the Date and Time of last
write which MsDos (PcDos) stores as part of a directory entry,
a new format was devised:
Byte 00 - 01: an integer FFFAH, the new squeezed file ID
Byte 02 - ??: the original file name, null terminated.
Followed By: a null terminated ASCII string with the file date.
Followed By: a null (perhaps a null terminated string?)
Followed By: an ASCII End-of-file character (^Z = 1AH)
Followed By: the file CheckSum
Followed By: an integer, the MsDos (FCB format) file date.
Followed By: an integer, the MsDos (FCB format) file time.
Followed By: an integer, the number of nodes.
Followed By: 2*Nodes integers: the "tree".
Followed By: The Squeezed File.
This program will UnSqueeze files of either format, doing
complete CheckSum verification, and will do so on either CP/M
or DOS computers (by selecting the appropriate .INC module).
Don't be surprised, however, if you discover yet another format
which this program doesn't recognize. It seems there are new
Squeeze/UnSqueeze programs popping up all the time, with
authors who may not be too concerned about the compatibility of
their programs with the rest of the universe.
Bob Berry
CompuServe 76555,167
The following documentation is included from the "original" programs.
=======================================================================
From SQZ.PAS:
{ CP/M compatible file squeezer utility.
This translation uses the Huffman algorithm to develop a binary tree
representing the decoding information for a variable length bit string code
for each input value. Each string's length is in inverse proportion to its
frequency of appearance in the incoming data stream. The encoding table is
derived from the decoding table.
The range of valid values into the Huffman algorithm are the values of a byte
stored in an integer plus the special endfile value chosen to be an adjacent
value. Overall, 0-SPEOF.
The algorithm develops the single element trees into a single binary tree by
forming subtrees rooted in interior nodes having weights equal to the sum of
weights of all their descendents and having depth counts indicating the
depth of their longest paths.
When all trees have been formed into a single tree satisfying the heap
property (on weight, with depth as a tie breaker) then the binary code
assigned to a leaf (value to be encoded) is then the series of left (0) and
right (1) paths leading from the root to the leaf. Note that trees are
removed from the heaped list by moving the last element over the top element
and reheaping the shorter list.
To further compress the output code stream, all bytes pass directly through
except for:
1) DLE is encoded as (DLE, zero).
2) repeated byte values (count >= 3) are encoded as (value, DLE, count).
In the original design it was believed that a Huffman code would fit in the
same number of bits that will hold the sum of all the counts. That was
disproven by a user's file and was a rare but infamous bug. This version
attempts to choose among equally weighted subtrees according to their maximum
depths to avoid unnecessarily long codes. In case that is not sufficient to
guarantee codes <= 16 bits long, we initially scale the counts so the total
fits in an unsigned integer, but if codes longer than 16 bits are generated
the counts are rescaled to a lower ceiling and code generation is retried.
The "node" array of structures contains the nodes of the binary tree. The
first NUMVALS nodes are the leaves of the tree and represent the values of
the data bytes being encoded and the special endfile, SPEOF. The remaining
nodes become the internal nodes of the tree.
Program states:
NoHist don't consider previous input
SentChar lastchar set, no lookahead yet
SendNewC newchar set, previous sequence done
SendCnt newchar set, DLE sent, send count next
}
(* NOTE: UnReferenced (debug) procedures and functions have been moved here
from SQZ.PAS.
Bob Berry *)
(*
Procedure PrintBits(len, number: integer);
var i, j: integer;
begin
Write(' code ');
For i:=len-1 DownTo 0 do
begin j:=(number shr i) and $0001; write(j:1); end;
writeln;
end;
*)
(* Local to InitializeHuffman
Procedure PrintFrequency;
Var i, j: integer;
begin
j := 0;
For i:=0 to pred(NumVals do
If Node[i].Weight>0 then
begin
j:=succ(j);
Writeln(Lst,'Node ',i:3,' Weight is ',Node[i].Weight:4:0);
end;
WriteLn(Lst); WriteLn(Lst,'Total Node count is ',j);
end;
Procedure PrintList;
Var i: integer;
str: string[10];
begin
WriteLn(', waiting'); ReadLn(str);
For i:=0 to pred(NumVals) do
begin
Write('Number ',i:3,' length ',CodeLen[i]:2,
' weight ',node[i].Weight:4:0);
If CodeLen[i]>0 then PrintBits(CodeLen[i],Code[i])
else WriteLn;
end;
end;
*)
FROM USQZ.PAS:
{
This program unsqueezes a file which has been squeezed or compressed to
reduce the space required to store it on disk. The program was converted
from the original version written for CP/M in the C language. This program
can be used to unsqueeze files which have been downloaded from RCP/M systems
where almost all files are saved in this squeezed format.
The technique used is the Huffman encoding technique which converts the most
common characters in the input file to a compressed bit stream of data. This
program unsqueezes such a Huffman encoded file.
PUBLIC DOMAIN - Feel free to distribute this program. Do not distribute it by
commercial means or make any charge for this pgm.
Version 1.0 - 09/05/82 Scott Loftesness
Version 1.1 - 01/06/83 Added capability to strip off parity bit if
output file is text. Ernie LeMay 71435,730
Version 1.2 - 07/20/84 converted to Turbo Pascal. Steve Freeman
}