home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.whtech.com
/
ftp.whtech.com.tar
/
ftp.whtech.com
/
Geneve
/
9640news
/
CAT01
/
RCHVRMT.ARK
< prev
next >
Wrap
Text File
|
2006-10-19
|
15KB
|
323 lines
?
07-Nov-1987
by: Alan L. Beard
BIX: abeard
Compuserve: 71370,2723
TI-99/GENEVE Archiver Format
----------------------------
1 General
This is a text file describing the archiver format popularized by Barry
Traver. It is furthur extended by Huffman encoding techniques, into
a squeezing archiver.
An archiver is a program, which reads a collection of (usually related)
other programs, and combines them into a single file called an archive.
Sometime later, the programs can be recovered from this single file in
the original file formats.
The advantage of an archiver is that only a single file need be posted
on a network instead of multiple files. A user need only be concerned
that the single file was downloaded, and be assured that this single
file contains all necessary elements to run a program (e.g. source,
objects, executables, documentation files, readme files, etc.).
A second advantage of an archiver is that the archiver can be made
"smart", i.e. it can take advantage of repitition of characters in
files to "squeeze" or "crunch" the files into an archive, making
the resulting archive usually some 30 to 60% smaller than the original
files contained. This is done without loss of information. The
"squeezed" archive is shorter to upload/download, saving the user 30
to 60% on his downloading fees. Also, moderators on networks and
bulletin board systems encourage file compression as immediately 30
to 60% more mass storage becomes available, with no increase in costs!
A squeezing archiver also makes sense to a user interested in
archiving a number of infrequently used files (such as the source
for a program after it has been compiled).
As a matter of fact, the only disadvantage to archiving is that
the archive/dearchiving process tends to be time-consuming (2 to
15 minutes to unpack an archive). But, since this is done offline,
on an essentially free (your) system, this usually isn't very
significant.
2 The TRAVER Algorithm/Format
Barry Traver introduced the most popular archiver for the TI-99
and Geneve. It is written in Extended BASIC, is quite a professional
program, and demonstrates the capabilities of Extended BASIC. A
single assembly language subroutine was used, which allowed sector
reads and writes by accessing a Device Service Routine (DSR) in
the TI Disk controller card.
The TRAVER archiver is FAIRWARE, for those people who subscribe
to the TRAVelER, the program is free.
Barry picked a simple approach to the archiving algorithm, one
that shows an indepth knowledge of the internal TI-99 disk file
structure, and uses that structure to the archiver's advantage.
Files are packed in a archive very close to the same way they
are on the TI-99 disk. The archive itself consists of a fixed/
display/128 file (probably since BASIC can't open a fixed/display/
256 byte file).
2.1 File Packing Algorithm
The basic algorithm for the archiver works something like this:
a. The file descriptor index record is read which points to
all of the one sector file header records.
b. The user is prompted for which files on the disk are to
be compressed.
c. Each desired file header record is read, the unused bytes
are stripped (including the reserved expansion bytes, and
the data chain pointer blocks) leaving a "mini" file descriptor
record which consists of the following:
o 10 character file name
o File Status Flags
o Maximum Number Records/Sector or AU
o Total Number of Sectors Used
o End of File Offset
o Logical Record Length
o Number of Fixed Records or Number Sectors Used
These 18 byte mini file headers are packed fourteen to two
128 byte records (for a total of 252 bytes). The remaining
unused four bytes per sector contain either zeroes (meaning
this is NOT the last header sector) or the characters END!
(meaning this is the last header sector).
d. Following the header section of the archive is the data
section. Each 256 byte sector of the file is packed as
two 128 byte records.
2.2 File Unpacking Algorithm
The file unpacking algorithm is quite straightforward. The user
is solicited for the file or the files to be "unpacked". These
are read from the archive file headers, and:
a. The data section for each file is located using the
"total number of sectors used" field in the mini-file
descriptor records.
b. The file is read from the archive, and written to the disk
as a DISPLAY/FIXED/128 file.
c. After the file has been successfully written to the disk,
the disk is searched for the file header and the file
header is overwritten by the information contained in
mini-file descriptor.
3 The BEARD Huffman Squeezing FORMAT
An extension to the TRAVER archiver format was released by myself
in September 1987. This extension provides capability for squeezing
the archive using HUFFMAN encoding techniques originally developed
by R.Greenlaw under CP/M. This CP/M method was ported from CP/M
to the IBM PC, and then to the Amiga, and finally to the TI-99.
Software sources were obtained in the public domain from BYTE
INFORMATION EXCHANGE.
Thanks should be offered at this point for the inputs provided
by Dr. Jerry Coffey, who helped me work out the final bugs in
the squeezing archiver, especially for helping me make it backwards
compatible to the Traver Archive.
The program is written in FORTRAN, so to use it you need to either
own a copy of FORTRAN, or get a copy of the public domain module
99STAND.ARC (for version 2.0) or FORTSA (for version 3.1 and above).
The program maintains backwards compatibility to the TRAVER format,
it can unpacked either squeezed or unsqueezed archives, or a combination
of them. The basic format of the archive is the same, a header
section consisting of "Traver-like" mini file headers, followed by
a data section of squeezed and (possibly) unsqueezed data files.
3.1 File Analysis
Before a file can be squeezed, it must be analyzed. This is performed
automatically by the squeezing archiver as a first pass on the file.
The analysis portion basically consists of reading every sector in
the file to be compressed, and counting the occurances of each of
the possible 256 data byte values. Once all of the file has been
read, and all of the occurance counts have been collected, then
a sort and binary tree is created containing the optimal Huffman
codes for the file.
3.1 File Squeezing Algorithm
Once the file has been analyzed, and a binary tree created, then
the squeezing part of the archiver is executed. This portion
first writes a prolouge to the data file consisting of the
archival type, the sector length of the original file, the
number of binary tree nodes, and the binary tree, followed
by variable length bit strings representing the Huffman encoded
data.
Following the squeeze operation, the reduction in the file size
is tested. If the resulting file is larger than the original
file, then the file is "re-packed", using the normal unsqueezed
TRAVER format.
If a reduction is detected in the file size, then the file header
is modified. First, the file status flag is OR'd with a '20'X,
indicating an archived file. The Total Number of Sectors used
field is changed to the Total Number of 128 byte records used.
The actual data file contains the following information:
o One word representing the archiving method. This is currently
set to a 1, since this is the first squeezing method to be
implemented. Future methods should set this to a different
unique number representing the different squeezing technique.
o One word representing the original value for the Total
Number of Sectors. Note that this was changed in the
mini-file header to be the number of 128 byte records.
o One word representing the number of nodes in the binary
huffman tree (this word starts the particular encoding
scheme, whereas the first two words must be understood
by all archivers).
o The nodes of the binary tree, first the left child, and then
the right child, repeated for the NUMNODES.
o The squeezed archive data, terminated by a special "end-of-file"
character.
3.2 File Unsqueezing Algorithm
The file unsqueezing algorithm is similar to the "non-squeezing"
file unpacking algorithm. The only key is to recognize that this
is a squeezed file (by the special file status bit OR'd '20'X),
and to recognize that the total sectors used is actually the
total records used if this is a squeezed file.
If this is not a squeezed file, than the unpacking algorithm is
identical to the TRAVER format.
One small difference can occur between an unpacked file (via the
Traver method) and a squeezed file (via the Beard method). The
final sector of a file is usually only partially used, and the
squeezing archiver takes advantage of this fact to only save the
in use partial amount of the final sector. The archiver "zero-
fills" the remainder of the sector. The inconsistency is that
the final sector can be different (in its unused portion) than
the original file. This should not cause any problems in using
the file (unless some tricks have been played with the DSR), but
would show as a difference on a sector by sector compare.
4 Improvements
Many improvements to the implemented squeezing archiver are possible
and encouraged. The following is a list of possible improvement
areas:
a. The TI-99 disk format is quite wasteful in that it does not
split a record (either fixed or variable) over a sector
boundry. Therefore, a DISPLAY/FIXED/129 file will have 127
bytes/sector of unused space! Variable files are similar, if
the next record to be stored cannot fit in the current sector,
then the record is terminated and the next record is started.
Since the squeezing archive is actually a stream of bits, this
unused space could easily be skipped in the archive.
b. The bit manipulation routines in this program are in FORTRAN,
therefore are slower than what is achievable with Assembly.
A good deal of the lower level stuff could be streamlined.
c. It would be useful if the archiver would recognize that a
file could not be squeezed at the "analysis" stage rather
than after it has been squeezed and written.
d. Different, and better squeezing algorithms are available.
It might be possible to borrow from the IBM PC archiver,
and do the analysis in various formats, and pick the
compression method that yielded the best result.
e. There is currently no support for hard disks, or reading/
writing sectors to/from a RAM disk. (the reason being
that I don't have a hard disk, and I haven't figured out
the RAM disk yet).
5. Archive FORMAT Recap
a. Archive consists of two sections packed into a 128 byte fixed
length record file:
o Header Section
o Data Section
b. The header section consists of "mini-file headers" packed 14
to a record. Each "mini-file header" is a stripped version of
a standard TI-99 file header, and contains the following:
byte nos Contents
0-9 10 Character (MAX) file name. Unused characters
are space characters.
10 File Status Flags:
Bit No ON=1 OFF=0
>00 Dis/Fix 0 Program File Data File
>01 Program 1 Internal Display
>02 Int/Fix 2 Reserved
>80 Dis/Var 3 Write Prot No Write Prot
>82 Int/Var 4 Reserved
*5 Squeezed Unsqueezed
*Non-standard 6 Reserved
meaning 7 Variable Len Fixed Len
11 Maximum Number of Records/Sector or AU
12-13 Total Number of Sectors Used (unsqueezed archive) or
Total Number of 128 byte Records Used (squeezed archive)
14 End of File Offset
15 Logical Record Length
16-17 Number of Fixed Length Records or Number of Sectors
Used by Variable Length Records
The last 128 byte header record contains the characters "END!" in
the last four bytes. Unused header record slots (to fill out a
128 byte record) contain zeroes.
c. The data section follows the header section, and contains a number
of data files pointed to by the header section. There is no special
seperation between the data sections in the data section itself, it
is merely a collection of 128 byte records.
The unsqueezed data section contains an even number of 128 byte
records for each sector in the input file.
The squeezed data section has a more complex format, but still
consists of 128 byte records. Actually, the data section can
be thought of as a continuous "stream" of data, organized as
follows:
word 0 - Set to a "1", to indicate the compression method.
word 1 - The original value for the Total Number of Sectors
word 2 - The number of nodes in the binary tree
words 3 to numnodes*2+2 - The binary tree
remaining - Variable bit data for the Huffman Encoding.
Download complete. Turn off Capture File.