home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.whtech.com
/
ftp.whtech.com.tar
/
ftp.whtech.com
/
Geneve
/
9640news
/
CAT10
/
JCOFFEY.ARK
< prev
next >
Wrap
Text File
|
2006-10-19
|
10KB
|
160 lines
?
Some Thoughts on Archiving
Jerry Coffey - Dec 1987
Al Beard did the TI community a service not only by writing a combined
archiving/Huffman-squeezing program, but by documenting and extending the
conventions used in Barry Traver's ARCHIVER program. Barry Boone has
provided a striking demonstration of the powerful Lempel-Ziv-Welch
algorithm as well as his consumate skill at crafting fast, efficient
assembly code. Dave Ramsey has just written a thoughtful piece on some
of the possibilities and needs to be met by library and compression
utilities.
In this remarkable atmosphere of creativity, it is difficult to keep
track of the exchange of ideas. I would like to summarize some of these
exchanges to give everyone the flavor of what is happening and to
emphasize some of the issues and questions. Barry Traver put his finger
on some of the concerns that need to be addressed in a series of comments
that are excerpted below:
"A multiplicity of ARCHIVERS with no set standard or
reasonable compatibility can indeed be a nightmare for the
average user (not to mention a real headache to Sysops!), as
owners of other computers well know (with any on-line time saved
by file compression sometimes outweighed by off-line time lost
trying to find which of a dozen different archivers must be used
to unpack the compressed file!).
I have no axe to grind for a particular method (I've
encouraged both Al Beard and Barry Boone in their efforts), but
I believe we need to work out something of benefit of the
average TI user and Sysops for the good of the entire TI
community. More than others, TI'ers are dependent upon major
services and local BBS's for software (they certainly won't find
much in their local computer store!).
"Two thoughts about the need for standardization:
(1) although TI'ers are as a whole more technically
knowledgeable than owners of other computers (they have to be to
stay alive!), many are still just plain "users" and need to have
some user-friendliness built into archiving and unarchiving, so
that they won't waste time on downloading files and never be
able to figure out how to unpack them.
(2) I've heard reported complaints from local IBM Sysops about
people uploading mammoth packed files without giving any
indication of what particular method to use in unpacking them,
which cause a lot of trouble for Sysops. If we can avoid this
sort of thing in the TI community, we should.
"One thought about one possible step for a solution to help
the confused user: have on a disk a collection of archivers with
a load program. When the load program is run, it could ask for
the name of the file to be unpacked, and - after checking it out
- could then load the appropriate archiver (DCOPY, ARCHIVER,
SQUARC, ARCHIVER II 2.3, ARCHIVER III, etc.) for unpacking that
particular file (assuming that there are enough "flags" -
intentional and/or accidental - to distinguish the packed
files). Just a thought on one thing that could be done if a
standard format _isn't_ worked out (although some
standardization would certainly be preferable, in my opinion).
WHY STANDARDIZATION?
In CPM and MSDOS, filename extensions play a critical role in
identifying file type for the operating system (e.g., COM, EXE, and BAT
files). This same device has been used by agreed convention to indicate
collections of files combined in a "library" or "archive". The extension
LBR or ARC conveys information about the structure of the combined file
to special utility programs, just as COM, EXE, and BAT convey information
to the operating system. The new extensions and their meaning were
established not by the creators of the operating system, but by users
(and vendors) working together to improve performance of the system.
In the TI system, the role of filename extensions is played by the file
type byte in the file header. Since this convention was established by
TI and has served the community very well, any elaboration of the file
type concept must be consistent with the TI protocol and be recognised
and accepted by programmers to be useful.
Al Beard proposed setting a single bit in the file type to flag
squeezed files. A recent conference on Delphi explored some of the
possibilities here. TI only used 10 out of the 256 bit combinations
available in the file type byte. Setting other bits in the file type has
virtually no effect in existing programs (thus avoiding compatibility
problems). Most important, this byte is preserved in the Paul Charlton
XMODEM implementation that has become the de facto standard for file
transfers. It is also preserved in Traver and Beard archives and will be
probably be preserved in the full implementation of Barry Boone's program
(ARC III). So here we have a common point of reference with the approach
used in CPM and MSDOS.
There are several possible uses that could be made of the unused bits
in the file type byte. Dave Ramsey has pointed out the usefulness of
considering library and squeezing functions separately. In the case of a
library or archive, the DF128 byte in the header that identifies the
combined file could have a bit set to distinguish an archive from
normally structured DF128 files. (The Display/Fixed 128 format for
archived files has also become a de facto standard, in part to
distinguish them from the radically different concept <TI-DOS
independence> used in TI's unique "DCOPY" utility.) For each file within
an archive, a "mini" header is written to identify the type and location
of the data records that constitute the packed file. These mini-headers
preserve the file type bytes for the component files and are a natural
location for imbedding flags to indicate that a file has been squeezed,
for example. Within any file type byte there are 15 variants that could
be distinguished by setting the "reserved bits". But if Dave Ramsey is
right, this modest approach might come back to haunt us by restraining
further development.
That is the significance of any consensus on standards we can reach
now. It will affect not only the ease of the transition to more powerful
file-handling utilities, but also the course of future development.
WHAT DEVELOPMENTS?
What is at stake? Consider a few of the possibilities. Even the
simplest Traver archive already has the potential for providing
subdirectories on floppy disks. When you catalog the contents of an
archive, you are already looking at a subdirectory. The files that are
pointed to by this directory are all intact and waiting to be used, but
the handles (file headers) needed by the operating system have been
condensed. What is missing are the utility routines to read or print
text directly from the archive or to execute the programs. How easy or
difficult it will be to write these routines may well depend on the form
of the mini-header.
The 9640 (having a real-time clock) uses some of the bytes reserved by
TI in the file header for two time/date stamps, a use that TI probably
anticipated. Since archives are very useful for long-term storage of
files, a date stamp can be very helpful in distinguishing similar files
of different vintages. If this potential is to be preserved, the
mini-header must also preserve some of these reserved bytes along with
the bare necessities retained in the current format. Note that these
bytes in the main archive header are not preserved in an XMODEM transfer
(FTG applies its own current date stamp when it writes a file on the user
disk), but date stamps in the mini headers within a transmitted archive
could be preserved.
Another possibility might be the use of bits or bytes within the mini-
header to indicate more elaborate processing the files have been put
through, for example Huffman or LZW encoding. This would permit files
compressed by different methods to inhabit the same archive and each file
could be correctly restored by selecting a routine based on the header
flags. Al Beard is using a similar idea employing the first few words of
the encoded file, but consolidating this information in the mini-header
may be more efficient and adaptable.
More ambitious possibilities include loading a file directly from an
archive on disk, unsqueezing if necessary, and then executing it
(program) or reading, printing, editing (text). How about a directory
routine for TI-Writer (or FUNL or BA-) or MY-WORD that reads a
subdirectory, marks a file, and loads the file into the editor
(unsqueezing if necessary) -- and another that repacks the edited file
(with a revised date stamp) back into the same archive. All of this is
possible with the right information in the mini-header and an appropriate
structure for the subdirectory.
CONSEQUENCES OF STANDARDS
Since TI has left the field, the future is in the hands of the users.
We must accept responsibility for any effort to assure orderly
development of new software. If standards don't permit enough room for
innovation, creative people will be frustrated and the user will suffer
with friendly but mundane software. On the other hand a change to less
restrictive concepts requires a lot of things to make the new standard
work for the average user -- such things as conversion programs to update
existing files for compatibility. This is not the sort of thing that can
be taken as a steady diet. So it is important to agree on something we
can live with for a while.
Download complete. Turn off Capture File.