home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
walnut-creek-CDROM
/
BEEHIVE
/
ZCAT
/
UNARC12A.LBR
/
UNARC.IZF
/
UNARC.INF
Wrap
Text File
|
2000-06-30
|
22KB
|
529 lines
File: UNARC.INF
Subject: Technical Information for UNARC Program
Version: 1.2
Date: June 24, 1986
------------------------------------------------------------------------------
UNARC
CP/M Archive File Extraction Utility
Copyright (C) 1986 by Robert A. Freed
All Rights Reserved
This file provides supplementary technical information of interest to
programmers and advanced users, particularly RCPM (Remote CP/M) system
operators, relating to UNARC version 1.2, which was released June 24, 1986.
Primary user documentation is provided by the associated file UNARC.DOC.
Refer to the notice in that file regarding rights of use and distribution of
this program. The file UNARC12.MSG contains a list of all files distributed
with the current UNARC release.
MEMORY REQUIREMENTS
-------------------
The UNARC.COM file (Z80 version) occupies 4K bytes of disk storage. A minimum
TPA size of slightly more than 4K bytes is needed to obtain the directory
listing of an archive file. File extraction requires additional TPA space,
with the exact amount dependent upon the version of the compression method
used to store an extracted file (reported by the "Ver" column of a directory
listing), as shown in the following table:
Version Stowage Min. TPA Size Min. CP/M Size
------- ------------- ------------- --------------
1,2 Unpacked 6K 13K
3 Packed 6K 13K
4 Squeezed 7K 14K
5,6,7 Crunched (old) 26K 33K
8 Crunched (new) 18K 25K
The alternate UNARCA.COM file (8080 version) occupies 5K bytes of disk storage
and requires approximately 1K bytes more TPA space for file extraction than is
shown above for the standard version.
Any additional TPA space is used for buffering the output file, which provides
better performance in systems with larger memories. By default UNARC does not
utilize the upper 2K bytes of TPA space (which it assumes is occupied by the
CCP), and it returns directly to the CCP instead of forcing a warm boot after
program execution. The last column in the above table, which specifies the
minimum required standard CP/M 2.2 system size, reflects this fact. For very
small systems, a patch may be made which allows UNARC to overlay the CCP (and
hence reduces the minimum required system size by 2K), at the expense of a
warm boot following each use of the program (see next section).
File: UNARC.INF Page 2 of 8
------------------------------------------------------------------------------
Note that the worst-case memory requirement is for crunched files created by
older releases of the MS-DOS ARC program (prior to program version 5.0, dated
January 1986). The newer method of generating crunched files not only reduces
UNARC's memory requirements by 8K bytes, but offers significant speed and
compression improvements. As older ARC files are converted to the newer
compression method, the likelihood of encountering such files should diminish.
UNARC always checks the amount of available memory during execution, and it
will abort gracefully with a "Not enough memory" message if necessary.
OPTIONAL PATCHES
----------------
Several options are provided, primarily to allow safe use of UNARC on RCPM
systems. These are specified by data variables at the beginning of the
program, which simplifies patching with DDT or ZSID (or whatever) without
requiring re-assembly of the source. The following table shows the location,
symbolic name, and default value of each of these options, which are discussed
in greater detail below:
Address Name Default Description
------- ----- ------- -----------
103H CCPSV 08H No. of high memory pages to preserve (8 = 2K)
104H BLKSZ 00H Default disk allocation block size / 1K
105H HIDRV 10H Highest input file drive no. (1=A, 2=B, .., 10H=P)
106H HODRV 10H Highest output file drive no. (0 = no extracts)
107H TYFLG FFH Typeout flag (0 = no typeout allowed)
108H TYPGS 00H No. buffer pages for file typeout (0 = maximum)
109H TYLIM 00H Line limit for file typeout (0 = none)
10AH WHEEL 0106H Address of "wheel" byte
10CH NOTYP 'COM' Table of typeout-excluded filetypes...
10FH 'CMD'
112H 'EXE'
115H 'OBJ'
118H 'OV?'
11BH 'REL'
11EH '?RL'
121H 'INT'
124H 'SYS'
127H 'BAD'
12AH 'LBR'
12DH 'ARC'
130H '?Q?'
133H 0,0,0 Room for additional filetypes...
136H 0,0,0
139H 0,0,0
13CH 0,0,0
13FH 0,0,0
142H 0,0,0
145H 0,0,0
148H 0 (End of table marker)
149H USAGE Start of help message
File: UNARC.INF Page 3 of 8
------------------------------------------------------------------------------
CCPSV: This value specifies the number of (256-byte) pages to reserve at the
top of the TPA. The default value (8) corresponds to a reserved area
of 2K bytes, which is appropriate for the CCP size in standard CP/M
2.2 systems. Setting this value to zero yields an additional 2K bytes
of buffer space for file extraction, but forces a warm boot system
return following program execution, and need be used only with very
small systems. (Setting this byte to zero may also be desirable with
CP/M 3.0 or any non-standard system with a permanently-resident CCP.)
BLKSZ: This value provides the default disk allocation block size (as a
multiple of 1K bytes) to use in calculating disk space requirements
for archive directory listings ("Disk" column), when no output drive
is specified. A zero value (the default) indicates that the block
size of the default (CCP) disk drive is to be used for this purpose,
unless the wheel byte is zero, in which case a value of 1 is used.
(Since 1K is the minimum disk block size in any CP/M system, this
provides the widest applicability for all remote users of an RCPM,
independent of the system's actual disk block size.)
HIDRV: Specifies the highest allowable drive no. for archive files, where
drive A is 1, drive B is 2, etc. Setting this to zero restricts input
to the default drive (i.e. disallows drive names in the archive file
parameter). Most RCPM systems need not alter this byte, since invalid
drive accesses are normally intercepted elsewhere. If this is not the
case, this value should be set to the number of available drives
(assuming all sequential drives are available, starting with drive A).
HODRV: Specifies the highest allowable output drive no. for file extraction
operations (specified as for HIDRV above). Setting this to zero will
disallow any file extraction (which is the obvious setting of
importance on RCPM systems). However, this is necessary only if a
wheel byte is not implemented. If the wheel byte is zero, a zero
value for HODRV is automatically assumed.
TYFLG: The default value of this flag (0FFH) enables typeout of a single
(unambiguous) file in an archive, if no drive name is specified (and
the filetype is not excluded by the NOTYP table). RCPM systems may
set this to zero to disallow file typeout operations. (However,
typeout is always permitted if the wheel byte is non-zero.)
TYPGS: This value specifies the number of (256-byte) pages to buffer an
extracted file during typeout operations. The default zero value
provides the maximum possible buffering (the entire TPA space), but
may cause a long delay at the start (and in the middle) of typeout of
very large files. Setting this value to 1 will minimize viewing
waits, but may cause excessive stop/start of floppy disk drive motors
on some systems (e.g. Kaypro). For RCPM systems, if this value is
zero and the wheel byte is zero, a value of 1 is assumed (which
minimizes delays for remote users).
TYLIM: This value may be set non-zero to enforce a limit on the number of
text lines (up to 255) which may be viewed for file typeout. This may
be desired by some RCPM systems, to discourage excess "browsing" in
favor of downloading of files by callers. If the wheel byte is non-
zero, no limit is enforced.
File: UNARC.INF Page 4 of 8
------------------------------------------------------------------------------
WHEEL: This word specifies the address of a "wheel" byte in memory. Most
RCPM systems implement such a byte to protect privileged functions
from remote callers and to provide greater flexibility for the sysop.
If the wheel byte is zero, no file extraction is allowed (as if HODRV
value = 0), and the BLKSZ and/or TYPGS values are assumed to be 1 (if
these are not changed from their default zero values). If the wheel
byte is non-zero, the TYFLG and TYLIM values are not enforced (i.e.
unlimited file typeout is allowed). Use of this address to specify a
wheel byte external to UNARC permits a single copy of the program to
be used by both the sysop (wheel byte non-zero) and remote users
(wheel byte zero). (ZCPR3 users should set this to the address of
their Z3WHL byte, as determined by running SHOW.COM.) If no such
wheel byte is implemented, a secure version of UNARC may still be
created by leaving WHEEL at its default value and setting the HODRV
byte to zero. (In this case, the sysop will require a second version
of the program for unrestricted use.)
NOTYP: This is a table of filetypes (3-byte strings) which are disallowed for
typeout. Storage is provided for 20 such filetypes, with 13 types
predefined. Additional filetypes may be added at the end of the
table, or existing types may be replaced. (To remove one of the pre-
defined types without replacing it, simply set the msb in any byte.)
The table must terminate with a zero byte.
USAGE: This is the on-line help message which is displayed when UNARC is run
with an empty command line. No changes should be made to this text.
In particular, the author requests that his copyright notice be
preserved (despite the ease with which it may be patched out). Note
that the help information relating to file extraction and/or typeout
is automatically removed if the corresponding operations are
disallowed (i.e. by zero values of HODRV, TYDRV, and/or wheel byte).
RE-RUNNING UNARC
----------------
The UNARC program is completely self-initializing. Once loaded, it may be
re-executed multiple times, e.g. by a zero-length COM file or with the "GO"
command in ZCPR systems.
ARCHIVE FILE FORMAT
-------------------
Component files are stored sequentially within an archive. Each entry is
preceded by a 29-byte header, which contains the directory information. There
is no wasted space between entries. (This is in contrast to the centralized
directory used by Novosielski libraries. Although random access to subfiles
within an archive can be noticeably slower than with libraries, archives do
have the advantage of not requiring pre-allocation of directory space.)
File: UNARC.INF Page 5 of 8
------------------------------------------------------------------------------
Note that UNARC uses CP/M random disk reads (BDOS function 33) to rapidly skip
through the entries of an archive. For this reason, the program requires CP/M
version 2.0 or higher and will reject attempted execution under earlier
versions of CP/M. Archive entries are normally maintained in sorted name
order (although UNARC does not utilize this fact to any advantage).
The format of the 29-byte archive header is as follows:
Byte 1: 1A Hex.
This marks the start of an archive header. If this byte is not found
when expected, UNARC will scan forward in the file (up to 64K bytes)
in an attempt to find it (followed by a valid compression version).
If a valid header is found in this manner, a warning message is
issued and archive file processing continues. Otherwise, the file is
assumed to be an invalid archive and processing is aborted. (This is
compatible with MS-DOS ARC version 5.12). Note that a special
exception is made at the beginning of an archive file, to accomodate
"self-unpacking" archives (see below).
Byte 2: Compression version, as follows:
0 = end of file marker (remaining bytes not present)
1 = unpacked (obsolete)
2 = unpacked
3 = packed
4 = squeezed (after packing)
5 = crunched (obsolete)
6 = crunched (after packing) (obsolete)
7 = crunched (after packing, using faster hash algorithm) (obsolete)
8 = crunched (after packing, using dynamic LZW variations)
Bytes 3-15: ASCII file name, nul-terminated.
(All of the following numeric values are stored low-byte first.)
Bytes 16-19: Compressed file size in bytes.
Bytes 20-21: File date, in 16-bit MS-DOS format:
Bits 15:9 = year - 1980
Bits 8:5 = month of year
Bits 4:0 = day of month
(All zero means no date.)
Bytes 22-23: File time, in 16-bit MS-DOS format:
Bits 15:11 = hour (24-hour clock)
Bits 10:5 = minute
Bits 4:0 = second/2 (not displayed by UNARC)
Bytes 24-25: Cyclic redundancy check (CRC) value (see below).
Bytes 26-29: Original (uncompressed) file length in bytes.
(This field is not present for version 1 entries, byte 2 = 1.
I.e., in this case the header is only 25 bytes long. Because
version 1 files are uncompressed, the value normally found in
this field may be obtained from bytes 16-19.)
File: UNARC.INF Page 6 of 8
------------------------------------------------------------------------------
SELF-UNPACKING ARCHIVES
-----------------------
A "self-unpacking" archive is one which can be renamed to a .COM file and
executed as a program. An example of such a file is the MS-DOS program
ARC512.COM, which is a standard archive file preceded by a three-byte jump
instruction. The first entry in this file is a simple "bootstrap" program in
uncompressed form, which loads the subfile ARC.EXE (also uncompressed) into
memory and passes control to it. In anticipation of a similar scheme for
future distribution of UNARC, the program permits up to three bytes to precede
the first header in an archive file (with no error message).
CRC COMPUTATION
---------------
Archive files use a 16-bit cyclic redundancy check (CRC) for error control.
The particular CRC polynomial used is x^16 + x^15 + x^2 + 1, which is commonly
known as "CRC-16" and is used in many data transmission protocols (e.g. DEC
DDCMP and IBM BSC), as well as by most floppy disk controllers. Note that
this differs from the CCITT polynomial (x^16 + x^12 + x^5 + 1), which is used
by the XMODEM-CRC protocol and the public domain CHEK program (although these
do not adhere strictly to the CCITT standard). The MS-DOS ARC program does
perform a mathematically sound and accurate CRC calculation. (We mention this
because it contrasts with some unfortunately popular public domain programs we
have witnessed, which from time immemorial have based their calculation on an
obscure magazine article which contained a typographical error!)
Additional note (while we are on the subject of CRC's): The validity of using
a 16-bit CRC for checking an entire file is somewhat questionable. Many
people quote the statistics related to these functions (e.g. "all two-bit
errors, all single burst errors of 16 or fewer bits, 99.997% of all single
17-bit burst errors, etc."), without realizing that these claims are valid
only if the total number of bits checked is less than 32767 (which is why they
are used in small-packet data transmission protocols). I.e., for file sizes
in excess of about 4K bytes, a 16-bit CRC is not really as good as what is
often claimed. This is not to say that it is bad, but there are more reliable
methods available (e.g. the 32-bit AUTODIN-II polynomial). (End of lecture!)
File: UNARC.INF Page 7 of 8
------------------------------------------------------------------------------
LIMITATIONS
-----------
The current release of UNARC contains several minor limitations, which are
described below. These suggest obvious areas for improvement, which may be
addressed in future releases of the program.
(1) ZCPR-style drive/user specifications (du:) are not recognized. All file
operations are performed in the current user area.
(2) No internal support is provided for printer output to the CP/M list
device. However, all console output is generated via BDOS function 2
calls, so directory listings may be printed by typing CTRL-P at the CCP
command level before executing UNARC. But this will in general not be
suitable for file typeout, as printer control characters such as BS and
FF are filtered by the program. The best way to obtain a printer listing
of a file in an archive is to first extract the file to disk and then use
PIP (or your favorite printer utility) to generate the listing.
(3) The file typeout facility is admittedly primitive. This is intentional,
as the typeout operation was provided primarily to allow quick previewing
of files. However, if the program achieves any measure of popularity on
RCPM systems, a paged mode of typeout (a la the public domain TYPEL
program) will most certainly be added.
(4) UNARC will not properly handle archive file sizes in excess of 23 bits
(8 Megabytes). Since this is the limit for file sizes under CP/M 2.2, I
do not envision this as being a significant concern for the majority of
users. (Although, if you are a CP/M 3.0 user with a file that large, and
you have enough disk space to store it, I suppose you have a real need
for UNARC! Sorry.)
File: UNARC.INF Page 8 of 8
------------------------------------------------------------------------------
SOURCE CODE (AND AUTHOR'S RAMBLINGS)
------------------------------------
The source program file, UNARC.Z80, uses Zilog mnemonics (my preference) and
may be assembled with either the M80 (Microsoft) or Z80ASM (SLR Systems) macro
assemblers. (Relocatable code features have been avoided, so conversion to
other assembler formats should be straightforward.)
I own a Z80-based machine and prefer not to limit its capabilities or restrict
my programming options. However, in an attempt to offer the program to users
of 8080-based systems (a vocal minority in the world of CP/M public domain
software), I have modified the source so that an alternate non-Z80 version
(UNARCA.COM) may be generated by conditional assembly. This is accomplished
through use of macro definitions which emulate the extended Z80 instruction
set on 8080/8085 machines. No attempt has been made to optimize this
emulation for either size or speed, but it does work without impacting the
efficiency of the standard Z80 version.
I distribute the source code because I am a firm believer in disseminating
information to those interested enough to inspect it (and patient enough to
download it). I hate to use a publicly-distributed program which I cannot
personally verify, and it pains me to see authors withhold their source code.
(For example, I could not have written this program without the availability
of the C language source for the MS-DOS ARC program.) However, I have seen
too many high-quality programs "hacked" to death by untalented (though perhaps
well-meaning) amateurs, and I do not wish the same fate for my work. It is
for this reason that I have placed certain caveats on the distribution of
modified versions of UNARC (see notice in UNARC.DOC). This is not meant to
discourage sincere implementors: Just drop me a line or give me a call, and I
will be happy to discuss and sanction any useful program modifications!
Bob Freed
62 Miller Road
Newton Centre, MA 02159
Telephone (617) 332-3533