home *** CD-ROM | disk | FTP | other *** search
- File: UNARC.DOC
- Subject: User Documentation for UNARC Program
- Version: 1.4
- Date: November 21, 1986
- ------------------------------------------------------------------------------
-
-
- UNARC
-
- CP/M Archive File Extraction Utility
-
-
- Copyright (C) 1986 by Robert A. Freed
- All Rights Reserved
-
-
-
- This file provides user-level documentation and operating instructions for
- UNARC version 1.4, released November 21, 1986. Refer to the notice at the end
- of this file regarding rights of use and distribution of this program.
-
- The release message file, UNARC.MSG, provides a list of all additional files
- distributed with the current UNARC release and describes the program changes
- from the previous version 1.2 release.
-
-
-
- ABSTRACT
- --------
-
- UNARC is a utility program for CP/M systems which allows the listing, typeout,
- and extraction of subfiles contained in "archive" library (*.ARC) files.
- These are commonly used for compressed file storage on remote access bulletin
- board systems which cater to users of 16-bit computers running the MS-DOS (or
- PC-DOS) operating system (e.g. IBM-PC and compatibles). UNARC provides the
- CP/M user the ability to process such files after downloading them via modem
- from these remote systems.
-
-
-
- REQUIREMENTS
- ------------
-
- UNARC requires CP/M version 2.0 or higher. The program is offered in two
- versions. The standard version, UNARC.COM, requires a Z80 (or compatible
- equivalent) processor. An alternate version, UNARCA.COM, is provided for
- older systems with 8080 or 8085 processors. Identical capabilities are
- provided by the two program versions.
-
- NOTE
-
- Although UNARCA.COM can execute on ANY system capable of
- supporting CP/M, it is larger and significantly slower than
- UNARC.COM and should be avoided by users of Z80-based systems.
-
- UNARC is written in Z80 assembly language and requires only 4K bytes of disk
- storage (5K for UNARCA). As distributed, the program requires at least a 33K
- CP/M 2.2 system size for full use of all functions (34K size for UNARCA).
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 2 of 10
- ------------------------------------------------------------------------------
-
-
- ABOUT ARC FILES
- ---------------
-
- The files which UNARC processes are the product of a utility program, ARC (a
- "freeware" product of System Enhancement Associates of Wayne, New Jersey),
- which executes on 16-bit computers running the MS-DOS (or PC-DOS) operating
- system. This program has achieved widespread popularity since it was first
- introduced in March 1985. It has become the de facto standard for file
- storage on remote access systems catering to 16-bit computer users.
-
- An archive is a group of files collected together into a single file in such a
- way that the individual files may be recovered intact. In this respect,
- archives are similar in function to libraries (*.LBR files), which have been
- commonplace on CP/M systems since 1982, when the original LU library utility
- program was introduced by Gary P. Novosielski. (However, the two file formats
- are not compatible.)
-
- The distinguishing characteristic of an ARC archive is that its component
- files are automatically compressed when they are added to the archive, so that
- the resulting file occupies a minimum amount of disk space. Of course, file
- compression techniques have also been commonplace in the CP/M world since
- 1981, when the public domain SQ and USQ "squeeze and unsqueeze" programs were
- introduced by Richard Greenlaw.
-
- The SQ/USQ programs and their numerous popular descendants utilize a well-
- known general-purpose form of data compression (Huffman coding). This
- technique, which is also utilized by the ARC program, performs well for many
- text files but often produces poor compression of binary files (e.g. object
- program .COM files). The ARC program also uses an advanced method of data
- compression, which it terms "crunching." This method (which is based on the
- Lempel/Ziv/Welch or "LZW" algorithm), performs better than "squeezing" in many
- (but not all) cases, often achieving 50% or better compression of ASCII text
- files and 15-40% compression of binary object files.
-
- ARC actually employs four different methods for storing files in an archive,
- and always chooses the one which results in the best compression for a
- particular file:
-
- (1) No compression ("unpacked"). The file is stored in its original form.
-
- (2) Run-length encoding ("packed"). Repeated sequences of 3-255 identical
- bytes are compressed into a three-byte sequence.
-
- (3) Huffman coding ("squeezed"). Each 8-bit byte (after run-length encoding)
- is encoded by a variable number of (up to 16) bits, with the bit length
- (approximately) inversely proportional to the frequency of occurence of
- the corresponding byte.
-
- (4) LZW compression ("crunched"). Variable-length strings of bytes (in
- theory, up to nearly 4000 bytes in length) are represented by a single
- 12-bit code.
-
- Note that since one of the four methods involves no compression at all, the
- resulting archive entry will never be larger than the original file.
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 3 of 10
- ------------------------------------------------------------------------------
-
-
- During its brief lifetime, the ARC program has undergone numerous revisions
- which have employed different variations on some of the above methods,
- particularly LZW compression. (The latest crunching method, introduced with
- version 5.0 of the ARC program, is superior to earlier methods, particularly
- for very short or very long files; and it almost always generates the best
- compression of all four methods for any type of file.) In order to retain
- compatibility with archives created by earlier program revisions, ARC stores a
- "version" indicator with each file in an archive. Based on this indicator,
- the latest release of the ARC program can always extract files created by
- older releases (although it will only use the latest data compression versions
- when adding new files to an archive).
-
- NOTE
-
- The current release of UNARC supports archive file versions
- generated by all releases of MS-DOS ARC through (at least) program
- version 5.12, dated February 7, 1986. It also supports archives
- generated by all releases of the following ARC-compatible MS-DOS
- programs through (at least) the indicated program versions:
-
- ARCA version 1.22, by Wayne Chin and Vernon Buerg
- PKARC version 1.2, by Phil Katz
-
- (UNARC does not recognize, but is unaffected by, the non-standard
- file commenting feature of PKARC.)
-
- Although the above discussion has emphasized the origin of archive files for
- the MS-DOS operating system, their use has recently spread to many other
- systems. Programs compatible with MS-DOS ARC have appeared for UNIX, Atari
- 68000, VAX/VMS, and TOPS-20 systems. A CP/M utility for building archive
- files will also be available in the near future.
-
- For additional information about archive files and the MS-DOS ARC utility,
- refer to the excellent documentation file, ARC.DOC, which is available from
- most remote access systems which utilize archive files. For additional
- information about the LZW algorithm (and data compression methods in general),
- refer to the article "A Technique for High-Performance Data Compression", by
- Terry A. Welch, in IEEE Computer, Vol. 17, No. 6, June 1984.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 4 of 10
- ------------------------------------------------------------------------------
-
-
- USING UNARC
- -----------
-
- The UNARC program provides a brief on-line help message, which is invoked by
- running the program with an empty command line:
-
- A>UNARC
-
- UNARC 1.4 21 Nov 86
- CP/M Archive File Extractor
-
- Usage: UNARC arcfile [d:][afn] [N]
-
- Examples:
-
- B>UNARC A:SAVE.ARC *.* ; List all files in archive SAVE on drive A
- A>UNARC SAVE ; Same as above
- A>UNARC SAVE *.DOC N ; List just .DOC files (no pauses)
- A>UNARC SAVE READ.ME ; Typeout the file READ.ME
- A>UNARC SAVE A: ; Extract all files to drive A
- A>UNARC SAVE B:*.DOC ; Extract .DOC files to drive B
- A>UNARC SAVE C:READ.ME ; Extract file READ.ME to drive C
-
- As shown by this help display, the UNARC utility provides three capabilities:
-
- (1) Listing the directory of an archive
- (2) Extracting component files from an archive
- (3) Typing the contents of a component file at the console
-
- The particular operation to be performed is determined by the form of the file
- parameter(s) in the command line, as described separately in the sections
- which follow. The following characteristics apply to all operations:
-
- The standard CP/M terminal control characters, CTRL-S (to pause console
- output) and CTRL-C (to abort the program), may be used at any time. CTRL-K
- may also be used as an alternate for CTRL-C. Printer output to the CP/M list
- device may be obtained by typing CTRL-P at CCP command level before executing
- UNARC.
-
- In addition, by default UNARC will pause after every 23 lines of console
- output. At this time, the message "[more]" will appear at the bottom of the
- console screen. The listing may be resumed by typing any key (other than
- CTRL-S, CTRL-C, or CTRL-K, which will function as described above). If the
- space bar is used, one more line of console output will be displayed (over-
- writing the "[more]" message) and the program will again pause. If any other
- key is typed (e.g. RETURN), another 23 lines of output will be allowed to
- scroll onto the screen before the next pause. (LINE FEED may be used to
- prevent overprinting of the "[more]" line, e.g. for hard-copy terminals.)
-
- If continuous display is desired, this automatic pause feature may be disabled
- by specifying "N" at the end of the command line. The "N" must be the last
-
-
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 5 of 10
- ------------------------------------------------------------------------------
-
-
- command line character, and it must be preceded by a space. Also, there must
- be two file parameters on the command line. E.g., note the difference between
- the following commands:
-
- A>UNARC SAVE N ; Typeout the file N. in archive SAVE.ARC
- A>UNARC SAVE *.* N ; List all files in SAVE.ARC with no pauses
-
- The first command line parameter must specify the name of an archive file. A
- drive name and filetype are optional. The filetype, if omitted, defaults to
- "ARC" or, if no such file exists, the alternate default "ARK" is used.
-
-
-
- LISTING AN ARCHIVE DIRECTORY
- ----------------------------
-
- By default, UNARC produces a detailed console listing of the component files
- in an archive. (In fact, there is no way to suppress this listing; it is
- generated during file extraction and typeout operations as well.) If only the
- archive file name appears on the command line, UNARC will generate a complete
- directory of all component files in the specified archive file. Otherwise,
- the second command line parameter may be used to select a particular file to
- be listed (or group of files, if it contains the ambiguous file specification
- characters "*" or "?"). If no disk drive name is provided for the second
- parameter, and this parameter specifies a group of files, the directory
- listing is the only output generated by the program.
-
- A sample directory listing is illustrated here:
-
- A>UNARC CODES
-
- Archive File = CODES.ARC
-
- Name Length Disk Stowage Ver Stored Saved Date Time CRC
- ============ ======= ==== ======== === ======= ===== ========= ====== ====
- ABLE .DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0
- BRAVO .COM 17152 17k Squeezed 4 14750 14% 2 May 86 4:11p 8CBD
- CHARLIE .TXT 234 1k Packed 3 99 58% 2 May 86 4:11p 8927
- ==== ======= ==== ======= ===
- Total 3 41706 42k 26626 36%
-
- This listing is equivalent to the "verbose" listing of the MS-DOS ARC program
- (with the addition of the "Disk" and "Ver" fields, which are unique to UNARC).
- The listing requires a 78-column terminal width; there is currently no "short"
- listing format.
-
- "Name" is the file name which will be generated if the file is extracted by
- UNARC on a CP/M system. (This is not necessarily the same as the name
- recorded in the archive file. Although CP/M and MS-DOS file naming
- conventions are identical, two conversions are made to guarantee file name
- validity under CP/M: Lower-case letters are converted to upper-case, and
- non-printing characters are converted to dollar signs, "$".) Archive entries
- are usually maintained (and hence listed) in alphabetic name order.
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 6 of 10
- ------------------------------------------------------------------------------
-
-
- "Length" is the uncompressed file length, i.e. the number of bytes the file
- will occupy if extracted to disk, exclusive of any additional length imposed
- by the CP/M file system. Note that MS-DOS permits files of arbitrary lengths
- (unlike CP/M which restricts all files to a multiple of 128 bytes).
-
- "Disk" is the actual amount of disk space required to extract the file to a
- CP/M disk, expressed as a multiple of 1K (1024) bytes. Note that this number
- is dependent on the disk data allocation block size. (CP/M permits various
- block sizes, ranging from 1K to 16K bytes. Typical sizes are 1K for single-
- density floppy disks, 2K for double-density floppies, and 4K for hard disks,
- although these values are quite system-dependent.) In the absence of an
- explicit output drive name, UNARC uses the block size of the default
- (currently "logged") disk drive (i.e. the drive which appears in the CCP
- prompt).
-
- "Stowage" is the compression method used, specified as "Unpacked", "Packed",
- "Squeezed", "Crunched", or "Unknown!". If the stowage type "Unknown!"
- appears, it most likely indicates (if not a faulty archive file) a newer
- release of the MS-DOS ARC program that supports a new compression method (or a
- new variation of an existing method). In this case, a corresponding new
- release of UNARC will be required to extract the file.
-
- "Ver" further identifies the version of compression used. Currently, UNARC
- supports versions 1-8: unpacked files can have versions 1 or 2; packed files,
- version 3; squeezed files, version 4; and crunched files, versions 5-8. The
- highest version number associated with each compression method is the one
- generated by the most recent release of the MS-DOS ARC program.
-
- "Stored" is the compressed file length, i.e. the number of bytes occupied by
- the file in the archive. (This does not include the overhead associated with
- the directory information itself, which adds an additional 29 bytes to the
- size of each component file.)
-
- "Saved" is the percentage of the original file length which was saved by
- compression; i.e., higher values indicate better compression. (The MS-DOS ARC
- documentation refers to this as the "stowage factor.") The value shown on the
- totals line applies to the archive as a whole, not including the directory
- overhead.
-
- "Date" and "Time" refer to the last file modification, as of the time it was
- added to the archive. (Date and time stamping is, of course, one of the nice
- features of MS-DOS which is lacking in standard CP/M 2.2.)
-
- "CRC" is an internal 16-bit cyclic redundancy check value which the MS-DOS ARC
- program computes when it adds a file to an archive (expressed in hexadecimal).
- As a test of file validity, UNARC re-computes this value when it extracts a
- file (see below). Note that this value is calculated by a different method
- than that used by either of the two popular public domain programs, CRCK and
- CHEK. (It is however mathematically valid as a quite reliable error-detection
- mechanism.) This value is shown in the listing for completeness only.
-
- The "Total" line is displayed only if multiple files appear in the listing.
-
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 7 of 10
- ------------------------------------------------------------------------------
-
-
- EXTRACTING FILES FROM AN ARCHIVE
- --------------------------------
-
- If the second command line parameter contains a disk drive name, UNARC will
- extract the selected file(s) from the archive to CP/M file(s) on the indicated
- disk drive. If only a drive name appears, all component files of the archive
- will be extracted. The following illustrates a sample archive directory
- listing as generated during a file extraction operation:
-
-
- A>UNARC CODES B:
-
- Archive File = CODES.ARC
- Output Drive = B:
-
- Name Length Disk Stowage Ver Stored Saved Date Time CRC
- ============ ======= ==== ======== === ======= ===== ========= ====== ====
- ABLE .DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0
- Replace existing output file (y/n)? Y
- BRAVO .COM 17152 18k Squeezed 4 14740 14% 2 May 86 4:11p 8CBD
- Warning: Extracted file has incorrect CRC
- Warning: Extracted file has incorrect length
- Warning: Bad archive file header, bytes skipped = 10
- CHARLIE .TXT 234 2k Packed 3 99 58% 2 May 86 4:11p 8927
- ==== ======= ==== ======= ===
- Total 3 41706 44k 26616 36%
-
-
- The above listing also illustrates several warning messages which may occur
- when extracting files from an archive.
-
- The message "Replace existing output file (y/n)?" appears if a file of the
- same name already exists on the output drive. The user must answer "Y" (or
- "y") to allow the extraction to proceed (in which case, the existing file is
- unceremoniously deleted). Any other response will cause UNARC to preserve the
- existing file, bypass the extraction operation for the current file, and
- (except for a CTRL-C response) skip to the next file to be extracted (if any).
-
- The first two warning messages illustrated above are provided as a check on
- the validity of the extracted file. These indicate that either the cyclic
- redundancy check (CRC) value computed by UNARC, or the resulting extracted
- file length, does not match the corresponding value recorded in the archive
- when the original file was added to it. The final warning message occurs if
- UNARC fails to detect the proper format for the start of a new subfile, but
- can recover by skipping a certain number of bytes in the archive file. (If
- the recovery attempt fails, UNARC aborts with the message "Invalid archive
- file format.") The appearance of any of these messages most likely indicates
- that the file data has been corrupted in some way (e.g. during modem
- transmission from a remote system).
-
- Note that if the original (i.e. MS-DOS) file length was not an exact multiple
- of 128 bytes (as required by CP/M), UNARC will pad the final record of the
- extracted file with hex "1A" (ASCII CTRL-Z) bytes. This provides the correct
- end-of-file termination for text files, according to CP/M conventions.
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 8 of 10
- ------------------------------------------------------------------------------
-
-
- Also, the disk space shown in the archive directory listing will be correct
- for the specified disk drive. (In the above examples, drive A: has a 1K data
- allocation block size while drive B: has a 2K block size, which accounts for
- the differences in the two listings.) In order to determine the exact disk
- space requirements in advance of a file extraction operation, the user may
- first "log into" the desired output drive (i.e. select it as the default
- drive), and run UNARC to obtain a directory listing only. (This is a
- consideration only on systems with mixed disk drive types.)
-
- A file extraction operation may be aborted at any time by entering CTRL-C from
- the console. In this case, any partial output file will remain on disk and
- should be deleted manually following the program abort. (Any existing file of
- the same name will have already been deleted, however.)
-
-
-
- TYPING OUT A FILE IN AN ARCHIVE
- -------------------------------
-
- A console typeout of the contents of a single component file in an archive may
- be requested by specifying a non-ambiguous file name (and no disk drive name)
- in the second command line parameter. For example:
-
-
- A>UNARC CODES ABLE.DOC
-
- Archive File = CODES.ARC
-
- Name Length Disk Stowage Ver Stored Saved Date Time CRC
- ============ ======= ==== ======== === ======= ===== ========= ====== ====
- ABLE .DOC 24320 24k Crunched 8 11777 52% 30 Apr 86 10:50a 42C0
- -------------------------------------------------------------------------------
- This is file ABLE.DOC, contained within the archive CODES.ARC. Typeout will
- proceed until the end of this file or may be aborted by CTRL-C.....
-
-
- The specified file is assumed to contain valid ASCII text data. In
- particular, all bytes are masked to seven bits, and all ASCII control
- characters are ignored except for HT (horizontal tab, which is expanded to
- blanks with assumed tab stops at every eighth column), LF, VT or FF (line
- feed, vertical tab or form feed, which generate a new typeout line), and SUB
- (CTRL-Z, which by CP/M convention indicates end-of-file and terminates the
- typeout). Note that BS (backspace) and CR (carriage return) are ignored, so
- that text will not be obscured within files which utilize these for over-
- printing (i.e. when directed to a printer).
-
- The following filetypes, which are usually associated with binary (non-text)
- data, are specifically excluded from typeout operations: COM, EXE, OBJ, OV?,
- REL, ?RL, INT, SYS, BAD, LBR, ARC, ARK, ?Q?, and ?Z?. If one of these types
- is specified, only the directory information for the requested file is listed.
-
-
-
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 9 of 10
- ------------------------------------------------------------------------------
-
-
- PROGRAM OPTIONS
- ---------------
-
- UNARC provides several options which may be used to tailor the program for
- specific non-universal requirements. Many of these are intended for RCP/M
- (Remote CP/M) system operators, to allow generation of a secure version of
- UNARC which can be used by remote callers for purposes of archive directory
- listing and/or file typeout only (but not file extraction). Others are
- provided for specialized non-standard CP/M systems and need not concern the
- majority of users running CP/M 2.2, CP/M 3.0 (CP/M Plus), or ZCPR3/ZRDOS
- systems. Additional options provide user preference features (such as the
- number of screen lines between console output pauses, or the list of filetypes
- excluded from typeout operations).
-
- All of these options are described in UNARCOVL.ASM, an assembly language
- source file that can be edited and assembled to generate a HEX-format overlay
- for easy patching of the UNARC.COM or UNARCA.COM program files. Complete
- details are provided for technically-oriented users in UNARCOVL.ASM. However,
- the default options in the distributed program files are suitable for the
- majority of users with standard CP/M operating systems.
-
-
-
- AUTHOR'S NOTE
- -------------
-
- I undertook writing the UNARC program to satisfy my curiosity about software
- developments in the MS-DOS/PC-DOS world. At the time I began work on UNARC,
- the MS-DOS ARC program had been in existence for over a year and had achieved
- widespread popularity and acceptance in the 16-bit community. Unfortunately,
- the lack of a compatible equivalent for CP/M systems rendered a large amount
- of public domain software inaccessible to 8-bit users such as myself. (Note
- that 16-bit software can indeed be of interest to users of 8-bit systems, e.g.
- Pascal and C language programs.)
-
- Also, an increasing number of RCP/M systems now cater to both 8-bit and 16-bit
- users. Since the release of UNARC 1.0 (May 3, 1986), I have been encouraged
- to see that the program has found a welcome home on many such systems.
- Special thanks are due to Irv Hoff and Norman Beeler for providing archive
- file support in the KMD20 and LUX52 series of programs, respectively. With
- the increasing popularity of .ARC files on many different computer systems, I
- believe that continued such support of this compression format is both
- desirable and inevitable for CP/M systems. At the time of this writing I am
- near completion of NOAH, a companion program to UNARC which will allow CP/M
- users to generate ARC-compatible files.
-
- Bob Freed
- November 21, 1986
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- File: UNARC.DOC Page 10 of 10
- ------------------------------------------------------------------------------
-
-
-
- NOTICE
-
- The UNARC program and its associated documentation is the copy-
- righted property of its author -- it is NOT in the public domain.
- HOWEVER... Free use, distribution, and modification of this
- program is permitted (and encouraged), subject to the following
- conditions:
-
- (1) Such use or distribution must be for non-profit purposes only.
- (2) The author's copyright notice may not be altered or removed.
- (3) Modifications to this program or its documentation files may
- not be distributed without notification of and approval by
- the author.
-
- No fee is requested or expected for the use and distribution of
- this program subject to the above conditions. The author reserves
- the right to modify these conditions for any future revisions of
- this program. Questions, comments, suggestions, commercial
- inquiries, and bug reports or fixes are welcomed by the author:
-
- Robert A. Freed
- 62 Miller Road
- Newton Centre, MA 02159
- Telephone (617) 332-3533
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-