home *** CD-ROM | disk | FTP | other *** search
- This file contains a write-up of the more technical aspects of
- uuencoding and uudecoding. First read the file UUSER.TXT, then read
- this for more details.
-
- Documentation for UUENCODE/DECODE 95 (v36)
-
- UU-encoding is a way to code a file which may contain any characters
- into a standard character set that can be reliably sent over diverse
- networks.
-
-
- THE CHARACTER ENCODING:
-
- The basic scheme is to break groups of 3 eight bit characters (24 bits)
- into 4 six bit characters and then add 32 (a space) to each six bit
- character which maps it into a readily transmittable character. Another
- way of phrasing this is to say that the encoded 6 bit characters are
- mapped into the set:
- `!"#$%&'()*+,-./012356789:;<=>?@ABC...XYZ[\]^_
- for transmission over communications lines.
-
- As some transmission mechanisms compress or remove spaces, spaces are
- changed into back-quote characters (a 96). (A better scheme might be to
- use a bias of 33 so the space is not created, but this is not done.)
-
- A newer, less popular, encoding method, called XX-encoding uses the set:
- +-01..89ABC...XYZabc...xyz
-
- In my opinion, XX-encoding is superior to UU-encoding because it uses
- more "normal" characters that are less likely to get corrupted. In fact
- several of the special characters in the UU set do not get through an
- EBCDIC to ASCII translation correctly. Conversely, an advantage of the
- UU set is that it does not use lower case characters. Now-a-days both
- upper and lower case are sent with no problems; maybe in the
- communications dark ages, there was a problem with lower case.
-
- This "UU" encode/decode pair can handle either XX or UU encoding. The
- encode program defaults to creating a UU encoded file; but can be run
- with a "-x" option to create an XX encoding.
-
- The decode program defaults to autodetect. However the program can get
- confused by comment lines preceding the actual encoded data. The decode
- mode can be forced to UU or XX with the "-u" or "-x" parameter.
-
- Another option is for the character mapping table to be inserted at the
- front of the file. The format for this is discussed later. The table
- parameters are detected and used by this decode program. (A table will
- override the "-x" or "-u" parameters.) The encode program can be run
- with a "-t" option which tells it to put the table into the encoded
- file.
-
- A third encode mapping is the one used by Brad Templeton's ABE program.
- This is not handled by these programs as the check and control
- information surrounding the actual encoded data is in a different form.
-
- From a theoretical view, this encoding is breaking down 24 bits modulo
- 64. Note that 64**3 is = 2**24. The result is 24 bits in for 32 bits
- out, a 33% size increase. Note that 85**5 > 2**32. Also note that
- there are 94 transmittable ASCII characters (from 0x21 through 0x7e).
- Thus modulo 85 encoding (the atob encoder) transforms 32 bits to 5 ASCII
- chars or 40 bits for a 25% size increase.
-
- The trade off in the module 85 encoding is that many communications
- systems do not reliably transmit 85 ASCII characters. The tilde, carat,
- brackets, and sometimes upper or lower case frequently get corrupted.
-
- There are two other popular encoding techniques. One is BinHex used on
- Apple Computers. The current version is BinHex 4.0. BinHex uses
- another mapping into 64 characters. The first encoded line in a BinHex
- file is an encoded structure that contains the file name, size,
- checksum, date and time The remaining lines are encoded data.
-
- The other technique that I have seen is BinMail used on Unisys A-series.
-
-
- COMPOSING A LINE OF ENCODED CHARACTERS:
-
- A small number of eight bit characters are encoded into a single line
- and a count is put at the start of the line. (Most lines in an encoded
- file have 45 encoded characters. When you look at a UU-encoded file
- note that most lines start with the letter "M". "M" is decimal 77
- which, minus the 32 bias, is 45.)
-
- BinHex does not use this count character, every encoded line contains 64
- characters. Except the last is limited by the size obtained from the
- first line.
-
- This encode program optionally puts a check character at the end of each
- line. The check is the sum of all the encoded characters, before adding
- the mapping, modulo 64.
-
- Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it
- uses the sum of the original, not the encoded characters. This decode
- program accepts either form of line check character.
-
- In previous versions (4.13 and lower) the line check characters was
- generated by default by this encode program and was suppressed with the
- "-L" option. One reason to suppress them is if they will be decoded by
- one of the old Horton decoders. Most decoders either accept this form
- of check or simply stop looking after the line length is exhausted. My
- feelings are mixed about the line checksums because errors of this type
- essentially never occur.
-
- Given modern, error-free communications systems and the CRC checks on
- the entire file (see below) I have made the default for uuencoding to
- have NO line level check characters effective version 4.21. The "-L"
- option on uuencode turns on generation of line checksums. If you have a
- really bad communications system and you want to isolate a problem, turn
- them on.
-
- Uudecode automatically checks for the presence of line checksums; so the
- default for uudecode is to leave line level checks on; if there are some
- problems the "-L" option for uudecode turns them off. Sometimes there
- is junk at the end of the line which causes spurious line checksum
- errors.
-
- I have encountered various other ways that encoders end lines. One
- encoder put an "M" at both the start and end of the line. Another used
- a line count character. This decode program checks all of these. I
- would not be surprised if some encoder out there ends lines with
- sequential astrological symbols. If you encounter some other weird form
- of encoded file, let me know. (The -L option turns line level checking
- off.)
-
-
- PACKAGING THE LINES INTO FILES:
-
- The lines of encoded data can be preceded by comments and by network
- addressing information. The encoded data is directly preceded by a line
- containing:
-
- begin <file-mode> <file-name>
-
- This line is created by the encoding program. The decode program scans
- the file looking for "begin" in column 1. The following line is the
- encoded data.
-
- Some encoders put file time and date information on the begin line:
-
- begin <file-mode> <file-name> <date> <time>
-
- My UUdecoder will accept this form of begin line, but does not use the
- time and date information.
-
- The final end of encoded data is an encoded line with zero encoded
- characters (a back-quote), followed by a line containing "end".
-
- For integrity checking, some encode programs insert checksums for the
- entire file. This decode tries to check for all known types of file
- checksums. This is discussed in more detail below.
-
- This encode program puts a header line, containing the section number
- and file name, in front of every section:
-
- "section <number> of uuencode of file <file name>"
-
- At the end of a section the encode program inserts a line containing
- checksum and file size information. This can be suppressed with the
- "-c" option.
-
- Other encoders use a variety of section lines:
-
- The format of the Archive-name line is:
- "Archive-name: <name>/part<number>"
- for example:
- Archive-name: diskutl/part02
-
- The format of the part line is:
- <name> part<number>/<max-number>
- for example:
- diskutl part02/03
-
- WinCode uses:
- [ section: 1/2 file: diskutl.exe . . . .
-
- enuu uses:
- section 001/002 diskutl.exe . . . .
-
- This program checks for consistency of these names and numbers as of
- release 5.0. The problem is distinguishing random text from valid
- lines.
-
- For each line that uudecode thinks is a "section" line, tests are made
- to validate the current section number, the maximum section number and
- the file name. The program is conservative and may sometimes
- erroneously give an "invalid section line" type of error. Inspect the
- file; if uudecode made a mistake, edit or delete the indicated line; and
- continue. If the problem appears to be a uudecode problem, not just
- some random comment lines that caused a one time problem, please contact
- me.
-
- All the "integrity fields" (the checksum, the line check, and the
- section header line) are inserted in a way that they will be ignored by
- other UUDECODE programs that cannot handle them. This decode program
- does not require any of these fields; if not present, integrity checking
- is not done. This program pair is 100% downward compatible!
-
-
- FILE NAMES:
-
- See UUSER.TXT for a discussion of file naming conventions.
-
-
-
- DECODE and VALID LINES:
-
- The below information is to help you solve infrequent problems when
- decoding files. Normally you do not have to be concerned with any of
- this stuff.
-
- UUdecode sometimes get confused and thinks header lines are encoded
- data. Sometimes this is because the separator line between sections
- (the "cut" line) is indistinguishable from valid decodable data. (An
- example is the line "---" used as a cut line on several DOS BBS
- systems.) You can tell UUdecode that a specific line is a cut line and
- not a decodable line with the -Z option:
-
- uudecode -Z "---" myfile
-
- Other times there is not a cut line between file sections or there is
- some other problem with the file. If so, edit the file and try again.
-
- When decode encounters a premature end-of-file or some data which is not
- decodable, it assumes the end of a file section. Decode is conservative
- when it encounters data it cannot decode (better an error than a bad
- file).
-
- Usually this undecodable data is valid "trailer" data put at the end of
- file for data transmission purposes. However the file may be bad. So
- decode continues to scan the file, if decode then encounters a line
- which is decodable it assumes the file is bad.
-
- When decode encounters a valid end of file section it must get the next
- file in sequence. If the file name ends with a number, decode tries the
- next file name in numeric sequence. Otherwise decode asks for a file
- name. If this file does not contain decodable data, decode asks for
- another file to try.
-
- If multiple sections are saved in a single file, each section must have
- some type of section line for validation. Decode builds a table of
- section information so it can go back and reread if sections are not
- saved in order.
-
- The "SECTION" line inserted by the UUENCODE program is used for validity
- checking only. If not present, decode will accept any file containing
- encoded lines.
-
-
- OTHER FILE FORMS:
-
- Sometimes files are wrapped in shell archives that automatically check
- sequencing and call uudecode for you on Unix systems. If you prefer to
- download the raw files to MS-DOS, uudecode 5.33 will filter simple shell
- scripts, that use the Unix 'sed' command, and decode the data
- automatically.
-
- There is one more rarely used feature of ENCODE: many input files can be
- encoded into one large encode file. (I have never seen this used.) The
- end of an input file is a zero length encoded line, followed by another
- "begin" line instead of by an "end" line. This decode program will
- decode this sort of file; but the encode will only handle a single input
- file.
-
-
- FILE LEVEL CHECKSUMS:
-
- There are three types of file checksums found in encoded files:
-
- UUENCODE 2.14 and below inserted lines that gave the section
- size and the original input file size. This is supplanted
- by a better technique in 3.07; but 3.07 UUDECODE still checks
- and validates the old form
-
- UUENCODE 3.07 and Rahul Dhesi's encode scripts compute a Unix
- "sum -r" on the encoded sections and on the original input file.
- A difference is that UUENCODE 3.07 puts the expected "sum -r"
- and size at the end of a file while Rahul''s scripts put them at
- beginning. This UUDECODE analyzes either.
-
- The third form of checksum is a full 32 bit CRC that Rahul's
- script inserts. My code does not handle this. Rahul has written
- the BRIK program to check them. If there is a "sum -r" failure,
- BRIK analysis should be considered.
-
- Several encoders put in a line containing just the original file size.
- My uudecode checks these.
-
-
- TABLE LINES:
-
- Some encoded files but the mapping used at the front of the encoded
- file, just in front of the "begin" line. The format for this is:
-
- table
- first 32 characters
- second 32 characters
-
- All this starts in column 1.
-
- If decode encounters a table specification, it uses it and overrides any
- command line parameters. Encode will create the table lines if run with
- the "-t" parameter.
-
-
- COMPLETION CODES:
-
- On successful completion, UUDECODE sets ERRORLEVEL to 0. If there are
- any problems, ERRORLEVEL is set to non-zero.
-
- The purpose of "-e" is to automatically run an un-archiver (like PKZIP
- or ARJ) when UUDECODE successfully completes. If the "-e" option is
- given, UUDECODE calls BAT file UNARCUUE on successful completion;
- UNARCUUE is passed five parameters:
-
- the filename decoded (with path but no extension),
- the file extension,
- the input file name (with path but no extension),
- the input file extension that is used,
- the number of sections.
-
- Normally the file extension tells which un-archiver to call. The
- UNARCUUE BAT file, can test these parameters and call the necessary
- un-archiver. If UNARCUUE is called, the return code from UUDECODE is
- the return code passed back from UNARCUUE. Note: one user had a problem
- in that the routines called by UNARCUUE set the errorlevel to 1 which
- was passed back to be the return code from UUDECODE.
-
- The "-E" (upper case) option is like "-e" but you can supply the name of
- the file to execute.
-
- If you are running an automated system, the -y or -n options which force
- a "Y" or "N" response to messges, may be helpful.
-
- BUGS and PROBLEMS:
-
- I try to make this program as good as possible. If you find a problem,
- please send me a diskette. You can mail a 3 1/2" diskette in a regular
- envelope, with no special protection, with a single stamp.
-
-
- CONCLUSION:
-
- This works well for me. On UNIX I find a program I want in 3 sections:
- PRG1, PRG2, PRG3.
- I copy the three files down to my PC as PRG1.UUE, PRG2.UUE, and PRG3.UUE. I
- then just enter UUDECODE PRG and the thing decodes.
-
-
- Done privately and not for profit (freeware). Suggestions appreciated.
- The programs are written in Turbo Pascal 5.5 with about 5% TASM for speed.
- The source is not public domain. If included in your for profit product,
- please contact me.
-
-
- Richard Marks
- 931 Sulgrave Lane
- Bryn Mawr, PA 19010
-
- Copyright (c) Richard E. Marks, Bryn Mawr, PA, 1992, 1993, 1994, 1995
-
- Also Copyright program name of
- UUENCODE 95, UUDECODE 95, UUCODE 95, UUWIN 95,
- UUENCODE 96, UUDECODE 96, UUCODE 96, UUWIN 96,
- UUENCODE 97, UUDECODE 97, UUCODE 97, UUWIN 97,
- UUENCODE 98, UUDECODE 98, UUCODE 98, UUWIN 98,
- UUENCODE 99, UUDECODE 99, UUCODE 99, UUWIN 99
- Copyright (c) by Richard E. Marks, Bryn Mawr, PA 1995
-
- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-
- Change log (started with 5.13):
- 95 (v36)
- Changed name to reflect year.
- -N option - always reply "NO" for automated bat files.
- handled nucode headers (not checksums)
- Fix problems involving leading blanks and GEnie (GEnie randomly inserts
- leading blanks on lines, sometimes leading blanks are valid, this requires
- a sophisticated analysis).
- 5.33
- Fix autodetect of xx encode format introduced in 5.32
- Fix problem with part line at very end of file
- 5.32
- Anaylsis of size lines used by other encoders. Analysis of uupost's
- END lines.
- Better 'sed' handling to work with Sun posting routines
- Validation of file name on 'begin' line
-
- 5.28
- split ot .001, .002.
- Better analysis of begin line - line following must be valid encoded data.
- Improved (again) part line analysis.
- Better file name parameter analysis.
-
- 5.25
- Fix memory overflow bug in 5.24.
-
- 5.24
- Ignore pad characters inserted by some comm systems.
-
- 5.22 & .23
- Improve analysis of "part" lines to accept form used in bin.pictures
- group and by mail server on Garbo.
- Fixes problem with encoded files that use blanks rather than backquotes.
-
- 5.20
- Z command line option to specify a "cut" line between multiple sections.
- Needed if cut line is a valid decodable line (of low probability) which
- the user chooses to be interpreted as a "cut" line. Plus other
- improvements in detecting end of section.
-
- 5.16
- Encode will split to a minimum of 75 (was 150) lines.
- Passes more info to UNARCUUE
-
- 5.15
- Fixes a problem with trailing blanks on lines.
-
- 5.14:
- Fixes a minor bug in which a redundant error message was produced when
- decoding single section files.
-
-
- 5.13 VERSUS 5.10:
-
- 5.13 decode has a command line option that disables all interactive
- responses to make it more useable from some BBS systems. Examine the
- "y" and "Y" options.
-
- 5.13 can increment the number on files up to five digits. The prior
- limit was two digits. You can now save files with names based on news
- article numbers.
-
- 5.13 can decode files encoded into 100 or more parts. A restriction is
- that if there are more than 100 parts, the sections MUST be in order.