home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
kermit.columbia.edu
/
kermit.columbia.edu.tar
/
kermit.columbia.edu
/
bin
/
msvv90sea.exe
/
MSBOOFLS.EXE
/
MSBOOFLS.DOC
< prev
next >
Wrap
Text File
|
1993-10-18
|
10KB
|
169 lines
THE PURPOSE AND USE OF ENCODED BINARY (BOO) FILES
Notes by R.N. Folsom 17 October 1993
[Including material taken from Columbia University Kermit Documentation]
Department of Economics
San Jose State University
San Jose, California 95192-0114
Voice: (408)649-6383, 924-5418 or -5400
Bitnet: Folsom@SJSUvm1.bitnet Internet: Folsom@SJSUvm1.SJSU.edu
Binary files may include control codes and eight bit characters. Binary
files include not only .EXE and .COM executable program files and supporting
"overlay" program files, and files compressed using ARC, PAK, ZOO, ZIP, ARJ,
LHA, or similar utilities, but also wordprocessing, spreadsheet, and database
files that are formatted using control codes and other special characters. To
transmit such binary files electronically over a network usually requires some
special network facility, such as the File Transfer Protocol (ftp). But
typically, not everyone on the network, indeed not all portions of the network,
have access to every (or the same) file transmission facility.
As an alternative, it is tempting to send a binary file as part of an
ordinary electronic mail ("E-mail") message. But unfortunately, most electronic
mail systems cannot transmit binary files. Electronic mail typically transmits
only ordinary "printable" characters: alphabetic, numeric, and punctuation
characters, --- no control codes, and only ASCII characters (described with only
seven bits). A file containing only such ordinary "printable" characters is
often called a text file.
(Control characters have a numeric decimal value less than 32; special
eight-bit characters have a numeric decimal value greater than 127. Thus a text
file's characters all have a numeric decimal value between 32 and 127,
inclusive.)
However, it is possible to electronically mail a binary file that has
been "encoded" into a text file. The encoded file is temporary: after the
transmission is complete, the encoded file is restored to its original binary
form.
For example, Columbia University routinely distributes binary files ---
particularly executable versions of its Kermit communication program --- over
electronic mail networks (and using tape formats that do not allow binary files)
by encoding the binary file into a ".boo" file, where boo is short for
bootstrap. A .boo file is a text file (no control codes and seven bit
characters) which *is* suitable for electronic transmission. Although
Columbia's documentation does not say so explicitly, the .boo file technique can
transmit as a text file not only Kermit programs but *any* binary file.
The procedure is as follows: The sender encodes the binary file into
a .boo file using the program MSBMKB.EXE, with the syntax
MSBMKB filename.ext filename.boo,
where the first filename is the input binary file and the second filename is the
output .boo file which is suitable for electronic mail. (Typically, the input
binary file extension .ext will be .COM or .EXE if it is an executable file, or
ARC, PAK, ZOO, ZIP, ARJ, LHA, or so forth if it is a compressed file.) The
sender then transmits the .boo file to its destination. At the destination, the
recipient decodes the .boo file into its original binary form using the program
MSBPCT.EXE, with the syntax
MSBPCT filename.boo
which recreates the original file filename.ext. That's all there is to it.
Of course, the sender needs MSBMKB.EXE, and the recipient needs
MSBPCT.EXE --- or some substitute. A substitute for MSBPCT.EXE is MSBPCT.BAS,
an uncompiled "BASIC" program which (because it is uncompiled) is *not* a
binary file and *can* be transmitted as is without any special treatment using
text-only electronic mail. MSBPCT.BAS can be run by anyone with access to an
MS-BASIC interpreter: for example, assuming that MSBASIC.COM and MSBPCT.BAS
both are available, the syntax
MSBASIC MSBPCT,
with approopriate responses to the prompts that result, will decode the binary
file encoded as filename.boo into the original filename.ext.
Thus if the recipient does not have MSBPCT.EXE, the solution is to
electronically mail him not only filename.boo but also MSBPCT.BAS, with
which he can decode filename.boo into the original filename.ext.
However, because MSBPCT.BAS is not compiled, it will decode *very*
slowly. If several encoded binary files will be electronically mailed to
this destination, the destination would undoubtedly appreciate having
MSBPCT.EXE. And if the destination will be mailing other binary files back,
it will need MSBMKB.EXE. Although these .EXE files could be sent on a floppy
diskette using ordinary mail, a faster solution may be to encode MSBPCT.EXE into
MSBPCT.BOO, and MSBMKB.EXE into MSBMKB.BOO, and electronically mail these
encoded files, along with MSBPCT.BAS (to decode MSBPCT.BOO AND MSBMKB.BOO) to
whomever doesn't have them.
But an implication of the MSBMKB.BWR file, enclosed, is that MSBPCT.BAS
will include extra characters in any .EXE file it creates; thus the best
procedure is to use MSBPCT.BAS to create a preliminary MSBPCT.EXE, rename it
to, say, MSBPCT01.EXE, use MSBPCT01.EXE on MSBPCT.BOO to create a final
MSBPCT.EXE, and use this final MSBPCT.EXE on MSBMKB.BOO to create MSBMKB.EXE
and on filename.boo to create filename.ext.
MINIMIZING FILE SIZE. The transmission will be more efficient the
smaller the file, so the sender may wish first to compress the original (for
example, using LHA or some similar program) before encoding it into a .boo file.
After the recipient has decoded the .boo file into the binary compressed file,
he will need the appropriate utility to uncompress the binary file, unless it is
self extracting.
This "compress first" procedure is likely to be particularly useful if the
original binary file is an executable .COM or .EXE file, a program overlay file,
or a wordprocessing or spreadsheet or database file containing control codes or
special characters. But there is no point in compressing an already compressed
file, unless the initial compression was inefficient --- and in that case, it
makes more sense to undo the original compression and then do it over again with
a more efficient compression routine. (For example, if the original file were a
Zip file, it could be UNZipped and then recompressed using LHA.)
COLUMBIA UNIVERSITY BOO FILE DOCUMENTATION
For those interested in the rationale and technique underlying .boo
files, the following edited explanation comes from Columbia's KERMIT USER
GUIDE, eighth edition [for MS-Dos Kermit version 2.32/A], c1981-1989,
Christine Gianone, editor: Chapter 5, MS-DOS Kermit, section 5.14,
"Installation of Kermit-MS", pages 135-136, distributed here under the
permission granted by the title page notice. This material seems to *not* be
in Ms. Gianone's more recent book, USING MS-DOS KERMIT, Digital Press, either
the first (1990) or second (1992) edition.
"Binary files are generally not compatible with the common labeled tape
formats (e.g. ANSI D), electronic mail, or raw downloading. . . . A common
[workaround] practice is to encode .EXE and other binary files into printable
characters, such as hexadecimal digits, for transportability. A simple 'hex'
[hexadecimal] encoding results in two characters per 8-bit binary byte, plus
CRLFs [carriage return (^M) followed by line feed (^L)], added every 80 (or
less) hex characters to allow the file to pass through card-oriented links.
[Old `IBM' cards used to contain 80 characters.] A hex file is therefore
more than twice as large as the original binary file.
"A .BOO [*not* .B00] file is a more compact, but somewhat more
complicated, encoding. Every three binary bytes (24 bits) are split up into
four 6-bit bytes with 48 (ASCII character '0') added to each, resulting in
four ASCII characters ranging from '0' (ASCII 48) to 'o' (ASCII 111), with
CRLFs added at or near 'column 76'. The resulting file size would therefore
be about 4/3 the .EXE file size. This is still quite large, so .BOO files
also compress consecutive null (zero) bytes. Up to 78 consecutive nulls are
compressed into two characters. Tilde ('~') is the null-compression lead-in,
and the following character indicates how many nulls are represented
(subtract 48 from this character's ASCII value). For instance, '~A' means 17
consecutive nulls; '~~' means 78 of them. Repeated nulls are very common in
.EXE files.
"4-for-3 encoding combined with null compression reduces the size of the
encoded file to approximately the same size as the original .EXE file, and
sometimes even smaller. The first line of a .BOO file is the name (in plain
text) of the original file. Here's what the first few lines of a typical
.BOO file look like:
MSVIBM.EXE
CEYP0Id05@0P~3oomo2Y01FWeP8@007P000040HB4001'W~28bL005\W~2JBP00722V0ZHPYP:
\8:H2]R2V0['PYP:68>H2S23V0YHPiP:Xg800;Qd~2UWD006Yg~20gl009]o~2L8000;20~~~~
~~~~~~~:R2H008TV?P761T410<H6@P40j4l6RRH0083l17@PP?'1M@?YSP20o0Ee0nUD0h3l
1WD3j0@3]0VjW03=8L?X4'N0o01h1\H6~20l>0i7n0o1]e7[@2\P0=8LH60@00Raj>04^97Xh0
. . . . "
According to the MSKermit documentation above, the MSBMKB.EXE
included here is written in assembly, and the MSBPCT.EXE included here is
written in C. But for both files, source code is available in a variety of
languages: Assembly, C, Turbo Pascal, and Fortran. Thus instead of the
BASIC program MSBPCT.BAS, any of the available source code MSBPCT.* programs
could be electronically mailed to the destination, compiled there, and used
to recreate the original filename.ext.
Columbia distributes also the MSBHEX.C program, written in C, to
produce and decode straight hex ASCII (seven bit character) files, which (as
suggested above) although large, can be transmitted via electronic mail.
For further information, contact
Kermit Distribution, Department OP
Columbia University Center for Computing Activities
612 West 115th Street
New York City, New York 10025
(212)854-3703
End of MSBOOFLS.DOC.