home *** CD-ROM | disk | FTP | other *** search
- [File; TRANSGID.TXT Revision date; April 23, 1990]
-
- A SHORT GUIDE TO NETWORKING AND FILE TRANSMISSION
- Erich Neuwirth
- Institute of Statistics and Computer Science
- University of Vienna
- Austria
- (A4422DAB@AWIUNI11.BITNET)
-
-
- GENERAL PRINCIPLES OF SENDING FILES IN ELECTRONIC NETWORKS
-
- Networking is mainly used in 2 ways:
-
- Electronic mail
- Sending (binary) files
-
- This paper tries to explain what some of the differences are and how
- one of the two transmission methods sometimes can be (mis)used for
- tasks which seem to belong to the other method.
-
- Electronic Mail
-
- Electronic mail means you are sending text from one computer site to
- another site. Letters of text are coded as numbers internally within
- computers. Problems arise from the fact that the same letter may be
- represented by different numbers on different computer systems and vice
- versa the same number may yield a different letter on different computer
- systems. Mostly we are concerned with two such representation systems
- for letters by numbers.
-
- ASCII (which is used on IBM-compatible PCs and on most non-IBM
- mainframe computers)
- EBCDIC (which is used on IBM (and compatible) mainframe computers)
-
- When you are sending text from one computer to another computer the
- computers "think" they only are sending numbers. People reading or
- writing text, on the other hand, expect characters, so some
- interpretation of the numbers producing the text must take place. Simply
- transferring the text file as a sequence of numbers (which is what it
- looks like to the computers involved) would result in an unreadable file
- on the receiving computer system. Therefore when using computers with
- different character representation systems the transmission usually
- involves a "translation process" which has the net effect of yielding a
- different "sequence of numbers" (= file) on the receiving machine, but
- this file usually gives the same letters when read as a text file.
- Usually these translation processes work quite well for letters
- (lowercase and uppercase) and digits. Quite often you will encounter
- problems with special characters like parentheses, brackets, tildes,
- carets and so on. If you are interested in merely transferring texts
- this is not much of a problem, because even if some special characters
- get scrambled it is usually not too hard to reconstruct the original
- text by normal editing. If you are setting up a new communications link
- it is a good idea to send a file containing all printable characters
- with descriptions and to test if they arrive at the other end as they
- should. At the end of this paper you will find an example of how such a
- test file could look. Of course such a file should be sent from both
- ends of the line because the scrambling process in many cases is
- asymmetrical, so different transpositions happen in the two different
- communication directions. Closely inspecting the file you receive will
- show you which characters are changed during the transmission process.
-
- Now three different events can happen:
-
- 1) You receive all the characters as they should be:
-
- Action: Don't worry, be happy
-
- 2) Some characters are not what they should be, but different characters
- still are different (even when not identical with their original)
-
- Action: Do worry, but not too much. In this case you can use the FIND
- and REPLACE function of your text editing program to restore the
- original meaning of the file. You even could program a macro in your
- text editor (if you don't know what that means just ignore this
- sentence) which automatically performs the "retranslation" process.
-
- 3) Some characters are scrambled and different characters in the source
- text file come out as identical characters at the receiving end.
-
- Action: Do worry, because this is the worst possible situation. It is
- not possible to construct an automatic "retranslation" process. As
- long as you are only concerned with text you will not have too many
- problems, because letters, digits, commas and periods usually are not
- scrambled when sent between different computer systems. If these
- characters also are scrambled the transmission process does not
- deserve the name "communication process" any more and you should talk
- to the technical people in charge of the transmission channel to take
- care of these problems.
-
- Things become more difficult when you want to send data files or program
- source files. Files of this kind usually contain special characters
- like parentheses and to reconstruct the original text of the file you
- usually have to edit the file you received by hand and to infer from the
- context the original meaning of a recognizably incorrect character.
-
- The automatic file transfer usually takes place between mainframe
- computers. So the most simple situation with text file transfer is that
- you use the editor on your mainframe computer to create your text and
- then you use the mailing program on the mainframe to send the text file
- (sometimes called e-mail or note) to its destination. At the
- destination site the receiver then can receive the file and read it with
- the help of the text editor program on the receiving mainframe computer.
-
- Sometimes the situation is more difficult. The file you want to send
- may exist on your PC, but not yet on the mainframe which is your
- entrance to the international computer networks.
-
- There is an important detail you have to take care of here.
- Usually you can write texts on a PC using two different kinds of
- programs to write with:
- Text editor programs or
- word processing programs
-
- Text files produced by text editing programs usually give no problems
- when you try to send them over a network. With most word processor files
- you will experience difficulties. But most word processing programs have
- a special way of saving your text as a "plain ASCII file". Remember to
- save your texts with this option if you intend to send them over
- networks. And if you are still considering which word processing program
- you should select for your personal use, only select a program which
- offers this option. If you do not know yourself how to verify the
- existence of such an option ask somebody more experienced than you to
- help you to find out.
-
- Now you have to find a way to transfer the file from your PC to your
- mainframe computer. For this purpose you need a file transfer program on
- the PC and on the mainframe. Different varieties of programs of this
- kind exist, but the prevalent program in an academic environment at the
- moment is KERMIT. To use KERMIT to transfer files you need the version of
- KERMIT for your PC and an installed version of KERMIT on the mainframe.
- The mainframe KERMIT is not your responsibility, you just have to
- find out from the staff of your computing center if they already have
- installed this program. If they have not done so yet you should tell them
- to do so because KERMIT is one of the very few hardware independent
- standards and it should be supported. Additionally, all KERMIT versions
- are in the Public Domain, so they do NOT COST MONEY. Your local
- computing center also should help you to find the version of KERMIT you
- need for your PC.
-
- KERMIT is a program used for 2 purposes; namely for using your PC as a
- terminal to your mainframe computer and for transferring files between
- these two systems.
-
- Now things start to be complicated (even more complicated? I hear you
- complain!).
-
- In this paper we will not deal with using KERMIT as a terminal emulator.
- There are many ways to do this and it mainly depends on which kind of
- mainframe you are using. You should try to get some help from the people
- from you local computing center who can show you exactly how to use
- KERMIT for this purpose.
-
- An additional remark: If you only want to use KERMIT as a "terminal
- emulator", which means using your PC as a terminal, you do not need
- KERMIT on the mainframe computer you are connecting to. The mainframe
- version is only needed for file transfer between the mainframe and your
- PC.
-
- Now things become really complicated! The PC KERMIT has only one way of
- transferring files. But the mainframe version usually has two ways
- (called "modes" by computer scientists). One way is text mode, the other
- way is binary mode. Text mode is used to transfer text files. E-mail
- consists of text files so it is this mode you need for downloading e-
- mail from your mainframe to your PC. Usually you need not care too
- much because practically all mainframe versions of KERMIT use text mode
- for file transfer if not told otherwise explicitly.
-
- So simply transferring a text file from your PC to the PC of somebody
- else you want to send it to can be done using the following steps:
-
- 1) Upload the text file from your PC to your mainframe with KERMIT in
- text mode
-
- 2) Use the mail facilities of your mainframe to send the text file as
- mail to the intended receiver
-
- 3) The receiver finally has to download this mail file (it still is
- text) with KERMIT in text mode to his/her PC
-
- In most cases the received file is identical with the original file.
- Letters and digits arrive as they should.
-
- The idea behind text mode of KERMIT is that the meaning of characters is
- preserved, so when transferring in text mode KERMIT automatically
- adjusts for different systems of character representations on the
- mainframe and on the PC.
-
- You might find that some of the special characters do not arrive as they
- should, but this usually is no problem when the text is only intended
- for reading and not as input to some computer program.
-
- Later we will see what you can do if you have to send a text file
- containing special characters and want to make sure that these
- characters arrive unchanged.
-
-
-
- TRANSFERRING NON-TEXT FILES
-
- It is becoming even more difficult in this section, but if you want to
- send programs and data files usable on other machines it is important
- that you understand this section.
-
- Networks can also be used to send PC programs over the network. If you
- want to send a program to somebody with the same kind of PC you have, the
- basic procedure is very much like the procedure for transferring text
- files from your PC via the network to somebody else's PC.
-
- The steps involved are:
- Uploading to a mainframe
- Using the sending facilities of the network
- Downloading from the target mainframe to the target PC
-
- The difficulties arising with program files are that programs contain
- more different symbols than text files. They especially contain lots of
- so called "nonprintable" characters. You can see this if you try to
- look at your program file with a text editor program or a word
- processing program.
-
- The simplest solution to transferring program files and like things
- (called binary files in computer terminology) is to use the binary
- transfer mode of your mainframe KERMIT to upload the program to your
- mainframe. Binary mode means that no translation whatsoever takes place
- while sending the file (remember, sending text files often involves a
- translation process). Now you can use the facilities of your mainframe
- for sending files over the network. Sending a file is not the same as
- sending a text as mail. Mailing implies that your text is put into the
- electronic equivalent of an envelope. Sending a files does not add the
- envelope, so the file being sent is (almost) identical with what you
- have on your PC. The receiver then can download the file to his/her PC
- also using the binary transfer mode of his/her mainframe KERMIT and the
- PC version of KERMIT.
-
- This file transfer quite often does not work. Some reasons may be: the
- two mainframes involved come from different manufacturers, some
- intermediate mainframe makes problems or the file is passing through
- different networks. One situation where it makes sense to try this way
- of sending binaries is when both mainframes are members of the EARN,
- BITNET or NETNORTH networks. It usually does not work when the
- mainframes belong to different networks like EARN and JANET.
-
- Now what can we do when we want to send a program or a data file from
- an EARN site to a JANET site?
-
- The main idea is translating your binary file (the one you cannot read
- because it contains nonprintable characters) into a file consisting only
- of printable characters. The most popular scheme for doing such a
- translation is the UUENCODE/UUDECODE process. It implies 2 programs,
- one usually called UUENCODE and the other one UUDECODE. UUENCODE takes
- a binary file and converts it into a file consisting only of printable
- characters. UUDECODE reverses this process and restores the original
- binary files from the encoded file. So what do you need these programs
- for?
-
- You UUENCODE the binary file and upload it to your mainframe (using the
- text mode of your mainframe KERMIT). Since it consists of printable
- characters only, you can incorporate it into a mail file you send. This
- mail file hopefully arrives at its destination and the receiver can
- download the mail from his/her mainframe to the local PC. Then it is
- mandatory to remove the "electronic envelope" from the mail file. An
- appendix will describe how an UUENCODEd file looks and how to recognize
- the parts forming the "envelope". Then the UUDECODE program can be used
- to translate the UUENCODEd version of the file back into its binary
- version.
-
- If you want to use this process you have to get hold of a copy of the
- UUENCODE and UUDECODE program. It is not possible (at least not in an
- easy way) to send this programs over networks if you have no experience
- with encoding and decoding binary files. These programs are binary files
- themselves and we cannot send unencoded binary files. So we would need
- the binary files already to translate the encoded versions into the
- binary version. It is a "who is first, the hen or the egg" kind of
- situation. There are ways of solving these problems, but the solutions
- involve a nontrivial amount of technical knowledge and also depend very
- much on the circumstances of the PCs and mainframes involved.
-
- (For the more technically inclined: we could send the source files of
- the translation programs as text files, but then we have to be sure that
- the recipient has a compiler for the programming language we are using.)
-
- So quite often the easiest way of setting up an environment where file
- transfer is possible involves sending a disk with the UUNCODE/UUDECODE
- programs to the sites involved. Once the programs are available file
- transfer can start.
-
- Now let us look what an UUENCODED file looks like:
-
- ------- the file starts directly below this line ------------
- begin 644 erich.com
- MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@
- M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A
- ;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA
- `
- end
- ------ the UUENCODED file ended just above this line -------
-
- The first line always contains the word 'begin' starting in the first
- column. The next item is a number which you can safely ignore and the
- last item is the name of the UUENCODEd file. The last line of the
- encoded file consists of the word "end" starting in the first column and
- nothing else. Some encoding programs add a line containing size
- information about the encoded file, but this is not really necessary.
- If you use the UUENCODing program on your PC the encoded version of the
- file usually has the same first part of the file name as the file being
- encoded and the file extension .UUE So encoding a program ERICH.COM
- would produce a file ERICH.UUE . This file ERICH.UUE is the one that
- should be uploaded and sent using the mail facilities of the network.
- At the receiving site the mail file sent can be downloaded to the PC.
-
- The downloaded file usually looks similar to the following example:
-
- ---------------- this line is not part of the file -----------
- Date: Sat 14 Jan 89 06:51:59-EST
- From: John R. Somebody <SOMEBODY@SOMESITE>
- Subject: File transfer demonstration
- To: The catcher in the rye <CATCHRYE@MYSITE>
-
- begin 644 erich.com
- MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@
- M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A
- ;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA
- `
- end
-
- John R. Somebody 1/14/89
- SOMEBODY@SOMESITE CATCHRYE@MYSITE 1/14/89 file transfer demonstration
-
- --------- this line does not belong to the file any more ---
-
- From this example it should be easy to see what the next step is: Every
- line above the "begin" line and every line below the "end" line has to be
- removed. The remaining file the can be decoded using UUDECODE. If no
- additional problems occurred the decoded program is identical with the
- binary program the sender wanted to send. Now for possible
- difficulties: UUENCODEd files contain special characters like brackets.
- Now when you are reading a text file you usually can recognize the
- intended special character even if it has been changed in a file
- transfer process. But it is not possible to recognize changed
- characters in an UUENCODEd file. So you have to find out if all the
- characters arrived unchanged. For this you can use the method described
- at the beginning of this paper, namely sending a file with all
- characters together with a verbal description of the characters. All
- remarks from the earlier part of the paper apply. Inspecting such a file
- closely might help you to find out which characters were changed and
- into what and with luck you can reverse this exchange process. The main
- problem with the UU scheme is that the set of characters being used
- contains special characters. So a variant of this method has been
- devised. It is call the XXENCODE/XXDECODE process. Essentially it
- functions like UUENCODE/UUDECODE, but the encoded file only contains
- letters, digits, and the plus and the minus sign. The advantage is that
- these characters usually are not changed when passed through different
- computers, so chances are higher that such a file will arrive unchanged.
- As with UUENCODE/UUDECODE you need the programs before you can start
- transmission of binary files. The XX scheme is relatively new, so
- usually it is easier to find programs for the UU scheme than for the XX
- scheme.
-
- It is important to be aware of the fact that UUENCODEd and XXENCODEd
- files are more than 30 percent larger than the original file. This is
- the price we have to pay for better transportability.
-
- There is one more important concept you should be aware of when
- transferring more than one file at a time and/or transferring big files.
- It is the concept of an archive. An archive essentially in one file
- created by pasting together and compressing one or more files. Usually
- when transferring a few files you use an archiving program which creates
- just one file out of a few files. This archived file also is smaller
- than all the "source" files together. In the archiving process you need
- two programs: the archiving program creating the archive and the
- dearchiving program reconstructing the original files. The advantages of
- using archives are:
-
- 1) It is impossible to forget a file belonging to a set of files when
- transferring copies of an archive
-
- 2) The amount of data to be transferred is smaller and therefore uses
- less disk space and less connect time for transferring them
- electronically.
-
- So if you want to send a few files belonging together it is quite common
- to create an archive, then to send the archive and then have the
- recipient reconstruct the original files by archiving. When you receive
- a file with file name extension ARC it is highly probable that it is an
- archive file. In this case the extension ARC denotes a special
- archiving (= pasting together and compressing) scheme. There is a new
- scheme around now which usually can be recognized by the file name
- extension ZIP. The 2 programs needed to be able to work with the ZIP
- scheme are PKZIP and PKUNZIP.
-
- Let us look at an example of how to use this set of programs.
- Let us assume we want to send 3 file named FILEA.TXT, FILEB.DTA and
- FILEC.COM.
-
- If we execute the command line
-
- PKZIP ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM
-
- PKZIP will create a file ARCHIVE.ZIP. This file is our archive and
- contains all 3 "source" files in a condensed form.
-
- To reconstruct the original files we execute the command line
-
- PKUNZIP ARCHIVE
-
- which will create the 3 original file FILEA.TXT, FILEB.DTA and
- FILEC.COM.
-
- There are different programs around for the ARC variant of the process.
- ARC and ARCX are a pair performing essentially the same function as
- PKZIP and PKUNZIP, PKARC and PKXARC are another pair. There also is a
- program called LHARC which performs archiving and dearchiving functions
- with just one program. The difference is that PKZIP and PKUNZIP use the
- ZIP scheme whereas ARC, ARCX, PKARC and PKXARC use the ARC scheme and
- LHARC uses the LZH scheme. All these different schemes are
- incompatible.
-
- If you want to create an LZH-archive similar to the ZIP archive of
- the previous example you can do so with the following command:
-
- LHARC A ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM
-
- This will create a file ARCHIVE.LZH.
-
- Extracting the files from the archive is done with the following
- command:
-
- LHARC E ARCHIVE
-
-
- There is a special variant of archive files, so-called self extracting
- archives. In this special case the archive and the dearchiving program
- are pasted together. The result is an executable file (usually with
- extension EXE) which, when executed, reconstructs the original files
- contained in the archive. It is not possible to recognize
- self-extracting archives from the file name extension, so you have to be
- told that a certain file is a self-extracting archive.
-
- So we have met two important concepts:
- Encoding for creating "mailable" files
- Archiving for creating smaller files
-
- It is quite common to combine these 2 processes. So if we want to send
- a set of files, first we create an archive containing all the files and
- then encode this archive. This hybrid product is sent via E-mail. The
- recipient first decodes the mail file into the archive file and then
- dearchives the archive into the original files. In this way we combine
- the advantages of compressing for reducing costs and of encoding to
- allow better transportability.
-
-
-
-
- APPENDIX A: CHARACTER TABLE
-
- Next is a list of all printable characters together with
- descriptions:
-
- Characters of the ASCII table
-
- blank
- ! exclamation mark
- " double quote
- # number sign
- $ dollar sign
- % percent sign
- & ampersand
- ' (closing) single quote
- ( left parenthesis
- ) right parenthesis
- * star
- + plus
- , comma
- - minus
- . period
- / slash
-
- digits
-
- 0123456789
-
- : colon
- ; semicolon
- < less
- = equal
- > greater
- ? question mark
- @ at-sign
-
- uppercase letters
-
- ABCDEFGHIJKLMNOPQRSTUVWXYZ
-
- [ left bracket
- \ backslash
- ] right bracket
- ^ caret
- _ underscore
- ` left single quote
-
- lowercase letters
-
- abcdefghijklmnopqrstuvwxyz
-
- { left curly brace
- : vertical bar
- } right curly bracket
- ~ tilde
-
- ASCII 127 is nonprintable
-
-
-
- APPENDIX B: TECHNICAL DETAILS OF ENCODING AND DECODING
-
- The rest of the paper is very technical, so you should read it only if
- you have some knowledge of the mathematics underlying the functioning of
- computers.
-
- How do UUECODE and UUDECODE work?
-
- For UUENCODing, the bytes forming the file are grouped in groups of
- three. Every byte is an 8-bit binary number, so every group of three
- bytes is a 24-bit binary number. This number then is split into four
- groups of 6 bits each, i.e. into 4 6-bit binary numbers. The 6-bit binary
- numbers give all decimal numbers from 0 to 63. To every such 6-bit
- number 32 (decimal) is added, giving numbers in the range from 32 to 95.
- Every number then is replaced by the ASCII character associated with
- this value. (32 becomes (a blank), 33 becomes !,... 95 becomes _ (an
- underscore)). So the translation process converts each group of 3 bytes
- into 4 printable characters.
-
- Additionally every group of 45 bytes (giving 60 characters) is grouped
- into a line in the file to be sent. Then a leading character is added to
- this line. The leading character is calculated by using the encoding
- scheme we just discussed onto the number of bytes represented by the
- line. (45+32=77, so for a line representing 45 bytes the leading
- character is M (M is ASCII character 77)). Usually the last line is
- shorter and therefore the leading character of the last line also is
- different from M. Finally a first line containing "begin", a 3 digit
- number (giving access privileges on UNIX systems and meaningless on
- other systems) and the name of the original file and a last line
- containing the word "end" is added.
-
- The decoding program then mainly has to convert each group of 4
- characters back into a group of three bytes (using the byte count given
- by the first character of each line for consistency checks).
-
- There are some problems with this scheme. We already discussed the
- possibility of special characters being scrambled. Additionally some
- "smart" mailing programs assume that trailing blanks always are
- unnecessary. Therefore they strip trailing blanks from every mail file.
- If it is only text you want to read you will not notice the difference.
- But an UUDECODing program will find out that the lines are too short
- (the first character of the line gives information about the line
- length!).
-
- There are different solutions for this problem.
-
- 1) Replace blanks by ` (the single opening quote having ASCII value
- 32+64=96)
-
- 2) Add an additional nonblank character at the end of each line
-
- 3) Make the decoding program smart enough to produce the missing
- blanks by itself.
-
- All the solutions are nonstandardized, so if you have some troubles when
- decoding you have to analyze them carefully. Solution number 2 usually
- works better than the two other solutions. So you should try to get an
- encoding program adding that additional character. Using an editor also
- makes it possible to transform the different "extended" formats of
- UUENCODEd files into one another.
-
-
-
- How do XXENCODE and XXDECODE work?
-
- XXENCODE uses the same splitting technique as the UU scheme (3 bytes
- into 4 6-digit binary numbers). Then every such number is converted
- into a character according to the following sequence:
-
- +-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
-
- So (decimal) 0 becomes +, 1 becomes -, (number) 2 becomes (character) 0,
- .... 63 becomes z.
-
- The mechanism for adding byte counts to lines is identical to the UU
- scheme with the difference the the numbers again are coded according to
- the above sequence of letter, digits, + and -. So it even is possible
- to convert UUENCODEd files into XXENCODEd files using the replace
- feature of a text editor.
-
-
-
- ACKNOWLEDGEMENTS
-
- The author wishes to thank Ted Werntz whose comments and suggestions
- helped enourmously to improve the paper.
-