home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Columbia Kermit
/
kermit.zip
/
archives
/
protocol.tar.gz
/
protocol.tar
/
pepmnt.nov1997things.txt
< prev
next >
Wrap
Text File
|
1998-01-13
|
17KB
|
378 lines
25-Nov-97 23:13:44-GMT,17151;000000000005
Return-Path: <JCHBN@CUVMB.CC.COLUMBIA.EDU>
Received: from CUVMB.CC.COLUMBIA.EDU (cuvmb.cc.columbia.edu [128.59.40.129])
by watsun.cc.columbia.edu (8.8.5/8.8.5) with SMTP id SAA26165
for <fdc@watsun.CC.COLUMBIA.EDU>; Tue, 25 Nov 1997 18:13:44 -0500 (EST)
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP V2R1)
with BSMTP id 8461; Tue, 25 Nov 97 18:14:36 EST
Date: Tue, 25 Nov 1997 18:13 EST
From: "John F. Chandler" <JCHBN@CUVMB.CC.COLUMBIA.EDU>
To: Frank da Cruz <fdc@watsun.CC.COLUMBIA.EDU>
Subject: Re: New stuff draft 2
In-reply-to: fdc@watsun.cc.columbia.edu message
<CMM.0.90.4.878519477.fdc@watsun.cc.columbia.edu> of Sun, 2 Nov 97 20:11:17
EST
Message-id: <JCHBN.971125.181315.RC0@CUVMB.CC.COLUMBIA.EDU>
Frank,
Here are some reactions, prefixed with ">>>".
John
---------
SOME MINOR ADDITIONS TO THE KERMIT PROTOCOL
D R A F T # 2
Sun Nov 2 13:49:46 1997
1. DIRECTORY OPERATIONS
The aim of these changes is to allow the exchange of directory trees or file
systems. It is assumed that all file systems are either tree-structured or
flat. Hardly any protocol changes are needed, mainly just agreements on data
formats. Most of the features are implemented outside the protocol: recursive
SEND commands, automatic directory creation during RECEIVE commands, etc.
1.0. Directory Name Format Selection
(This is simplified considerably in Draft 2 after I implemented it in C-K...)
SET FILE NAMES { CONVERTED, LITERAL }
Now applies to pathnames too. For pathnames, CONVERTED means that the
native directory notation is converted to standard format when sending,
and the standard format is assumed when receiving.
The related command:
SET { SEND, RECEIVE } PATHNAMES { OFF, ABSOLUTE, RELATIVE }
then applies as usual. PATHNAMES are OFF by default, in which case nothing is
different. When SEND PATHNAMES is ABSOLUTE or RELATIVE, then the FILE NAMES
setting is applied to them just as it is to the rest of the filename.
When receiving files, a Kermit program should be expected to understand its
own native format and the native one; it cannot be expected to understand a
foreign directory notation. Thus SET FILE NAMES CONVERTED should be used
between unlike systems.
Note: There is no reason why there can't be separate SET FILE NAMES commands
and settings for each direction.
Note 2: We haven't said anything that affects the protocol yet, that comes
in the next section.
1.1. Kermit Protocol Directory Name Representation
UNIX notation shall be used for directories when FILE NAMES are CONVERTED.
Forward slash (/) is the directory separator. If a / appears as a literal
character in a directory name, then it should be written as //. A file or
directory specification beginning with / is absolute, otherise it is relative.
This is more or less the same scheme used by Info-ZIP and so it is widely
proven in the real world.
Note: I have this working now in VMS as well as UNIX, but so far just in
the sender -- will do the receiver tomorrow I hope. Example from today's
actual logs:
FILENAMES SEND PATHNAMES UNIX Result VMS Result
CONVERTED OFF OOFA.TXT OOFA.TXT
CONVERTED RELATIVE BLAH/OOFA.TXT BLAH/OOFA.TXT
CONVERTED ABSOLUTE /W/FDC/TMP/BLAH/OOFA.TXT /FDC/BLAH/OOFA.TXT
LITERAL OFF oofa.txt OOFA.TXT
LITERAL RELATIVE blah/oofa.txt [.BLAH]OOFA.TXT
LITERAL ABSOLUTE /w/fdc/tmp/blah/oofa.txt [FDC.BLAH]OOFA.TXT
1.2. Client/Server Directory Operations
REMOTE MKDIR <name>
G packet function code "m" (yes, lowercase). Creates the specified
directory. Names are as in 1.1 (absolute or relative).
REMOTE RMDIR <name>
G packet function code "r". Removes specified directory. Name can
be wild.
REMOTE RMDIR /RECURSIVE <name>
G packet function code "s". Removes specified directory tree and all
its contents. Like rm -Rf in UNIX. Name can be wild.
1.3. GET /RECURSIVE
New packet types:
V for GET /RECURSIVE.
Tells server to send all files that match the given specification in
the current or given directory tree. Otherwise just like G for GET.
W for GET /DELETE /RECURSIVE.
Like V, but the server should delete each file after it is sent
successfully.
That should do it.
2. 32-BIT CRC
We might as well, why not. The code for the CHKT field in the init string
is "4". 32-bit CRC must not be implemented in the absence of 16-bit CRC. A
special rule applies here, namely if one Kermit says "4" and the other says
"3", then fall back to "3" instead of "1". The generating polynomial is:
>>> Can't do that because it violates existing protocol. If the other
>>> Kermit says "3" and doesn't know about "4", it falls back to "1", so
>>> we must do the same.
X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0
taken "backwards" with the highest-order term in the lowest-order bit. The
X^32 term is "implied"; the LSB is the X^31 term, etc. The X^0 term (usually
shown as "+1") results in the MSB being 1. Code will be based on the well
known and open Gary Brown code that everybody else uses.
Unlike the type 1, 2, and 3 block checks, the 32-bit one should be encoded
to never contain a blank. We can either use the same encoding as for the
16-bit CRC but excess-33 instead of -32 (resulting in 6 bytes), or we can
write it more compactly as a base-94 number whose lowest digit is "!". (How
many bytes is that?)
>>> five -- could use "5" as the code instead of "4"??
(Joe notes that there might not be much value here, but we have learned that
trying to persuade the masses that the reason we don't have such-and-such a
feature that the others (read "Zmodem") have by filling blackboards full of
math never works -- better to just go along... Anyway, this is just for the
protocol definition, not necessarily to be implemented anywhere, and certainly
not *required* anywhere.)
3. EX-POST-FACTO PER-FILE CRC CHECKING
MS-DOS Kermit and C-Kermit can accumulate a 16-bit CRC of an entire
transaction, and they include a rather cumbersome process for comparing the
CRCs afterward, which works only in a client/server setting, and is script
based:
<file-transfer-command>
if fail <do something>
remote query kermit crc16
if not = \v(query) \v(crc16) <we got trouble>
Obviously this can be expected to succeed only for binary-mode transfers,
and so scripts that use this technique will break in text mode.
A more general mechanism can be added to the protocol itself as follows:
a. Add a new S/I packet parameter, after the last one that is defined,
whatever that is (don't worry, I'll look it up). A single byte, this
character has the same values as the Block Check parameter, except only
"3" or "4" should be allowed.
b. Add SET commands to turn the feature ON and OFF. It should be OFF by
default, to avoid the extra overhead.
c. When ON, it should be operative only for binary-mode transfers.
>>> Why can't this be applied to the "canonical" text form? I.e., after
>>> line delimiters have been converted to CRLF (if necessary) and the
>>> text has been translated to the transfer character set, but before
>>> control-quoting and repeat-count compression, etc.
d. At the end of file, the file sender puts the following in the Z-packet
data field: The letter C and then the decimal character representation of
the negotiated type of CRC for the file.
>>> Didn't you say the MSB will always be on? If so, then the decimal
>>> representation will depend on whether you understand the value to be
>>> signed 32-bit or unsigned 32-bit. Why not just encode the value in
>>> the same way as for packet checksums?
e. If the CRC from (d) does not agree with the receiver's CRC, the receiver
ACKs the Z packet with a Data field of N, optionally followed by its own
CRC, otherwise it ACKs with either an empty data field or the letter C
followed by the CRC (exactly as in the Z packet). It is up to the
receiver how to dispose of the file when the CRCs don't match.
f. When the sender receives a CRC mismatch indication, the SEND command must
fail. But what does this mean when a file group is being sent? Should it
stop and send an error packet or go on to the next file? This must be a
user choice, so there will need to be some SET commands... In any case,
if it is a SEND /DELETE (aka MOVE) operation, then the source file must
not be deleted. Appropriate notations must be made in the transaction
log, if any, etc.
The per-file CRC mechanism operates independently of the \v(crc16) variable,
which accumulates a CRC over the entire transfer, which could obviously become
bollixed if a mixture of text and binary files were transferred in the same
transaction, as can occur with VMS C-Kermit.
4. The Capabilities Mask
We're out of bits, except for the "continued" bit. But if we use the
continuation mechanism, we'll no doubt break every non-Kermit-Project Kermit
implementation on earth, and probably also many of the old ones in our own
collection. So to add more capability bits, we'll need to leave the
"continued" bit blank, and add the second capabilities mask at the end.
5. Info Exchange
The idea is for the two Kermits to exchange information with each other that
applies to the transaction as a whole, but is beyond the scope of (too
voluminous for) the S/Y or I/Y exchange.
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
At this point, if the two Kermits have agreed, the sender may (but need not)
send an "L" packet, which contains an unencoded parameter-length-value (PLV)
sequence (just like an "A" packet) of information applying to the connection
and the entire transfer. Parameters (all are optional):
F = (Sender only) Number of files (expressed as decimal string)
L = (Sender only) Total length, decimal string. Obviously iffy for
text-mode transfers, but we've always had that problem.
E = Encoding: Kermit transfer character-set designation for text used in
any of these fields that can contain arbitrary text. Default = ASCII.
Syntax: exactly as in A packet.
H = Hostname (e.g. so local Kermit can show remote host's name on the
file transfer display).
D = Current directory, syntax according to SET FILE NAMES.
O = Organization name. Arbitrary text, encoding specified in E.
C = Country code (ISO 3166).
T = Connection type (to allow automatic choices of various things based
on whether the connection is known to be reliable -- e.g. TCP/IP at
*both* ends). Number. 0 = unknown (usually the case when in remote
mode); 1 = serial port; 2 = ISDN; 3 = TCP; 4 = UDP; 5 = CTERM; 6 = LAT;
etc etc.
A = Address. Interpreted according to connection type. This can be the IP
hostname, IP address, or other address specific to the network type, or
telephone phone number in +1(212)7654321 format, for display on the
other Kermit's screen, or logging, or callback, or any other desired
reason. All sorts of uses for this one can be imagined.
Z = Timezone. There is some standard for this. Can be used to adjust
A-packet date/times, which are always in local time. Applies only to
terrestrial transfers.
>>> Can't use the current time zone to adjust the date/time of files
>>> previously created -- the daylight/standard switching mechanism is
>>> not universal. Also, the time stamp on a file may or may not be
>>> set in local time. Some installations choose to run on UT! Further,
>>> the time stamp on a file doesn't reflect the time zone that was
>>> in effect when the file was created. If the system manager decides
>>> it was stupid to run on UT, and switches over to local standard time,
>>> there's no "paper trail".
X = Encryption identifier (this needs spelling out).
K = Public key for X, when applicable.
N = (Receiver only): No. Refuses the transaction. Optionally one or more
more parameter letters are given as data, to indicate the reason for
refusal.
etc etc...
The order doesn't matter, except that if E is given, it must precede any
arbitrary-text fields. We can have up to 96 parameters, one for each 7-bit
graphic character. One must be reserved as an escape for when we run out.
NOTE: "L" was our last unused uppercase letter for packet types. Additional
packet types will be lowercase letters or other graphic characters. At least
one must be reserved as an escape for when we run out.
6. Extended Sequence Numbers and Window Size
32 just isn't big enough, e.g. for interplanetary transfers, not to mention
the Internet some days. But we can't increase it beyond 32 because it is
limited to the half the sequence-number range. Thus for larger windows we
must increse the sequence number space. But we can't do this in the regular
sequence number field, at least not significantly, because it is restricted
to a 64-byte codeset (in theory maybe 94, but that too would require a change
in the protocol, and as long as we're changing it, let's shoot higher).
6.1. Negotiation
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
d. Add another 2-byte field to the init string, XWINDO.
This works exactly like long packet negotiation. If the bit is set then we
fetch the actual window size from the two XWINDO bytes, which are in excess-32
base 95 notation, just like the extended packet length. The receiver that
doesn't understand this option, of course, fetches the window size from the
regular WINDO field. The maximum extended window size is:
95^2 - 1 = 9024 / 2 = 4512
6.2. Packet Format
When an extended window size is negotiated, the packet sequence number is
indicated as ` (backquote, ASCII 96) to indicate that the full 2-byte base-95
packet number is included in the extended header. For long packets, this goes
between the length and the header checksum. For short packets, it forms the
extended header by itself (plus a checksum).
>>> This applies only to D packets, right?
The maximum extended sequence number is thus 95^2 - 1 = 9024, and the maximum
window size is half that, or 4512. A 4512-packet window of 9024-byte packets
(the theoretical maximum) would require about 7MB of packet buffers.
Obviously a smaller actual maximum can be imposed by the implementation.
6.3. Improved Packet Framing
This is changed from yesterday -- now it's imply folded in with the
new packet format.
There is nothing in a basic Kermit packet to indicate where the data ends and
the block check begins. But we have the opportunity in extended-sequence
packets to use a better format. In these packets, the packet length indicates
the beginning of a PLV format block check. Parameters are the block-check
codes (1, 2, 3, B, 4). The length indicates the number of bytes in the block
check. Then the block check. In addition to preventing foulups, this allows
the block check type to be varied dynamically throughout the transaction. It
>>> Why would we want to do that???
also allows a graphic character to be placed after the block check in case it
ends with a blank.
>>> In practice, we can do that already.
7. Supervisory Packets
These can be used for "out of band" functions. Supervisory packets must be
numbered, just like regular ones, because otherwise there is no way for the
receiver to indicate that it was or wasn't received.
>>> Is this going to play havoc with sliding windows? E.g., changing
>>> the packet size upon request of the receiver would logically
>>> demand that the packets already in transit be "flushed" somehow.
>>> Ouch!
Let's call this a "u" packet. It can be sent only by the file sender, and
it can be sent at any time during a transaction if negotiated:
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
Contents are, again, the familiar PLV sequences. Some possible parameters:
M = Message. To be logged or shown in the display.
W = Change window size
P = Change packet length
R = Reset to defaults
S = Sync
D = Drain
B = Buffer credit
(I'm not really sure yet whether any of these make sense, or what they would
do, or how they would work, or what else we can do here, so this is mainly
just a placeholder.)
The sender ACKs with the normal indications (Y or N, length, list of tags).
If the file receiver wants to send a supervisory message, it can be placed
into the data field of any D-packet ACK: the letter "u" followed by PLV
sequences (we can't put these in *any* ACK because some already are allowed to
contain arbitrary string data, e.g. ACK(F), tsk tsk). The file sender
"acknowledges" by sending a "u" packet, which must then be ACK'd by the
receiver with an empty ACK.