home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Columbia Kermit
/
kermit.zip
/
archives
/
protocol.tar.gz
/
protocol.tar
/
newprotocol.txt
< prev
next >
Wrap
Text File
|
2002-10-17
|
47KB
|
1,070 lines
SOME MINOR ADDITIONS TO THE KERMIT PROTOCOL
Begun: November 1997
D R A F T # 11
Thu Oct 17 16:33:57 2002
ROUGH DRAFT - TO BE FLESHED OUT AFTER DISCUSSION
For next book... prefixing. Suppose you have 98 Ctrl-C's in a row,
and Ctrl-C is unprefixed. Then you get ~~xxx (where x is Ctrl-C), because
97th and 98th are only two in a row and don't get collapsed. But this leaves
three of them in a row in the packet, which can kill the transfer. Ditto for
plus signs, or any other char. There has to be a rule to prevent this.
(Maybe Ctrl-C should always be prefixed, since C-Kermit accepts ^C^C^C to
break out of packet mode by default.)
The same thing can happen (with very low probability) at the end of the data
field, before the block check. For that matter, it's possible to have a
Type-3 block check of "+++", which could disconnect the modem. Not that I've
ever heard of such a thing happening in 20 years...
Items that still need to be addressed:
. Do something about Error packets -- block check type, frame number?
. Add something about sending the file's original name -- a new attribute
maybe.
. A generic way to represent file permissions on the wire, and then
REMOTE SET FILE PERMISSION, etc. Very cumbersome if done portably...
Also, how to represent permissions in "universal directory listings"?
0. REMOTE SET TRANSFER MODE (DONE)
Allows the client to set the server's transfer mode. Code: 410.
Value: 0 for automatic, 1 for manual.
How about REMOTE SET FILE PATTERNS { ON, OFF }? Not needed, since REMOTE SET
TRANSFER-MODE takes care of this. If it were needed, however, then we'd also
need to way for the client to modify the server's pattern list.
0.0.1. REMOTE SET MATCH { FIFO, DOTFILE } { ON, OFF }
DOTFILE = 330
FIFO = 331
332-339 saved for other wildcard/match settings.
(Note: 320 is FILE CHARACTER-SET)
0.1. WHATAMI2
For greater client/server coupling however, we're going to need a second
WHATAMI field.
0.1.1. Transfer Mode
REMOTE SET TRANSFER MODE isn't good enough. GET /BINARY or GET /TEXT really
have to force the transfer mode to the one indicated, no matter what, just as
SEND /TEXT (or /BINARY) does. But it only sets the *prevailing* mode.
Internally these also set TRANSFER MODE to MANUAL for the duration of the
command, but there is presently no protocol for the client to tell the server
its transfer mode, so if the server has XFER MODE AUTO and PATTERNS ON, this
will still take precedence on a per-file basis. In the worst case "get /text
foo.com" will result in a binary transfer if "*.com" is in the server's binary
pattern list.
Thus we need a way for GET /<mode> to *temporarily* set the server's transfer
mode to MANUAL. One way to do this is already sketched out for Extended GET.
But it requires the client to send an O packet, which few servers understand.
This should be implemented eventually anyway, but in the meantime we can do
the same thing by adding a new WHATAMI bit. Unfortunately, however, the
WHATAMI field is full, so now we need a second one, which directly follows the
(variable-length) system ID field. The new field is called WHATAMI2, with the
same format as WHATAMI:
Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
+--------+----------+----------+----------+-----------+----------+
| 1 | reserved | reserved | CHARSETS | RECURSIVE | XFERMODE |
+--------+----------+----------+----------+-----------+----------+
Bit 5
Is set to 1 to indicate that the other bits are to be believed.
This allows this field to be skipped over in the event it is not
implemented, but a subsequent field is added.
XFERMODE
0 = Automatic.
1 = Manual.
RECURSIVE (See Section 0.1.2)
0 = This will not be a recursive transfer
1 = This is to be a recursive transfer
CHARSETS (See Section 0.1.3)
0 = My TRANSFER CHARACTER SET is TRANSPARENT
1 = My TRANSFER CHARACTER SET is not TRANSPARENT
The binary bit-encoded value is made printable by tochar() and then decoded
by the receiver using unchar().
This mechanism lets the client control the server's transfer mode. The
client's transfer mode is controlled by:
a. The most recent SET TRANSFER MODE { AUTOMATIC, MANUAL }, which
sets the prevailing mode (the default is AUTOMATIC); and:
b. Any mode switch (/BINARY, /TEXT, etc) included with a SEND, GET, or other
file-transfer command, which sets the transfer to MANUAL for the duration
of the command; the prevailing mode is restored afterwards.
The WHATAMI2 XFERMODE bit lets the client control the server's transfer mode.
This is automatically "temporary", since each client SEND/GET command will set
it appropriately.
0.1.2. Pathnames
Next problem: GET /RECURSIVE works without having to tell the server to SET
SEND PATHNAMES RELATIVE first because GET /RECURSIVE has its own packet type.
But SEND /RECURSIVE still requires you to SET RECEIVE PATHNAMES RELATIVE on
the server first, since there is nothing special in the I or S packet to tell
the server to expect a recursive transfer.
The WHATAMI2 RECURSIVE bit lets the client tell the server that incoming files
will have pathnames attached and the server should automatically switch to
RECEIVE PATHNAMES RELATIVE. But what if the user does not want the server
to do this? We need:
SET RECEIVE PATHNAMES AUTO
to be the default, which means OFF normally, but RELATIVE if I'm a server
and the client sets the WHATAMI2 RECURSIVE bit. If the user has set RECEIVE
PATHNAMES to anything else but AUTO, then that value is used instead.
The RECEIVE PATHNAMES value is saved and restored around the transfer, so if
it was AUTO it goes back to AUTO.
The reserved bits in the WHATAMI2 field must be set to 0. At such time as
they are defined, their definitions must be such that the 0 value corresponds
to previous and/or default behavior, so that their use will not cause
interoperability problems with Kermit versions in which they are still
reserved.
0.1.3. Character Sets
Now, assuming all the aforementioned rules are in effect, we still have
problem. Suppose the client and server are on "like platforms" and therefore
would slip into FILE TYPE BINARY and TRANSFER MODE MANUAL automatically. This
is just fine as long as character-set translation was not desired. Therefore
the WHATAMI2 word also includes a CHARSETS bit, which is 0 for Kermits whose
TRANSFER CHARACTER-SET is TRANSPARENT, and 1 if an actual transfer charset
has been selected. Now the rule is:
Do not switch to TRANSFER MODE MANUAL automatically if the WHATAMI2:CSET bit
is 1.
But this makes it potentially harder to recover broken transfers between like
systems.
1. DIRECTORY OPERATIONS
The aim of these changes is to allow the exchange of directory trees or file
systems. It is assumed that all file systems are either tree-structured or
flat. Hardly any protocol changes are needed, mainly just agreements on data
formats. Most of the features are implemented outside the protocol: recursive
SEND commands, automatic directory creation during RECEIVE commands, etc.
1.0. Directory Name Format Selection (DONE)
(This is simplified considerably in Draft 2 after I implemented it in C-K...)
SET FILE NAMES { CONVERTED, LITERAL }
Now applies to pathnames too. For pathnames, CONVERTED means that the
native directory notation is converted to standard format when sending,
and the standard format is assumed when receiving.
The related command:
SET { SEND, RECEIVE } PATHNAMES { OFF, ABSOLUTE, RELATIVE }
then applies as usual. PATHNAMES are OFF by default, in which case nothing is
different. When SEND PATHNAMES is ABSOLUTE or RELATIVE, then the FILE NAMES
setting is applied to them just as it is to the rest of the filename.
When receiving files, a Kermit program should be expected to understand its
own native format and the standard one; it cannot be expected to understand a
foreign directory notation. Thus SET FILE NAMES CONVERTED should be used
between unlike systems.
Notes:
1. There is no reason why there can't be separate SET FILE NAMES commands
and settings for each direction.
2. We haven't said anything that affects the protocol yet, that comes
in the next section.
1.1. Kermit Protocol Directory Name Representation (DONE)
UNIX notation shall be used for directories when FILE NAMES are CONVERTED.
Forward slash (/) is the directory separator. If a / appears as a literal
character in a directory name, then it should be written as //. A file or
directory specification beginning with / is absolute, otherise it is relative.
This is more or less the same scheme used by Info-ZIP and so it is widely
proven in the real world.
As always, the rule regarding letters when FILE NAMES are CONVERTED is to
uppercase when sending. The receiver handles letters according to its own
convention.
Symbolic names like "." and ".." should be expanded before transmission.
For the time being, we should use the rule that device names are always
discarded (e.g. DOS disk letters, VMS disk names, etc).
Note: I have this working now in VMS as well as UNIX:
FILENAMES SEND PATHNAMES UNIX Result VMS Result
CONVERTED OFF OOFA.TXT OOFA.TXT
CONVERTED RELATIVE BLAH/OOFA.TXT BLAH/OOFA.TXT
CONVERTED ABSOLUTE /W/FDC/TMP/BLAH/OOFA.TXT /FDC/BLAH/OOFA.TXT
LITERAL OFF oofa.txt OOFA.TXT
LITERAL RELATIVE blah/oofa.txt [.BLAH]OOFA.TXT
LITERAL ABSOLUTE /w/fdc/tmp/blah/oofa.txt [FDC.BLAH]OOFA.TXT
1.2. Client/Server Directory Operations
REMOTE MKDIR <name>
G packet function code "m" (yes, lowercase). Creates the specified
directory. Names are as in 1.1 (absolute or relative). (DONE)
REMOTE RMDIR <name>
G packet function code "d". Removes specified directory. Name can
be wild. (DONE)
REMOTE RMDIR /RECURSIVE <name>
G packet function code "t". Removes specified directory tree and all
its contents. Like rm -Rf in UNIX. Name can be wild. (NOT DONE)
1.3. GET /RECURSIVE (DONE)
New packet types:
V for GET /RECURSIVE.
Tells server to send all files that match the given specification in
the current or given directory tree. Otherwise just like G for GET
(DONE).
W for GET /DELETE /RECURSIVE.
Like V, but the server should delete each file after it is sent
successfully (DONE).
That should do it.
1.4. EXTENDED GET
July 1998: No, it shouldn't. Because what about /RECOVER, etc? We are nearly
out of (uppercase) packet types, and can't afford to add a new one for every
combination of GET switches; even if we could, this unnecessarily
overcomplicates the FSA that implements the protocol.
Definition: Simple GET-Class Packet -- Any of the following:
R - Original GET packet
H - GET /DELETE (= RETRIEVE)
V - GET /RECURSIVE
W - GET /DELETE /RECURSIVE
We should not have used up those packet types, but it's too late now. From
now on, all new GET options go through a new Extended GET (XGET) packet, type
"O", which is (a) capable of expressing all combinations of GET options
(including those already expressed in the existing simple GET-class packets),
and (b) extensible:
O - (New) Extended GET
With this addition, GET-Class Packets include the Simple GET-Class Packets
plus the O-packet.
Note that many GET commands are "ambiguous" in the sense that they could
result in either a Simple GET-Class packet or an Extended GET packet.
Suppose the client picks one form, but the server only implements the other?
To resolve this situation in a user-friendly manner, the rule must be:
Any Kermit client that implements Extended GET must also implement all
of the Simple GET-Class packets (R, H, V, and W). Any GET command that
can be expressed in a Simple GET-Class packet must be expressed that way;
an Extended GET packet should be used only for combinations that are not
expressible in a Simple GET-Class packet.
The server, of course, should accept either form.
Negotiation:
None. If a server receives an O packet and does not understand it, it
returns an Error packet in the normal fashion.
Format:
Packet type: O. Data field contains options and selectors in Modified PLV
(Parameter, Length, Value) format. Modified PLV format is just like PLV
format, except that a special escape character may be placed in the Length
field to indicate that the Value field begins with a 2-character length.
This allows for fields longer than 94, and in fact allows fields up to 8836
bytes long. Since any printable character is allowed in the regular PLV
length field, this escape character must be either a control character or an
8-bit character, which is OK, since it will be encoded according to normal
rules (see below). The escape mechanism should be used only when a value
is longer than 94 bytes, which should happen only with filenames, and then
only rarely. The escape character is Ctrl-V (SYN, ASCII 22).
Parameters:
O: Options (bits to ANDed together, result converted to a decimal string):
1 = Delete each source file after it is sent successfully (/DELETE)
2 = Recursive (/RECURSIVE)
4 = Recover (/RECOVER)
8 = Filename is a command (/COMMAND)
16 = Reserved
32 = Reserved
*** Is there some reason this is limited to a single 6-bit byte?
*** To be added (to first or second byte):
xx = Use filemode given in yy, no matter what -- overrides all else
yy = 0 = text, 1 = binary
This is to allow GET /TEXT and GET /BINARY in the client to override any other
kind of automatic transfer-mode determination in the server. If the user says
/TEXT or /BINARY, they mean it.
o: Reserved as a second Options byte.
M: Local Transfer Mode (sets the server's mode for this transaction only):
0 = Text
1 = Binary or Image
2 = Auto (default)
3 = Labeled
P: Pathnames:
0 = Server should send with pathnames stripped
1 = Server should send with relative pathnames
2 = Server should send with absolute pathnames
N: Name Conversion:
0 = Server should send with literal names
1 = Server should send with converted names
X: Transfer character set (client tells server which xfer charset to use;
server picks corresponding file charset automatically by association).
E: Exception name or pattern (was X).
There can be more than one of these. The entire exception list applies
to all filespecs.
F: Filespec:
Name or wildcard for requested file(s).
There can be more than one of these.
L: Larger than (size in bytes)
S: Smaller than (size in bytes)
A: After. File date-time, yyyymmdd hh:mm:ss, client's local time.
Only send files modified AFTER the given date-time.
a: After2. File date-time, yyyymmdd hh:mm:ss, client's local time.
Only send files modified ON OR AFTER the given date-time.
B: Before. File date-time, yyyymmdd hh:mm:ss, client's local time.
Only send files modified BEFORE the given date-time.
b: Before2. File date-time, yyyymmdd hh:mm:ss, client's local time.
Only send files modified ON OR BEFORE the given date-time.
C: After. File date-time, yyyymmdd hh:mm:ss, GMT.
Only send files modified AFTER the given date-time.
c: After2. File date-time, yyyymmdd hh:mm:ss, GMT.
Only send files modified ON OR AFTER the given date-time.
D: Before. File date-time, yyyymmdd hh:mm:ss, GMT.
Only send files modified BEFORE the given date-time.
c: Before2. File date-time, yyyymmdd hh:mm:ss, GMT.
Only send files modified ON OR BEFORE the given date-time.
@: End of Parameters
Note that the P and N parameters raise a tricky question for the command
language, since these parameters can apply separately at each end. For
example, does GET /FILENAMES:LITERAL mean the server should send filenames
literally, the client store them literally, or both? Ditto for GET
/COMMAND? Currently this means the incoming file is to be fed to a command,
as opposed to telling the server that it should be sending from a command
(for which purpose we presently use "!" notation in the filename).
O-Packets must be encoded -- unlike S/I/A packets -- because parameters
and/or length fields might have any value at all. Thus PLV processing by
the server must take place AFTER decoding.
Examples:
Here are some sample O packets:
1. ^A0 OO!7F&blah.x@ 0 ; GET /DEL /RECURS /RECOV blah.x
2. ^A3 OO!7M!1F&blah.x@ T ; GET /DEL /RECURS /RECOV /BIN blah.x
3. ^A. OO!7F##abc@ 7 ; GET /DEL /RECURS /RECOV abc
4. ^A. OO!7F##~#a@ R ; GET /DEL /RECURS /RECOV aaa
5. ^A O!3FO!7F#V!)abcdefghij...(lots more)...qrstuvwxyz@ 3
(1) shows that the M field is omitted when /TEXT or /BINARY not given.
(2) shows that the M field is included when /TEXT or /BINARY is given.
(3) shows how a length field of 3 is encoded as ##.
(4) shows a filename that compresses to 3 characters.
(5) shows what happens when a filename is more then 94 chars long --
the O-packet data field begins with an extended header ("!3F"), then the F
parameter length field is a Control-V character, indicating the first two
characters of the value field are a 2-byte length ("!)"). On clear channels
or when Ctrl-V is unprefixed, it will be inserted literally rather than
encoded as "#V".
Protocol:
If the client sends an option not understood by the server, the server MUST
send an Error packet and return to server command wait. Otherwise, the
resulting transfer could be incorrect (wrong mode, wrong file, wrong
destination, etc). Thus, the client should not send options that were not
specified by user (e.g. supply default options that were not given
explicitly).
Since filenames can be quite long, and any number of them can be included in
an XGET command, the resulting parameter list could easily be greater than the
negotiated packet length. Therefore we must allow for a series of O packets,
as we do with A packets.
We do, however, require that each parameter be totally contained within a
packet, just as we do for A packets. Although it might be desirable to allow
filenames, etc, to span packets, there is no pressing need for this (it is not
allowed in F or R packets, nor with A-packet parameters, and nobody is
complaining), and it would add considerable complication to the
implementation. Therefore, the restriction that a filename must fit within
the negotiated packet length is not changed by this protocol addition.
Note that Simple GET-Class packets are not acknowledged; instead the server
reverses the direction of the protocol by sending an S packet. O packets, on
the other hand, must be numbered and, except for the last (or only) one,
acknowledged individually. This means the final O packet MUST contain an End
Of Parameters marker (@) as its last parameter. (Of course the final O-packet
can be NAK'd, in which case the client must retransmit it.)
As with any other GET operation, the server responds by resetting the sequence
number to 0 and sending S(0), except in this case, only after the final
O-Packet. A potential problem occurs if the S(0) sent in response to a final
O-Packet whose sequence number was not 0 is lost. In this case, the client
might time out and retransmit O(x). But x is not a valid sequence number any
more so the server's transport layer will reject it with an error packet. But
this is an unnecessary error, since all the server really needs to do is
retransmit S(0). To avoid this situation, the following rule should be
added:
When the window size is 1 and a packet arrives, save it (or if memory
is at a premium, save its control fields). Whenever a new packet arrives,
compare it with the previous one and if it is a duplicate, ignore it.
Or... if this causes too much overhead, put another ugly heuristic into the
transport layer similar to the one for E packets...
Wildcards and Patterns:
Unlike regular GET, XGET should define a standard format for filenames and
patterns, so clients need not know the special syntax of the server's
underlying platform.
Thus the following characters in filenames and patterns are reserved:
* = matches any sequence of 0 or more characters
? = matches any single character
/ = directory separator (portable filenames are in UNIX format)
But how to quote these characters when they are to be taken literally?
First note that we also want to accept platform-specific syntax, and in a
very common case, this includes DOS-format pathnames. Which rules out
backslash as a quote character. Similarly for any other ASCII character.
Therefore the quote character should be a control character:
^V (Control-V)
This is natural for UNIX and TOPS-10/20 users and is very unlikely to appear
in a filename (in case it does, it can quote itself). The Ctrl-V is encoded
with the SET SEND CONTROl-PREFIX provided it is not included in the
SET CONTROL UNPREFIX set.
Possible conflicts occur on platforms that use wildcards differently,
e.g. AOS/VS, where "*" matches any string of characters up to a period, and
"+" matches any string of characters. If incoming "*" is translated to "+",
then how would the client get the AOS/VS functionality? (With "*." -- so
let's not worry about it.)
2. 32-BIT CRC
We might as well, why not. The code for the CHKT field in the init string
is "4". 32-bit CRC must not be implemented in the absence of 16-bit CRC. A
special rule applies here, namely if one Kermit says "4" and the other says
"3", then fall back to "3" instead of "1". The generating polynomial is:
X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0
taken "backwards" with the highest-order term in the lowest-order bit. The
X^32 term is "implied"; the LSB is the X^31 term, etc. The X^0 term (usually
shown as "+1") results in the MSB being 1. Code will be based on the well
known and open Gary Brown code that everybody else uses.
Unlike the type 1, 2, and 3 block checks, the 32-bit one should be encoded
to never contain a blank. We can either use the same encoding as for the
16-bit CRC but excess-33 instead of -32 (resulting in 6 bytes), or we can
write it more compactly as a base-94 number whose lowest digit is "!". (How
many bytes is that?)
(Joe notes that there might not be much value here, but we have learned that
trying to persuade the masses that the reason we don't have such-and-such a
feature that the others (read "Zmodem") have by filling blackboards full of
math never works -- better to just go along... Anyway, this is just for the
protocol definition, not necessarily to be implemented anywhere, and certainly
not *required* anywhere.)
3. EX-POST-FACTO PER-FILE CRC CHECKING
MS-DOS Kermit and C-Kermit can accumulate a 16-bit CRC of an entire
transaction, and they include a rather cumbersome process for comparing the
CRCs afterward, which works only in a client/server setting, and is script
based:
<file-transfer-command>
if fail <do something>
remote query kermit crc16
if not = \v(query) \v(crc16) <we got trouble>
Obviously this can be expected to succeed only for binary-mode transfers,
and so scripts that use this technique will break in text mode.
A more general mechanism can be added to the protocol itself as follows:
a. Add a new S/I packet parameter, after the last one that is defined,
whatever that is (don't worry, I'll look it up). A single byte, this
character has the same values as the Block Check parameter, except only
"3" or "4" should be allowed.
b. Add SET commands to turn the feature ON and OFF. It should be OFF by
default, to avoid the extra overhead.
c. When ON, it should be operative only for binary-mode transfers.
d. At the end of file, the file sender puts the following in the Z-packet
data field: The letter C and then the decimal character representation of
the negotiated type of CRC for the file.
e. If the CRC from (d) does not agree with the receiver's CRC, the receiver
ACKs the Z packet with a Data field of N, optionally followed by its own
CRC, otherwise it ACKs with either an empty data field or the letter C
followed by the CRC (exactly as in the Z packet). It is up to the
receiver how to dispose of the file when the CRCs don't match.
f. When the sender receives a CRC mismatch indication, the SEND command must
fail. But what does this mean when a file group is being sent? Should it
stop and send an error packet or go on to the next file? This must be a
user choice, so there will need to be some SET commands... In any case,
if it is a SEND /DELETE (aka MOVE) operation, then the source file must
not be deleted. Appropriate notations must be made in the transaction
log, if any, etc.
The per-file CRC mechanism operates independently of the \v(crc16) variable,
which accumulates a CRC over the entire transfer, which could obviously become
bollixed if a mixture of text and binary files were transferred in the same
transaction, as can occur with VMS C-Kermit.
4. The Capabilities Mask
We're out of bits, except for the "continued" bit. But if we use the
continuation mechanism, we'll no doubt break every non-Kermit-Project Kermit
implementation on earth, and probably also many of the old ones in our own
collection. So to add more capability bits, we'll need to leave the
"continued" bit blank, and add the second capabilities mask at the end.
But the next available field is after a PLV field (system ID) and so it's
also not in a fixed place...
Solution: Recycle the three Checkpoint bytes, since Checkpointing has never
been implemented and nobody has seen the spec. Currently we have (counting
from 0):
S[13] = '0'; <-- '0' means WONT CHECKPOINT.
S[14] = '_';
S[15] = '_';
S[16] = '_';
S[13] (according to the checkpoint proposal) can have the following values:
0 = WONT I won't do it (SET CHECKPOINT DISABLED)
1 = WILL I will do it if asked (SET CHECKPOINT ENABLED)
2 = DO Please do it (SET CHECKPOINT ON)
Now we give it a new one:
9 = XCAPAS (extended capability field)
This clearly identifies the following bytes as capability words and not
some vestige of checkpointing. The XCAPAS bytes are filled right to left
in normal 6bit+32 format. Unused XCAPAS bytes are set to accent grave (`),
which is outside the 6bit_32 range and therefore would not be mistaken for
a capability word.
New S[16] Capability bits:
1 = UTF8 Filenames (UTF8NAMES)
2 = GMT (UCT) file timestamps (GMTSTAMPS)
4 = (free...)
5. Info Exchange (NOT IMPLEMENTED YET)
The idea is for the two Kermits to exchange information with each other that
applies to the transaction as a whole, but is beyond the scope of (too
voluminous for) the S/Y or I/Y exchange.
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
At this point, if the two Kermits have agreed, the sender may (but need not)
send an "L" packet, which contains an unencoded parameter-length-value (PLV)
sequence (just like an "A" packet) of information applying to the connection
and the entire transfer. Parameters (all are optional):
F = (Sender only) Number of files (expressed as decimal string)
L = (Sender only) Total length, decimal string. Obviously iffy for
text-mode transfers, but we've always had that problem.
E = Encoding: Kermit transfer character-set designation for text used in
any of these fields that can contain arbitrary text. Default = ASCII.
Syntax: exactly as in A packet.
H = Hostname (e.g. so local Kermit can show remote host's name on the
file transfer display).
D = Current directory, syntax according to SET FILE NAMES.
O = Organization name. Arbitrary text, encoding specified in E.
C = Country code (ISO 3166).
T = Connection type (to allow automatic choices of various things based
on whether the connection is known to be reliable -- e.g. TCP/IP at
*both* ends). Number. 0 = unknown (usually the case when in remote
mode); 1 = serial port; 2 = ISDN; 3 = TCP; 4 = UDP; 5 = CTERM; 6 = LAT;
etc etc.
A = Address. Interpreted according to connection type. This can be the IP
hostname, IP address, or other address specific to the network type, or
telephone phone number in +1(212)7654321 format, for display on the
other Kermit's screen, or logging, or callback, or any other desired
reason. All sorts of uses for this one can be imagined.
X = Encryption identifier (this needs spelling out).
K = Public key for X, when applicable (more thought needed).
N = (Receiver only): No. Refuses the transaction. Optionally one or more
more parameter letters are given as data, to indicate the reason for
refusal.
Also add specific platform identifier, OS name and version, Kermit software
name and version, endianness, ...
The order doesn't matter, except that if E is given, it must precede any
arbitrary-text fields. We can have up to 96 parameters, one for each 7-bit
graphic character. One must be reserved as an escape for when we run out.
NOTE: "L" was our last unused uppercase letter for packet types. Additional
packet types will be lowercase letters or other graphic characters. At least
one must be reserved as an escape for when we run out.
Notes on encyrption (from Jeff):
Now that the PGP style of public key encryption is no longer covered by patent
and it looks like the IETF is going to accept PGP encryption as their
standard, Kermit public key encryption could work like this:
. The sender and receiver would negotiate the type of encryption to use.
. The receiver would then deliver its public key to the sender.
. The sender would then encrypt all data for the transaction using
that public key, which only the receiver would be able to decrypt.
This would allow Kermit to generate keys completely on the fly without any
need for local files or user intervention.
6. Extended Sequence Numbers and Window Size
32 just isn't big enough, e.g. for interplanetary transfers, not to mention
the Internet some days. But we can't increase it beyond 32 because it is
limited to the half the sequence-number range, which is 64. So for larger
windows we must increse the sequence number space. But we can't do this in
the regular sequence number field, at least not significantly, because it is
restricted to a 64-byte codeset (in theory maybe 94, but that too would
require a change in the protocol, and as long as we're changing it, let's
shoot higher).
(*** This not so important any more because of streaming ***)
6.1. Negotiation
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
d. Add another 2-byte field to the init string, XWINDO.
This works exactly like long packet negotiation. If the bit is set then we
fetch the actual window size from the two XWINDO bytes, which are in excess-32
base-95 notation, just like the extended packet length. The receiver that
doesn't understand this option, of course, fetches the window size from the
regular WINDO field. When this option is negotiated, the maximum sequence
number is thus 95^2 - 1 = 9024, and the maximum window size is half that, or
4512. A 4512-packet window of 9024-byte packets (the theoretical maximum)
would require about 7MB of packet buffers. Obviously a smaller actual maximum
can be imposed by the implementation.
6.2. Packet Format
When an extended window size is negotiated, the packet sequence number is
indicated as ` (backquote, ASCII 96) to indicate that the full 2-byte base-95
packet number is included in the extended header. For long packets, this goes
between the length and the header checksum. For short packets, it forms the
extended header by itself (with the header checksum of course).
6.3. Improved Packet Framing
There is nothing in a basic Kermit packet to indicate where the data ends and
the block check begins. But we have the opportunity in extended-sequence
packets to use a better format. In these packets, the packet length indicates
the beginning of a PLV format block check. Parameters are the block-check
codes (1, 2, 3, B, 4). The length indicates the number of bytes in the block
check. Then the block check. In addition to preventing foulups, this allows
the block check type to be varied dynamically throughout the transaction. It
also allows a graphic character to be placed after the block check in case it
ends with a blank.
Thus "Kermit-II" packets add 6 bytes of overhead to short packets:
. The wasted SEQ byte
. The 3-byte extended header
. 2 extra bytes for the packet block check
and 5 bytes for long packets:
. The wasted SEQ byte
. 2 bytes in the extended header that is already there
. 2 extra bytes for the packet block check
7. Supervisory Packets
These can be used for "out of band" functions. Supervisory packets must be
numbered, just like regular ones, because otherwise there is no way for the
receiver to indicate that it was or wasn't received.
Let's call this a "u" packet. It can be sent only by the file sender, and
it can be sent at any time during a transaction if negotiated:
a. Add a new capability bit for this.
b. The file sender sets this bit in its S packet.
c. The file receiver agrees by setting the same bit in its ACK(S).
Contents are, again, the familiar PLV sequences. Some possible parameters:
M = Message. To be logged or shown in the display.
W = Change window size
P = Change packet length
R = Reset to defaults
S = Sync
D = Drain
B = Buffer credit
(I'm not really sure yet whether any of these make sense, or what they would
do, or how they would work, or what else we can do here, so this is mainly
just a placeholder.)
The sender ACKs with the normal indications (Y or N, length, list of tags).
If the file receiver wants to send a supervisory message, it can be placed
into the data field of any D-packet ACK: the letter "u" followed by PLV
sequences (we can't put these in *any* ACK because some already are allowed to
contain arbitrary string data, e.g. ACK(F), tsk tsk). The file sender
"acknowledges" by sending a "u" packet, which must then be ACK'd by the
receiver with an empty ACK.
8. Compression
(Note: much of this discussion also applies to per-file encryption...)
This is indicated in the A packet. The book says attribute * (Encoding) is
the place to do this and lists Huffman Encoding (Q) as an example of
compression. So we can add something like "Z" for ZIP/Zlib compression. So
far so good.
The " (Type) field that lists the filetype, A (text) or B (binary).
Unfortunately, this has become synonymous with "transfer mode". Which has not
been a problem until now.
What if we want to send a text file with compression? We must do all the
character-set and record-format conversion first, then compress it, and the
transfer must occur in binary mode, yet the receiver must know to apply its
normal text-mode conversions upon it after decompressing.
Questions:
1. Should we define a capability bit for compression?
. Yes, so the two Kermits can negotiate about it in the normal way.
. No, because there might be many compression methods.
Maybe it's best to skip the capability bit and simply lump this in
with Attribute capability, and then let the Attribute refusal mechanism
take care of negotiation.
But then there's no way for the sender to bid for compression but fall back
to noncompression if the receiver fails to agree. UNLESS...
If the receiver explicitly "ACKs" the compression in its ACK(A), then it will
be compressed, otherwise it won't be.
2. How do we specify that we are sending a compressed text file?
. The *Z attribute overrides the "A attribute? No, because old Kermits
would not know to do this and so would corrupt the file.
. Always send in binary mode ("B), but notify the receiver in some other
way that once uncompressed, it's a text file. This would work with
old Kermits (the received compressed file would be stored as sent,
binary, and could be decompressed afterwards).
But where is the other info? How about this: *ZA means compressed text,
*ZB means compressed binary.
When compression was selected, the SET FILE TYPE value would move to the *Z?
field, and the "file type" would be binary
9. Format of System-Dependent File Permissions in A-Packets (DONE)
The format of this field (the "," attribute) is interpreted according to the
System ID ("." Attribute).
For UNIX (System ID = U1), it's the familiar 3-digit octal number, the
low-order 9 bits of the filemode: Owner, Group, World, e.g. 660 = read/write
access for owner and group, none for world, recorded as a 3-digit octal string.
For VMS (System ID = D7), it's a 4-digit hex string, representing the 16-bit
file protection WGOS fields (World,Group,Owner,System), in that order (which
is the reverse of how they're shown in a directory listing); in each field,
Bit 0 = Read, 1 = Write, 2 = Execute, 3 = Delete. A bit value of 0 means
permission is granted, 1 means permission is denied. Sample:
r-01-00-^A/!FWERMIT.EXE'"
s-01-00-^AE!Y/amd/watsun/w/fdc/new/wermit.exe.DV
r-02-01-^A]"A."D7""B8#119980101 18:14:05!#8531&872960,$A20B-!7(#512@ #.Y
s-02-01-^A%"Y.5! ^^^^^^
A VMS directory listing shows the file's protection as (E,RWED,RED,RE) which
really means (S=E,O=RWED,G=RED,W=RE), which is reverse order from the internal
storage, so (RE,RED,RWED,E). Now translate each letter to its corresponding
bit:
RE=0101, RED=1101, RWED=1111, E=0010
Now reverse the bits:
RE=1010, RED=0010, RWED=0000, E=1101
This gives the 16-bit quantity:
1010001000001101
This is the internal representation of the VMS file permission; in hex:
A20B
as shown in the sample packet above.
The VMS format probably would also apply to RSX or any other FILES-11 system.
10. Handling of Generic Protection
To be used when the two systems are different (and/or do not recognize or
understand each other's local protection codes).
First of all, the book is wrong. This should not be the World protection,
but the Owner protection. The other fields should be set according to system
defaults (e.g. UNIX umask, VMS default protection, etc), except that no
non-Owner field should give more permissions than the Owner field.
11. Dates and Times in Attribute Packets
In keeping with good protocol design, conversions of dates and times between
two Kermit partners, if they are to be done at all, require a standard
date/time on the wire, so each Kermit program needs to know only how to
convert between its local time and the standard, and does NOT need to know
anything about the other Kermit's timezone. The standard time is GMT.
The date-time attribute in the A packet should be clearly described as LOCAL
TIME, not to be converted.
The use of GMT can be negotiated via capability bits. See Section 4.
Ditto for the Extended GET packet described above...
12. Tight Coupling of Client and Server via TELNET Protocol
Described in IKSD and TELNET KERMIT OPTION RFCs.
13. REMOTE EXIT (DONE)
BYE logs out the server's job. FINISH returns to either the command prompt
or the shell depending on how the server was started. But the client does
not necessarily know how the server was started. REMOTE EXIT addresses this
(partially) by telling the server to exit to the shell, no matter how it was
started. Format: Generic Server Command X. Protocol: If EXIT is disabled,
server sends an Error packet and does not exit; otherwise, it sends an ACK
and exits. The classic problem of the ACK being lost can occur here, just
as it can with BYE, or the B packet.
14. REMOTE STATUS
This one is in the Kermit book but was never described or implemented.
Let's define it as a short string that indicates the server's capabilities,
in PLV notation. The string is returned in the Data field of of the ACK to
the REMOTE STATUS command, and thus may not exceed the negotiated packet
length. The data field is encoded in the normal fashion. The parameters
returned are:
0 - Login status (3 bytes)
0 = Not logged in but login is required
1 = Logged in as a user
2 = Logged in anonymously
1 - IKS status (3 bytes)
0 = IKS not available
1 = IKS available but not negotiated
2 = IKS negotiated, indicates tight coupling of client and server
2 - Acceptable Client Packet Type List (up to about 24 bytes):
A string containing the "top level" commands that are available for
for execution (i.e. that are both implemented and enabled). The string
is composed of the packet types that may be sent to the server when it
is in server command wait state, e.g. "CGHIJORSVW". There is no need to
include standard types such as BDEFXNY, etc; if they are included, they
are ignored.
3 - Acceptable REMOTE Command List (up to about 30 bytes):
A string containing the REMOTE commands that are available for execution
(implemented and enabled). The string is composed of the Generic Server
Command subtypes, e.g. "ACDEFHIJKLQRSTUVWXdm".
The client parses the response and sets local variables accordingly, and
also may display an appropriate message, and set up detailed information to
be displayed in a subsequent SHOW SERVER command. It might also
disable/remove/mark client commands that are unavailable in the server.
NOTE: In case this features should grow beyond the capacity of a single
Data field, it can become a long-form reply, but a new packet would be
needed to distinguish it from other server long-form replies.
15. REMOTE LOGIN (DONE)
The book doesn't say what should happen if it fails. The server should
send an Error packet with text "Access denied."
Also, the book says nothing about the authentication method, which is fine.
It depends entirely on the implementation.
16. UNICODE (DONE)
UCS-2 Level 1, Group 00, Plane 00: FCS and TCS
UTF-8 Level 1, Group 00, Plane 00: FCS and TCS
TCS Kermit designators:
UCS-2: I162 (= Level 1)
UTF-8: I190 (= Level 1, but accept I196 incoming = Level unspecified).
There is no restriction regarding breaking of UCS-2 or UTF-8 sequences
across packets.
17. FILE SIZE (OPEN)
When sending a file, we put the file size into the A packet. But this is
not terribly useful when FCS is single-byte and TCS is multibyte (or v.v.).
But the protocol definition says the file size must be used, not the estimated
"transfer size". The receiver has no way of knowing about any expansion or
compaction of the orignal file, since it can only see the transfer encoding.
So we need another attribute: estimated expansion factor (as a percent).
18. CANCELLATION (DONE)
The protocol allows X or Z in the ACK to a D packet. It must be (has been)
extended to also allow this in the ACK to the Z packet, because of streaming,
to catch the edge case when an entire file's contents fits in a single
data packet, or for empty files.
19. UTF-8 FILENAMES
The coding of filenames has never been specified because the A packet, in
which the charset is given, comes before the F packet. So when a filename
comes in, we have no idea what its character set is (we can *guess* that it
is the current TRANSFER character set, but that's far from certain).
Now that there is a universal character set and a standard Internet
representation for it, i.e. UTF-8, we can use that for all filenames,
regardless of the file or transfer character set, as long as the two Kermits
agree beforehand. Negotiation is done simply by setting the UTF8NAMES
capability bit. If both Kermits set it, names are encoded in UTF8. If not,
the (unspecified and unpredictable) previous method is used. When UTF8
names are used:
. When receiving files, we convert all incoming filenames from UTF8 to the
current file character set (after deciding what it is based on the
incoming transfer charset via file associations). The tricky part is, we
don't know what the transfer charset is until the A-packet comes;
therefore we have to defer opening the output file until we get the Z or
first D packet -- but we already do that. But when the F-packet comes, we
have to put the local name in the ACK, and what character set we use for
that??? UTF-8 is the only one that makes sense, but since we generally
return the full pathname and possibly do other conversions (or we have an
as-name), this requires that we (a) convert the incoming UTF-8 name to
whatever character set is used locally for filenames, then (b) construct
the new filename, and (c) convert the result back to UTF-8.
Unfortunately, this whole area is a minefield. We can neither assume
that all local filenames have the same encoding, nor that every filename
has the same encoding as the file's contents. In any case, binary file
contents have no encoding at all. Also:
- What about filename collisions? They will work only if we GUESS
right about the encoding for the local filename before we have the
information we must know to guess right (the transfer character set).
- What about "set file names converted"? Case folding in Unicode
requires lookups in two big databases.
- Do we allow filenames to include combining characters? If we do,
that's MORE database lookups (character properties) AND a sort, e.g.
to convert to Normalization Form C. What if there is no mapping to
the current character set?
- etc etc etc... No doubt other problems would surface in the course of
implementation.
. When sending files, we get the local file's character set with filescan
or whatever, and then ASSUME the filename is in the same character set,
and convert to UTF-8 for the F-packet. Obviously this assumption might
be wrong. Perhaps we can check it by filescanning the name itself, but
names are generally not long enough to give a reliable result. Even if
this works:
- The ACK comes back in UTF-8. How do we display it? Convert it to the
local FILE character set? How do we record it in the Transaction log?
Do we need to let the user specify not only the file and transfer
character sets, but also the console and log character sets?
- etc etc.
This is not as simple as it seemed at first!