home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Columbia Kermit
/
kermit.zip
/
archives
/
protocol.tar.gz
/
protocol.tar
/
checkpoint.txt
< prev
next >
Wrap
Text File
|
1993-08-16
|
80KB
|
1,778 lines
KERMIT CHECKPOINT RESTART CAPABILITY REQUIREMENTS
August 16, 1993
Author:
Frank da Cruz
Columbia University
Internet: fdc@columbia.edu
Telephone: W: 854-3508, H: 866-4894
Prepared By:
Computer Sciences Corporation
M/C 265
3160 Fairview Park Dr.
Falls Church, VA 22042
Table of Contents
INTRODUCTION
REQUIREMENTS DEFINITION
DESIGN
PARAMETERS
CONTROLLING THE CHECKPOINT FEATURE
NEGOTIATION OF CHECKPOINTING PROTOCOL
THE CHECKPOINT SYNC PACKET.
THE FILE TRANSFER PHASE
SPACE-CHECKING
TERMINATING A SUCCESSFUL TRANSACTION
TAKING CHECKPOINTS
TEXT TRANSFER MODE
BINARY TRANSFER MODE
RECORDING CHECKPOINTS
THE CHECKPOINT REQUEST AND CONFIRMATION PACKETS
THE RECOVERY FILE
THE RECOVERY PROCESS
SERVER-MODE CONSIDERATIONS
SECURITY CONSIDERATIONS
MULTI-FILE TRANSFERS
AUTOMATIC REESTABLISHMENT OF CONNECTION
RECOVERY FROM AN INTERRUPTED TRANSFER THAT WAS NOT CHECKPOINTED
SUMMARY OF NEW COMMANDS AND PROTOCOL MESSAGES
PROJECT OVERVIEW
DEVELOPMENT AND IMPLEMENTATION
DOCUMENTATION
TESTING
TIMELINE
ESTIMATED BUDGET
LICENSING
REFERENCES
.c.INTRODUCTION
This report outlines a method for restarting a Kermit file transfer from a
point of failure that should work correctly and dependably for all types of
files independent of the underlying operating system and file system, plus a
tentative implementation plan for MS-DOS, Windows, OS/2, UNIX, (Open)VMS,
VM/CMS, MVS/TSO, and CICS.
This is a preliminary discussion of the design, and an estimate of the cost to
create this functionality.
Familiarity with Kermit file transfer protocol [1] is assumed, as well as with
the operation of popular Kermit programs such as MS-DOS Kermit [2], C-Kermit
[3] and IBM Mainframe Kermit [4].
.c.REQUIREMENTS DEFINITION
The fundamental requirement of this project is the addition of a
restart-from-point-of-failure capability to Kermit file transfer protocol and
software. This means that the transfer of a particular file can be resumed
where it was interrupted, e.g. by loss of connection, with a minimum of
retransmission overhead, and with the resulting destination file exactly as it
would have been if it had been transferred successfully without interruption.
In particular:
1. A checkpoint/restart-capable Kermit program should be fully interoperable
with a Kermit program that does not have this capability.
2. Recovery must work for both text and binary files.
3. Recovery methods must workable between any pair of
computer/operating-system platforms, and be easily adaptable to future
systems.
4. Recovery must not require the two computers to have similar file formats.
5. The design must not lock out any popular type of computer or file system.
6. The design must not depend on specific capabilities that some computers or
operating systems are likely to lack.
7. Automatic (unattended) recovery should be possible.
8. Manual (attended) recovery must be possible when automatic recovery is not.
9. The net result of recovery must be a received file identical to what would
have been received in an uninterrupted transfer.
10. Within reason, the constraints of the checkpointing mechanism should not
cause checkpointed transfers to fail in cases where non-checkpointed
transfers would succeed, nor vice versa.
11. Neither Kermit program should make assumptions about the internal
operation of the other, nor about the other's underlying file system.
12. Checkpointing should operate independently of the underlying communication
and protocol settings. That is, it should work uniformly on serial and
network connections, slow and fast connections, full and half duplex
connections, 7- and 8-bit connections, with or without sliding windows or
long packets, with or without text-file character-set conversion, and so
on.
13. It is not desirable to invent a new notation or language for recovery
information. Ordinary Kermit commands, parsable by the existing command
parsers, should be used to record and recover checkpointing information.
Commands and terminology should be internally consistent with each
Kermit software version, and that are also uniform among versions.
To assure that our design meets these requirements, we will implement it on
the following diverse platforms:
1. PCs with MS-DOS or Windows (MS-DOS Kermit) and PCs with OS/2 (C-Kermit).
These computers have a sequential stream-oriented file system, in which
text files consist of lines terminated by CRLF.
2. All computers running the UNIX operating system or any of its many
variants as well as Data General MV-series computers running the AOS/VS
operating system (C-Kermit). UNIX and AOS/VS have sequential
stream-oriented file systems, in which text files consist of lines
terminated by LF, and thus change size when they are transferred to
(say) MS-DOS.
3. DEC VAX or Alpha AXP computers running the OpenVMS operating system
(C-Kermit). OpenVMS has an extremely complex record-oriented file system,
with many different record formats and file attributes. Both text and
binary files almost always change size and format when transferred to
non-VMS systems.
4. IBM mainframes with the VM/CMS, MVS/TSO, and CICS operating systems (IBM
Mainframe Kermit-370). IBM mainframe operating systems have complicated
record-oriented file systems, but with details and capabilities different
from those of OpenVMS. In addition, text files are encoded in EBCDIC
rather than ASCII-based codes. As with VMS, both text and binary files
almost always change size and format when transferred to non-VMS systems.
Thus, checkpoint/restart capability will be added to three separate Kermit
software programs, each of which can be built for and/or executed on various
different hardware platforms and/or software environments. These are, at
present and for the foreseeable future, the three major Kermit software
programs.
It is recognized that the immediate requirements of the contractor might not
call for checkpoint/restart-capable Kermit software on all these platforms,
but it is essential that we obtain operational proof-of-concept over a wide
variety of computers and file systems to be reasonably certain that our design
is adequate to cover all contingencies.
Before checkpoint/restart capability can be added to a Kermit software
program, the program must already include the following capabilities:
- Basic Kermit file transfer protocol.
- Attribute packets: file modification date/time, transfer mode (text/binary).
- An interactive command parser.
- The ability to execute commands from files.
These capabilities are available in C-Kermit, MS-DOS Kermit, and IBM Mainframe
Kermit. Note, in particular, that long packets, sliding windows,
international character sets, single shifts, locking shifts, and other
optional negotiated protocol features are neither required nor prohibited.
In addition, in order to automatically establish and reestablish connections,
a Kermit program must support:
- Local-mode operation
- Connection-establishment commands such as DIAL or TELNET
- A script programming language (INPUT, OUTPUT, IF, GOTO, etc)
These capabilities are available in MS-DOS Kermit and C-Kermit, but not in
IBM Mainframe Kermit. IBM Mainframe Kermit is never the initiator of a
connection.
Finally, the underlying operating system, file system, and programming
interface must provide the following capabilities:
- To restart a transfer from the point of failure, the file sender should be
capable of positioning its file pointer to a given byte or record within the
input (source) file. Thus, the source file must be on a random-access
device.
- The file receiver is assumed to be able to append new material to the end of
an existing file, or at least to be able to append two files together. The
destination file need not be on disk -- it can also be a printer or other
type of sequential output device.
- To keep crash-resistent recovery files, both the file sender and receiver
must be capable of appending new material to the end of an existing file.
It is believed that every operating system offers these features.
.c.DESIGN
File transfer failures can be recoverable or unrecoverable. If the Kermit
program can determine the reason for a protocol failure, it must set a return
or status code accordingly, which can be tested to determine whether automatic
recovery should be attempted. This will require:
- Assignment of standard error codes for transmission in error packets.
These would be numeric strings at the head of the Error-packet field,
which would cause no problems with Kermit programs which did not
understand them, but which could be used by updated programs.
- Creation of some kind of status variable that can be queried by a script
program, e.g. \v(recovery) in MS-DOS Kermit or C-Kermit. This variable
would be set locally, or from an incoming E-packet's error code.
A recoverable failure is one that can be handled AUTOMATICALLY by a
checkpoint-restart mechanism. These include:
- Loss of connectivity, e.g. a dialup or network connection that was dropped,
but which can be reestablished a short time later.
- System failure, e.g. one of the two systems crashed for a short time.
An unrecoverable failure is one that can NOT be handled AUTOMATICALLY by a
checkpoint-restart mechanism. Examples include:
- Destination disk filled up or storage quota exceeded.
- Incorrect communication or protocol settings that prevented the transaction
from beginning successfully.
- Lack of sufficient transparency on the communication channel; for example, a
device that changes modes when it receives a certain sequence of characters.
- A system, component, or connection method that disappeared forever, and
similar "natural disasters".
Note that most unrecoverable failures can still be recovered manually. For
example, by fixing a broken computer, changing protocol or communication
settings, cleaning up a full disk.
.c2.PARAMETERS
Terminology:
- In any particular connection, the Kermit program that originated the
connection is in LOCAL MODE, and the other Kermit program is in REMOTE
mode. Similarly, one Kermit program is the file SENDER and the other is the
file RECEIVER.
- The file being transferred is the SOURCE FILE from the SENDER's point of
view, and is the DESTINATION FILE from the receiver's point of view.
The information is required to recover a failed transfer is the information
necessary and sufficient to locate and verify the source and destination
files, plus the information required for the sender to interpret the contents
of the source file, and the receiver to interpret the contents of the incoming
data packets:
- The TYPE OF TRANSFER: TEXT or BINARY.
- The fully qualified FILE SPECIFICATION of the source file: node, device,
directory, version, etc, sufficient to locate the same file again. If a
fully qualified name is not available, then the RELATIVE NAME plus,
separately, the DEVICE, DIRECTORY, and any other necessary location
information.
- The SIZE and MODIFICATION DATE AND TIME of the source file, to verify it has
not changed.
- Kermit's LOCAL ACCESS METHOD for the source file: TEXT, BINARY, MACBINARY,
V-BINARY, D-BINARY, LABELED, IMAGE, BLOCK, etc. These items depend on the
Kermit implementation and the underlying file system.
- Any QUALIFIERS necessary for the source-file access method: ORGANIZATION
(sequential, indexed, relative, random, etc), RECORD FORMAT (fixed,
variable, variable with fixed header, stream CR, stream LF, stream CRLF),
RECORD-LENGTH, CARRIAGE CONTROL, MARGINS, etc.
- For text-mode transfers, the source FILE CHARACTER-SET.
- For text-mode transfers, the TRANSFER CHARACTER-SET.
- The fully-qualified FILE SPECIFICATION of the destination file: node,
device, directory, version, etc. or the relative name plus other location
information.
- Kermit's LOCAL ACCESS METHOD for the destination file: TEXT, BINARY,
MACBINARY, V-BINARY, D-BINARY, LABELED, IMAGE, BLOCK, etc.
- Any QUALIFIERS necessary for the local access method: ORGANIZATION, RECORD
FORMAT, RECORD-LENGTH, CARRIAGE CONTROL, MARGINS, etc.
- The SIZE of the destination file, to tell whether it has changed.
- For text-mode transfers, the local FILE CHARACTER-SET of the destination
file.
The information given above points up a minor inconsistency in Kermit command
nomenclature. The command:
SET FILE TYPE { TEXT, BINARY, <others> }
actually does two things. It defines the local file access method, and,
by implication, also the transfer mode. Examples:
SET FILE TYPE TEXT -- implies TEXT transfer mode
SET FILE TYPE BINARY -- implies BINARY transfer mode
SET FILE TYPE IMAGE -- implies BINARY transfer mode
SET FILE TYPE V-BINARY -- implies BINARY transfer mode
SET FILE TYPE LABELED -- implies BINARY transfer mode
Strictly speaking, these are separate issues. We might, for example, want to
transfer a text file in binary mode, but using local access methods
appropriate for text files. Or we might want to transfer a binary file in
text mode in order to get CRLFs appended to each record. It is therefore
worth distinguishing, at least conceptually, between the FILE TYPE and the
TRANSFER MODE, and postulating (if not requiring) the availability of a new
command:
SET TRANSFER MODE { TEXT, BINARY }
.c2.CONTROLLING THE CHECKPOINT FEATURE
Checkpoint-restart capability might add perceptible overhead to file transfer
operations. Obviously, every attempt will be made to ensure that the
checkpoint-restart implemetation is as efficient as possible, but the priority
must be ironclad reliability. As currently envisioned, however, checkpointing
overhead will occur because separate recovery files must be maintained, files
must be closed and opened repeatedly, and additional messages must exchanged
throughout file transfer. For this reason, and for compatibility with earlier
Kermit software releases, this capability WILL NOT BE USED unless specifically
requested. The command is:
SET CHECKPOINT { ENABLED, DISABLED, ON }
ENABLED will be the default. It means "I will do checkpointing if requested"
by the other Kermit. DISABLED means "I won't do it", period. ON tells your
Kermit program to actively negotiate the use of checkpointing with another
Kermit program. For checkpointing to take place, at least one of the Kermits
must SET CHECKPOINT ON, and the other must SET CHECKPOINT ON or ENABLED. When
recoverability is always the priority, SET CHECKPOINT ON can be included in
the Kermit initialization file.
There must also be a control over how frequently checkpoints are taken:
SET CHECKPOINT INTERVAL <number>
where <number> is the number of transmitted bytes at or after which a
checkpoint should be taken. The default is implementation-dependent, and also
dependent on the type and characteristics of the file. Let's say the nominal
default is around 10K. At 2400 bps -- a common dialup transmission speed --
this amounts to about 45 seconds of transfer time. At 9600 bps, it's only
about 11 seconds and, naturally, decreases with transmission speed.
Checkpoint information is kept in a separate "recovery file" by each transfer
partner. The user should be allowed to specify the name of this file, even
though this can complicate checkpointing setup for the user as well as the
recovery process, particularly automated recovery. The advantage of this
feature is that it allows multiple recoveries to be pending. For example, the
user might have an automated procedure that connects to several hosts or
services each night and transfers some files. If one of these operations
fails, it would be desirable to go on immediately to the next one, rather than
wait the indefinite amount of time required to recover from the failed one,
even if more than one transfer had failed.
The following command can be used to specify the name of the recovery file:
SET CHECKPOINT RECOVERY-FILE <filespec>
If this command is not given, an implementation-dependent default is used,
which should be a fully qualified absolute pathname, so it can be found
automatically in the event of an unattended restart. In practice, this would
be a file of a certain name that is located (for example) according to the
same rules as the initialization file. Examples:
UNIX: .kermrf in the user's home (login) directory (rf = recovery file)
MS-DOS: MSKERMIT.RF in the same directory as the MSKERMIT.INI file
OS/2: CKERMIT.RF in the same directory as the CKERMIT.INI file
VMS: CKERMIT.RF in the user's home directory
VM/CMS: KERMIT RF A1 (???)
NOTE: The recommended device for recovery files on PCs is the boot drive,
since drive letters of other drives can change unexpectedly, e.g. when file
servers are involved.
There are, of course, dangers in recording information in separate recovery
files. For example, there might not be sufficient disk space for a recovery
file. In particular, it will not be possible to send a file with
checkpointing from a computer whose storage is completely full or
write-protected; in such cases, the SET CHECKPOINT RECOVERY-FILE command
allows the recovery file to be placed in a separate storage area.
More subtly, a recovery file might grow to fill available storage on the file
sender, receiver, or both. Before proceeding, let's consider this situation.
Suppose that a particular file transfer would have succeeded without
checkpointing, but would fail with checkpointing because the recovery file
filled up the disk, or there was an I/O error writing the recovery file, or
there was some kind of checkpoint-related protocol error (e.g. caused by a
programming mistake). Should the transfer fail? The user should be given the
choice. This can be accomplished with another SET CHECKPOINT command:
SET CHECKPOINT ERROR-ACTION { PROCEED, QUIT }
The default action should be PROCEED, so that a file transfer will not fail
simply because it is checkpointed. In this case, the transfer continues but
checkpointing is canceled. When QUIT is elected, a checkpointing failure
(e.g. failure to write to the recovery file) is fatal, and the transfer is
canceled by an Error packet.
SET CHECKPOINT commands can be given to the file sender or the file receiver
or both. Checkpointing may be initiated by either party to the file
transfer. CHECKPOINT ERROR-ACTION QUIT, given to either party, is sufficient
to stop a transfer when a checkpointing error occurs.
Finally, there should be a command:
SHOW CHECKPOINT
This displays the current SET CHECKPOINT settings, and whether an active
recovery file exists.
.c2.NEGOTIATION OF CHECKPOINTING PROTOCOL
The protocol initialization string (the data field of the S and I packets, and
of their acknowledgements) contains the following new fields for checkpoint
negotiation:
10 new new
---+-----------+-------+--------+--------+--------+--------+
... | CAPAS ... | WINDO | MAXLX1 | MAXLX2 | CHKPNT | CHKINT |
---+-----------+-------+--------+--------+--------+--------+
These fields are positional. The CAPAS field (capabilities mask), beginning
at position 10, is extensible to multiple bytes by setting its low-order bit
(currently it occupies only one byte). The WINDO byte is the first byte after
the last CAPAS byte (we call this position CAPAS+1, currently byte 11).
MAXLX1 is at CAPAS+2, and so forth.
The Attribute Packet Capability bit must be set in the capabilities mask.
If it isn't, the following items are ignored and checkpointing is not done.
If the Kermit program does not support lower-numbered fields (e.g. WINDOW,
MAXLX1, MAXLX2), then their positions must be filled with blanks so that the
CHKPNT field is at the CAPAS+4 position and the CHKINT field takes up the next
three bytes.
The new fields are encoded as follows:
1. CHKPNT, 1 byte, values:
0 = WONT I won't do it (SET CHECKPOINT DISABLED)
1 = WILL I will do it if asked (SET CHECKPOINT ENABLED)
2 = DO Please do it (SET CHECKPOINT ON)
Anything else (including absence of this byte) is interpreted as WONT.
These work as follows:
Sender Receiver Checkpointing Initiator
WONT (any) No None
WILL WONT No None
WILL WILL No None
WILL DO Yes Receiver
DO WONT No None
DO WILL Yes Sender
DO DO Yes Both
2. CHKINT, checkpoint interval: 3 bytes, containing a base-95 number, with
digits in the normal offset-32 notation (SP = 0, ..., ~ = 95). Maximum
value is 857374 = 95^3 - 1. If this field is missing from, or incomplete
in, the receiver's ACK packet, or is zero (SP SP SP), checkpointing is not
done. The protocol should allow any checkpoint interval at all, even an
interval of one byte, but the implementation (e.g. the command parser)
can prevent the user from selecting nonsensical values.
The FILE SENDER sets this field to the largest value it can handle. For
example, if the file sender is limited to 16-bit arithmetic, it might send a
value of 65536. If the file sender has no particular limit on its checkpoint
interval, it should set it to the maximum: 857374 (~~~).
The FILE RECEIVER tells the file sender the checkpoint interval that should
actually be used. This value must be no larger than the CHKINT value sent by
the file sender. It may be any value equal to or less than the sender's
value.
In cases where checkpointing is supported but not elected (i.e. CHKPNT = 0),
the content of the CHKINT field is immaterial. However, if the CHKPNT field
is present, then the CHKINT field is required too. In that case, the
recommended contents for the CHKINT field is "___" (three underscores) to
allow easy (human) identification.
.c2.THE CHECKPOINT SYNC PACKET
At the beginning of a checkpointed (or recovery) file transfer, after the
A-packet but before the first data packet, there is a CHECKPOINT SYNC packet.
Its packet type is H. Its data field contains the following information:
<len><xfer-id><len><checkpoint-id>
The <len> fields are single-character numbers in base-95 excess-32 notation.
The <xfer-id> is an identifier for this file transfer. This is a dynamically
computed quantity that should be more-or-less globally unique, and so a
many-digit date-and-time stamp, accurate to at least the second, would be a
good choice, for example: 930808152832. The <checkpoint-id> is described
later, but on an "original" (first attempt, non-recovery) file transfer,
it is the null string, i.e. its <len> field is 0 (SP).
If the CHECKPOINT SYNC packet fails to appear when expected -- that is, if a
Data (D) or End-Of-File (Z) packet appears when an H packet is expected (this
should not happen) -- the transaction is cancelled with an Error packet (if
CHECKPOINT ERROR-ACTION is QUIT) or else checkpointing is disabled and the
file transfer proceeds.
.c2.THE FILE TRANSFER PHASE
With checkpointing enabled, normal (i.e. non-recovery) file transfer proceeds
as follows. For each file:
- The file sender sends the F packet.
- The file receiver acknowledges it.
- The file sender sends one or more A-packets.
- The file receiver acknowledges the A-packets, accepting or rejecting the
file.
If the file is accepted:
- Upon receipt of the file acceptance notification (in the ACK to the
A-packet), the file sender opens a new recovery file (overwriting any
previous recovery file of the same name), computes the Transfer ID, writes
it to the recovery file, writes the "prelude" (file name, settings, etc) to
it (discussed below), closes it, and then sends a CHECKPOINT SYNC (H) packet
with the Transfer ID and with a null (zero-length) Checkpoint ID. If
creation and initialization of the recovery file fails, the file sender
first ensures that the recovery file is destroyed, and then sends an Error
packet if CHECKPOINT ERROR-ACTION is QUIT, otherwise sends an H packet with
a null (zero-length) Transfer ID to cancel checkpointing operations.
- Upon receipt of an H packet containing a null Transfer ID notice, the file
receiver cancels checkpointing operations if its CHECKPOINT ERROR-ACTION is
PROCEED, and ACKs the H packet, with an uppercase letter X occyping the data
field of the ACK. If its CHECKPOINT ERROR-ACTION is QUIT, it responds with
an E packet to cancel the entire file transfer.
- Upon receipt of an H packet containing a valid, non-null Transfer ID, the
file receiver opens and initializes its own recovery file (deleting any
previous recovery file), and ACKs the H-packet. The ACK contains the same
Transfer ID and the receiver's checkpoint ID, which, on a non-recovery
transfer, is also null. If the file receiver failed to open and initialize
its recovery file, then, if CHECKPOINT ERROR-ACTION is PROCEED, it places an
uppercase latter X in the data field of the ACK to the H packet; if it is
QUIT, then an Error packet is sent.
- At this point, data packets will start to arrive (unless the source file is
empty). The file receiver writes incoming file data out to a TEMPORARY FILE
rather than to the real output file. (The temporary file, obviously, must
be created in such a way as not to overwrite any existing files.)
- Checkpoints are taken and recovery files updated as described below.
- If a fatal error occurs during the data transfer phase, an error packet is
sent and a status code should be set to indicate the cause of the failure, so
the higher-level procedures can decide whether the failure is recoverable.
- Upon receipt of a Z packet, the file receiver takes the normal actions:
closes the output file and responds with an ACK if and only if the file was
closed successfully, otherwise with an E packet.
If possible, the partial destination file's modification date / time should be
reset from the A-packet value each time the file is closed, to ensure that the
destination file can be correctly identified should recovery be necessary. In
any case, the date/time should be set when the file is succesfully (fully)
received and closed.
.c2.SPACE-CHECKING
An optional feature of Kermit protocol and software is the ability to check
available disk space before agreeing to accept an incoming file. The file
sender includes the file size (at best, an approximation, since it does not
know what transformations will be done by the receiver); the receiver
compares this number against available disk space, IF IT HAS THIS ABILITY
(certain operating systems, notably UNIX and MVS/TSO, offer no good way to do
this).
The use of temporary files and recovery files during checkpointing must be
accounted for in the space calculation -- that is, the receiver must compare
available space against the incoming file's size PLUS the negotiated
checkpoint interval PLUS the estimated maximum size for the recovery file
(if it is on the same storage device), with the customary allowance for
expansion, depending on the transfer mode and operating systems involved.
.c2.TERMINATING A SUCCESSFUL TRANSACTION
At the end of a successful transaction (B packet sent and ACK'd), both
recovery files can (and should) be deleted. Thus, recovery files are deleted
at the beginning of each file transfer and at the end of the transaction (this
prevents the final recovery file from remaining on disk after a transaction
is completed successfully).
If the B packet is ACK'd but the ACK is never received, the sender can still
delete its recovery file, because it knows the (last) file was received
successfully, since the End-Of-File (Z) packet had already been ACK'd.
.c2.TAKING CHECKPOINTS
When should checkpoints be taken? We have to satisfy the constraints of both
the sender and receiver. Record-oriented file systems cannot be expected to
write out a partial record, close the file, reopen it in append mode, and
finish the partial record later. Thus checkpoints must be taken at record
boundaries when one or both of the file systems involved is record-oriented.
Text- and binary-mode transfers, however, must be handled in different ways.
.c2.TEXT TRANSFER MODE
We take it for granted that all computer operating systems are capable of
writing out a record (line) to a text file, no matter what the record
format. We do not assume that an operating system can write partial lines.
Therefore, in text mode transfers, the file sender must send checkpoint
requests only on record (line) boundaries.
This means that the data packet preceding a checkpoint request might not be
filled to capacity and, in fact, could be very short. This should cause no
protocol or data-integrity problems, but will, of course, have a slight impact
on performance.
If a text line is longer than the checkpoint interval, there is no choice but
to postpone the checkpoint until the end of the record, because we can not
assume that the receiver can commit a partial record to disk.
Thus, in text mode, we view the checkpoint interval as a MINIMUM rather than a
maximum, which simplifies matters quite a bit. If we had to send a checkpoint
*before* the checkpoint interval, there would be a need for record-oriented
lookahead, and we would still need special handling for the case in which a
record was longer than the checkpoint interval. But note that this strategy
also precludes the use of in-memory buffers in lieu of temp files, since there
is no limit on the amount of data that might need to be stored in such a
buffer.
.c2.BINARY TRANSFER MODE
Protocols like ZMODEM include a checkpoint-restart capability for binary files
based on the assumption the length, format, and layout of a binary file will be
exactly the same on both ends. Nothing special happens during a normal file
transfer. To recover a binary-mode transfer, the file receiver sends the
length of the partially-received destination file back to the file sender; the
file sender positions its file pointer to the corresponding next byte in the
source file and resumes sending from there. This method assumes that both
systems have a stream-oriented file system in which the file length is recorded
as an exact number of bytes and that a byte-oriented file pointer capability
is available to the sender. There are numerous exceptions to this model.
When transferring in binary mode, record boundaries will still be important if
the file receiver has a record-oriented file system, and thus checkpoints
should still occur only on record boundaries. But in this case, how does the
file sender know when to send checkpoint requests?
Conversely, the file sender might have a record-oriented file system, and can
only restart a transfer from a record boundary.
In the worst case, both systems are record-oriented, but use different record
lengths.
Assumptions:
1. On record-oriented systems, binary files have either fixed-length records
or else a fixed-length "allocation unit" (e.g. blocksize).
Discussion: CMS MODULEs and VMS object files are examples of binary files with
variable length records. Each record includes a header giving its length.
Normally, the record header is NOT considered part of the data.
The Kermit protocol has a mechanism for dealing with such files, but this
method has never been used because when such a file is sent to a
stream-oriented file system, there is no way to preserve the record boundaries
without also including the record headers. Therefore, all existing Kermit
programs transfer such files by including the record headers as part of the
data itself. The receiver is ignorant of the difference between files encoded
this way and ordinary stream-binary files.
To accomplish such transfers, the Kermit program on the record-oriented system
is put into a special "local file mode", known only to itself, such as
V-BINARY (VM/CMS) or LABELED (VMS). Files sent in such modes to
non-record-oriented systems are said to be "archived", since the result
contains structuring information as well as file data, and is, in general,
not useful on the system to which it has been sent. Rather, it is designed
to be sent back to the type of system on which it originated, where it can be
restored to its original (useful) format.
2. One Kermit program cannot be expected to understand the archiving format
of a different Kermit program.
3. Checkpoint requests can NOT be initiated by the file receiver.
Therefore archived files are transferred in regular binary mode, and if record
length is an issue, it must be handled with a fixed number, whose value is to
be determined.
Facts:
1. There is presently no mechanism in the A- or F-packet exchange for the file
receiver to tell the sender the destination file's record length, blocksize,
etc (let's call this the "allocation unit").
2. It does not make sense to do this in the S-packet exchange, because the
allocation unit can change from file to file.
Therefore, we need to invent new syntax for the ACK to the A packet, in which
the receiver informs the sender of its file allocation unit. This will be the
new attribute tag '3' (ASCII 51). In the sender's attribute packet, this
works in the normal way: the sender informs the receiver of its allocation
unit:
3<len><number>
e.g.:
3#512
However, the treatment of this attribute in the receiver's ACK to the
attribute packet must be different from how other items in the Attribute ACK
are handled. Normally, the file receiver's ACK contains Y or N to accept or
reject the file, respectively, followed by a list of attribute tags, but with
no associated data. The '3' tag, however, will have to carry data in the
Attribute ACK. This is an ugly special case, but it is preferable to
exchanging an extra packet to convey this information. The '3' tag is
followed by a single-character base-94 offset-32 length field, and then a
numeric value. A value of 0 means "I don't care", and a value of 1 means that
the Kermit program is capable of writing one byte at a time to an output file
(in practice, 0 and 1 would be equivalent). A value of 2 might be used by
systems (like PRIME) that can do i/o only in "words" rather than bytes.
Record-oriented systems would specify values like 80, 128, 512, 800, etc.
NOTE: The effect of this field when received by Kermit programs that
are not aware of it must be considered. Such Kermit programs will not
understand the '3' and might misinterpret the subsequent data as
attribute tags.
Now, assuming we have a mechanism to allow the receiver to inform the sender
of the destination file's allocation unit, the sender must compute a
checkpoint interval that allows checkpoints to occur on record boundaries that
the source and destination files share in common. This would be a number into
which both the source and destination record lengths divide evenly, and which
is also in the neighborhood of the desired checkpoint interval, e.g. 10240 for
512 and 80; in the worst case it would be the product of the two numbers.
In the most common cases, e.g. UNIX, MS-DOS, etc, there are no records and
therefore binary-mode checkpoints can occur anywhere at all. In the case where
only one Kermit is record-oriented, the sender can choose any value close to
the negotiated checkpoint interval that is a multiple of the record size.
NOTE: The precise mechanism for binary mode checkpointing will require
further study and refinement during the development stage.
.c2.RECORDING CHECKPOINTS
Checkpointing must be entirely consistent with sliding windows. Checkpoint
requests and confirmations should flow smoothly among the data packets, which
means that checkpoint requests and confirmations can be widely separated in
time.
Since checkpoint requests and confirmations are separate packets, there can
never be more than 31 of them in the window, since 31 is Kermit's maximum
window size. In fact, there will always be at least one data packet between
checkpoints, so no more than 16 checkpoint requests would ever be in the
window.
Each checkpoint is assigned a serial number, or ID, on which the two Kermit
programs can synchronize during recovery. Since there can never be more than
16 checkpoint requests outstanding, the checkpoint ID ranges from 0 to 15 and
then recycles.
To handle checkpoints in the general case (windowed as well as non-windowed
transfers), the file sender keeps a checkpoint window, implemented as a
16-element array indexed by the Checkpoint ID, which contains the recovery
information associated with each checkpoint. The checkpoint window is
guaranteed to contain all the checkpoints that are also in the packet window.
A CHECKPOINT RECORD is written to the recovery file for each checkpoint.
The format of a checkpoint record is:
CHECKPOINT <id> <recovery-info>
where the Checkpoint ID is a decimal number, 0-15, and the system-dependent
recovery information is as follows:
- For the FILE SENDER: how to identify the point in the source file
corresponding to the checkpoint, e.g. a file pointer to the next byte to be
read from the file, or the number or location or ID of the next record.
- For the FILE RECEIVER: the size of the destination file after the checkpoint
operation is completed, expressed in units appropriate to the file system:
bytes, blocks, etc. It is, however, essential that this number grow as each
checkpoint is recorded.
Examples:
CHECKPOINT 0 10240
CHECKPOINT 1 20480
.c2.THE CHECKPOINT REQUEST AND CONFIRMATION PACKETS
Checkpoint requests are made by the FILE SENDER by sending a discrete packet,
with a new packet type of J. The CHECKPOINT REQUEST packet contains the
Checkpoint ID as a decimal ASCII numeric string, "0"-"15", in its Data field.
The CHECKPOINT CONFIRMATION packet is simply an Acknowledgement (Y) for a
CHECKPOINT REQUEST packet, containing the same Checkpoint ID in the same
format. If the data field of the J packet or its ACK contains the uppercase
letter X instead of a numeric Checkpoint ID, this indicates a checkpointing
error, which is to be handled according to the CHECKPOINT ERROR-ACTION
setting.
A checkpoint is taken as follows:
1. Sender opens the recovery file in append mode, writes a checkpoint record
into it, and then closes it. If this operation fails, the transfer is
canceled with an E packet, or checkpointing is canceled with a J(X) packet,
according to CHECKPOINT ERROR-ACTION.
2. Sender sends a J packet with the checkpoint ID in the data field, for
example J3.
3. Upon receipt of the J packet, the file receiver performs the following
actions:
a. Closes the temp file to ensure all data has been written out to it.
b. Creates the destination file if it doesn't exist yet.
c. Appends the temp file to the destination file. NOTE: There is a
window of vulnerability if the computer should crash at this point,
or if the append operation succeeds, but fills the disk: the destination
file is updated, but the update is not recorded in the recovery file.
This situation is detected and handled during the recovery operation.
d. If and only if all the above actions were successful, and if the J
packet did not contain the "X" cancellation indicator, the file receiver
opens its recovery file in append mode, writes the current checkpoint
info to it, and closes it.
e. Deletes the temp file, creates a new one, and opens it for write access.
f. If and only if all the above actions were successful, the receiver sends
a CHECKPOINT CONFIRMATION (ACK with Checkpoint ID in Data field) back to
the file sender. Otherwise, the error is handled according the the
CHECKPOINT ERROR-ACTION setting: if PROCEED, cancel checkpointing and
respond with X in the data field of the ACK; if QUIT, send an E-packet.
Observe that the connection can fail after the J packet has been sent, but
before it was received, and therefore the two recovery files will be out of
sync. Similarly, the connection can fail after the ACK is sent but before it
is received. It is impossible to devise a strategy to assure that the two
recovery files always WILL be in sync, especially on a long-delay connection
with sliding windows active.
The simple strategy given above resolves this dilemma by ensuring that when
the recovery files ARE out of sync, that the SENDER IS ALWAYS AHEAD of the
receiver. We know that it is possible for the sender to move its source-file
pointer back to any desired position (byte or record), but we cannot make any
such assumption about the file receiver. For all practical purposes, the
destination file could be a printer, a deck of cards, or a punched paper tape,
where what is done cannot be undone.
.c2.THE RECOVERY FILE
Since a file transfer failure might have been caused by a computer crash,
information about the transfer must be recoverable after a computer restart.
Therefore it must be recorded on a nonvolatile device. This would normally be
in the file system as a separate file on disk. Each Kermit program keeps its
own recovery file.
The recovery file will not contain connection information such as phone
number, communication settings, etc. In order to reestablish a connection
automatically from the recovery file, it would be necessary to store a
password there, and this violates the most fundamental concepts of computer
security. Therefore, automatic connection reestablishment must be
accomplished using other methods, to be discussed later.
The recovery file must contain sufficient information to ensure that in a
recovery operation:
- The two recovery files apply to the same file transfer transaction.
- The correct source and destination files are identified for recovery, and
have not changed in the meantime.
- All settings that affect the final result of the transfer are the same as
in the original transfer operation.
- The two Kermit programs agree upon the exact point of failure.
The recovery file is composed of ordinary Kermit commands (some of them new)
and executed just like any other command file. Certain commands might make
sense only in recovery mode; those commands could be marked as invisible or
invalid in other modes.
A new recovery file is written for EACH FILE that is transferred. No attempt
is made to include transfer history for multiple files. There are several
reasons for this:
- The recovery file could become quite large.
- Processing of the recovery file could take a long time and cause a lot of
disk activity (e.g. accessing directory information for many files).
- Various complications arise when we allow the recovery file to apply to many
files. For example, ASSERT commands (see below) could fail, causing
premature termination of a RECOVER operation, even though the file that the
ASSERT commands apply to was transferred successfully (e.g. the file was
modified after it was transferred).
- There is no particular benefit in keeping records for multiple files. It
does not, for example, tell us which files were NOT transferred yet.
For this reason, a recovery file is created as the transfer of EACH file
begins, and is destroyed (only) after the file is transferred successfully.
The first command in the recovery file should be:
SET TRANSFER ID <text>
The transfer ID is the key that joins the sender's and receiver's recovery
files together.
Each Kermit program should then write the commands corresponding to all
settings that could affect the contents and form of the destination file, for
example:
SET FILE TYPE TEXT
SET FILE CHARACTER-SET CP850
SET TRANSFER CHARACTER-SET LATIN1
SET FILE ... (system-dependent things -- record length, etc)
Next we specify the direction of file transfer.
SET TRANSFER ACTION { SEND, RECEIVE, MAIL <address>, PRINT <options> }
And then, if necessary (that is, if fully qualified file specifications are
not available), we specify the current location at the time the SEND or
RECEIVE command was given. If this command appears in the recovery file, all
subsequent filenames that are not fully qualified are relative to this path:
CD <path>
In a moment, we will give the name of the file that is being transferred and
make several ASSERTIONS about it. An assertion fails if it is not
true. Therefore we must ensure that if any of the subsequent commands fail,
the recovery operation itself fails:
SET TAKE ERROR ON; (This is the syntax for C-Kermit)
(This would be equivalent, in MS-DOS Kermit 3.13 and earlier, to putting the
command IF FAILURE END 1 after each command.) It is essential that processing
fail if any of these assertions proves false, otherwise there is no guarantee
that the recorded checkpoints are accurate.
Now we identify the file:
SET TRANSFER FILE <name>
<name> is either a fully qualified path name or else, if a CD command was
given, a path name relative to the path given in the CD command.
If the TRANSFER ACTION is SEND, MAIL, or PRINT, the Kermit program also
obtains the file's size and its modification date and time, and then checks to
make sure they haven't changed. The new command, ASSERT, tells Kermit to
check that the given condition is true and to FAIL if it isn't:
ASSERT TRANSFER FILE DATE <modification-date-time>
This ensures the current modification date-and-time of the file given in the
SET TRANSFER FILE command are the same as the given date and time.
Similarly, the file sender also includes:
ASSERT TRANSFER FILE SIZE <number>
to ensures that the file's size is still the one given.
The remainder of the recovery file consists of checkpoint records and a final
STATUS record:
CHECKPOINT 0 <recovery-info>
CHECKPOINT 1 <recovery-info>
CHECKPOINT 2 <recovery-info>
...
At the end of a transaction, a STATUS statement records the status of the file
transfer:
STATUS <code>
The code is the numeric status code (values to be assigned). 0 means the file
was transferred successfully.
If there is no STATUS statement -- that is, if the file ends on a CHECKPOINT
statement -- it means the computer (or Kermit) crashed in the midst of file
transfer, and the status is assumed to be a recoverable failure.
SAMPLE RECOVERY FILE
Here is a sample recovery file for a successful file transfer, from the file
sender's point of view:
; ... PRELUDE
SET TRANSFER ID 930719152832 ; File transfer ID
SET FILE TYPE TEXT ; Transfer settings
SET FILE CHARACTER-SET CP850
SET TRANSFER CHARACTER-SET LATIN1
SET TRANSFER ACTION SEND ; We're sending files
CD /usr/olga ; Current directory
SET TAKE ERROR ON;
SET TRANSFER FILE NAME oofa.txt ; File identification
ASSERT TRANSFER FILE DATE 930808125959
ASSERT TRANSFER FILE SIZE 1234567
; ... CHECKPOINT HISTORY
CHECKPOINT 0 10240 ; Checkpoint records
CHECKPOINT 1 20480
CHECKPOINT 2 30720
...
STATUS 0 ; Completion status
.c2.THE RECOVERY PROCESS
A transfer failed, the connection is broken. The user reestablishes the
connection, logs back in to the remote computer, starts Kermit, and gives the
following new command:
RECOVER
or, to identify a non-default recovery file:
RECOVER <filespec>
The RECOVER command enables checkpointing automatically, so if the recovery
operation itself fails, it can be recovered just like any other interrupted
file transfer for which checkpoints were taken.
The RECOVER command is similar to the TAKE command in that it directs the
command parser to execute commands from a file, but with certain key
differences:
1. It sets a recovery-in-progress flag that persists until the transfer
described in the recovery file is complete (or fails).
2. It enables or recognizes certain commands that are invalid or ignored
during ordinary command processing (such as CHECKPOINT and STATUS).
3. It disables certain commands that are valid outside of recovery mode
(such as SEND, RECEIVE, CONNECT, EXIT, HELP, etc), to protect against
"hand-crafted" recovery files.
4. It enters protocol mode automatically upon encountering the end of a
valid recovery file.
The remote Kermit reads the recovery file, executes all the settings, makes
all the checks, etc, and if all is well, gives the KERMIT READY TO
xxx... message and enters packet mode ("xxx" is SEND or RECEIVE, depending on
the given TRANSFER ACTION).
Now the user escapes back to the local Kermit and gives a RECOVER command
there too. The local Kermit reads its own recovery file.
When the FILE SENDER (which may be the remote or local Kermit program) reads
the checkpoint records from its recovery file, it loads them into its
checkpoint window, so in case an earlier checkpoint must be used, it can be
located immediately without having to re-read the recovery file.
The FILE RECEIVER (which may be the local or remote Kermit program) reads
checkpoint records until it has found the last one. Now it compares the
<recovery-info> with the current status (most commonly, the size) of the
destination file, and then:
IF THE DESTINATION FILE IS BIGGER THAN THE SIZE RECORDED IN THE
FINAL CHECKPOINT RECORD, THE RECEIVER'S CHECKPOINT ID IS INCREMENTED
BY ONE (modulo 16).
If the destination file size is larger than the final recorded checkpoint, we
know that exactly one checkpoint had been taken but not recorded. This shuts
the "window of vulnerability" noted previously.
Each program enters packet mode upon encountering the end of the recovery
file, but only if the final entry was a CHECKPOINT statement or a STATUS
statement indicating a failure. Otherwise, an error message is printed and
the RECOVER command fails because there is nothing to recover.
After the normal S, F, and A packet exchanges, the file sender sends the
CHECKPOINT SYNC (H) packet, and the receiver checks it. If the Transfer IDs
don't agree, the transfer terminates in error.
NOTE: If, by chance, the wrong recovery file is used on one end, and we wind
up with two recovery files specifying the same TRANSFER ACTION (SEND or
RECEIVE), the operation will quickly fail with an unexpected packet type.
Now the checkpoints from the CHECKPOINT SYNC packet are compared. If they do
not agree, then -- by design -- the sender's will be the higher of the two,
and the sender rolls back its checkpoint to the one reported by the receiver.
This information is already loaded into the sender's checkpoint window from
the CHECKPOINT records in the recovery file.
Now the file sender positions the source file to the agreed-upon checkpoint
and begins sending from there. The file receiver writes out incoming data to
temporary files and appends them to the destination file in the normal manner.
Checkpoints are appended to the SAME recovery files that were used to launch
the recovery operation.
Note that the recovery-in-progress flag should inhibit the re-writing
of the recovery-file "prelude", i.e. the material preceding the first
CHECKPOINT record.
If a recovered transfer fails, the RECOVER command sets a failure code for
IF SUCCESS / IF FAILURE, and the recovery file -- perhaps with additional
checkpoints and status appended to it, is preserved so subsequent recovery
attempts can be made.
.c2.SERVER-MODE CONSIDERATIONS
Some sites might wish to run a Kermit program only in server mode. For
example, a Kermit server might be installed as the login shell on a particular
computer for users who log in as "kermit" or "guest". Or a Kermit server
might be set up on an Internet TCP socket, similar to an FTP server. Escape
to command mode might be disabled for security or other reasons.
Kermit servers, too, can participate in checkpointed file transfers. The
protocol and procedures are the same. Checkpointing must be initiated by the
client program unless the server has been told to SET CHECKPOINT ON before
entering server mode.
In order to recover an interrupted checkpointed file transfer when a Kermit
server is involved, a new protocol message is required by which the client
program instructs the server to recover the interrupted transfer. This will
be in the form of a "Generic" server command, packet-type G, new subtype O:
+-----+--------------------+
| G | O <len> <filename> |
+-----+--------------------+
Type Data
This packet would be sent to the server when the client executed the command:
REMOTE RECOVER [ <filename> ]
Normally, the filename would not be given, since the client would usually
have no way of knowing what it was. Thus the <len> would normally be zero
(expressed as a SP character).
Upon receipt of a REMOTE RECOVER command packet, the Kermit server would
behave exactly as if it had been given an interactive RECOVER command except
that any errors would cause the server to send an error packet and return to
server command wait, rather than setting a FAILURE status and returning to the
prompt. That is, neither success nor failure of the recovery operation should
cause the server to exit from server mode.
.c2.SECURITY CONSIDERATIONS
Kermit software programs should never give users access to files that they
would not otherwise have access to.
NOTE: The statement above is subject to minor caveats. For example,
in UNIX, it is sometimes necessary to grant a Kermit program
special privileges to access communication devices or UUCP lockfiles
or UUCP lockfile directories that are not normally accessible, but
these privileges should not otherwise amplify the user's access rights.
See the discussion in the UNIX C-Kermit installation notes, CKUINS.DOC.
Managers of multiuser computer systems in which it is possible to confer
privileges on a program are always cautioned to install Kermit software as an
ordinary, unprivileged user program. Obviously, this recommendation can not
be enforced any more for Kermit than it can for any other application software
program. Thus any discussion of security relative to Kermit software has to
assume it is installed according to recommendations.
During a checkpointed file transfer, unprivileged Kermit software programs
will not create any files that the user could not have created by other
conventional means. The additional files are the temporary files created by
the file receiver and the recovery files created by both sender and receiver.
Neither do recovery files themselves pose a risk, as long as the Kermit
programs are unprivileged. Recovery files do not contain passwords or other
authentication material. Even if users alter recovery files in an attempt to
gain access to forbidden information or resources, unprivileged Kermit
software programs will not grant them such access. That is, Kermit software
does not run with any kind of privilege or identity in checkpointing or
recovery mode that it does not ordinarily have.
Thus, addition of checkpoint/restart capability to Kermit software introduces
NO NEW SECURITY RISKS.
.c2.MULTI-FILE TRANSFERS
A multi-file transfer is the transfer of a group of files in a single Kermit
TRANSACTION; that is, a series of protocol messages initiated by an S-packet
exchange and terminated by a B-packet exchange. Zero, one, or more files may
be transferred in this way. Multi-file transfers are typically initiated by
the use of wildcards or with an MSEND command containing a file list.
The file list (either directly given or the result of wildcard expansion) is
not conveyed from one Kermit to another, nor is it necessarily recorded
locally. The order in which files are transferred cannot be guaranteed from
one transaction to another. Thus, the files themselves -- and their
operating-system-dependent attributes -- are the database from which we must
construct recovery information.
The Kermit protocol already offers a mechanism to recover from multi-file
transfers at the point of failure, on a per-file basis. To enable this type
of recovery, one of the following settings are given to the file RECEIVER:
SET FILE COLLISION DISCARD:
If a file arrives that has the same name as a file that already exists in
the current device/directory, the incoming file is refused via Kermit's
attribute refusal mechanism, and the existing file is preserved.
SET FILE COLLISION UPDATE
If a file arrives that has the same name as a file that already exists in
the current device/directory, AND the incoming file's modification date and
time is less than or equal to (older than) that of the existing file, the
incoming file is refused via Kermit's attribute refusal mechanism, and the
existing file is preserved.
This mechanism is independent of ordering, but entails a small amount of
overhead as S, F, and Z packet exchanges occur for each file already
transferred.
This mechanism can be used in conjunction with checkpoint-restart to recover a
multi-file transfer:
1. Recover the file that failed.
2. Resend the file group with the appropriate collision action selected
at the receiver.
.c2.AUTOMATIC REESTABLISHMENT OF CONNECTION
Connection establishment occurs before Kermit protocol is activated, using
commands like SET PORT, SET SPEED, DIAL, CONNECT, etc, and then by
authenticating oneself to the remote host or service. This process is easily
automated in MS-DOS Kermit or C-Kermit using a script program -- a procedure
written in the Kermit program's own command language. Here is a crude example
that would apply to both C-Kermit and MS-DOS Kermit.
set count 20 ; Try up to 20 times to transfer the file
set checkpoint on ; Turn on checkpointing
askq \%p Password: ; Prompt for password interactively
; (This is used by LOGIN.SCR)
:LOGIN
hangup
dial 7654321 ; Dial the phone number
if fail end 1
take login.scr ; Execute the login script
set file type text ; Set transfer parameters
output kermit\13 ; Start Kermit on remote end
input 5 ermit> ; Wait for prompt
if failure end 1 ; No prompt, fail
if > 0 \v(count) -
goto recover ; Go to separate section for recovery
:FIRST ; First try (non-recovery)
output receive\13 ; Send RECEIVE command
input 5 RECEIVE... ; Wait for packet-mode prompt
send message.txt ; Try to send a file
if success end 0 ; Success, we're finished
if not = \v(recovery) 0 -
end 1 ; Failure not recoverable, quit
if count goto login ; Recoverable, go try again.
end 1 Too many tries.
:RECOVER
output recover\13 ; Send RECOVER command
input 5 RECEIVE... ; Wait for packet-mode prompt
recover ; Tell local Kermit to RECOVER
if success end 0 ; Success, we're finished
if not = \v(recovery) 0 -
end 1 ; Failure not recoverable, quit
if count goto login ; Recoverable failure, try again
end 1 Too many tries
Automated recovery is always initiated by the caller, since only the caller
knows how to reestablish the connection. Some Kermit programs, such as
Kermit-370, are never the caller, and so need not implement any of the
connection re-establishment features.
.c2.RECOVERY FROM AN INTERRUPTED TRANSFER THAT WAS NOT CHECKPOINTED
The checkpoint/restart protocol described in this document takes place only
when: (a) both Kermit programs have implemented the checkpoint/restart
protocol, and (b) the user has enabled its use.
Suppose a non-checkpointed file transfer is interrupted? Normally, the
receiving Kermit discards any incoming file that is not completely received.
However, most Kermit programs include a command:
SET FILE INCOMPLETE { DISCARD, KEEP }
The default is DISCARD, which is proper because users should never be given
the false impression that an incomplete file transfer was successful. To
enable the retention of partially received files, the user must give the
command to the file receiver prior to the transfer:
SET FILE INCOMPLETE KEEP
When this option is in effect, interrupted file transfers can be recovered
manually by a somewhat laborious and error-prone process:
1. The user examines the partially received destination file to determine
exactly where the transfer was interrupted.
2. The user uses a text editor or other utility to extract the as-yet
unsent portion of the source file into a separate file.
3. The user transfers the newly created source-file fragment to the
destination system, either as a new and separate file.
4. The user appends the two destination files together.
(Steps 3 and 4 can be combined via some trickery plus SET FILE COLLISION
APPEND, if available, on the receiving Kermit.)
A simple modification to existing Kermit software -- independent of the
checkpoint/restart feature and of the Kermit protocol itself -- can simplify
this process somewhat. A new command, PSEND (Partial Send):
PSEND <filename> <number>
can be used to tell the Kermit program to send the given file (the name of a
single file, not a wildcard or file-group specification) starting at the
position given by the <number>, where <number> is a system-dependent quantity,
representing a byte position, a record number, etc. Meanwhile, the file
receiver is told to:
SET FILE COLLISION APPEND
meaning: when a file arrives that has the same name as an existing file,
append the new material to the end, rather than creating a new file or
overwriting the old one.
To handle the case where the file sender cannot be given a starting position
that corresponds exactly to the end of the partially received destination file
(for example, if the file sender has a record-oriented file system, but the
receiver has a byte-oriented file system, or a different record size), the
following new command can be given to the file receiver:
PRECEIVE <filename> <number>
This instructs the file receiver to write incoming bytes beginning at the
position given by <number>, possibly overwriting existing material. PRECEIVE
capability is not necessarily possible in all operating systems.
It is, of course, the user's responsibility to reestablish all the original
settings before attempting this type of recovery: text vs binary, character
set, record length, etc.
Recovery from interrupted transfers using this method can never be automatic
(because the required information is not recorded anywhere) and is possible
only when the file receiver has been given the command SET FILE INCOMPLETE
KEEP in advance. If this mode of recovery is always desired as a fallback
when true checkpoint/restart protocol has not been enabled or successfully
negotiated, the SET FILE INCOMPLETE KEEP command can be added to the Kermit
initialization file.
.c2.SUMMARY OF NEW COMMANDS AND PROTOCOL MESSAGES
Commands:
SET CHECKPOINT { ENABLED, DISABLED, OFF }
SET CHECKPOINT INTERVAL <number>
SET CHECKPOINT RECOVERY-FILE <filename>
SET CHECKPOINT ERROR-ACTION { PROCEED, QUIT }
SHOW CHECKPOINT
SET TRANSFER ID <number>
SET TRANSFER ACTION { SEND, RECEIVE, MAIL <address>, PRINT <options> }
SET TRANSFER FILE <filename>
ASSERT TRANSFER FILE DATE <modification-date-time>
ASSERT TRANSFER FILE SIZE <number>
STATUS
PSEND <filename> <position>
PRECEIVE <filename> <position>
Protocol Messages:
CAPAS mask in Initialization string must include Attribute Packets bit.
CHKPNT and CHKINT fields added to Initialization string.
New CHECKPOINT SYNC (H) packet.
New CHECKPOINT REQUEST (J) packet.
New ALLOCATION UNIT field (Tag 3) in ACK to A-Packet.
.c.PROJECT OVERVIEW
The checkpoint-restart project will consist of five phases:
1. Requirements Definition
2. Design
3. Development / Implementation
4. Testing
5. Deployment
The initial requirements definition and design for Kermit protocol extensions,
user interface, nomenclature, and recovery procedures are given in this
document. This design will be refined and expanded during the development
process.
.c2.DEVELOPMENT AND IMPLEMENTATION
This is to be considered a small project that should not be subject to the
formality of controls that apply to large projects. It will be conducted by
one principal designer, with design review by the contractor and by the other
developers, and with programming work by no more than four programmers on
three different bodies of source code.
Development and implementation will proceed in build-a-little, test-a-little
increments. The individuals involved in the project will cooperate closely
at all times (primarily by Internet email and file transfer), rather than
working on discrete compenents in isolation from one another. Most of the
work after the design stage will proceed in parallel, with developments and
discoveries constantly feeding back into the design and implementation plan
as real-life experience is gained with issues that have, so far, been
completely abstract. Thus a highly detailed and specific "critical path"
analysis would not apply to this project. However, the overall structure
of the workflow can be depicted as follows:
+---------+ +-------------+ +---------+ +------------+
| Initial | ------> | Development | ------> | Public | ---> | Acceptance |
| Design | --+ | and testing | | Beta | | testing |
+---------+ | +-------------+ +--> | testing | +------------+
| | +---------+
| +---------------+ |
+--> | User & tech | --+
| Documentation |
+---------------+
Omitted from this diagram (for simplicity) is the obvious fact that each stage
can feed back to earlier stages. For example, development and testing might
require changes in the design; Beta or acceptance testing might reveal bugs
that need fixing or even previously undiscovered design problems, and so on.
Here is a preliminary outline of the work to be done. This outline gives a
suggested order in which tasks are to be accomplished, proceeding from the
general (items not strictly related to checkpoint/restart capability, but
needed as underpinnings to it) to the specific, with prototyping done at
appropriate points.
In most cases, later items depend on earlier items. This outline is not,
however, a rigid prescription. In particular, developers should feel free to
proceed to a later item if they are temporarily blocked (perhaps for reasons
beyond their control) by an earlier item. For example, if a particular
feature is to be tested among all combinations of MS-DOS Kermit, C-Kermit, and
IBM Mainframe Kermit, but that feature is not yet ready in, say, C-Kermit, the
MS-DOS and IBM Mainframe Kermit developers should not feel compelled to do
nothing until C-Kermit is ready, but rather, they may proceed with other items
that do not depend on that feature.
A. LAYING THE FOUNDATION
1. Implementation, testing, and documentation of requisite capabilities that
are lacking from specific Kermit software programs:
- SET FILE COLLISION APPEND capability in MS-DOS and VMS C-Kermit.
- SET FILE COLLISION UPDATE capability in MS-DOS Kermit.
- SET { TAKE, MACRO } { ERROR, ECHO } { ON, OFF } in MS-DOS Kermit.
- Carrier-loss detection in MS-DOS Kermit: SET CARRIER { ON, OFF, AUTO }.
- Possible addition of an intrinsic DIAL command to MS-DOS Kermit, with
associated SET MODEM and SET DIAL commands as in C-Kermit.
2. Redesign of C-Kermit's file input module specification to deliver
consistently marked records, and recoding of system-dependent file i/o
modules according to the new specification.
3. Implementation, testing, and documentation of PSEND and PRECEIVE commands
to establish the ability to seek within a file.
4. Definition of standardized Kermit protocol error codes and their meanings.
5. Addition of standardized error codes to E packets in the three major
Kermit versions.
6. Classification of error codes into recoverable and nonrecoverable
categories and addition of a new variable or status code that can be
queried to see whether a failed transfer is recoverable.
7. Coding and testing of file allocation unit to Attribute-packet reply.
B. CHECKPOINT/RESTART FRAMEWORK CODE DEVELOPMENT
1. Definition of file and variable names for checkpoint/restart, and
specification of the associated semantics.
2. Coding, testing, and documentation of the commands to set and display
checkpoint-related variables and capabilities: SET CHECKPOINT { ON,
DISABLED, ENABLED , INTERVAL, RECOVERY-FILE, ERROR-ACTION }, SHOW
CHECKPOINT. At this stage, these commands simply set and display internal
variables.
3. Coding, testing, and documentation of the ASSERT TRANSFER FILE command.
4. Coding, testing, and documentation of the following prototype
(nonoperational) commands:
- SET TRANSFER ID <xfer-id>
- SET TRANSFER FILE <filename>
- SET TRANSFER ACTION { SEND, RECEIVE }
- CHECKPOINT <number-list>
- STATUS <number>
The Kermit program will parse these commands, but will not associate any
actions with them. Check for syntactic problems or conflicts with other
commands as well as conceptual problems or difficulties with documentation.
5. Create and test a new module used by the file sender to generate a transfer
ID and write the initial part of the recovery file. At this stage of
development, this module would be called by the file sender at the time it
opens the input (source) file.
6. Ensure that the same Kermit program can read the prototype recovery files
back without syntax errors, and set the corresponding variables correctly.
7. Ensure that the file sender creates a new Transfer ID and recovery file for
each file in a file group, and that each recovery file is destroyed after
the file is successfully transferred, and that the proper recovery file
remains on disk when a transfer is interrupted. Test with file groups
consisting of zero files, one file, and more than one file.
8. Coding and testing of a module (in some cases, perhaps a preexisting
system call) to create a temporary output file without destroying any
existing file.
9. Coding of a module that appends one file to another. Install a new
file-management command, APPEND <source> <dest>, to test this code.
10. Enable use of temporary files by the file receiver without checkpointing.
Receive into a temporary file, and when the transfer is complete, rename
the temporary file to the desired name.
C. CODE DEVELOPMENT FOR CHECKPOINTED FILE TRANSFER
1. Coding and testing of checkpoint/restart protocol negotiation: WILL, WONT,
DO; communication of checkpoint interval. Add display and/or debugging
tools to monitor the progress and/or results of the negotiation. Test all
combinations of SET CHECKPOINT { ON, OFF, ENABLED } among MS-DOS Kermit,
C-Kermit, and Kermit-370, as well as against a non-checkpoint-capable
Kermit version.
2. Add the new CHECKPOINT SYNC (H) packet and the appropriate protocol state
transitions and actions. Sender generates a new Transfer ID, and uses a
null Checkpoint ID. Create and initialize the recovery file in both file
sender and receiver. To test, collect packet logs and ensure that the new
packets are exchanged, have the correct format, and that file transfers
still work. Also, ensure that the H-packet is NOT exchanged when
checkpointing has not been negotiated. Also, ensure that failure of the
CHECKPOINT SYNC packet to appear at the proper time when checkpointing has
been negotiated is handled according to CHECKPOINT ERROR-RECOVERY.
Inspect the recovery files and ensure they are correct.
3. Implementation of the checkpointing process in the file sender:
- Determination of when to initiate a checkpoint request, based on
negotiated checkpoint interval and file record/line boundaries.
- Addition of capability to terminate a file-transfer data packet on a
record (e.g. text line) boundary, rather than filling the packet.
- Creation of the checkpoint window structure and recording of checkpoints.
- Writing of CHECKPOINT and STATUS records to the recovery file.
- Transmission of CHECKPOINT REQUEST packets.
- Ability of receiver to accept CHECKPOINT REQUEST packets without error,
but without actually processing them, so the sender's code can be tested.
4. Testing of checkpointing process in the file sender:
- Collect packet logs.
- Test both text and binary mode transfers.
- Binary transfers should be tested for stream and record-oriented systems.
- Ensure that checkpoints were taken at the right places.
- Ensure that CHECKPOINT records were written correctly.
- Dump the checkpoint window periodically to ensure it is correct.
- Ensure that appropriate STATUS records are written for both successful
and failed transfers.
5. Implementation of the checkpointing process in the file receiver. Upon
receipt of CHECKPOINT REQUEST packet:
- Flush, close temp file.
- Append temp file to destination file and delete temp file.
- Create new temp file for subsequent incoming data.
- If all OK, write CHECKPOINT record to recovery file.
- If OK, send CHECKPOINT CONFIRMATION packet, otherwise cancel transfer
or checkpointing according to CHECKPOINT ERROR-RECOVERY setting.
6. Testing of the checkpointing process in the file receiver:
- Ensure destination files (both text and binary) are created correctly.
- Ensure that CHECKPOINT CONFIRMATION packets are correct.
- Ensure recovery file updated correctly with CHECKPOINT and STATUS
records.
- Ensure recovery file is retained when transfer fails, destroyed when
transfer succeeds.
7. Further Testing of checkpointed transfers. Verify that all mechamisms
coded so far work on or for:
- Text and binary files
- For binary files: stream and record oriented.
- For record-oriented binary files: records of different sizes / formats
- All window sizes (test on long-delay connections with large windows
sizes)
- Long and short packets
- Single-file transfers and file-group transfers
- 7-bit and 8-bit connections
- With and without text character-set conversion.
- Serial and network connections
- Noisy and clean connections
- Between all combinations of MS-DOS Kermit, C-Kermit, and Kermit-370
8. Performance Evaluation. Compare file transfer efficiency with and without
checkpointing:
- For various checkpoint intervals
- For text and binary files
- For various window-size / packet-length combinations
D. CODE DEVELOPMENT FOR RECOVERY FROM POINT OF FAILURE
1. Implementation of the CHECKPOINT command. This command simply loads the
given information into the indicated slot in the checkpoint window.
2. Implementation of the STATUS command. This simply sets an internal
variable to the given number.
3. Implementation of SET TRANSFER { FILE, ID, ACTION } commands.
4. Implementation of the RECOVER command:
- Locate and read recovery file, validate info, determine TRANSFER ACTION.
- Open the transfer file in the given mode (read or write)
- Load checkpoint window.
- Determine final transfer status, don't recover if 0.
- Close recovery file.
- Enable checkpointing.
- Enter protocol mode according to TRANSFER ACTION.
- Process and synchronize CHECKPOINT SYNC from recovery file info.
- Sender positions source-file pointer to indicated position.
- Receiver opens destination file in append mode.
- File transfer resumes where it left off when interrupted.
- Recovery files are updated and disposed of in the normal way.
5. Testing the RECOVER command:
- Collect packet logs on both ends, inspect to ensure correctness.
- Using all combinations of connections, protocol settings, file settings,
systems, etc, ensure that files of all types are recovered correctly.
- Ensure that a recovery operation can itself be interrupted and recovered.
E. SAMPLE SCRIPTS FOR AUTOMATIC RECOVERY
1. Write and refine sample scripts for automated connection establishment,
file transfer, detection of recoverable failures, connection
reestablishment, and file transfer recovery.
2. Test with single- and multiple-file transfers.
3. Test on direct serial, dialed, and network connections.
4. Test for recovery by caller and callee.
5. Test recovery when receiver's recovery file is one checkpoint behind
the destination file.
.c2. DOCUMENTATION
Throughout the development / implementation stage, technical documentation
will be updated and refined, and trial copies of user documentation of each
user-visible feature will be produced as that feature is added.
The technical documentation will include extensions to the Kermit protocol
specification [1] as well as additions to the relevent program logic manuals
(PLMs). Eventually the extensions to the Kermit protocol will be published in
a new edition of [1]. PLMs are generally maintained as online
English-language plain-text files.
Trial user documentation will be compiled from the documentation written
during the development period, in the form of online English-language plain
text, to be issued in the update or release notes with beta-test or newly
released versions of the updated Kermit software programs. Eventually, the
user interface to the checkpoint/restart features will be described in new
editions of the relevent published or online user manuals.
.c2.TESTING
Testing is performed by the developers at each step of the development
process. When the initial implementation is complete, further testing will
be done by project members who were not personally involved in the coding.
Once the internal tests are complete, the updated software and documentation
will be turned over to the contractor for evaluation and testing, and, upon
the contractor's go-ahead, will also be released to the general Internet user
community for Beta testing. We feel that the wider Internet community will
give the updated software a much more thorough workout on a much wider variety
of platforms and communication methods than the developers or contractors
could ever hope to accomplish by themselves.
After the public Beta test period is complete, the resulting software and
documentation will be turned over to the contractor for acceptance testings.
A detailed formal test plan will accompany the software. The contractor may,
of course, devise its own tests. Once the contractor has accepted the
software, all test notices will be removed from its banners and documentation,
and it will be released.
.c2.TIMELINE
Requirements Definition and Initial Design: One month. Done.
The development and testing phase is expected to take four months. Thus, if
work commences September 1, it will be complete by December 31.
Development (keyed to section labels used above), 3 calendar months, 3-5
people working in parallel on each phase:
A: One calendar month
B: One half calendar month
C: One half calendar month
D: One half calendar month
E: One half calendar month
Testing: One month
.c.REFERENCES
[1] da Cruz, Frank, "Kermit, A File Transfer Protocol", Digital Press (1987).
[2] Gianone, Christine, "Using MS-DOS Kermit", 2nd Ed., Digital Press (1992).
[3] da Cruz, F., and C. Gianone, "Using C-Kermit", Digital Press (1993).
[4] Chandler, John, "IBM System/370 Kermit User's Guide", unpublished (1993).
(End of Document)