home *** CD-ROM | disk | FTP | other *** search
Text File | 2003-06-11 | 85.3 KB | 2,132 lines |
-
-
-
-
-
-
- Network Working Group R. Braden
- Request for Comments: 1644 ISI
- Category: Experimental July 1994
-
- T/TCP -- TCP Extensions for Transactions
- Functional Specification
-
- Status of this Memo
-
- This memo describes an Experimental Protocol for the Internet
- community, and requests discussion and suggestions for improvements.
- It does not specify an Internet Standard. Distribution is unlimited.
-
- Abstract
-
- This memo specifies T/TCP, an experimental TCP extension for
- efficient transaction-oriented (request/response) service. This
- backwards-compatible extension could fill the gap between the current
- connection-oriented TCP and the datagram-based UDP.
-
- This work was supported in part by the National Science Foundation
- under Grant Number NCR-8922231.
-
- Table of Contents
-
- 1. INTRODUCTION .................................................. 2
- 2. OVERVIEW ..................................................... 3
- 2.1 Bypassing the Three-Way Handshake ........................ 4
- 2.2 Transaction Sequences .................................... 6
- 2.3 Protocol Correctness ..................................... 8
- 2.4 Truncating TIME-WAIT State ............................... 12
- 2.5 Transition to Standard TCP Operation ..................... 14
- 3. FUNCTIONAL SPECIFICATION ..................................... 17
- 3.1 Data Structures .......................................... 17
- 3.2 New TCP Options .......................................... 17
- 3.3 Connection States ........................................ 19
- 3.4 T/TCP Processing Rules ................................... 25
- 3.5 User Interface ........................................... 28
- 4. IMPLEMENTATION ISSUES ........................................ 30
- 4.1 RFC-1323 Extensions ...................................... 30
- 4.2 Minimal Packet Sequence .................................. 31
- 4.3 RTT Measurement .......................................... 31
- 4.4 Cache Implementation ..................................... 32
- 4.5 CPU Performance .......................................... 32
- 4.6 Pre-SYN Queue ............................................ 33
- 6. ACKNOWLEDGMENTS .............................................. 34
- 7. REFERENCES ................................................... 34
- APPENDIX A. ALGORITHM SUMMARY ................................... 35
-
-
-
- Braden [Page 1]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- Security Considerations .......................................... 38
- Author's Address ................................................. 38
-
- 1. INTRODUCTION
-
- TCP was designed to around the virtual circuit model, to support
- streaming of data. Another common mode of communication is a
- client-server interaction, a request message followed by a response
- message. The request/response paradigm is used by application-layer
- protocols that implement transaction processing or remote procedure
- calls, as well as by a number of network control and management
- protocols (e.g., DNS and SNMP). Currently, many Internet user
- programs that need request/response communication use UDP, and when
- they require transport protocol functions such as reliable delivery
- they must effectively build their own private transport protocol at
- the application layer.
-
- Request/response, or "transaction-oriented", communication has the
- following features:
-
- (a) The fundamental interaction is a request followed by a response.
-
- (b) An explicit open or close phase may impose excessive overhead.
-
- (c) At-most-once semantics is required; that is, a transaction must
- not be "replayed" as the result of a duplicate request packet.
-
- (d) The minimum transaction latency for a client should be RTT +
- SPT, where RTT is the round-trip time and SPT is the server
- processing time.
-
- (e) In favorable circumstances, a reliable request/response
- handshake should be achievable with exactly one packet in each
- direction.
-
- This memo concerns T/TCP, an backwards-compatible extension of TCP to
- provide efficient transaction-oriented service in addition to
- virtual-circuit service. T/TCP provides all the features listed
- above, except for (e); the minimum exchange for T/TCP is three
- segments.
-
- In this memo, we use the term "transaction" for an elementary
- request/response packet sequence. This is not intended to imply any
- of the semantics often associated with application-layer transaction
- processing, like 3-phase commits. It is expected that T/TCP can be
- used as the transport layer underlying such an application-layer
- service, but the semantics of T/TCP is limited to transport-layer
- services such as reliable, ordered delivery and at-most-once
-
-
-
- Braden [Page 2]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- operation.
-
- An earlier memo [RFC-1379] presented the concepts involved in T/TCP.
- However, the real-world usefulness of these ideas depends upon
- practical issues like implementation complexity and performance. To
- help explore these issues, this memo presents a functional
- specification for a particular embodiment of the ideas presented in
- RFC-1379. However, the specific algorithms in this memo represent a
- later evolution than RFC-1379. In particular, Appendix A in RFC-1379
- explained the difficulties in truncating TIME-WAIT state. However,
- experience with an implementation of the RFC-1379 algorithms in a
- workstation later showed that accumulation of TCB's in TIME-WAIT
- state is an intolerable problem; this necessity led to a simple
- solution for truncating TIME-WAIT state, described in this memo.
-
- Section 2 introduces the T/TCP extensions, and section 3 contains the
- complete specification of T/TCP. Section 4 discusses some
- implementation issues, and Appendix A contains an algorithmic
- summary. This document assumes familiarity with the standard TCP
- specification [STD-007].
-
- 2. OVERVIEW
-
- The TCP protocol is highly symmetric between the two ends of a
- connection. This symmetry is not lost in T/TCP; for example, T/TCP
- supports TCP's symmetric simultaneous open from both sides (Section
- 2.3 below). However, transaction sequences use T/TCP in a highly
- unsymmetrical manner. It is convenient to use the terms "client
- host" and "server host" for the host that initiates a connection and
- the host that responds, respectively.
-
- The goal of T/TCP is to allow each transaction, i.e., each
- request/response sequence, to be efficiently performed as a single
- incarnation of a TCP connection. Standard TCP imposes two
- performance problems for transaction-oriented communication. First,
- a TCP connection is opened with a "3-way handshake", which must
- complete successfully before data can be transferred. The 3-way
- handshake adds an extra RTT (round trip time) to the latency of a
- transaction.
-
- The second performance problem is that closing a TCP connection
- leaves one or both ends in TIME-WAIT state for a time 2*MSL, where
- MSL is the maximum segment lifetime (defined to be 120 seconds).
- TIME-WAIT state severely limits the rate of successive transactions
- between the same (host,port) pair, since a new incarnation of the
- connection cannot be opened until the TIME-WAIT delay expires. RFC-
- 1379 explained why the alternative approach, using a different user
- port for each transaction between a pair of hosts, also limits the
-
-
-
- Braden [Page 3]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- transaction rate: (1) the 16-bit port space limits the rate to
- 2**16/240 transactions per second, and (2) more practically, an
- excessive amount of kernel space would be occupied by TCP state
- blocks in TIME-WAIT state [RFC-1379].
-
- T/TCP solves these two performance problems for transactions, by (1)
- bypassing the 3-way handshake (3WHS) and (2) shortening the delay in
- TIME-WAIT state.
-
- 2.1 Bypassing the Three-Way Handshake
-
- T/TCP introduces a 32-bit incarnation number, called a "connection
- count" (CC), that is carried in a TCP option in each segment. A
- distinct CC value is assigned to each direction of an open
- connection. A T/TCP implementation assigns monotonically
- increasing CC values to successive connections that it opens
- actively or passively.
-
- T/TCP uses the monotonic property of CC values in initial <SYN>
- segments to bypass the 3WHS, using a mechanism that we call TCP
- Accelerated Open (TAO). Under the TAO mechanism, a host caches a
- small amount of state per remote host. Specifically, a T/TCP host
- that is acting as a server keeps a cache containing the last valid
- CC value that it has received from each different client host. If
- an initial <SYN> segment (i.e., a segment containing a SYN bit but
- no ACK bit) from a particular client host carries a CC value
- larger than the corresponding cached value, the monotonic property
- of CC's ensures that the <SYN> segment must be new and can
- therefore be accepted immediately. Otherwise, the server host
- does not know whether the <SYN> segment is an old duplicate or was
- simply delivered out of order; it therefore executes a normal 3WHS
- to validate the <SYN>. Thus, the TAO mechanism provides an
- optimization, with the normal TCP mechanism as a fallback.
-
- The CC value carried in non-<SYN> segments is used to protect
- against old duplicate segments from earlier incarnations of the
- same connection (we call such segments 'antique duplicates' for
- short). In the case of short connections (e.g., transactions),
- these CC values allow TIME-WAIT state delay to be safely discuss
- in Section 2.3.
-
- T/TCP defines three new TCP options, each of which carries one
- 32-bit CC value. These options are named CC, CC.NEW, and CC.ECHO.
- The CC option is normally used; CC.NEW and CC.ECHO have special
- functions, as follows.
-
-
-
-
-
-
- Braden [Page 4]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- (a) CC.NEW
-
- Correctness of the TAO mechanism requires that clients
- generate monotonically increasing CC values for successive
- connection initiations. These values can be generated using
- a simple global counter. There are certain circumstances
- (discussed below in Section 2.2) when the client knows that
- monotonicity may be violated; in this case, it sends a CC.NEW
- rather than a CC option in the initial <SYN> segment.
- Receiving a CC.NEW causes the server to invalidate its cache
- entry and do a 3WHS.
-
- (b) CC.ECHO
-
- When a server host sends a <SYN,ACK> segment, it echoes the
- connection count from the initial <SYN> in a CC.ECHO option,
- which is used by the client host to validate the <SYN,ACK>
- segment.
-
- Figure 1 illustrates the TAO mechanism bypassing a 3WHS. The
- cached CC values, denoted by cache.CC[host], are shown on each
- side. The server host compares the new CC value x in segment #1
- against x0, its cached value for client host A; this comparison is
- called the "TAO test". Since x > x0, the <SYN> must be new and
- can be accepted immediately; the data in the segment can therefore
- be delivered to the user process B, and the cached value is
- updated. If the TAO test failed (x <= x0), the server host would
- do a normal three-way handshake to validate the <SYN> segment, but
- the cache would not be updated.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 5]
-
- RFC 1644 Transaction/TCP July 1994
-
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- cache.CC[A]
- V
-
- [ x0 ]
-
- #1 --> <SYN, data1, CC=x> --> (TAO test OK (x > x0) =>
- data1->user_B and
- cache.CC[A]= x; )
-
- [ x ]
- #2 <-- <SYN, ACK(data1), data2, CC=y, CC.ECHO=x> <--
- (data2->user_A;)
-
-
- Figure 1. TAO: Three-Way Handshake is Bypassed
-
-
- The CC value x is echoed in a CC.ECHO option in the <SYN,ACK>
- segment (#2); the client side uses this option to validate the
- segment. Since segment #2 is valid, its data2 is delivered to the
- client user process. Segment #2 also carries B's CC value; this
- is used by A to validate non-SYN segments from B, as explained in
- Section 2.4.
-
- Implementing the T/TCP extensions expands the connection control
- block (TCB) to include the two CC values for the connection; call
- these variables TCB.CCsend and TCB.CCrecv (or CCsend, CCrecv for
- short). For example, the sequence shown in Figure 1 sets
- TCB.CCsend = x and TCB.CCrecv = y at host A, and vice versa at
- host B. Any segment that is received with a CC option containing
- a value SEG.CC different from TCB.CCsend will be rejected as an
- antique duplicate.
-
- 2.2 Transaction Sequences
-
- T/TCP applies the TAO mechanism described in the previous section
- to perform a transaction sequence. Figure 2 shows a minimal
- transaction, when the request and response data can each fit into
- a single segment. This requires three segments and completes in
- one round-trip time (RTT). If the TAO test had failed on segment
- #1, B would have queued data1 and the FIN for later processing,
- and then it would have returned a <SYN,ACK> segment to A, to
- perform a normal 3WHS.
-
-
-
-
- Braden [Page 6]
-
- RFC 1644 Transaction/TCP July 1994
-
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- CLOSED LISTEN
-
- #1 SYN-SENT* --> <SYN,data1,FIN,CC=x> --> CLOSE-WAIT*
- (TAO test OK)
- (data1->user_B)
-
- <-- LAST-ACK*
- #2 TIME-WAIT <-- <SYN,ACK(FIN),data2,FIN,CC=y,CC.ECHO=x>
- (data2->user_A)
-
-
- #3 TIME-WAIT --> <ACK(FIN),CC=x> --> CLOSED
-
- (timeout)
- CLOSED
-
- Figure 2: Minimal T/TCP Transaction Sequence
-
-
- T/TCP extensions require additional connection states, e.g., the
- SYN-SENT*, CLOSE-WAIT*, and LAST-ACK* states shown in Figure 2.
- Section 3.3 describes these new connection states.
-
- To obtain the minimal 3-segment sequence shown in Figure 2, the
- server host must delay acknowledging segment #1 so the response
- may be piggy-backed on segment #2. If the application takes
- longer than this delay to compute the response, the normal TCP
- retransmission mechanism in TCP B will send an acknowledgment to
- forestall a retransmission from TCP A. Figure 3 shows an example
- of a slow server application. Although the sequence in Figure 3
- does contain a 3-way handshake, the TAO mechanism has allowed the
- request data to be accepted immediately, so that the client still
- sees the minimum latency.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 7]
-
- RFC 1644 Transaction/TCP July 1994
-
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- CLOSED LISTEN
-
- #1 SYN-SENT* --> <SYN,data1,FIN,CC=x> --> CLOSE-WAIT*
- (TAO test OK =>
- data1->user_B)
-
- (timeout)
- #2 FIN-WAIT-1 <-- <SYN,ACK(FIN),CC=y,CC.ECHO=x> <-- CLOSE-WAIT*
-
-
- #3 FIN-WAIT-1 --> <ACK(SYN),FIN,CC=x> --> CLOSE-WAIT
-
-
- #4 TIME-WAIT <-- <ACK(FIN),data2,FIN,CC=y> <-- LAST-ACK
- (data2->user_A)
-
- #5 TIME_WAIT --> <ACK(FIN),CC=x> --> CLOSED
-
- (timeout)
- CLOSED
-
- Figure 3: Acknowledgment Timeout in Server
-
-
- 2.3 Protocol Correctness
-
- This section fills in more details of the TAO mechanism and
- provides an informal sketch of why the T/TCP protocol works.
-
- CC values are 32-bit integers. The TAO test requires the same
- kind of modular arithmetic that is used to compare two TCP
- sequence numbers. We assume that the boundary between y < z and z
- < y for two CC values y and z occurs when they differ by 2**31,
- i.e., by half the total CC space.
-
- The essential requirement for correctness of T/TCP is this:
-
- CC values must advance at a rate slower than 2**31 [R1]
- counts per 2*MSL
-
- where MSL denotes the maximum segment lifetime in the Internet.
- The requirement [R1] is easily met with a 32-bit CC. For example,
- it will allow 10**6 transactions per second with the very liberal
- MSL of 1000 seconds [RFC-1379]. This is well in excess of the
-
-
-
- Braden [Page 8]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- transaction rates achievable with current operating systems and
- network latency.
-
- Assume for the present that successive connections from client A
- to server B contain only monotonically increasing CC values. That
- is, if x(i) and x(i+1) are CC values carried in two successive
- initial <SYN> segments from the same host, then x(i+1) > x(i).
- Assuming the requirement [R1], the CC space cannot wrap within the
- range of segments that can be outstanding at one time. Therefore,
- those successive <SYN> segments from a given host that have not
- exceeded their MSL must contain an ordered set of CC values:
-
- x(1) < x(2) < x(3) ... < x(n),
-
- where the modular comparisons have been replaced by simple
- arithmetic comparisons. Here x(n) is the most recent acceptable
- <SYN>, which is cached by the server. If the server host receives
- a <SYN> segment containing a CC option with value y where y >
- x(n), that <SYN> must be newer; an antique duplicate SYN with CC
- value greater than x(n) must have exceeded its MSL and vanished.
- Hence, monotonic CC values and the TAO test prevent erroneous
- replay of antique <SYN>s.
-
- There are two possible reasons for a client to generate non-
- monotonic CC values: (a) the client may have crashed and
- restarted, causing the generated CC values to jump backwards; or
- (b) the generated CC values may have wrapped around the finite
- space. Wraparound may occur because CC generation is global to
- all connections. Suppose that host A sends a transaction to B,
- then sends more than 2**31 transactions to other hosts, and
- finally sends another transaction to B. From B's viewpoint, CC
- will have jumped backward relative to its cached value.
-
- In either of these two cases, the server may see the CC value jump
- backwards only after an interval of at least MSL since the last
- <SYN> segment from the same client host. In case (a), client host
- restart, this is because T/TCP retains TCP's explicit "Quiet Time"
- of an MSL interval [STD-007]. In case (b). wrap around, [R1]
- ensures that a time of at least MSL must have passed before the CC
- space wraps around. Hence, there is no possibility that a TAO
- test will succeed erroneously due to either cause of non-
- monotonicity; i.e., there is no chance of replays due to TAO.
-
- However, although CC values jumping backwards will not cause an
- error, it may cause a performance degradation due to unnecessary
- 3WHS's. This results from the generated CC values jumping
- backwards through approximately half their range, so that all
- succeeding TAO tests fail until the generated CC values catch up
-
-
-
- Braden [Page 9]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- to the cached value. To avoid this degradation, a client host
- sends a CC.NEW option instead of a CC option in the case of either
- system restart or CC wraparound. Receiving CC.NEW forces a 3WHS,
- but when this 3WHS completes successfully the server cache is
- updated to the new CC value. To detect CC wraparound, the client
- must cache the last CC value it sent to each server. It therefore
- maintains cache.CCsent[B] for each server B. If this cached value
- is undefined or if it is larger than the next CC value generated
- at the client, then the client sends a CC.NEW instead of a CC
- option in the next SYN segment.
-
- This is illustrated in Figure 4, which shows the scenario for the
- first transaction from A to B after the client host A has crashed
- and recovered. A similar sequence occurs if x is not greater than
- cache.CCsent[B], i.e., if there is a wraparound of the generated
- CC values. Because segment #1 contains a CC.NEW option, the
- server host invalidates the cache entry and does a 3WHS; however,
- it still sets B's TCB.CCrecv for this connection to x. TCP B uses
- this CCrecv value to validate the <ACK> segment (#3) that
- completes the 3WHS. Receipt of this segment updates cache.CC[A],
- since the cache entry was previously undefined. (If a 3WHS always
- updated the cache, then out-of-order SYN segments could cause the
- cached value to jump backwards, possibly allowing replays).
- Finally, the CC.ECHO option in the <SYN,ACK> segment #2 defines
- A's cache.CCsent entry.
-
- This algorithm delays updating cache.CCsent[] until the <SYN> has
- been ACK'd. This allows the undefined cache.CCsent value to used
- as a a "first-time switch" to reliable resynchronization of the
- cached value at the server after a crash or wraparound.
-
- When we use the term "cache", we imply that the value can be
- discarded at any time without introducing erroneous behavior
- although it may degrade performance.
-
- (a) If a server host receives an initial <SYN> from client A but
- has no cached value cache.CC[A], the server simply forces a
- 3WHS to validate the <SYN> segment.
-
- (b) If a client host has no cached value cache.CCsent[B] when it
- needs to send an initial <SYN> segment, the client simply
- sends a CC.NEW option in the segment. This forces a 3WHS at
- the server.
-
-
-
-
-
-
-
-
- Braden [Page 10]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- cache.CCsent[B] cache.CC[A]
- V V
-
- (Crash and restart)
- [ ?? ] [ x0 ]
-
- #1 --> <SYN, data1,CC.NEW=x> --> (invalidate cache;
- queue data1;
- 3-way handshake)
-
- [ ?? ] [ ?? ]
- #2 <-- <SYN, ACK(data1),CC=y,CC.ECHO=x> <--
- (cache.CCsent[B]= x;)
-
- [ x ] [ ?? ]
-
- #3 --> <ACK(SYN),CC=x> --> data1->user_B;
- cache.CC[A]= x;
-
- [ x ] [ x ]
-
- Figure 4. Client Host Restarting
-
-
- So far, we have considered only correctness of the TAO mechanism
- for bypassing the 3WHS. We must also protect a connection against
- antique duplicate non-SYN segments. In standard TCP, such
- protection is one of the functions of the TIME-WAIT state delay.
- (The other function is the TCP full-duplex close semantics, which
- we need to preserve; that is discussed below in Section 2.5). In
- order to achieve a high rate of transaction processing, it must be
- possible to truncate this TIME-WAIT state delay without exposure
- to antique duplicate segments [RFC-1379].
-
- For short connections (e.g., transactions), the CC values assigned
- to each direction of the connection can be used to protect against
- antique duplicate non-SYN segments. Here we define "short" as a
- duration less than MSL. Suppose that there is a connection that
- uses the CC values TCB.CCsend = x and TCB.CCrecv = y. By the
- requirement [R1], neither x nor y can be reused for a new
- connection from the same remote host for a time at least 2*MSL.
- If the connection has been in existence for a time less than MSL,
- then its CC values will not be reused for a period that exceeds
- MSL, and therefore all antique duplicates with that CC value must
- vanish before it is reused. Thus, for "short" connections we can
-
-
-
- Braden [Page 11]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- guard against antique non-SYN segments by simply checking the CC
- value in the segment againsts TCB.CCrecv. Note that this check
- does not use the monotonic property of the CC values, only that
- they not cycle in less than 2*MSL. Again, the quiet time at
- system restart protects against errors due to crash with loss of
- state.
-
- If the connection duration exceeds MSL, safety from old duplicates
- still requires a TIME-WAIT delay of 2*MSL. Thus, truncation of
- TIME-WAIT state is only possible for short connections. (This
- problem has also been noticed by Shankar and Lee [ShankarLee93]).
- This difference in behavior for long and for short connections
- does create a slightly complex service model for applications
- using T/TCP. An application has two different strategies for
- multiple connections. For "short" connections, it should use a
- fixed port pair and use the T/TCP mechanism to get rapid and
- efficient transaction processing. For connections whose durations
- are of the order of MSL or longer, it should use a different user
- port for each successive connection, as is the current practice
- with unmodified TCP. The latter strategy will cause excessive
- overhead (due to TCB's in TIME-WAIT state) if it is applied to
- high-frequency short connections. If an application makes the
- wrong choice, its attempt to open a new connection may fail with a
- "busy" error. If connection durations may range between long and
- short, an application may have to be able to switch strategies
- when one fails.
-
- 2.4 Truncating TIME-WAIT State
-
- Truncation of TIME-WAIT state is necessary to achieve high
- transaction rates. As Figure 2 illustrates, a standard
- transaction leaves the client end of the connection in TIME-WAIT
- state. This section explains the protocol implications of
- truncating TIME-WAIT state, when it is allowed (i.e., when the
- connection has been in existence for less than MSL). In this
- case, the client host should be able to interrupt TIME-WAIT state
- to initiate a new incarnation of the same connection (i.e., using
- the same host and ports). This will send an initial <SYN>
- segment.
-
- It is possible for the new <SYN> to arrive at the server before
- the retransmission state from the previous incarnation is gone, as
- shown in Figure 5. Here the final <ACK> (segment #3) from the
- previous incarnation is lost, leaving retransmission state at B.
- However, the client received segment #2 and thinks the transaction
- completed successfully, so it can initiate a new transaction by
- sending <SYN> segment #4. When this <SYN> arrives at the server
- host, it must implicitly acknowledge segment #2, signalling
-
-
-
- Braden [Page 12]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- success to the server application, deleting the old TCB, and
- creating a new TCB, as shown in Figure 5. Still assuming that the
- new <SYN> is known to be valid, the server host marks the new
- connection half-synchronized and delivers data3 to the server
- application. (The details of how this is accomplished are
- presented in Section 3.3.)
-
- The earlier discussion of the TAO mechanism assumed that the
- previous incarnation was closed before a new <SYN> arrived at the
- server. However, TAO cannot be used to validate the <SYN> if
- there is still state from the previous incarnation, as shown in
- Figure 5; in this case, it would be exceedingly awkward to perform
- a 3WHS if the TAO test should fail. Fortunately, a modified
- version of the TAO test can still be performed, using the state in
- the earlier TCB rather than the cached state.
-
- (A) If the <SYN> segment contains a CC or CC.NEW option, the
- value SEG.CC from this option is compared with TCB.CCrecv,
- the CC value in the still-existing state block of the
- previous incarnation. If SEG.CC > TCB.CCrecv, the new <SYN>
- segment must be valid.
-
- (B) Otherwise, the <SYN> is an old duplicate and is simply
- discarded.
-
- Truncating TIME-WAIT state may be looked upon as composing an
- extended state machine that joins the state machines of the two
- incarnations, old and new. It may be described by introducing new
- intermediate states (which we call I-states), with transitions
- that join the two diagrams and share some state from each. I-
- states are detailed in Section 3.3.
-
- Notice also segment #2' in Figure 5. TCP's mechanism to recover
- from half-open connections (see Figure 10 of [STD-007]) cause TCP
- A to send a RST when 2' arrives, which would incorrectly make B
- think that the previous transaction did not complete successfully.
- The half-open recovery mechanism must be defeated in this case, by
- A ignoring segment #2'.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 13]
-
- RFC 1644 Transaction/TCP July 1994
-
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- CLOSED LISTEN
-
- #1 --> <...,FIN,CC=x> --> LAST-ACK*
-
- #2 <-- <...ACK(FIN),data2,FIN,CC=y,CC.ECHO=x> <--- LAST-ACK*
- TIME-WAIT
- (data2->user_A)
-
-
- #3 TIME-WAIT --> <ACK(FIN),CC=x> --> X (DROP)
-
- (New Active Open) (New Passive Open)
-
- #4 SYN-SENT* --> <SYN, data3,CC=z> ...
-
- LISTEN-LA
- #2' (discard) <-- <...ACK(FIN),data2,FIN,CC=y> <--- (retransmit)
-
- #4 SYN-SENT* ... <SYN,data3,CC=z> --> ESTABLISHED*
- SYN OK (see text) =>
- {Ack seg #2;
- Delete old TCB;
- Create new TCB;
- data3 -> user_B;
- cache.CC[A]= z;}
-
- Figure 5: Truncating TIME-WAIT State: SYN as Implicit ACK
-
-
- 2.5 Transition to Standard TCP Operation
-
- T/TCP includes all normal TCP semantics, and it will continue to
- operate exactly like TCP when the particular assumptions for
- transactions do not hold. There is no limit on the size of an
- individual transaction, and behavior of T/TCP should merge
- seamlessly from pure transaction operation as shown in Figure 2,
- to pure streaming mode for sending large files. All the sequences
- shown in [STD-007] are still valid, and the inherent symmetry of
- TCP is preserved.
-
- Figure 6 shows a possible sequence when the request and response
- messages each require two segments. Segment #2 is a non-SYN
- segment that contains a TCP option. To avoid compatibility
- problems with existing TCP implementations, the client side should
-
-
-
- Braden [Page 14]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- send segment #2 only if cache.CCsent[B] is defined, i.e., only if
- host A knows that host B plays the new game.
-
-
-
- TCP A (Client) TCP B (Server)
- _______________ ______________
-
- CLOSED LISTEN
-
-
- #1 SYN-SENT* --> <SYN,data1,CC=x> --> ESTABLISHED*
- (TAO test OK =>
- data1-> user)
-
- #2 SYN-SENT* --> <data2,FIN,CC=x> --> CLOSE-WAIT*
- (data2-> user)
-
- CLOSE-WAIT*
- #3 FIN-WAIT-2 <-- <SYN,ACK(FIN),data3,CC=y,CC.ECHO=x> <--
- (data3->user)
-
- #4 TIME_WAIT <-- <ACK(FIN),data4,FIN,CC=y> <-- LAST-ACK*
- (data4->user)
-
- #5 TIME-WAIT --> <ACK(FIN),CC=x> --> CLOSED
-
-
- Figure 6. Multi-Packet Request/Response Sequence
-
- Figure 7 shows a more complex example, one possible sequence with
- TAO combined with simultaneous open and close. This may be
- compared with Figure 8 of [STD-007].
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 15]
-
- RFC 1644 Transaction/TCP July 1994
-
-
-
- TCP A TCP B
- _______________ ______________
-
- CLOSED CLOSED
-
- #1 SYN-SENT* --> <SYN,data1,FIN,CC=x> ...
-
- #2 CLOSING* <-- <SYN,data2,FIN,CC=y> <-- SYN-SENT*
- (TAO test OK =>
- data2->user_A
-
- #3 CLOSING* --> <FIN,ACK(FIN),CC=x,CC.ECHO=y> ...
-
- #1' ... <SYN,data1,FIN,CC=x> --> CLOSING*
- (TAO test OK =>
- data1->user_B)
-
- #4 TIME-WAIT <-- <FIN,ACK(FIN),CC=y,CC.ECHO=x> <-- CLOSING*
-
- #5 TIME-WAIT --> <ACK(FIN),CC=x> ...
-
- #3' ... <FIN,ACK(FIN),CC=x,CC.ECHO=y> --> TIME-WAIT
-
- #6 TIME-WAIT <-- <ACK(FIN),CC=y> <--- TIME-WAIT
-
- #5' TIME-WAIT ... <ACK(FIN),CC=x> --> TIME-WAIT
-
- (timeout) (timeout)
- CLOSED CLOSED
-
- Figure 7: Simultaneous Open and Close
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 16]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- 3. FUNCTIONAL SPECIFICATION
-
- 3.1 Data Structures
-
- A connection count is an unsigned 32-bit integer, with the value
- zero excluded. Zero is used to denote an undefined value.
-
- A host maintains a global connection count variable CCgen, and
- each connection control block (TCB) contains two new connection
- count variables, TCB.CCsend and TCB.CCrecv. Whenever a TCB is
- created for the active or passive end of a new connection, CCgen
- is incremented by 1 and placed in TCB.CCsend of the TCB; however,
- if the previous CCgen value was 0xffffffff (-1), then the next
- value should be 1. TCB.CCrecv is initialized to zero (undefined).
-
- T/TCP adds a per-host cache to TCP. An entry in this cache for
- foreign host fh includes two CC values, cache.CC[fh] and
- cache.CCsent[fh]. It may include other values, as discussed in
- Sections 4.3 and 4.4. According to [STD-007], a TCP is not
- permitted to send a segment larger than the default size 536,
- unless it has received a larger value in an MSS (Maximum Segment
- Size) option. This could constrain the client to use the default
- MSS of 536 bytes for every request. To avoid this constraint, a
- T/TCP may cache the MSS option values received from remote hosts,
- and we allow a TCP to use a cached MSS option value for the
- initial SYN segment.
-
- When the client sends an initial <SYN> segment containing data, it
- does not have a send window for the server host. This is not a
- great difficulty; we simply define a default initial window; our
- current suggestion is 4K. Such a non-zero default should be be
- conditioned upon the existence of a cached connection count for
- the foreign host, so that data may be included on an initial SYN
- segment only if cache.CC[foreign host] is non-zero.
-
- In TCP, the window is dynamically adjusted to provide congestion
- control/avoidance [Jacobson88]. It is possible that a particular
- path might not be able to absorb an initial burst of 4096 bytes
- without congestive losses. If this turns out to be a problem, it
- should be possible to cache the congestion threshold for the path
- and use this value to determine the maximum size of the initial
- packet burst created by a request.
-
- 3.2 New TCP Options
-
- Three new TCP options are defined: CC, CC.NEW, and CC.ECHO. Each
- carries a connection count SEG.CC. The complete rules for sending
- and processing these options are given in Section 3.4 below.
-
-
-
- Braden [Page 17]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- CC Option
-
- Kind: 11
-
- Length: 6
-
- +--------+--------+--------+--------+--------+--------+
- |00001011|00000110| Connection Count: SEG.CC |
- +--------+--------+--------+--------+--------+--------+
- Kind=11 Length=6
-
- This option may be sent in an initial SYN segment, and it may
- be sent in other segments if a CC or CC.NEW option has been
- received for this incarnation of the connection. Its SEG.CC
- value is the TCB.CCsend value from the sender's TCB.
-
- CC.NEW Option
-
- Kind: 12
-
- Length: 6
-
- +--------+--------+--------+--------+--------+--------+
- |00001100|00000110| Connection Count: SEG.CC |
- +--------+--------+--------+--------+--------+--------+
- Kind=12 Length=6
-
- This option may be sent instead of a CC option in an initial
- <SYN> segment (i.e., SYN but not ACK bit), to indicate that the
- SEG.CC value may not be larger than the previous value. Its
- SEG.CC value is the TCB.CCsend value from the sender's TCB.
-
- CC.ECHO Option
-
- Kind: 13
-
- Length: 6
-
- +--------+--------+--------+--------+--------+--------+
- |00001101|00000110| Connection Count: SEG.CC |
- +--------+--------+--------+--------+--------+--------+
- Kind=13 Length=6
-
- This option must be sent (in addition to a CC option) in a
- segment containing both a SYN and an ACK bit, if the initial
- SYN segment contained a CC or CC.NEW option. Its SEG.CC value
- is the SEG.CC value from the initial SYN.
-
-
-
-
- Braden [Page 18]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- A CC.ECHO option should be sent only in a <SYN,ACK> segment and
- should be ignored if it is received in any other segment.
-
- 3.3 Connection States
-
- T/TCP requires new connection states and state transitions.
- Figure 8 shows the resulting finite state machine; see [RFC-1379]
- for a detailed development. If all state names ending in stars
- are removed from Figure 8, the state diagram reduces to the
- standard TCP state machine (see Figure 6 of [STD-007]), with two
- exceptions:
-
- * STD-007 shows a direct transition from SYN-RECEIVED to FIN-
- WAIT-1 state when the user issues a CLOSE call. This
- transition is suspect; a more accurate description of the
- state machine would seem to require the intermediate SYN-
- RECEIVED* state shown in Figure 8.
-
- * In STD-007, a user CLOSE call in SYN-SENT state causes a
- direct transition to CLOSED state. The extended diagram of
- Figure 8 forces the connection to open before it closes,
- since calling CLOSE to terminate the request in SYN-SENT
- state is normal behavior for a transaction client. In the
- case that no data has been sent in SYN-SENT state, it is
- reasonable for a user CLOSE call to immediately enter CLOSED
- state and delete the TCB.
-
- Each of the new states in Figure 8 bears a starred name, created
- by suffixing a star onto a standard TCP state. Each "starred"
- state bears a simple relationship to the corresponding "unstarred"
- state.
-
- o SYN-SENT* and SYN-RECEIVED* differ from the SYN-SENT and
- SYN-RECEIVED state, respectively, in recording the fact that
- a FIN needs to be sent.
-
- o The other starred states indicate that the connection is
- half-synchronized (hence, a SYN bit needs to be sent).
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 19]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- ________ g ________
- | |<------------| |
- | CLOSED |------------>| LISTEN |
- |________| h ------|________|
- | / | |
- | / i| j|
- | / | |
- a| a'/ | _V______ ________
- | / j | |ESTAB- | e' | CLOSE- |
- | / -----------|-->| LISHED*|------------>| WAIT*|
- | / / | |________| |________|
- | / / | | | | |
- | / / | | c| d'| c|
- ____V_V_ / _______V | __V_____ | __V_____
- | SYN- | b' | SYN- |c | |ESTAB- | e | | CLOSE- |
- | SENT |------>|RECEIVED|---|->| LISHED|----------|->| WAIT |
- |________| |________| | |________| | |________|
- | | | | | |
- | | | | __V_____ |
- | | | | | LAST- | |
- d'| d'| d'| d| | ACK* | |
- | | | | |________| |
- | | | | | |
- | | ______V_ | ________ |c' |d
- | k | | FIN- | | e''' | | | |
- | -------|-->| WAIT-1*|---|------>|CLOSING*| | |
- | / | |________| | |________| | |
- | / | | | | | |
- | / | c'| | c'| | |
- ___V___ / ____V___ V_____V_ ____V___ V____V__
- | SYN- | b'' | SYN- | c | FIN- | e'' | | | LAST- |
- | SENT* |---->|RECEIVD*|---->| WAIT-1 |---->|CLOSING | | ACK |
- |________| |________| |________| |________| |________|
- | | |
- f| f| f'|
- ___V____ ____V___ ___V____
- | FIN- | e |TIME- | T | |
- | WAIT-2 |---->| WAIT |-->| CLOSED |
- |________| |________| |________|
-
-
- Figure 8A: Basic T/TCP State Diagram
-
-
-
-
-
-
-
-
-
- Braden [Page 20]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- ________________________________________________________________
- | |
- | Label Event / Action |
- | _____ ________________________ |
- | |
- | a Active OPEN / create TCB, snd SYN |
- | a' Active OPEN / snd SYN |
- | b rcv SYN [no TAO]/ snd ACK(SYN) |
- | b' rcv SYN [no TAO]/ snd SYN,ACK(SYN) |
- | b'' rcv SYN [no TAO]/ snd SYN,FIN,ACK(SYN) |
- | c rcv ACK(SYN) / |
- | c' rcv ACK(SYN) / snd FIN |
- | d CLOSE / snd FIN |
- | d' CLOSE / snd SYN,FIN |
- | e rcv FIN / snd ACK(FIN) |
- | e' rcv FIN / snd SYN,ACK(FIN) |
- | e'' rcv FIN / snd FIN,ACK(FIN) |
- | e''' rcv FIN / snd SYN,FIN,ACK(FIN) |
- | f rcv ACK(FIN) / |
- | f' rcv ACK(FIN) / delete TCB |
- | g CLOSE / delete TCB |
- | h passive OPEN / create TCB |
- | i (= b') rcv SYN [no TAO]/ snd SYN,ACK(SYN) |
- | j rcv SYN [TAO OK] / snd SYN,ACK(SYN) |
- | k rcv SYN [TAO OK] / snd SYN,FIN,ACK(SYN) |
- | T timeout=2MSL / delete TCB |
- | |
- | |
- | Figure 8B. Definition of State Transitions |
- |________________________________________________________________|
-
- This simple correspondence leads to an alternative state model,
- which makes it easy to incorporate the new states in an existing
- implementation. Each state in the extended FSM is defined by the
- triplet:
-
- (old_state, SENDSYN, SENDFIN)
-
- where 'old_state' is a standard TCP state and SENDFIN and SENDSYN
- are Boolean flags see Figure 9. The SENDFIN flag is turned on (on
- the client side) by a SEND(... EOF=YES) call, to indicate that a
- FIN should be sent in a state which would not otherwise send a
- FIN. The SENDSYN flag is turned on when the TAO test succeeds to
- indicate that the connection is only half synchronized; as a
- result, a SYN will be sent in a state which would not otherwise
- send a SYN.
-
-
-
-
-
- Braden [Page 21]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- ________________________________________________________________
- | |
- | New state: Old_state: SENDSYN: SENDFIN: |
- | __________ __________ ______ ______ |
- | |
- | SYN-SENT* => SYN-SENT FALSE TRUE |
- | |
- | SYN-RECEIVED* => SYN-RECEIVED FALSE TRUE |
- | |
- | ESTABLISHED* => ESTABLISHED TRUE FALSE |
- | |
- | CLOSE-WAIT* => CLOSE-WAIT TRUE FALSE |
- | |
- | LAST-ACK* => LAST-ACK TRUE FALSE |
- | |
- | FIN-WAIT-1* => FIN-WAIT-1 TRUE FALSE |
- | |
- | CLOSING* => CLOSING TRUE FALSE |
- | |
- | |
- | Figure 9: Alternative State Definitions |
- |________________________________________________________________|
-
-
- Here is a more complete description of these boolean variables.
-
- * SENDFIN
-
- SENDFIN is turned on by the SEND(...EOF=YES) call, and turned
- off when FIN-WAIT-1 state is entered. It may only be on in
- SYN-SENT* and SYN-RECEIVED* states.
-
- SENDFIN has two effects. First, it causes a FIN to be sent
- on the last segment of data from the user. Second, it causes
- the SYN-SENT[*] and SYN-RECEIVED[*] states to transition
- directly to FIN-WAIT-1, skipping ESTABLISHED state.
-
- * SENDSYN
-
- The SENDSYN flag is turned on when an initial SYN segment is
- received and passes the TAO test. SENDSYN is turned off when
- the SYN is acknowledged (specifically, when there is no RST
- or SYN bit and SEG.UNA < SND.ACK).
-
- SENDSYN has three effects. First, it causes the SYN bit to
- be set in segments sent with the initial sequence number
- (ISN). Second, it causes a transition directly from LISTEN
- state to ESTABLISHED*, if there is no FIN bit, or otherwise
-
-
-
- Braden [Page 22]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- to CLOSE-WAIT*. Finally, it allows data to be received and
- processed (passed to the application) even if the segment
- does not contain an ACK bit.
-
- According to the state model of the basic TCP specification [STD-
- 007], the server side must explicitly issued a passive OPEN call,
- creating a TCB in LISTEN state, before an initial SYN may be
- accepted. To accommodate truncation of TIME-WAIT state within
- this model, it is necessary to add the five "I-states" shown in
- Figure 10. The I-states are: LISTEN-LA, LISTEN-LA*, LISTEN-CL,
- LISTEN-CL*, and LISTEN-TW. These are 'bridge states' between two
- successive the state diagrams of two successive incarnations.
- Here D is the duration of the previous connection, i.e., the
- elapsed time since the connection opened. The transitions labeled
- with lower-case letters are taken from Figure 8.
-
- Fortunately, many TCP implementations have a different user
- interface model, in which the use can issue a generic passive open
- ("listen") call; thereafter, when a matching initial SYN arrives,
- a new TCB in LISTEN state is automatically generated. With this
- user model, the I-states of Figure 10 are unnecessary.
-
- For example, suppose an initial SYN segment arrives for a
- connection that is in LAST-ACK state. If this segment carries a
- CC option and if SEG.CC is greater than TCB.CCrecv in the existing
- TCB, the "q" transition shown in Figure 10 can be made directly
- from the LAST-ACK state. That is, the previous TCB is processed
- as if an ACK(FIN) had arrived, causing the user to be notified of
- a successful CLOSE and the TCB to be deleted. Then processing of
- the new SYN segment is repeated, using a new TCB that is generated
- automatically. The same principle can be used to avoid
- implementing any of the I-states.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 23]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- ______________________________
- | P: Passive OPEN / |
- | |
- | Q: Rcv SYN, special TAO test | d'| d|
- | (see text) / Delete TCB, | ________ ___V____ |
- | create TCB, snd SYN | |LISTEN- | P | LAST- | |
- | | | LA* |<-----| ACK* | |
- | Q': (same as Q) if D < MSL | |________| |________| |
- | | | | | |
- | R: Rcv ACK(FIN) / Delete TCB,| Q| c'| c'| |
- | create TCB | | | | |
- | | | ___V____ V______V
- | S': Active OPEN if D < MSL / | | |LISTEN- | P | LAST- |
- | Delete TCB, create TCB, | | | LA |<-----| ACK |
- | snd SYN. | | |________| |________|
- |______________________________| | | | |
- | Q| R| f|
- ________ ________ | | | |
- e''' | | P |LISTEN- | | | V V
- ---->|CLOSING*|----->| CL* | | | LISTEN CLOSED
- |________| |________| | |
- | | Q| | |
- c'| c'| V V V
- | | ESTABLISHED*
- ____V___ V_______
- e'' | | P |LISTEN- |
- ---->|CLOSING |------>| CL |
- |________| |________|
- | R| Q|
- f| V V
- | LISTEN ESTABLISHED*
- ____V___ _________
- e |TIME- | P | LISTEN- |
- ---->| WAIT |------------->| TW |
- |________| |_________|
- / | | | |
- S'/ T| T| Q'| |S'
- | _____V_ h _____V__ | V
- | | |-------->| | | SYN-SENT
- | | CLOSED |<--------| LISTEN | |
- | |________| ------|________| |
- | | / | j| |
- | a| a'/ i| V V
- | | / | ESTABLISHED*
- V V V V
- SYN-SENT ...
-
- Figure 10: I-States for TIME-WAIT Truncation
-
-
-
- Braden [Page 24]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- 3.4 T/TCP Processing Rules
-
- This section summarizes the rules for sending and processing the
- T/TCP options.
-
- INITIALIZATION
-
- I1: All cache entries cache.CC[*] and cache.CCsent[*] are
- undefined (zero) when a host system initializes, and CCgen
- is set to a non-zero value.
-
- I2: A new TCB is initialized with TCB.CCrecv = 0 and
- TCB.CCsend = current CCgen value; CCgen is then
- incremented. If the result is zero, CCgen is incremented
- again.
-
-
- SENDING SEGMENTS
-
- S1: Sending initial <SYN> Segment
-
- An initial <SYN> segment is sent with either a CC option
- or a CC.NEW option. If cache.CCsent[fh] is undefined or
- if TCB.CCsend < cache.CCsent[fh], then the option
- CC.NEW(TCB.CCsend) is sent and cache.CCsent[fh] is set to
- zero. Otherwise, the option CC(TCB.CCsend) is sent and
- cache.CCsent[fh] is set to CCsend.
-
- S2: Sending <SYN,ACK> Segment
-
- If the sender's TCB.CCrecv is non-zero, then a <SYN,ACK>
- segment is sent with both a CC(TCB.CCsend) option and a
- CC.ECHO (TCB.CCrecv) option.
-
- S3: Sending Non-SYN Segment
-
- A non-SYN segment is sent with a CC(TCB.CCsend) option if
- the TCB.CCrecv value is non-zero, or if the state is SYN-
- SENT or SYN-SENT* and cache.CCsent[fh] is non-zero (this
- last is required to send CC options in the segments
- following the first of a multi-segment request message;
- see segment #2 in Figure 6).
-
- RECEIVING INITIAL <SYN> SEGMENT
-
- Suppose that a server host receives a segment containing a SYN
- bit but no ACK bit in LISTEN, SYN-SENT, or SYN-SENT* state.
-
-
-
-
- Braden [Page 25]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- R1.1:If the <SYN> segment contains a CC or CC.NEW option,
- SEG.CC is stored into TCB.CCrecv of the new TCB.
-
- R1.2:If the segment contains a CC option and if the local cache
- entry cache.CC[fh] is defined and if
- SEG.CC > cache.CC[fh], then the TAO test is passed and the
- connection is half-synchronized in the incoming direction.
- The server host replaces the cache.CC[fh] value by SEG.CC,
- passes any data in the segment to the user, and processes
- a FIN bit if present.
-
- Acknowledgment of the SYN is delayed to allow piggybacking
- on a response segment.
-
- R1.3:If SEG.CC <= cache.CC[fh] (the TAO test has failed), or if
- cache.CC[fh] is undefined, or if there is no CC option
- (but possibly a CC.NEW option), the server host proceeds
- with normal TCP processing. If the connection was in
- LISTEN state, then the host executes a 3-way handshake
- using the standard TCP rules. In the SYN-SENT or SYN-
- SENT* state (i.e., the simultaneous open case), the TCP
- sends ACK(SYN) and enters SYN-RECEIVED state.
-
- R1.4:If there is no CC option (but possibly a CC.NEW option),
- then the server host sets cache.CC[fh] undefined (zero).
- Receiving an ACK for a SYN (following application of rule
- R1.3) will update cache.CC[fh], by rule R3.
-
- Suppose that an initial <SYN> segment containing a CC or CC.NEW
- option arrives in an I-state (i.e., a state with a name of the
- form 'LISTEN-xx', where xx is one of TW, LA, L8, CL, or CL*):
-
- R1.5:If the state is LISTEN-TW, then the duration of the
- current connection is compared with MSL. If duration >
- MSL then send a RST:
-
- <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
-
- drop the packet, and return.
-
- R1.6:Perform a special TAO test: compare SEG.CC with
- TCB.CCrecv.
-
- If SEG.CC is greater, then processing is performed as if
- an ACK(FIN) had arrived: signal the application that the
- previous close completed successfully and delete the
- previous TCB. Then create a new TCB in LISTEN state and
- reprocess the SYN segment against the new TCB.
-
-
-
- Braden [Page 26]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- Otherwise, silently discard the segment.
-
- RECEIVING <SYN,ACK> SEGMENT
-
- Suppose that a client host receives a <SYN,ACK> segment for a
- connection in SYN-SENT or SYN-SENT* state.
-
- R2.1:If SEG.ACK is not acceptable (see [STD-007]) and
- cache.CCsent[fh] is non-zero, then simply drop the segment
- without sending a RST. (The new SYN that the client is
- (re-)transmitting will eventually acknowledge any
- outstanding data and FIN at the server.)
-
- R2.2:If the segment contains a CC.ECHO option whose SEG.CC is
- different from TCB.CCsend, then the segment is
- unacceptable and is dropped.
-
- R2.3:If cache.CCsent[fh] is zero, then it is set to TCB.CCsend.
-
- R2.4:If the segment contains a CC option, its SEG.CC is stored
- into TCB.CCrecv of the TCB.
-
- RECEIVING <ACK> SEGMENT IN SYN-RECEIVED STATE
-
- R3.1:If a segment contains a CC option whose SEG.CC differs
- from TCB.CCrecv, then the segment is unacceptable and is
- dropped.
-
- R3.2:Otherwise, a 3-way handshake has completed successfully at
- the server side. If the segment contains a CC option and
- if cache.CC[fh] is zero, then cache.CC[fh] is replaced by
- TCB.CCrecv.
-
- RECEIVING OTHER SEGMENT
-
- R4: Any other segment received with a CC option is
- unacceptable if SEG.CC differs from TCB.CCrecv. However,
- a RST segment is exempted from this test.
-
- OPEN REQUEST
-
- To allow truncation of TIME-WAIT state, the following changes
- are made in the state diagram for OPEN requests (see Figure
- 10):
-
- O1.1:A new passive open request is allowed in any of the
- states: LAST-ACK, LAST-ACK*, CLOSING, CLOSING*, or TIME-
- WAIT. This causes a transition to the corresponding I-
-
-
-
- Braden [Page 27]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- state (see Figure 10), which retains the previous state,
- including the retransmission queue and timer.
-
- O1.2 A new active open request is allowed in TIME-WAIT or
- LISTEN-TW state, if the elapsed time since the current
- connection opened is less than MSL. The result is to
- delete the old TCB and create a new one, send a new SYN
- segment, and enter SYN-SENT or SYN-SENT* state (depending
- upon whether or not the SYN segment contains a FIN bit).
-
- Finally, T/TCP has a provision to improve performance for the case
- of a client that "sprays" transactions rapidly using many
- different server hosts and/or ports. If TCB.CCrecv in the TCB is
- non-zero (and still assuming that the connection duration is less
- than MSL), then the TIME-WAIT delay may be set to min(K*RTO,
- 2*MSL). Here RTO is the measured retransmission timeout time and
- the constant K is currently specified to be 8.
-
- 3.5 User Interface
-
- STD-007 defines a prototype user interface ("transport service")
- that implements the virtual circuit service model [STD-007,
- Section 3.8]. One addition to this interface in required for
- transaction processing: a new Boolean flag "end-of-file" (EOF),
- added to the SEND call. A generic SEND call becomes:
-
- Send
-
- Format: SEND (local connection name, buffer address,
- byte count, PUSH flag, URGENT flag, EOF flag [,timeout])
-
- The following text would be added to the description of SEND in
- [STD-007]:
-
- If the EOF (End-Of-File) flag is set, any remaining queued
- data is pushed and the connection is closed. Just as with the
- CLOSE call, all data being sent is delivered reliably before
- the close takes effect, and data may continue to be received
- on the connection after completion of the SEND call.
-
- Figure 8A shows a skeleton sequence of user calls by which a
- client could initiate a transaction. The SEND call initiates a
- transaction request to the foreign socket (host and port)
- specified in the passive OPEN call. The predicate "recv_EOF"
- tests whether or not a FIN has been received on the connection;
- this might be implemented using the STATUS command of [STD-007],
- or it might be implemented by some operating-system-dependent
- mechanism. When recv_EOF returns TRUE, the connection has been
-
-
-
- Braden [Page 28]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- completely closed and the client end of the connection is in
- TIME-WAIT state.
-
- __________________________________________________________________
- | |
- | |
- | OPEN(local_port, foreign_socket, PASSIVE) -> conn_name; |
- | |
- | SEND(conn_name, request_buffer, length, |
- | PUSH=YES, URG=NO, EOF=YES); |
- | |
- | while (not recv_EOF(conn_name)) { |
- | |
- | RECEIVE(conn_name, reply_buffer, length) -> count; |
- | |
- | <Process reply_buffer.> |
- | } |
- | |
- | |
- | Figure 8A: Client Side User Interface |
- |__________________________________________________________________|
-
- If a client is going to send a rapid series of such requests to
- the same foreign_socket, it should use the same local_port for
- all. This will allow truncation of TIME-WAIT state. Otherwise,
- it could leave local_port wild, allowing TCP to choose successive
- local ports for each call, realizing that each transaction may
- leave behind a significant control block overhead in the kernel.
-
- Figure 8B shows a basic sequence of server calls. The server
- application waits for a request to arrive and then reads and
- processes it until a FIN arrives (recv_EOF returns TRUE). At this
- time, the connection is half-closed. The SEND call used to return
- the reply completes the close in the other direction. It should
- be noted that the use of SEND(... EOF=YES) in Figure 4B instead of
- a SEND, CLOSE sequence is only an optimization; it allows
- piggybacking the FIN in order to minimize the number of segments.
- It should have little effect on transaction latency.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 29]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- __________________________________________________________________
- | |
- | |
- | OPEN(local_port, ANY_SOCKET, PASSIVE) -> conn_name; |
- | |
- | <Wait for connection to open.> |
- | |
- | STATUS(conn_name) -> foreign_socket |
- | |
- | while (not recv_EOF(conn_name)) { |
- | |
- | RECEIVE(conn_name, request_buffer, length) -> count; |
- | |
- | <Process request_buffer.> |
- | } |
- | |
- | <Compute reply and store into reply_buffer.> |
- | |
- | SEND(conn_name, reply_buffer, length, |
- | PUSH=YES, URG=NO, EOF=YES); |
- | |
- | |
- | Figure 8B: Server Side User Interface |
- |__________________________________________________________________|
-
-
- 4. IMPLEMENTATION ISSUES
-
- 4.1 RFC-1323 Extensions
-
- A recently-proposed set of TCP enhancements [RFC-1323] defines a
- Timestamps option, which carries two 32-bit timestamp values.
- This option is used to accurately measure round-trip time (RTT).
- The same option is also used in a procedure known as "PAWS"
- (Protect Against Wrapped Sequence) to prevent erroneous data
- delivery due to a combination of old duplicate segments and
- sequence number reuse at very high bandwidths. The approach to
- transactions specified in this memo is independent of the RFC-1323
- enhancements, but implementation of RFC-1323 is desirable for all
- TCP's.
-
- The RFC-1323 extensions share several common implementation issues
- with the T/TCP extensions. Both require that TCP headers carry
- options. Accommodating options in TCP headers requires changes in
- the way that the maximum segment size is determined, to prevent
- inadvertent IP fragmentation. Both require some additional state
- variable in the TCB, which may or may not cause implementation
- difficulties.
-
-
-
- Braden [Page 30]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- 4.2 Minimal Packet Sequence
-
- Most TCP implementations will require some small modifications to
- allow the minimal packet sequence for a transaction shown in
- Figure 2.
-
- Many TCP implementations contain a mechanism to delay
- acknowledgments of some subset of the data segments, to cut down
- on the number of acknowledgment segments and to allow piggybacking
- on the reverse data flow (typically character echoes). To obtain
- minimal packet exchanges for transactions, it is necessary to
- delay the acknowledgment of some control bits, in an analogous
- manner. In particular, the <SYN,ACK> segment that is to be sent
- in ESTABLISHED* or CLOSE-WAIT* state should be delayed. Note that
- the amount of delay is determined by the minimum RTO at the
- transmitter; it is a parameter of the communication protocol,
- independent of the application. We propose to use the same delay
- parameter (and if possible, the same mechanism) that is used for
- delaying data acknowledgments.
-
- To get the FIN piggy-backed on the reply data (segment #3 in
- Figure 2), thos implementations that have an implied PUSH=YES on
- all SEND calls will need to augment the user interface so that
- PUSH=NO can be set for transactions.
-
- 4.3 RTT Measurement
-
- Transactions introduce new issues into the problem of measuring
- round trip times [Jacobson88].
-
- (a) With the minimal 3-segment exchange, there can be exactly one
- RTT measurement in each direction for each transaction.
- Since dynamic estimation of RTT cannot take place within a
- single transaction, it must take place across successive
- transactions. Therefore, cacheing the measured RTT and RTT
- variance values is essential for transaction processing; in
- normal virtual circuit communication, such cacheing is only
- desirable.
-
- (b) At the completion of a transaction, the values for RTT and
- RTT variance that are retained in the cache must be some
- average of previous values with the values measured during
- the transaction that is completing. This raises the question
- of the time constant for this average; quite different
- dynamic considerations hold for transactions than for file
- transfers, for example.
-
- (c) An RTT measurement by the client will yield the value:
-
-
-
- Braden [Page 31]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- T = RTT + min(SPT, ATO),
-
- where SPT (server processing time) was defined in the
- introduction, and ATO is the timeout period for sending a
- delayed ACK. Thus, the measured RTT includes SPT, which may
- be arbitrarily variable; however, the resulting variability
- of the measured T cannot exceed ATO. (In a popular TCP
- implementation, for example, ATO = 200ms, so that the
- variance of SPT makes a relatively small contribution to the
- variance of RTT.)
-
- (d) Transactions sample the RTT at random times, which are
- determined by the client and the server applications rather
- than by the network dynamics. When there are long pauses
- between transactions, cached path properties will be poor
- predictors of current values in the network.
-
- Thus, the dynamics of RTT measurement for transactions differ from
- those for virtual circuits. RTT measurements should work
- correctly for very short connections but reduce to the current TCP
- algorithms for long-lasting connections. Further study is this
- issue is needed.
-
- 4.4 Cache Implementation
-
- This extension requires a per-host cache of connection counts.
- This cache may also contain values of the smoothed RTT, RTT
- variance, congestion avoidance threshold, and MSS values.
- Depending upon the implementation details, it may be simplest to
- build a new cache for these values; another possibility is to use
- the routing cache that should already be included in the host
- [RFC-1122].
-
- Implementation of the cache may be simplified because it is
- consulted only when a connection is established; thereafter, the
- CC values relevant to the connection are kept in the TCB. This
- means that a cache entry may be safely reused during the lifetime
- of a connection, avoiding the need for locking.
-
- 4.5 CPU Performance
-
- TCP implementations are customarily optimized for streaming of
- data at high speeds, not for opening or closing connections.
- Jacobson's Header Prediction algorithm [Jacobson90] handles the
- simple common cases of in-sequence data and ACK segments when
- streaming data. To provide good performance for transactions, an
- implementation might be able to do an analogous "header
- prediction" specifically for the minimal request and the response
-
-
-
- Braden [Page 32]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- segments.
-
- The overhead of UDP provides a lower bound on the overhead of
- TCP-based transaction processing. It will probably not be
- possible to reach this bound for TCP transactions, since opening a
- TCP connection involves creating a significant amount of state
- that is not required by UDP.
-
- McKenney and Dove [McKenney92] have pointed out that transaction
- processing applications of TCP can stress the performance of the
- demultiplexing algorithm, i.e., the algorithm used to look up the
- TCB when a segment arrives. They advocate the use of hash-table
- techniques rather than a linear search. The effect of
- demultiplexing on performance may become especially acute for a
- transaction client using the extended TCP described here, due to
- TCB's left in TIME-WAIT state. A high rate of transactions from a
- given client will leave a large number of TCB's in TIME-WAIT
- state, until their timeout expires. If the TCP implementation
- uses a linear search for demultiplexing, all of these control
- blocks must be traversed in order to discover that the new
- association does not exist. In this circumstance, performance of
- a hash table lookup should not degrade severely due to
- transactions.
-
- 4.6 Pre-SYN Queue
-
- Suppose that segment #1 in Figure 4 is lost in the network; when
- segment #2 arrives in LISTEN state, it will be ignored by the TCP
- rules (see [STD-007] p.66, "fourth other text and control"), and
- must be retransmitted. It would be possible for the server side
- to queue any ACK-less data segments received in LISTEN state and
- to "replay" the segments in this queue when a SYN segment does
- arrive. A data segment received with an ACK bit, which is the
- normal case for existing TCP's, would still a generate RST
- segment.
-
- Note that queueing segments in LISTEN state is different from
- queueing out-of-order segments after the connection is
- synchronized. In LISTEN state, the sequence number corresponding
- to the left window edge is not yet known, so that the segment
- cannot be trimmed to fit within the window before it is queued.
- In fact, no processing should be done on a queued segment while
- the connection is still in LISTEN state. Therefore, a new "pre-
- SYN queue" would be needed. A timeout would be required, to flush
- the Pre-SYN Queue in case a SYN segment was not received.
-
- Although implementation of a pre-SYN queue is not difficult in BSD
- TCP, its limited contribution to throughput probably does not
-
-
-
- Braden [Page 33]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- justify the effort.
-
- 6. ACKNOWLEDGMENTS
-
- I am very grateful to Dave Clark for pointing out bugs in RFC-1379
- and for helping me to clarify the model. I also wish to thank Greg
- Minshall, whose probing questions led to further elucidation of the
- issues in T/TCP.
-
- 7. REFERENCES
-
- [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", ACM
- SIGCOMM '88, Stanford, CA, August 1988.
-
- [Jacobson90] Jacobson, V., "4BSD Header Prediction", Comp Comm
- Review, v. 20, no. 2, April 1990.
-
- [McKenney92] McKenney, P., and K. Dove, "Efficient Demultiplexing
- of Incoming TCP Packets", ACM SIGCOMM '92, Baltimore, MD, October
- 1992.
-
- [RFC-1122] Braden, R., Ed., "Requirements for Internet Hosts --
- Communications Layers", STD-3, RFC-1122, USC/Information Sciences
- Institute, October 1989.
-
- [RFC-1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
- for High Performance, RFC-1323, LBL, USC/Information Sciences
- Institute, Cray Research, February 1991.
-
- [RFC-1379] Braden, R., "Transaction TCP -- Concepts", RFC-1379,
- USC/Information Sciences Institute, September 1992.
-
- [ShankarLee93] Shankar, A. and D. Lee, "Modulo-N Incarnation
- Numbers for Cache-Based Transport Protocols", Report CS-TR-3046/
- UIMACS-TR-93-24, University of Maryland, March 1993.
-
- [STD-007] Postel, J., "Transmission Control Protocol - DARPA
- Internet Program Protocol Specification", STD-007, RFC-793,
- USC/Information Sciences Institute, September 1981.
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 34]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- APPENDIX A. ALGORITHM SUMMARY
-
- This appendix summarizes the additional processing rules introduced
- by T/TCP. We define the following symbols:
-
- Options
-
- CC(SEG.CC): TCP Connection Count (CC) Option
- CC.NEW(SEG.CC): TCP CC.NEW option
- CC.ECHO(SEG.CC): TCP CC.ECHO option
-
- Here SEG.CC is option value in segment.
-
- Per-Connection State Variables in TCB
-
- CCsend: CC value to be sent in segments
- CCrecv: CC value to be received in segments
- Elapsed: Duration of connection
-
- Global Variables:
-
- CCgen: CC generator variable
- cache.CC[fh]: Cache entry: Last CC value received.
- cache.CCsent[fh]: Cache entry: Last CC value sent.
-
-
- PSEUDO-CODE SUMMARY:
-
- Passive OPEN => {
- Create new TCB;
- }
-
- Active OPEN => {
- <Create new TCB>
- CCrecv = 0;
- CCsend = CCgen;
- If (CCgen == 0xffffffff) then Set CCgen = 1;
- else Set CCgen = CCgen + 1.
- <Send initial {SYN} segment (see below)>
- }
-
-
- Send initial {SYN} segment => {
-
- If (cache.CCsent[fh] == 0 OR CCsend < cache.CCsent[fh] ) then {
-
- Include CC.NEW(CCsend) option in segment;
- Set cache.CCsent[fh] = 0;
-
-
-
- Braden [Page 35]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- }
- else {
-
- Include CC(CCsend) option in segment;
- Set cache.CCsent[fh] = CCsend;
- }
- }
-
-
- Send {SYN,ACK} segment => {
-
- If (CCrecv != 0) then
- Include CC(CCsend), CC.ECHO(CCrecv) options in segment.
- }
-
-
- Receive {SYN} segment in LISTEN, SYN-SENT, or SYN-SENT* state => {
-
- If state == LISTEN then {
- CCrecv = 0;
- CCsend = CCgen;
- If (CCgen == 0xffffffff) then Set CCgen = 1;
- else Set CCgen = CCgen + 1.
- }
-
- If (Segment contains CC option OR
- Segment contains CC.NEW option) then
- Set CCrecv = SEG.CC.
-
- if (Segment contains CC option AND
- cache.CC[fh] != 0 AND
- SEG.CC > cache.CC[fh] ) then { /* TAO Test OK */
-
- Set cache.CC[fh] = CCrecv;
- <Mark connection half-synchronized>
- <Process data and/or FIN and return>
- }
-
-
- If (Segment does not contain CC option) then
- Set cache.CC[fh] = 0;
-
- <Do normal TCP processing and return>.
- }
-
- Receive {SYN} segment in LISTEN-TW, LISTEN-LA, LISTEN-LA*, LISTEN-CL,
- or LISTEN-CL* state => {
-
-
-
-
- Braden [Page 36]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- If ( (Segment contains CC option AND CCrecv != 0 ) then {
-
- If (state = LISTEN-TW AND Elapsed > MSL ) then
- <Send RST, drop segment, and return>.
-
- if (SEG.CC > CCrecv ) then {
- <Implicitly ACK FIN and data in retransmission queue>;
- <Close and delete TCB>;
- <Reprocess segment>.
- /* Expect to match new TCB
- * in LISTEN state.
- */
- }
- }
- else
- <Drop segment>.
- }
-
-
- Receive {SYN,ACK} segment => {
-
- if (Segment contains CC.ECHO option AND
- SEG.CC != CCsend) then
- <Send a reset and discard segment>.
-
- if (Segment contains CC option) then {
- Set CCrecv = SEG.CC.
-
- if (cache.CC[fh] is undefined) then
- Set cache.CC[fh] = CCrecv.
- }
- }
-
-
- Send non-SYN segment => {
-
- if (CCrecv != 0 OR
- (cache.CCsent[fh] != 0 AND
- state is SYN-SENT or SYN-SENT*)) then
- Include CC(CCsend) option in segment.
- }
-
-
- Receive non-SYN segment in SYN-RECEIVED state => {
-
- if (Segment contains CC option AND RST bit is off) {
- if (SEG.CC != CCrecv) then
- <Segment is unacceptable; drop it and send an
-
-
-
- Braden [Page 37]
-
- RFC 1644 Transaction/TCP July 1994
-
-
- ACK segment, as in normal TCP processing>.
-
- if (cache.CC[fh] is undefined) then
- Set cache.CC[fh] = CCrecv.
- }
- }
-
-
- Receive non-SYN segment in (state >= ESTABLISHED) => {
-
- if (Segment contains CC option AND RST bit is off) {
- if (SEG.CC != CCrecv) then
- <Segment is unacceptable; drop it and send an
- ACK segment, as in normal TCP processing>.
- }
- }
-
-
- Security Considerations
-
- Security issues are not discussed in this memo.
-
- Author's Address
-
- Bob Braden
- University of Southern California
- Information Sciences Institute
- 4676 Admiralty Way
- Marina del Rey, CA 90292
-
- Phone: (310) 822-1511
- EMail: Braden@ISI.EDU
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden [Page 38]
-
-