home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.ee.lbl.gov
/
2014.05.ftp.ee.lbl.gov.tar
/
ftp.ee.lbl.gov
/
papers
/
sackIDv1.0.txt
< prev
next >
Wrap
Text File
|
1996-01-03
|
24KB
|
537 lines
INTERNET-DRAFT
TCP Selective Acknowledgement Option
Matthew B. Mathis Mathis@psc.edu
Jamshid Mahdavi Mahdavi@psc.edu
Sally Floyd floyd@ee.lbl.gov
Allyn Romanow allyn@eng.sun.com
Version: 1.0, Thu Jan 4 14:48:16 PST 1996
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as ``work in
progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Abstract
TCP may experience poor performance when multiple packets are lost
from one window of data. With the limited information available
from cumulative acknowledgements, a TCP sender can only learn
about a single lost packet per round trip time. An aggressive
sender could choose to retransmit packets early, but such
retransmitted segments may have already been successfully
received.
A Selective Acknowledgement (SACK) mechanism, combined with a
selective repeat retransmission policy, can help to overcome these
limitations. The receiving TCP sends back SACK packets to the
sender informing the sender of data that has been received. The
sender can then retransmit only the missing data segments.
This draft proposes an implementation of SACK and discusses its
performance and related issues.
Acknowledgements:
Much of the text in this document is taken directly from RFC1072
``TCP Extensions for Long-Delay Paths'' by Bob Braden and Van
Jacobson. The authors would like to thank Kevin Fall (LBNL),
Christian Huitema (Inria), Van Jacobson (LBNL), Greg Minshall
(Ipsilon), and Lixia Zhang (XEROX PARC and UCLA) for their review
and constructive comments.
1. Introduction
Multiple packet losses from a window of data can have a
catastrophic effect on TCP throughput. TCP [Postel81] uses a
cumulative acknowledgment scheme in which received segments that
are not at the left edge of the receive window are not
acknowledged. This forces the sender to either wait a roundtrip
time to find out about each lost packet, or to unnecessarily
retransmit segments which have been correctly received [Fall95].
With the cumulative acknowledgment scheme, multiple dropped
segments generally cause TCP to lose its ACK-based clock, reducing
overall throughput.
Selective Acknowledgment (SACK) is a strategy which corrects this
behavior and restores full throughput in the face of multiple
dropped segments. With selective acknowledgments, the data
receiver can inform the sender about all segments that have arrived
successfully, so the sender need retransmit only the segments that
have actually been lost.
Several transport protocols, including NETBLT [Clark87], XTP
[Strayer92], RDP [Velten84], NADIR [Huitema81],
and VMTP [Cheriton88] have used selective
acknowledgement. There is some empirical evidence in favor of
selective acknowledgments -- simple experiments with RDP have shown
that disabling the selective acknowledgment facility greatly
increases the number of retransmitted segments over a lossy,
high-delay Internet path [Partridge87]. A recent simulation study
by Kevin Fall and Sally Floyd [Fall95], demonstrates the strength
of TCP with SACK over the non-SACK Tahoe and Reno TCP implementations.
RFC1072 [VJ88] describes one possible implementation of SACK
options for TCP. Unfortunately, it has never been deployed in the
Internet, as there was disagreement about how SACK options should
be used in conjunction with the TCP window shift option (initially
described RFC1072 and revised in Jacobson92).
We propose slight modifications to the SACK options as proposed in
RFC1072. Specifically, sending a selective acknowledgment for the
most recently received data reduces the need for long SACK
options [Keshav94, Mathis95]. In addition, sequence numbers are
now 32 bits. These two modifications represent the only changes to
the proposal in RFC1072. They make SACK easier to implement and
address concerns about robustness.
The selective acknowledgment extension uses two TCP options. The
first is an enabling option, "SACK-permitted", which may be sent in
a SYN segment to indicate that the SACK option can be used once the
connection is established. The other is the SACK option itself,
which may be sent over an established connection once permission
has been given by SACK-permitted.
The SACK option is to be included in a segment sent from a TCP that
is receiving data to the TCP that is sending that data; we will
refer to these TCP's as the data receiver and the data sender,
respectively. We will consider a particular simplex data flow; any
data flowing in the reverse direction over the same connection can
be treated independently.
2. SACK-Permitted Option
This two-byte option may be sent in a SYN by a TCP that has been
extended to receive (and presumably process) the SACK option once
the connection has opened.
TCP Sack-Permitted Option:
Kind: 4
+---------+---------+
| Kind=4 | Length=2|
+---------+---------+
[We need to formally decide (in the BOF?) if we are going to reuse
option numbers.]
3. SACK Option Format
The SACK option is to be used to convey extended acknowledgment
information from the receiver to the sender over an established
TCP connection.
TCP SACK Option:
Kind: 5
Length: Variable
+--------+--------+
| Kind=5 | Length |
+--------+--------+--------+--------+
| Left Edge of 1st Block |
+--------+--------+--------+--------+
| Right Edge of 1st Block |
+--------+--------+--------+--------+
| |
/ . . . /
| |
+--------+--------+--------+--------+
| Left Edge of nth Block |
+--------+--------+--------+--------+
| Right Edge of nth Block |
+--------+--------+--------+--------+
The SACK option is to be sent by a data receiver to inform the
data sender of non-contiguous blocks of data that have been
received and queued. The data receiver awaits the receipt of data
(perhaps by means of retransmissions) to fill the gaps in sequence
space between received blocks. When missing segments are
received, the data receiver acknowledges the data normally by
advancing the left window edge in the Acknowledgment Number field
of the TCP header. The SACK option does not change the meaning of
the Acknowledgment Number field, whose value will still specify
the left window edge, i.e., one byte beyond the last sequence
number of fully-received data.
The SACK option provides additional information which the data
transmitter can use to optimize retransmissions. The TCP data
receiver includes the SACK option in an acknowledgment segment
whenever it has data that is queued and unacknowledged.
The SACK option may be sent only when the TCP has received the
SACK-permitted option in the SYN segment for that connection.
This option contains a list of some of the blocks of contiguous
sequence space occupied by data that has been received and queued
within the window.
Each contiguous block of data queued at the data receiver is
defined in the SACK option by two 32-bit unsigned integers in
network byte order:
* Left Edge of Block
This is the first sequence number of this block.
* Right Edge of Block
This is the sequence number immediately following the last
sequence number of this block.
Each block represents received bytes of data that are contiguous and
isolated; that is, the bytes just below the block, (Left Edge of
Block - 1), and just above the block, (Right Edge of Block), have
not been received.
A SACK option that specifies n blocks will have a length of
8*n+2 bytes, so the 40 bytes available for TCP options can
specify a maximum of 4 blocks. It is expected that SACK will
often be used in conjunction with the Timestamp option used for
RTTM [Jacobson92], which takes an additional 10 bytes (plus two
bytes of padding); thus a maximum of 3 SACK blocks will be
allowed in this case.
The SACK option is advisory, in that, while it notifies the data sender
that the data receiver has received the indicated segments, the
data receiver is permitted to later discard data which have been
reported in a SACK option. A detailed discussion of the advisory
nature of the SACK option appears below, following the discussion
of the normal case.
4. Generating SACK Options: Data Receiver Behavior
If the data receiver has received a SACK-Permitted option on the
SYN for this connection, the data receiver MAY elect to generate
SACK options as described below. If the data receiver generates
SACK options under any circumstance, it SHOULD generate them under
all permitted circumstances. If the data receiver has not received
a SACK-Permitted option for a given connection, it MUST NOT send
SACK options on that connection.
If sent at all, SACK options SHOULD be included in all ACKs which
do not ACK the highest sequence number in the data receiver's queue.
In this situation the network has lost or mis-ordered data, such
that the receiver holds non-contiguous data in its queue. RFC
1122, Section 4.2.2.21, discusses the reasons for the receiver to
send ACKs in response to additional segments received in this
state. The receiver SHOULD send an ACK for every valid segment
that arrives containing new data, and each of these "duplicate"
ACKs SHOULD bear a SACK option.
If the data receiver chooses to send a SACK option, the following
rules apply:
* The first SACK block (i.e., the one immediately following the
kind and length fields in the option) MUST specify the
contiguous block of data containing the segment which triggered
this ACK, unless that segment advanced the Acknowledgment Number
field in the header. This assures that the ACK with the SACK
option reflects the most recent state change at the data receiver.
* The data receiver SHOULD include as many distinct SACK blocks
as possible in the SACK option. Note that the maximum
available option space may not be sufficient to report all
blocks present in the receiver's queue.
* The SACK option SHOULD be filled out by repeating the most
recently reported SACK blocks (based on first SACK blocks in
previous SACK options) that are not subsets of a SACK block
already included in the SACK option being constructed. This
assures that in normal operation every SACK block is repeated
several times. (At least three times for large-window TCP
implementations [RFC1323]).
It is very important that the SACK option always reports
the block containing the most recently received segment, because
this provides the sender with the most up-to-date information
about the state of the network and the data receiver's queue.
5. Interpreting the SACK Option and Retransmission Strategy:
Data Sender Behavior
When receiving an ACK containing a SACK option, the data sender
SHOULD record the selective acknowledgement for future reference.
The data sender is assumed to have a retransmission queue
that contains the segments that have been transmitted but not yet
acknowledged, in sequence-number order. If the data sender
performs re-packetization before retransmission, the block
boundaries in a SACK option that it receives may not fall on
boundaries of segments in the retransmission queue; however, this
does not pose a serious difficulty for the sender.
One possible implementation of the sender's behavior is as
follows. Let us suppose that for each segment in the
retransmission queue there is a (new) flag bit "RESEND", to be used
to indicate that this particular segment is on the list to be
retransmitted. When a segment is first transmitted, it will be
entered into the retransmission queue with its RESEND bit off.
When an acknowledgment segment arrives containing a SACK option,
the data sender will turn on the ACK'd bits for segments that
have been selectively acknowledged. More specifically, for each
block in the SACK option, the data sender will turn on the
ACK'ed flags for all segments in the retransmission queue that are
wholly contained within that block. This requires straightforward
sequence number comparisons.
After the ACKed bit is turned on (as the result of processing a received
SACK option), the data sender will skip that segment during
any later retransmission. Any segment that has the ACKed bit
turned off and is less than the highest ACKed segment is
available for retransmission.
However, after a retransmit timeout all of the ACKed bits are
turned off. A segment will not be dequeued and its buffer freed
until the left window edge is advanced over it.
5.1 Congestion Control Issues
This document does not attempt to specify in detail the congestion
control algorithms for implementations of TCP with SACK. However,
the congestion control algorithms present in the de facto standard
TCP implementations MUST be preserved [Stevens94]. In particular,
to preserve robustness in the presence of packets reordered by the
network, recovery is not triggered by a single ACK reporting
out-of-order packets at the receiver. Further, during recovery the
data sender limits the number of segments sent in response to each
ACK. Existing implementations limit the data sender to sending one
segment during Reno-style fast recovery, or to two segments during
slow-start [Jacobson88]. Other aspects of congestion control, such
as reducing the congestion window in response to congestion, must
similarly be preserved.
The use of time-outs as a fall-back mechanism for detecting dropped
packets is unchanged by the SACK option. Because the data receiver
is allowed to discard SACKed data, when a retransmit timeout
occurs the data sender MUST ignore prior SACK information in determining
which data to retransmit.
Future research into congestion control algorithms may take
advantage of the additional information provided by SACK. One such
area for future research concerns modifications to TCP for a
wireless or satellite environment where packet loss is not
necessarily an indication of congestion.
6. Efficiency and Worst Case Behavior
If the return path carrying ACKs and SACK options were lossless,
one block per SACK option packet would always be sufficient. Every
segment arriving while the data receiver holds discontinuous data
would cause the data receiver to send an ACK with a SACK option
containing the one altered block in the receiver's queue. The data
sender is thus able to construct a precise replica of the
receiver's queue by taking the union of all the first SACK blocks.
However, since the return path is not lossless, the SACK option is
defined to include more than one SACK block in a single packet.
The redundant blocks in the SACK option packet increase the
robustness of SACK delivery in the presence of lost ACKs. For a
receiver that is also using the time stamp option [Jacobson92], the
SACK option has room to include three SACK blocks. Thus each SACK
block will generally be repeated at least three times, in three
successive ACK packets. However, if all of the ACK packets
reporting a particular SACK block are dropped, then the sender
might assume that the data in that SACK block has not been
received, and unnecessarily retransmit those segments.
The worst-case conditions necessary for the sender to needlessly
retransmit data is discussed in more detail in a separate document
[Floyd96]. As is shown in that paper, the exposure of TCP with
SACK in regard to the unnecessary retransmission of packets is
strictly less than the exposure of current implementations of TCP.
In current implementations of TCP, the sender can unnecessarily
retransmit packets whenever multiple packets dropped from a single
window of data are followed by a slow-start. In contrast, as is
shown in [Floyd96], the simplest condition that can cause
duplicated (needlessly retransmitted) data sent to the receiver for
TCP with SACK requires a sender congestion size of 11 packets and a
precise (and therefore rather improbable) sequence of 4 lost data
packets and 3 lost ACKs for that window of data.
7. SACK Option Examples
The follow examples attempt to demonstrate the proper behavior of
SACK generation by the data receiver.
Assume the left window edge is 5000 and that the data transmitter
sends a burst of 8 segments, each containing 500 data bytes.
Case 1: The first 4 segments are received but the last 4 are
dropped.
The data receiver will return a normal TCP ACK segment
acknowledging sequence number 7000, with no SACK option.
Case 2: The first segment is dropped but the remaining 7 are
received.
Upon receiving each of the last seven packets, the data
receiver will return a TCP ACK segment that acknowledges
sequence number 5000 and contains a SACK option specifying
one block of queued data:
Triggering ACK Left Edge Right Edge
Segment
5000 (lost)
5500 5000 5500 6000
6000 5000 5500 6500
6500 5000 5500 7000
7000 5000 5500 7500
7500 5000 5500 8000
8000 5000 5500 8500
8500 5000 5500 9000
Case 3: The 2nd, 4th, 6th, and 8th (last) segments are
dropped.
The data receiver ACKs the first packet normally. The
third, fifth, and seventh packets trigger SACK options as
follows:
Triggering ACK First Block 2nd Block 3rd Block
Segment Left Right Left Right Left Right
Edge Edge Edge Edge Edge Edge
5000 5500
5500 (lost)
6000 5500 6000 6500
6500 (lost)
7000 5500 7000 7500 6000 6500
7500 (lost)
8000 5500 8000 8500 7000 7500 6000 6500
8500 (lost)
Suppose at this point, the 4th packet is received out of
order. (This could either be because the data was badly
misordered in the network, or because the 2nd packet was
retransmitted and lost). At this point the data receiver has
only two SACK blocks to report. The data receiver replies
with the following Selective Acknowledgement:
Triggering ACK First Block 2nd Block 3rd Block
Segment Left Right Left Right Left Right
Edge Edge Edge Edge Edge Edge
6500 5500 6000 7500 8000 8500
Suppose at this point, the 2nd segment is received. The
data receiver then replies with the following Selective
Acknowledgement:
Triggering ACK First Block 2nd Block 3rd Block
Segment Left Right Left Right Left Right
Edge Edge Edge Edge Edge Edge
5500 7500 8000 8500
8. Data Receiver Reneging
Note that the data receiver is permitted to discard data in its
queue that has not been acknowledged to the data sender, even if
the data has already been reported in a SACK option. Such
discarding of SACKed packets is discouraged, but may be used if the
receiver runs out of buffer space.
The data receiver MAY elect not to keep data which it has reported
in a SACK option. In this case, the receiver SACK generation is
additionally qualified:
* The first SACK block MUST reflect the newest segment. Even
if the newest segment is going to be discarded and the receiver
has already discarded adjacent segments, the first SACK block
MUST report, at a minimum, the left and right edges of the
newest segment.
* Except for the newest segment, all SACK blocks MUST NOT
report any old data which is no longer actually held by the
receiver.
Since the data receiver may later discard data reported in a SACK
option, the sender MUST NOT discard data before it is acknowledged
by the Acknowledgment Number field in the TCP header.
9. Security Considerations:
This document neither strengthens nor weakens TCP's current
security properties.
10. Support
Matt Mathis and Jamshid Mahdavi are supported by the National
Science Foundation Grant No. NCR-9415552. Sally Floyd is supported
by the Director, Office of Energy Research, Scientific Computing
Staff, of the U.S. Department of Energy under Contract No.
DE-AC03-76F00098. Allyn Romanow is supported by Sun Microsystems.
11. REFERENCES
[Cheriton88] Cheriton, D., "VMTP: Versatile Message Transaction
Protocol", RFC 1045, Stanford University, February 1988.
[Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk
Data Transfer Protocol", RFC 998, MIT, March 1987.
[Fall95] Fall, K. and Floyd, S., "Comparisons of Tahoe, Reno,
and Sack TCP", ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z, December 1995.
[Floyd96] Floyd, S., "Issues of TCP with SACK",
ftp://ftp.ee.lbl.gov/papers/issues_sa.ps.Z, January 1996.
[Huitema81] Huitema, C., and Valet, I., An Experiment on High
Speed File Transfer using Satellite Links, 7th Data Communication
Symposium, Mexico, October 1981.
[Jacobson88] Jacobson, V., "Congestion Avoidance and Control", to
be presented at SIGCOMM '88, Stanford, CA., August 1988.
[Jacobson92] Jacobson, V., Braden, R., and Borman, D., TCP
Extensions for High Performance, RFC 1323, May 1992.
[Keshav94] Keshav, presentation to the Internet End-to-End Research Group,
November 1994.
[Mathis95] Mathis, M., and Mahdavi, J., TCP Forward
Acknowledgement Option, presentation to the Internet End-to-End
Research Group, June 1995.
[Partridge87] Partridge, C., "Private Communication", February
1987.
[Postel81] Postel, J., "Transmission Control Protocol - DARPA
Internet Program Protocol Specification", RFC 793, DARPA,
September 1981.
[Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1: The
Protocols, Addison-Wesley, 1994.
[Strayer92] Strayer, T., Dempsey, B., and Weaver, A., XTP -- the
xpress transfer protocol. Addison-Wesley Publishing Company,
1992.
[Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data
Protocol", RFC 908, BBN, July 1984.