home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Columbia Kermit
/
kermit.zip
/
e
/
id-nag-00.txt
< prev
next >
Wrap
Text File
|
2020-01-01
|
40KB
|
879 lines
TCP Implementation Working Group Joe R. Doupnik
Internet Draft Utah State University
Expiration Date: December 1999 June 1999
draft-doupnik-tcpimpl-nagle-mode-00.txt
A new TCP transmission policy replacing Nagle mode
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Both Nagle mode and delayed ACKs attempt to conserve network and host
machine resources by delaying transmissions in the expectation that
the current material can be piggybacked onto a future transmission.
Unfortunately when both mechanisms are active at the same time on
either end of a connection a deadlock can exist, which is broken by
arrival of new data for transmission or firing of the delayed ACK
timer. This produces classical timer based ACKing, which for the
common 200ms ACK delay yields five exchanges per second.
A new TCP transmission policy is discussed in this memo which uses
information known only to the transmitter about when to send
segments. It groups octets based on filling segments and sending a
small segment when the application indicates no more data are
immediately available, not on arrival of ACKs. It works well with and
avoids deadlocks with delayed ACKs. It is automatic and does not need
to be turned off. It is a suitable replacement for Nagle mode.
A new TCP transmission policy replacing Nagle mode [Page 2]
Table of Contents
1.0 Introduction.................................................2
1.1 Maximum Segment Size, MSS....................................3
1.2 Nagle mode...................................................3
1.3 Strict Nagle mode............................................3
1.4 Strict Nagle example.........................................4
1.5 Liberal Nagle mode...........................................5
1.6 Delayed ACKs.................................................5
2.0 New transmission policy......................................6
2.1 Formal statement of new policy...............................7
2.2 Discussion...................................................8
2.3 Operation between like and unlike TCP stacks.................9
3.0 Experimental results........................................10
4.0 Conclusions.................................................13
5.0 Security Considerations.....................................14
6.0 Acknowledgments.............................................14
7.0 References..................................................14
8.0 Author's address............................................14
1.0 Introduction
Nagle mode [TCP:1] and delayed ACKs are TCP heuristics designed to
reduce network traffic, and the consequent load on both originating
and receiving hosts. They perform this by slightly different means,
but the common factor is to delay a transmission in the expectation
that another will be required quickly and hence the present and next
transmissions may be combined into one (piggybacking).
When both modes are active, as they should be to conserve resources,
then they may interact to hold data at the transmitter while the
receiver holds/delays the ACKs until a very slow (200ms) timer forces
out the ACKs. The delay is of major importance when the conversation
is alternating between hosts, where one side makes requests, the
other responds, and the pattern repeats. The response is delayed
until the entire request has arrived at the receiver. Yet the next to
last packet of the request can result in a delayed ACK which in turn
delays release of the last packet being held by the Nagle condition.
A delay in sending all octets from one side or the other can slow the
conversation to about 1/delayed_ack_time exchanges per second
(typically 5 exchanges per second). Such patterns are common for web
serving, SMTP mail queues, and other modern applications.
Today many application programmers turn off Nagle mode to overcome
the interaction. They cannot control delayed ACKs which are often
turned on or off on a system-wide basis. Unfortunately, turning off
Nagle mode increases network traffic, host machine workload, and
router workload. If applications cannot turn off Nagle mode to avoid
the delayed ACK effect then UDP is the next candidate, and that means
no regard for the network and little regard (or lots of work in the
application) for lost packets. Today's growing request/reply work
would be better served by responsive TCP based communications.
Doupnik Page 2
A new TCP transmission policy replacing Nagle mode [Page 3]
1.1 Maximum Segment Size, MSS
In the following discussion we will use MSS, Maximum Segment Size, as
a test criteria for full segments. What is meant is the full capacity
for TCP data after allowing for IP and TCP headers and options, which
RFC1122 [TCP:2] represents as Eff.snd.MSS. Also some hosts use a
power of two buffer sizes as a full segment although the MSS is
larger. Nevertheless, we will employ the term MSS, Maximum Segment
Size, to be the host's concept of its largest segment size at one
moment.
1.2 Nagle mode
The current definition of Nagle mode is found in RFC1122, [TCP:2],
section 4.2.3.4 When to Send Data:
(start quote)
The Nagle algorithm is generally as follows:
If there is unacknowledged data (i.e., SND.NXT > SND.UNA),
then the sending TCP buffers all user data (regardless of the
PSH bit), until the outstanding data has been acknowledged or
until the TCP can send a full-sized segment (Eff.snd.MSS
bytes; see Section 4.2.2.6).
(end quote)
Nagle mode has been implemented in at least two different forms,
leading to different behaviors. Each is discussed below. The
different forms result from answering the question: if more than one
Eff.snd.MSS of data has accumulated, how much beyond full segments
may be sent at once?
The strict approach answers the question above by sending only full
segments. A last short segment will be retained for later release. A
liberal approach answers it by sending all available data including a
possible (very likely) short ending component. The labels strict
Nagle and liberal Nagle are used in this paper for purposes of
discussion. As a matter of interest, TCP/IP stacks derived from BSD
sources often use the strict Nagle mechanism.
1.3 Strict Nagle form
The strict Nagle form transmits only full sized segments while
awaiting ACKs for previously sent data. A partial segment of unsent
data remaining afterward is retained in the transmit buffer as unsent
data until all preceding data have been ACKed, or until more
application data arrives to compose full length segments. Window size
and congestion avoidance criteria of Van Jacobson [TCP:3] may cause
even these to remain unsent for some time.
Holding back the last partial segment leads to grouping with later
new application data and hence sending full segments when possible.
Delayed ACKs assist grouping in the transmitter by allowing time for
the application to add more octets, assuming there is more data and
the receiver's window is large enough. But they also introduce the
Doupnik Page 3
A new TCP transmission policy replacing Nagle mode [Page 4]
problem of delaying release of the held tail octets. Prior to the
tail segment, strict Nagle mode is doing a fine job of forming full-
length segments for transmission. Timely release of held tail octets
is the essence of the interaction problem discussed in this document.
1.4 Strict Nagle example
As an example, suppose the TCP buffer is empty and the application
writes 3.5 MSS worth of data to it. Remote host window size and
congestion avoidance criteria are applied to determine the size of
the candidate transmission. We may consider two cases, one where all
data are allowed and a second where less is allowed.
The first case is all octets are allowed. A full MSS of data is
fetched from the buffer and the Nagle test is applied. It passes
because the size is a full MSS. The data is sent. The transmitter
loops back for a second fetch. The Nagle test finds a full segment
and transmits it although unACKed data exist from the first
transmission. This repeats until it fetches the last piece, 0.5 MSS.
The Nagle test fails for it because it is smaller than a full segment
and there is unACKed data in transit. The test will fail again until
there is no unACKed data (or enough application data arrives). The
small tail piece is held until all preceding octets have been ACKed,
not just the first or second segments. Thus up to three ACKs may be
required to release the tail. This is a "held tail" effect.
The second case is windowing and congestion avoidance allows only a
few octets to be transmitted, say two MSS worth. The first two
segments are full length and are sent promptly. Nothing more can be
sent until either a fresh write from the application or arrival of a
packet creates another transmission opportunity. 1.5 MSS of data
remain blocked and invisible to Nagle tests. Suppose the application
does not write more data. The transmitter awaits a packet from the
receiver that results in calling the transmission code again. At that
time as many full segments permitted by windowing and congestion
avoidance are sent. A partial segment remainder blocks by strict
Nagle rules because it is smaller than a full segment and unACKed
data are in transit. Up to three ACKs may be required to release the
trailer. This is a "held tail" effect.
Unfortunately, the last ACK may be delayed and thus the last piece
may not go onto the wire for the duration of the receiver's delayed
ACK timer. The receiver does not know that the transmitter has data
blocked waiting for the final ACK (rather than say data being forced
out by new writes from the application). Waiting for the last ACK can
involve the full delayed ACK interval, often 200ms; and that results
in timer based ACKing.
Doupnik Page 4
A new TCP transmission policy replacing Nagle mode [Page 5]
1.5 Liberal Nagle mode
The second form of Nagle mode applies the full segment rule from
RFC1122 but interprets it as saying a trailing partial segment may be
transmitted with full segments during the blocked condition. In
essence, the size determination is made on all allowed unsent data
rather than testing each candidate segment individually as in the
strict Nagle case. The test should be on all unsent data after being
reduced by remote host window capacity and congestion avoidance
limits. The test is really on the minimum of "allowed" (by window
size and congestion avoidance) and "available" (the number of unsent
octets visible to the TCP transmitter at that moment. Strict Nagle
mode of course experiences the same size filtering before data reach
it.
The liberal Nagle form reduces but does not eliminate incidence of
held tails, as the following example illustrates, whereas strict
Nagle mode creates such incidences at almost each application write
event. Liberal Nagle blocks with a partial segment when the window
size and congestion avoidance combine to hold back data during the
next to last transmission opportunity and only a fraction of an MSS
of data remain for the last transmission opportunity. The initial
hold back is invisible to Nagle mode at that time so the small piece
is not available to be included with the full segments. UnACKed data
may exist from the previous send and the small segment remains
blocked until preceding octets have been ACKed. Large transmitter and
small receiver TCP window sizes and slow comms contribute markedly to
this held tail effect with liberal Nagle mode.
One may infer that liberal Nagle mode was created in part to reduce
incidence of the held tail problem. Alas, it does reduce but not
eliminate it, and in the process it may send small segments within
application data.
1.6 Delayed ACKs
Delayed ACKs are a popular mechanism of TCP to avoid sending an ACK
for each received segment. Typically, every other arrival generates
an ACK. The mechanism is to create a delayed ACK queue which will be
flushed to the wire as a single ACK when either a delayed ACK timer
expires, or the queue length reaches a certain value (such as two
entries), or the local machine sends data. Although ACKs are tiny-
grams they do take time and CPU resources to create and to receive,
and the routing load is the same as full-length segments. Even on a
local wire without routers sending an ACK for each arriving segment
creates noticeable additional load on both machines and on network
capacity. Thus delaying to coalesce two or more ACKs is a good
concept and is the same philosophy as grouping octets into full
packets rather than many smaller ones.
Delaying ACKs is guessing, to paraphrase private communications by
John Nagle, that there will be either more data arriving immediately,
or there will be a transmission by the receiver in a very short time,
Doupnik Page 5
A new TCP transmission policy replacing Nagle mode [Page 6]
or that the receiver doesn't care about immediacy, and thus
delaying will be a good tactic. Unfortunately, the receiver has
little basis for making the guess: the sending machine provides no
hints, the local receiving application provides no notice of data
about to be delivered. The delay time is fixed, which will be a
mismatch for either local or long distance communications. And the
PUSH bit isn't available to act as a hint because the last held
segment gets the PUSH bit. At best, a receiver may infer tiny
arrivals might be from human typing where the operating system will
provide an immediate echo.
Delayed ACKs would be more effective if the receiver were to adjust
the delay time to match the session, say in a manner similar to
making round trip timing estimates. One or two round trip times seems
appropriate, where that information is available. One way transfers
such as the FTP data channel make this approach impractical. In
addition, fine scale timers for crisp responses are a burden for the
operating system and may not be available for the short intervals of
local area networking. For example, the 200ms delay of the fast timer
in many BSD systems is very long on even many of today's long
distance links. Thus the concept of dynamic delay time is difficult
at this time and becomes more so at increasingly higher network
speeds.
2.0 New transmission policy
This document proposes a new TCP transmission policy that allows
delayed ACKs to work as present, thus retaining their advantages. It
groups octets similar to Nagle algorithms and yet avoids deadlocks.
Two terms need to be defined to simplify discussion. These are
"available" data and "allowed" capacity. "Available" data are all the
data from the application which are not yet sent. It is what a single
write or output statement would provide. The TCP stack may see only a
portion of this data on each invocation, or it may see it all. This
implies the TCP stack knows such a length either explicitly or
through an indicator from its caller. Current TCP stacks already
perform this test to properly set the PUSH (PSH) bit.
"Allowed" capacity is the number of octets permitted to be sent based
on calculated receiver window size and congestion avoidance limits.
It is the minimum of these two constraints. Calculated receiver
window size is the usual value of the last announced window size
minus the sent but unACKed data. It does not necessarily yield even
MSS values. Heuristics in the transmitter may modify the calculation.
Congestion avoidance is the normal Van Jacobson congestion window
[TCP:3] and this normally yields full MSS values.
The new policy acts after the window size and congestion avoidance
size restrictions are applied.
The transmitting side has a transmission policy designed to group
data into full segments and to not hold the very last segment. This
may be stated ambiguously as transmit now if a full segment is
Doupnik Page 6
A new TCP transmission policy replacing Nagle mode [Page 7]
available (after limitations of receiver window size and congestion
avoidance are applied). A small segment candidate should be sent
immediately only if it exhausts all data from the application;
otherwise it should be held for joining by more application data.
Two parts of the above paragraph are unclear. First, "transmit now"
does not state how much can be transmitted at one time, a problem
seen with the Nagle algorithm. The policy can be strict: transmit
whole segments only and withhold a final small segment until an
indicator of "no more data will follow" has been obtained. It can be
liberal: transmit a partially full segment if one or more full
segments immediately precede it, even though this leads to smaller
segments on the wire than the strict case. These two policies mimic
strict and liberal Nagle modes used today, but minus ACKs and
consideration for unACKed data.
What the policy should not be: hold back a small segment because
unACKed data is present. That creates the held tail deadlock seen
with Nagle mode combined with delayed ACKs.
The second ambiguous part is the size of the transmission buffer.
Some systems expose the entire application buffer to the protocol
stack. In such systems TCP may easily decide when the current
candidate for transmission will empty the buffer. Other systems may
divide the application buffer into many smaller intermediate buffers
and expose only an intermediate buffer to TCP, one for each call upon
the transmitter. The latter requires the operating system to provide
an indicator of end of application data, a flag or variable or
equivalent, marking the current buffer as the last in a series and
thus no more data will follow it. In either case, the TCP stack knows
how much data is "available" and thus it knows when to properly set
the PUSH (PSH) bit.
2.1 Formal statement of new policy
Stated formally the new transmission policy is as follows:
Rule 1. Transmit all full segments in min(available, allowed).
Rule 2. If a partial segment occurs in min(available, allowed)
then transmit it now if it includes the end of application data;
otherwise retain it.
And optionally
Rule 3. If a partial segment occurs in min(available, allowed)
then transmit it now if min(available, allowed) is larger than a
full segment. This modifies phrase "otherwise retain it" above.
min(a, b) represents the smaller value of a or b.
Available is the total amount of unsent application data at the time
of transmission.
Allowed is the smaller of receiver apparent window size and
congestion avoidance constraints.
Doupnik Page 7
A new TCP transmission policy replacing Nagle mode [Page 8]
2.2 Discussion
We see that Rule 2 represents a policy of strict grouping until the
end of application data. Rules 1 and 2 are necessary and sufficient
for good network behavior and good application response.
Key points of the new policy are the release conditions are generated
by the transmitter rather than the receiver, and the conditions are a
full segment or indication of end of application data. For Nagle
modes, the release is generated by transmitter and receiver, and the
conditions are a full segment or all previous data have been ACKed.
Optional Rule 3 is a liberal policy to permit sending small segments
from data immediately available but not at the end of application
data. Rule 3 is presented only because some existing TCP/IP stacks
are designed for the liberal Nagle approach.
In practice, the above rules can be overlaid upon current Nagle mode
code. The full segment test is performed, and the case where a small
segment is to be delayed is modified to be: transmit a small segment
if end of application data is reached, else delay it as before.
At this point, we must discuss a useful and important side effect of
using the new policy: the network will do what the application asks!
When an application does small immediate mode writes, then it largely
controls the size of segments sent onto the wire. This is because
each output statement implies its own end of application data (give
or take whatever the operating system may do between it and the
protocol stack). In an extreme case the application may perform
single octet writes in massive succession before reading a response.
If the network can drain data faster than the application can create
data (a classical queueing problem) then massive quantities of tiny
segments will appear on the network. That imposes a very heavy load
on both hosts and network communications. Slower draining yields
larger segments, naturally, but erratically from erratic delays.
By way of contrast, Nagle mode will send small segments if ACKs
arrive promptly. When they don't then Nagle mode strongly groups
data. A difference between Nagle mode and the new policy is timing
affects Nagle mode and end of application data affects the new
policy. The new policy strongly groups bytes that are within the
application data set, independent of ACKs. One method uses network
delays to group data and the other uses the application and local
operating system.
Non-Nagle mode waits for neither ACKs nor indication from the
application. Liberal Nagle mode will behave like strict or non-Nagle
modes, depending on whether all unsent data are smaller than a full
segment, respectively.
In the above case of one octet writing by the application, new policy
and non-Nagle modes behave alike: send tinygrams. Nagle modes group
data to the extent that ACKs are delayed.
Doupnik Page 8
A new TCP transmission policy replacing Nagle mode [Page 9]
To remove the uncertain element of ACK time of arrival, and its
consequences for held tails and timer based ACKing, as well as bring
the small segment problem under control the best strategy is for the
application to write large components. This is readily accomplished
by the application programmer. For example, rather than using
immediate mode writing operations, such as Unix function write(),
one may use equivalents which are buffered automatically in the
application, such as Unix functions fwrite() or printf(). Unix
functions are only illustrative here, as is BSD sockets. With
buffered functions the protocol stack sees large buffer amounts even
if data are generated in small increments by the application. Then
the issue becomes one of using ACK time or application indication.
Buffering is often accompanied by a buffer flush function, such as
fflush() in Unix, to ensure all data are released at that time rather
than waiting for the data pathway to be formally closed. A buffer flush
function also serves as an indirect signal to the protocol stack that
application data writing is complete, without there being a need to
invent a special programmer's equivalent to flush TCP transmit data.
The new policy is closely analogous to this file system buffering.
It seems to the author that data aggregation at the application
level makes best sense because the natural end of writing is known
only at that level. Trying to predict the end of writing at the
protocol stack level by either transmitter or receiver, in
expectation of avoiding held tails from delayed ACKs and yet
delaying transmission to form full length segments, is a very
difficult task. It probably has no solution in the general case
because a stack does not know when the application is truly finished
writing. At best the stack is told when a portion of the output has
been prepared. The new policy uses that information, as does the
stack to set the PUSH bit.
The new policy provides immediate response by the network when the
application so indicates, which as noted is a double edged sword;
otherwise it groups independently of network timing.
The alternatives seem to be we must endure the delayed ACK effect of
Nagle modes, or risk sending many small segments by poorly designed
applications, or application writers will turn to UDP and bypass
network protection mechanisms.
2.3 Operation between like and unlike TCP stacks
The new transmission policy proposed here resides entirely on the
transmitting host. Receivers remain unchanged. Clearly, with
bilateral exchanges both sides should implement the policy for best
speed. The new policy sends the trailing segment of a series without
waiting for ACKs to previous data, the same as non-Nagle mode. The
new policy groups data into full segments (strict Rule 2), or does so
most of the time (Rule 2 plus optional Rule 3), whereas non-Nagle
mode and liberal Nagle mode may send short segments as each portion
Doupnik Page 9
A new TCP transmission policy replacing Nagle mode [Page 10]
of application data is delivered to the TCP stack. The PUSH bit
should be set at end of application data by all policies.
The receiver and network are ready to deal with the data, because
window size and congestion avoidance criteria are still effective and
are applied before either Nagle or new policy mechanisms. New policy
transmitters send the trailing segment when the network and remote
host is ready, whereas Nagle mode transmitters may wait for one or
more ACKs to arrive.
The new policy works well with the classical case of write(small),
write(small), read(). Each write() creates a new application data set
and each is sent immediately. Both strict and liberal Nagle
transmitter holds the second write's data; that is the held tail
effect. The new policy transmitter does not hold the second write's
data, nor does non-Nagle mode.
The new policy results in more tinygrams when a user is typing by
hand, because each keystroke constitutes an entire application
buffer. In practice this is a non-problem because people don't type
that fast compared to even 200ms delayed ACKs. Thus in practice for
human typing all three approaches and non-Nagle are about the same on
the wire. Please see above on data aggregation by applications.
Let us compare the three approaches for longer data transmissions.
Strict Nagle induces a held-tail for each application buffer longer
than one segment. Liberal Nagle can also, but only when windowing or
congestion avoidance hold back octets. New policy and non-Nagle
transmitters do not hold tails. During sending of the application
buffer liberal Nagle, liberal new policy, and non-Nagle transmitters
may send short segments if the data are delivered to the transmitter
in small pieces. Strict Nagle and strict new policy transmitters join
interior small pieces into full segments. However, small segments may
arise naturally if the application buffer is short and/or its filling
is slower than its draining by the network.
In summary, new policy transmitters should work well with existing
TCP/IP stacks and should produce no known side effects.
3.0 Experimental results
Four machines were used in a test configuration to examine serving
web page activity with and without Nagle mode, and with the new
transmission policy.
Operating System Descriptions:
UnixWare 7.0.1
400MHz AMD cpu, 200ms delayed ACK, strict Nagle mode.
32KB receive window. Source code was not available.
FreeBSD v3.2
233MHz AMD cpu, 200ms delayed ACK, strict Nagle mode.
Source code was modified for new policy. Note indication
of TCP receive window size, rwnd, in tests.
Doupnik Page 10
A new TCP transmission policy replacing Nagle mode [Page 11]
Solaris 7/Intel
350MHz AMD cpu, 50ms delayed ACK, liberal Nagle mode.
8KB receive window. Source code was modified for new policy.
Linux 2.2.5-15
350MHz AMD cpu, 10ms dynamically adjusted delayed ACK,
liberal Nagle mode. 16KB receive window. Source code was
modified for new policy.
Interconnections were via a 100Mbps Ethernet hub. This has
implications for the tests. The fast network is able to drain TCP
data faster than the application can supply it. Thus protocol
behavior is exposed that otherwise would be hidden by forced holding
back from congestion avoidance and window size constraints.
The test procedure employs a web request client to request a web
page, receive and discard it without reading the content, request it
again, and so on, and provide timing results. The client sends a
short one packet GET request, it reads the server's HTTP headers and
then it counts in the following data file. Once all data file octets
have been read then the original request is repeated.
Each Unix machine runs a simplified web server that replies to the
request with two short packets, HTTP web server identification and
the HTTP document description, followed by the document itself. Thus
there are two short write()'s followed by a succession of 4KB
write()'s for the file body. The client counts file octets and when
done initiates the next request. Keep-alive connections were used to
create a succession of request and replies on the same TCP
connection. The serial nature of the request and reply means the
longer the file the fewer requests occur per second.
The web client produces delayed ACKs to all servers. Its use or not
of Nagle mode has no influence because each request is only one
segment and occurs after each long response from the server. Thus the
server's protocol behavior is being examined in the presence of
delayed ACKs.
The web server is run as a single process without threads, to
simplify the experiment and to emphasize serialized request and
response interaction. Requests were repeated as fast as the systems
could perform, up to 60 seconds or 50000 requests.
The short file is smaller than window size and congestion avoidance
limits, as well as fitting into a single Unix write() statement. The
longer file may encounter the window size limit, and it will be
expressed as a sequence of Unix write statements. Both files have
tails to be held (should we name this the monkey effect?).
The interaction between Nagle mode and 200ms delayed ACKs is evident.
Also present is a case where liberal Nagle mode is caught by delayed
ACKs when window size constraints leave a small segment without
preceding large segments to drag it out.
Doupnik Page 11
A new TCP transmission policy replacing Nagle mode [Page 12]
What the results show is the new policy works. It works better than
strict Nagle. It works as well as both liberal Nagle (but without the
held tail effect) and non-Nagle (but without sending small segments
gratuitously). It does not require control at the application layer.
However, as discussed previously, applications can abuse the swift
responsiveness of the network by performing many small writes in
succession without buffering at the applications layer.
Table 1. Web page test results, requests and bytes per second.
Client Server 2.2KB file 33KB file
------ -------- ---------- ----------
UW7 FreeBSD 5 req/sec 5 req/sec
Nagle on 12 KB/sec 165 KB/sec
UW7 FreeBSD 1249 req/sec 222 req/sec
Nagle off 2914 KB/sec 7142 KB/sec
UW7 FreeBSD 1247 req/sec 228 req/sec
new policy 2909 KB/sec 7324 KB/sec
UW7 Solaris7 991 req/sec 221 req/sec
Nagle on 2311 KB/sec 7112 KB/sec
UW7 Solaris7 935 req/sec 219 req/sec
Nagle off 2181 KB/sec 7041 KB/sec
UW7 Solaris7 993 req/sec 219 req/sec
new policy 2317 KB/sec 7041 KB/sec
FreeBSD UW7 5 req/sec 5 req/sec
16KB rwnd Nagle on 12 KB/sec 177 KB/sec
FreeBSD UW7 1508 req/sec 264 req/sec
16KB rwnd Nagle off 3519 KB/sec 8478 KB/sec
FreeBSD UW7 5 req/sec 5 req/sec
4KB rwnd Nagle on 12 KB/sec 166 KB/sec
FreeBSD UW7 1421 req/sec 235 req/sec
4KB rwnd Nagle off 3315 KB/sec 7553 req/sec
FreeBSD Linux 1665 req/sec 277 req/sec
16KB rwnd Nagle on 3912 KB/sec 8876 KB/sec
FreeBSD Linux 1709 req/sec 279 req/sec
16KB rwnd Nagle off 3987 KB/sec 8970 KB/sec
FreeBSD Linux 1665 req/sec 277 req/sec
16KB rwnd new policy 3883 KB/sec 8894 KB/sec
Doupnik Page 12
A new TCP transmission policy replacing Nagle mode [Page 13]
FreeBSD Linux 1685 req/sec 55 req/sec
4KB rwnd Nagle on 3930 KB/sec 1776 KB/sec
FreeBSD Linux 1692 req/sec 238 req/sec
4KB rwnd Nagle off 3946 KB/sec 7634 KB/sec
FreeBSD Linux 1699 req/sec 241 req/sec
4KB rwnd new policy 3964 KB/sec 7740 KB/sec
FreeBSD Solaris7 1104 req/sec 180 req/sec
4KB rwnd Nagle on 2575 KB/sec 5795 KB/sec
FreeBSD Solaris7 1090 req/sec 180 req/sec
4KB rwnd Nagle off 2544 KB/sec 5772 KB/sec
FreeBSD Solaris7 1090 req/sec 165 req/sec
4KB rwnd new policy 2543 KB/sec 5290 KB/sec
FreeBSD Solaris7 1151 req/sec 233 req/sec
16KB rwnd Nagle on 2685 KB/sec 7474 KB/sec
FreeBSD Solaris7 1186 req/sec 239 req/sec
16KB rwnd Nagle off 2768 KB/sec 7669 KB/sec
FreeBSD Solaris7 1206 req/sec 237 req/sec
16KB rwnd new policy 2813 KB/sec 7634 KB/sec
Solaris7 as a client produced erratic results from long variable
delays preceding each request. This occurred for stock and modified
Solaris7. There is suspicion that its server performance may be
influenced too.
4.0 Conclusions
The new TCP transmission policy solves the problem of Nagle mode
deadlocking with delayed ACKs. It retains data grouping but
operates with only transmitter information. It accommodates those
systems which wish to implement a liberal sending policy regarding
partial segments not at the end of application data, and those which
prefer the stronger grouping of a strict sending policy.
The new policy works well with delayed ACKs and sending into small
receiver windows. Its performance is essentially the same as non-Nagle
mode, yet it retains grouping which non-Nagle mode does not. It does
not need an on/off control visible to applications. The new
transmission policy is a suitable replacement for Nagle mode.
The warning is the same as for non-Nagle mode: what is sent by the
application to the protocol stack is also what the network tries to
send. Thus grouping of data in applications and/or operating systems
remains a good idea.
Doupnik Page 13
A new TCP transmission policy replacing Nagle mode [Page 14]
5.0 Security Considerations
There are no security considerations in this memo.
6.0 Acknowledgements
Special thanks to John Nagle for candid discussions on the problem
and reviewing the draft document. Thanks to Gehri Grimaud at Utah
State University for introducing the author to FreeBSD and helping to
run experiments. And to Miles Johnson at USU, Richard J. Letts at
Salford University in the UK and Diana Osborn at San Diego State
University for reading the rough draft of this document.
7.0 References
[TCP:1] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January
1984.
[TCP:2] "Requirements for Internet Hosts -- Communication Layers", R.
Brandon RFC-1122, October 1989.
[TCP:3] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-
88, August 1988.
8.0 Author's address
Joe R. Doupnik
Dept of Electrical and Computer Engineering
Utah State University
Logan, Utah 84322
Phone: (801) 797-2982
Email: jrd@cc.usu.edu
Full Copyright Statement
"Copyright (C) The Internet Society (1999). All Rights Reserved. This
document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
Doupnik Page 14
A new TCP transmission policy replacing Nagle mode [Page 15]
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
Doupnik Page 15