home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.ee.lbl.gov
/
2014.05.ftp.ee.lbl.gov.tar
/
ftp.ee.lbl.gov
/
email
/
vanj.94mar14.txt
< prev
Wrap
Internet Message Format
|
1995-11-10
|
4KB
From van@ee.lbl.gov Mon Mar 14 07:46:23 1994
Message-Id: <9403141543.AA02092@rx7.ee.lbl.gov>
To: end2end-tf@isi.edu
Cc: danzig@catarina.usc.edu
Subject: problems with Arizona's vegas
Date: Mon, 14 Mar 94 07:43:28 PST
From: Van Jacobson <van@ee.lbl.gov>
Status: OR
During the e2e meeting I tried to point out that there were a
number of serious problems with Arizona's vegas TCP mods.
Unfortunately I needed to draw a picture to illustrate what was
going on & couldn't do that from 600 miles away. Appended to
this message is the picture.
Just to recap, I tried to explain that Arizona's mod to the Fast
Retransmit algorithm was a very bad idea because (a) it couldn't
possibly work if RTO was set correctly and (b) even if it worked,
the gain would be negligible. The effect of (a) is that the algorithm
does nothing if the RTT/RTO estimate is correct but generates data
very quickly if the estimate gets screwed up. This is exactly the
opposite of the behavior one wants in a large scale Internet.
To see why this is true, the attached postscript file shows the
rough chronology of a packet loss. (The scale is approximately
right for the T1 NSFNet. Keep the packet spacing the same and
move the RTT marks out by a factor of 30 for the T3 NSFNet.)
The thick dashed diagonal line at the top of the page is the
packet that was lost & the thin dashed diagonal line marks where
its ack would have come back. There are a window's worth of
packets in transit following the loss and some of them and their
acks are shown. Since a packet was lost, all the acks are
duplicates. Arizona claims that making retransmit fire on the
first duplicate ack (labeled '1') instead of the third (labeled
'3') will make a significant performance difference. This is
obviously silly: The cost (in throughput) of the packet loss is
proportional to the time it takes the sender to learn it has
filled the hole in the receiver's sequence space which must be
at least one RTT (roughly the time from 1RTT to 2RTT in the
diagram). If the RTT is R and a packet time is P, the reduction
in recovery time (increase in throughput) could be at most
(R+P)/(R+3P). For the old T1 backbone, R is about 26 packet
times so the theoretical maximum improvement is ~7% and, as we
scale up to faster backbones, the expected improvement falls
like the inverse square of the backbone bandwidth.
One might be tempted to implement Arizona's algorithm in the
hope of getting some part of the 7% but a bit of inspection
shows that would be a very antisocial thing to do: Arizona
decides to send if the retransmit timer has expired when the
first dup ack arrives. Notice that the first dup ack arrives
essentially 1 RTT after the missing packet was sent so the only
way the retransmit timer could have expired would be if RTO =
RTT. In other words, Arizona's algorithm sends whenever the
retransmit timer is broken.
Since most of the TCP timing code was broken by vegas, it's
worthwhile pointing out that whenever there are competing
connections sharing a path, transient RTT fluctuations of twice
the minimum are completely normal (they just represent other
connections starting or restarting after a loss) so it is never
reasonable for RTO to be less than 2*RTT. Notice from the
picture that the last possible dup ack arrives just before 2*RTT
so there is no variation of Arizona's algorithm that would work
with a correctly set timer. In passing, it should be noted that
the 2*RTT minimum for RTO is necessary but not sufficient -- If
your path is `short' but you share a bottleneck with someone
whose path is `long', you can see transient fluctuations in RTT
of roughly his path length. As I tried to say during the
meeting, this means that conservative (i.e., scalable)
engineering should guarantee that RTO is never less than
100-200ms. This is basically what BSD does but is something
else that Arizona broke.
- Van