home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!seven-up.East.Sun.COM!tyger.Eng.Sun.COM!geoff
- From: geoff@tyger.Eng.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top)
- Newsgroups: comp.protocols.nfs
- Subject: Re: Sun PC-NFS performance (again)
- Date: 15 Dec 1992 15:10:07 GMT
- Organization: SunSelect
- Lines: 145
- Message-ID: <1gksggINNjmh@seven-up.East.Sun.COM>
- References: <1992Dec7.173718.12792@Comtech.com> <1g3673INNa64@seven-up.East.Sun.COM> <1992Dec15.015951.20329@Comtech.com>
- NNTP-Posting-Host: tyger.east.sun.com
-
- Quoth aga@Comtech.com (Alan G. Arndt) (in <1992Dec15.015951.20329@Comtech.com>):
- [Performance report claiming that PC-NFS......
- # ... CAN'T go faster then 277.7 KB/sec
- #except in one circumstance which is belived to be because of the
- #closeness of the machine.
- #
- #So the PC isn't the limiting factor, the PC's card isn't the limiting
- #factor, the network isn't the limiting factor and the SERVER ISN'T
- #the limiting factor, the protocol isn't the limiting factor, in fact
- #the LOW Level drivers aren't even the limiting factor. So that only
- #leaves ONE item, PC-NFS itself.
-
- ????? PC-NFS is just a BWOS (big wad of software). Its performance is
- determined by all of those things that you just said were *not*
- limiting factors. (OK. Let me cover *all* the bases. It is
- theoretically possible for PC-NFS to self-time in such a way that the
- performance is independent of, e.g., CPU speed. I can assure you that
- it doesn't do this.) Obviously the design of PC-NFS will affect the
- performance, but the assertion that there is an absolute limit of 277KB
- because of PC-NFS is incongruous.
-
- #Now that I have shown all the data I have I will go into what we
- #found out. We then used etherfind to watch the packets going across.
- #Years ago (5+) with PC-NFS V2.x I watched this transaction and I
- #could swear it the Sun was sending 8KB blocks in 5 packet bursts and
- #the pc would receive them and request another block. HOWEVER,
- #yesterday we noticed that the PC was only requesting 1K blocks (1166)
- #bytes. The request was a healthy 218 bytes as well.
-
- Your memory is faulty. PC-NFS has always used a read size of 1K.
- (Initially this was because most/all PC Ethernet cards were incapable
- of dealing with back-to-back packets. Unfortunately it got ingrained
- into the buffer management logic, and it's still there. But
- I digress...) As for the sizes of the request (and response) packets,
- these are NFS issues rather than anything to do with PC-NFS.
-
- #When one
- #starts to add up the requests and individual 1K bytes I can belive
- #that the performance is ONLY 277 KB/sec and that the machines are
- #doing their BEST the get that. That is why there is a noticeable
- #difference between 50 Meters away and a machine RIGHT next to it's
- #client.
-
- OK, but see below. (And the last sentence only makes sense if you are
- going through some painfully slow routers. Bridges or repeaters
- shouldn't introduce any noticeable delay.)
-
- #We also noticed that the PC was only creating WRITES of apparently
- #512 bytes (734).
-
- Small writes strike again.
-
- PC-NFS uses a dual write strategy. Small writes are buffered up to 512
- byte blocks. Larger writes (256+ bytes) are written directly from user
- memory without any intervening copies (in most cases the data is DMA's
- directly from the user buffer to the NIC). These values were developed
- by testing the performance of real-world applications - *not* synthetic
- benchmarks. We found that in most cases writes of less than 256 bytes
- were in fact writes of 1 byte - typically from operations such as
- "print to file" or i/o redirection. Why 512 bytes for the buffer? A
- typical space/time tradeoff, unfortunately.
-
- On an NFS server, any write other than an optimally aligned write of
- exactly bsize bytes may require a read-modify-write to get the disk
- updated. Small transfers are clearly much less efficient than large
- ones in this mode, and in fact using a 512 byte write size will reduce
- performance noticeably. I'm posting as a separate follow-up copies of
- my "reader" and "writer" programs which do 8K writes and reads. Under
- PC-NFS, 8K writes are performed using a single RPC (assuming that the
- server can support this tsize); each UDP datagram is obviously
- fragmented based on the network MTU.
-
- This explains most of the reason for the discrepancy between your
- numbers and those I reported earlier (310-420K reads, 113-480K
- writes). The only remaining question is why you were seeing
- almost identical performance on an SS1+ and a 4/670. The most likely
- "levelling" factor is the disk: doing 512-byte R/M/W cycles will
- pretty much negate any performance benefits of a faster IPI disk,
- and may in fact slow things down enough to offset the CPU differences.
-
-
- #Now we looked at the manuals and the only option I found was TSIZE
- #which WAS set at 8Kbytes. The TSIZE is supposed to be only for
- #writes but as there is only one option I assumed it might also be
- #used for the read request size. It is however obvious that PC-NFS
- #pay's NO ATTENTION to this parameter and it wasting a tremendous
- #amount of time dealing with small block sizes.
-
- [Watch those capital letters - it reads as though you're shouting.]
- The tsize parameter is not "supposed to be only for writes. Consulting
- AnswerBook, I find that it defines tsize as
-
- The optimum transfer size of the server in bytes. This is
- the number of bytes the server would like to have in the data
- part of READ and WRITE requests.
-
- This is actually misleading. A client should treat tsize as an upper
- bound on the read and write size, since this is the only way a
- server with a marginal network interface can discourage the client
- from sending back-to-back packets.
-
- #So what can be done to PC-NFS to get it to use the LARGE block sizes?
- #Is there some parameter I have missed? The SMC8013 cards have
- #buffers of 16K on the board and certainly could handle 4K blocks if
- #not full 8K blocks. The 3C509 cards will stream the data off the
- #card as it comes in so the block size is irrelavent.
-
- These factors are only a tiny part of the whole problem. Let's assume
- that we're never going to drop packets, and that we can get the data
- off the board as fast as memory will take it. The more difficult issues
- revolve around buffer management. How many buffers? How big? In
- conventional memory, UMB, EMS? If EMS, do you transfer directly to EMS
- from the net or copy it up? If the former, what happens to interrupt
- latency? If the latter, how do you avoid excessive copying of data? How
- do you avoid the EMS performance hit in the simple case? Do you
- optimize around an EMS model, and if so what do you do about the
- millions of PCs which don't/can't support EMS? But wait - there's
- more! How do you handle ReadDir response buffering? Do you use the
- same buffer pool as for file data? If so, how do you deal with the
- radically different buffer aging and re-use patterns for data and
- directories? If not, where do you copy directory information to? Do you
- hard-code a single policy and configuration, or do you make the whole
- thing infinitely configurable? If the latter, how much does the
- configurability cost in terms of code size and performance?
-
- Over the lifetime of PC-NFS, the most consistent demand has been for
- size reduction. We've added features over the years, but we've tried to
- keep the footprint more or less constant. What avenues are open to us
- to improve performance? More buffering of small writes; support for
- larger (up to 8K) reads; more read buffers. All these will increase the
- size significantly. Putting buffers in EMS will slow some things down
- (EMS access times plus at least one extra copy). The present design
- and configuration represents our current best shot at balancing all of
- these conflicting demands. You (and others) want better performance:
- we'll obviously consider that in our product plans, but please
- recognize that there are competing demands that we must consider.
-
- Geoff
-
-
- --
- Geoff Arnold, PC-NFS architect, Sun Select. (geoff.arnold@East.Sun.COM)
- ADMINISTRIVIA==========ADMINISTRIVIA=====ADMINISTRIVIA==========ADMINISTRIVIA
- New address: SunSelect, 2 Elizabeth Drive, Chelmsford, MA 01824-4195
- New numbers: Phone: 508-442-0317 FAX: 508-250-5068
-