NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / protocol / nfs / 2961 < prev next >

Wrap

Internet Message Format | 1992-12-15 | 8.0 KB

Path: sparky!uunet!usc!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!seven-up.East.Sun.COM!tyger.Eng.Sun.COM!geoff From: geoff@tyger.Eng.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) Newsgroups: comp.protocols.nfs Subject: Re: Sun PC-NFS performance (again) Date: 15 Dec 1992 15:10:07 GMT Organization: SunSelect Lines: 145 Message-ID: <1gksggINNjmh@seven-up.East.Sun.COM> References: <1992Dec7.173718.12792@Comtech.com> <1g3673INNa64@seven-up.East.Sun.COM> <1992Dec15.015951.20329@Comtech.com> NNTP-Posting-Host: tyger.east.sun.com Quoth aga@Comtech.com (Alan G. Arndt) (in <1992Dec15.015951.20329@Comtech.com>): [Performance report claiming that PC-NFS...... # ... CAN'T go faster then 277.7 KB/sec #except in one circumstance which is belived to be because of the #closeness of the machine. # #So the PC isn't the limiting factor, the PC's card isn't the limiting #factor, the network isn't the limiting factor and the SERVER ISN'T #the limiting factor, the protocol isn't the limiting factor, in fact #the LOW Level drivers aren't even the limiting factor. So that only #leaves ONE item, PC-NFS itself. ????? PC-NFS is just a BWOS (big wad of software). Its performance is determined by all of those things that you just said were *not* limiting factors. (OK. Let me cover *all* the bases. It is theoretically possible for PC-NFS to self-time in such a way that the performance is independent of, e.g., CPU speed. I can assure you that it doesn't do this.) Obviously the design of PC-NFS will affect the performance, but the assertion that there is an absolute limit of 277KB because of PC-NFS is incongruous. #Now that I have shown all the data I have I will go into what we #found out. We then used etherfind to watch the packets going across. #Years ago (5+) with PC-NFS V2.x I watched this transaction and I #could swear it the Sun was sending 8KB blocks in 5 packet bursts and #the pc would receive them and request another block. HOWEVER, #yesterday we noticed that the PC was only requesting 1K blocks (1166) #bytes. The request was a healthy 218 bytes as well. Your memory is faulty. PC-NFS has always used a read size of 1K. (Initially this was because most/all PC Ethernet cards were incapable of dealing with back-to-back packets. Unfortunately it got ingrained into the buffer management logic, and it's still there. But I digress...) As for the sizes of the request (and response) packets, these are NFS issues rather than anything to do with PC-NFS. #When one #starts to add up the requests and individual 1K bytes I can belive #that the performance is ONLY 277 KB/sec and that the machines are #doing their BEST the get that. That is why there is a noticeable #difference between 50 Meters away and a machine RIGHT next to it's #client. OK, but see below. (And the last sentence only makes sense if you are going through some painfully slow routers. Bridges or repeaters shouldn't introduce any noticeable delay.) #We also noticed that the PC was only creating WRITES of apparently #512 bytes (734). Small writes strike again. PC-NFS uses a dual write strategy. Small writes are buffered up to 512 byte blocks. Larger writes (256+ bytes) are written directly from user memory without any intervening copies (in most cases the data is DMA's directly from the user buffer to the NIC). These values were developed by testing the performance of real-world applications - *not* synthetic benchmarks. We found that in most cases writes of less than 256 bytes were in fact writes of 1 byte - typically from operations such as "print to file" or i/o redirection. Why 512 bytes for the buffer? A typical space/time tradeoff, unfortunately. On an NFS server, any write other than an optimally aligned write of exactly bsize bytes may require a read-modify-write to get the disk updated. Small transfers are clearly much less efficient than large ones in this mode, and in fact using a 512 byte write size will reduce performance noticeably. I'm posting as a separate follow-up copies of my "reader" and "writer" programs which do 8K writes and reads. Under PC-NFS, 8K writes are performed using a single RPC (assuming that the server can support this tsize); each UDP datagram is obviously fragmented based on the network MTU. This explains most of the reason for the discrepancy between your numbers and those I reported earlier (310-420K reads, 113-480K writes). The only remaining question is why you were seeing almost identical performance on an SS1+ and a 4/670. The most likely "levelling" factor is the disk: doing 512-byte R/M/W cycles will pretty much negate any performance benefits of a faster IPI disk, and may in fact slow things down enough to offset the CPU differences. #Now we looked at the manuals and the only option I found was TSIZE #which WAS set at 8Kbytes. The TSIZE is supposed to be only for #writes but as there is only one option I assumed it might also be #used for the read request size. It is however obvious that PC-NFS #pay's NO ATTENTION to this parameter and it wasting a tremendous #amount of time dealing with small block sizes. [Watch those capital letters - it reads as though you're shouting.] The tsize parameter is not "supposed to be only for writes. Consulting AnswerBook, I find that it defines tsize as The optimum transfer size of the server in bytes. This is the number of bytes the server would like to have in the data part of READ and WRITE requests. This is actually misleading. A client should treat tsize as an upper bound on the read and write size, since this is the only way a server with a marginal network interface can discourage the client from sending back-to-back packets. #So what can be done to PC-NFS to get it to use the LARGE block sizes? #Is there some parameter I have missed? The SMC8013 cards have #buffers of 16K on the board and certainly could handle 4K blocks if #not full 8K blocks. The 3C509 cards will stream the data off the #card as it comes in so the block size is irrelavent. These factors are only a tiny part of the whole problem. Let's assume that we're never going to drop packets, and that we can get the data off the board as fast as memory will take it. The more difficult issues revolve around buffer management. How many buffers? How big? In conventional memory, UMB, EMS? If EMS, do you transfer directly to EMS from the net or copy it up? If the former, what happens to interrupt latency? If the latter, how do you avoid excessive copying of data? How do you avoid the EMS performance hit in the simple case? Do you optimize around an EMS model, and if so what do you do about the millions of PCs which don't/can't support EMS? But wait - there's more! How do you handle ReadDir response buffering? Do you use the same buffer pool as for file data? If so, how do you deal with the radically different buffer aging and re-use patterns for data and directories? If not, where do you copy directory information to? Do you hard-code a single policy and configuration, or do you make the whole thing infinitely configurable? If the latter, how much does the configurability cost in terms of code size and performance? Over the lifetime of PC-NFS, the most consistent demand has been for size reduction. We've added features over the years, but we've tried to keep the footprint more or less constant. What avenues are open to us to improve performance? More buffering of small writes; support for larger (up to 8K) reads; more read buffers. All these will increase the size significantly. Putting buffers in EMS will slow some things down (EMS access times plus at least one extra copy). The present design and configuration represents our current best shot at balancing all of these conflicting demands. You (and others) want better performance: we'll obviously consider that in our product plans, but please recognize that there are competing demands that we must consider. Geoff -- Geoff Arnold, PC-NFS architect, Sun Select. (geoff.arnold@East.Sun.COM) ADMINISTRIVIA==========ADMINISTRIVIA=====ADMINISTRIVIA==========ADMINISTRIVIA New address: SunSelect, 2 Elizabeth Drive, Chelmsford, MA 01824-4195 New numbers: Phone: 508-442-0317 FAX: 508-250-5068