NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / clients / 256 next >

Wrap

Text File | 1993-01-04 | 15.7 KB | 409 lines

Newsgroups: comp.client-server Path: sparky!uunet!wupost!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!gatech!udel!rochester!rocksanne!news From: weltman@adoc.xerox.com (Rob Weltman) Subject: NFS slowness Message-ID: <1993Jan4.175456.20845@spectrum.xerox.com> Keywords: NFS Sender: news@spectrum.xerox.com Reply-To: weltman@adoc.xerox.com Organization: Xerox AODS, Palo Alto, CA Date: Mon, 4 Jan 1993 17:54:56 GMT Lines: 396 I received a number of informative responses to my question about why large NFS writes are so slow, and a few suggestions on how to improve performance, and where to look for more info on tuning NFS. The following are all the responses (hope that's OK to post; everybody responded directly to me, so people on the net can't really know what's been communicated). Rob --------------------------------------------- | Rob Weltman | | Xerox | | 3400 Hillview Ave, building 5 | | Palo Alto, CA 94303 | | | | weltman@adoc.xerox.com | | phone (415)-813-7477 fax (415)-813-6792 | --------------------------------------------- Responses to question "Why is NFS so slow for large file writes" ---------------------------------------------------------------- From auspex.com!guy Wed Dec 30 17:47:59 1992 From: guy@auspex.com (Guy Harris) To: weltman@adoc.xerox.com Subject: Re: NFS and RPC Content-Length: 720 X-Lines: 15 Status: RO >What was surprising was that it was so slow to just do a >large write You may have said the magic word - "write". Did the SPARCstation 2 have a Prestoserve board on it? If not, then every write request that the client sent over the wire to the server would block the process doing the write (which is *not* necessarily the process doing the writes to the file; writes may be handed off to a "biod" process) until the data being written made it out to disk. The Prestoserve board, plus its supporting software, "pretends" that a write to disk completes when the data being written to disk has been copied into the Prestoserve board's battery-backed-up static RAM. Data written there gets written out to disk later. ---------------------------------------------------------------- From auspex.com!guy Wed Dec 30 18:05:01 1992 From: guy@auspex.com (Guy Harris) To: weltman@adoc.xerox.com Subject: Re: NFS and RPC Content-Length: 749 X-Lines: 18 Status: RO > Thanks for your comment on the NFS server process blocking > during a write. Well, yes, the server process does block, although there are probably 7 or so more server processes ready to serve other request. My main point, though, was that some process on the NFS *client* would block. > If the client did a large binary write() - say 2 MB - would > that count as one transaction that would block until completion, > or does the OS or NFS break it up into a number of smaller > buffers, and block/resume for each one? NFS writes are generally limited to about 8K, so a 2MB write would be broken into 256 individual NFS write operations, and some process on the client would block and resume for each one (not necessarily the same process). ---------------------------------------------------------------- From rus.uni-stuttgart.de!Kurt.Jaeger Thu Dec 31 07:48:55 1992 From: Kurt.Jaeger@rus.uni-stuttgart.de (Kurt Jaeger aka PI) To: weltman@adoc.xerox.com Subject: Re: NFS and RPC Newsgroups: comp.client-server Status: RO Content-Length: 525 X-Lines: 15 In article <1992Dec30.155939.13344@spectrum.xerox.com> you write: > - have a small server process on the remote machine open a file there > - transfer the file to the remote process with RPC or socket > - have the remote process write to the file This is what AFS does. So short, PI -- PI at the User Help Desk Comp.Center U of Stuttgart, FRG 28 Years to go ! SMTP: pi@rus.uni-stuttgart.de Phone: +49 711 685-4828 X.400: pi@rus.uni-stuttgart.dbp.de Bitnet: zrzr0111@ds0rus54.bitnet (aka Kurt Jaeger) ---------------------------------------------------------------- From logos.ucs.indiana.edu!hughes Thu Dec 31 08:37:58 1992 From: larry hughes <hughes@logos.ucs.indiana.edu> To: weltman@adoc.xerox.com Subject: Re: NFS and RPC Status: RO Content-Length: 1037 X-Lines: 20 NFS is really "Not a File System". :-) Seriously, it's somewhat of a hack. You _can_ get much better throughput, however, with a "black box" NFS system. Running NFS on a general purpose Unix box is functional but not optimal. Check out Auspex. They make high-end NFS dedicated servers that are maybe an order of magnitude faster than a Sparc running NFS. They utilize multiprocessor technology, and capture the NFS packets at a very low layer (right above data link) so they can be farmed out to the I/O processors. By contrast, a typical Unix-based NFS server processes the NFS requests way up at the application layer (nfsd is just a user process after all). //==================================================================\\ || Larry J. Hughes, Jr. | hughes@indiana.edu || || Indiana University | "The person who knows everything || || University Computing Services | has a lot to learn." || \\===================================================================// ---------------------------------------------------------------- From uu5.psi.com!shearson!shearson.com!fgreco Thu Dec 31 09:12:12 1992 From: fgreco@shearson.com (Frank Greco) To: weltman@adoc.xerox.com Subject: Re: NFS and RPC Newsgroups: comp.client-server Status: RO Content-Length: 1611 X-Lines: 44 In article <1992Dec30.155939.13344@spectrum.xerox.com> you write: > >Why is NFS so slow? >------------------- The question should really read "Why are NFS-writes so slow?". NFS-reads aren't really that bad. The "problem" with NFS-writes is that they are synchronous; the write must be completed on the server before write() can return to your workstation. There are several well-known ways of speeding this up. * Make sure NFS is tuned properly for your network/server. See Hal Stern's book for great tips. * Get a Legato Prestoserver board which "fakes out" NFS into thinking the write() was completed and buffers the writes for later completion. Fairly cheap. * Get an NFS-machine like an Auspex NFS server. Fairly expensive and really cannot be used as a CPU-server. * Use other mechanisms of network files (rcp/your own fileserver), but note that one benefit of NFS is that it uses XDR, which you should duplicate. Others include ONC+ (Solaris 2.1) and AFS (which caches files locally). * There's another product called eNFS that apparently increases NFS performance, but I really don't know that much about it. Hope this helps, Frank G. -- +-On Assignment at: Lehman Brothers-+-Office: Mercury Technologies, Inc.-+ | World Financial Center, 11th Floor| PO Box 529 | | New York City, NY 10285 Desk: 1515| Fanwood, NJ 07023 | | email: fgreco@shearson.com | email: fgreco@mercury.com | + voice: (212)-640-9159-------------+ voice: (908)-754-7820--------------+ My comments reflect my own opinions, not my clients'. ---------------------------------------------------------------- From iscp.bellcore.com!jona Sun Jan 3 10:54:36 1993 To: weltman@adoc.xerox.com (Rob Weltman) Subject: NFS From: "Jon Alperin" <jona@iscp.bellcore.com> Content-Length: 307 X-Lines: 11 Status: RO Rob, it sounds like you have implemented caching, whereby the file is transferred to a local machine. The only problem you will have is maintaing the lock on the remote file so that some other user doesn't modify it underneath you. This is what the Andrew File System (AFS) did. jon alperin Bellcore. ---------------------------------------------------------------- From uunet.uu.net!imatron!raven1!jmm Thu Dec 31 08:10:55 1992 From: imatron!raven1!jmm@uunet.uu.net (Jon Meyers) To: guy@auspex.com Subject: NFS and RPC discussion. Cc: weltman@parc.xerox.com, jmm@uunet.uu.net Status: RO Content-Length: 1556 X-Lines: 33 I've noted the E-mail conversation you've been having with Rob Weltman at Xerox Parc regarding NFS and RPC (Rob's been cc'ing me). I'm actually the (not so) mysterious friend of Rob's who did the benchmarks he initially mentioned. First off, I'd like to say thanks for the cogent comments you've been making. I've found it very interesting to see people at various companies wrangling with the NFS transfer issue. Universally, writing to a remote-mounted file has caused them performance headaches. The discussion you and Rob have been having has pretty much supported my thought that there's something in NFS (and SunOS?) that really should be improved. Out of curiousity, where did you come upon your information on how NFS is working internally? A reference on the topic would be very useful. Thanks in advance. Another question - which company provides Prestoserve boards? Sounds like the board is designed to fit neatly into one niche, but I'd like to go get more information. As Rob mentioned, my solution to the NFS issue was to provide servers on the Sparcs I wanted to write to, and to let the clients send data to them using RPC's utilizing TCP/IP. Of course, the servers do more than just write - we had the need here to have them do several file and other operations. Anyway, the performance looks good so far - within the expectations of ethernet. So, thanks again for the information you've provided! Thanks in advance also for any more thoughts you might have. Happy New Year, Jonathan Meyers Imatron, Inc. jmm@imatron.com ---------------------------------------------------------------- From auspex.com!guy Thu Dec 31 11:08:24 1992 From: guy@auspex.com (Guy Harris) To: jmm@imatron.com Subject: Re: NFS and RPC discussion. Cc: weltman@parc.xerox.com Status: RO Content-Length: 5983 X-Lines: 116 > First off, I'd like to say thanks for the cogent comments you've been making. > I've found it very interesting to see people at various companies wrangling > with the NFS transfer issue. Universally, writing to a remote-mounted file > has caused them performance headaches. The discussion you and Rob have been > having has pretty much supported my thought that there's something in NFS > (and SunOS?) that really should be improved. It's mainly in NFS, and not SunOS-specific. The idea is that the protocol is (mostly...) "stateless", and if a server crashes and reboots, the client shouldn't lose any data or otherwise have any problems other than delays. If a machine writes data to a file, when the write is flagged as "done", the software on the machine generally assumes that it is not obliged to keep the data around in the buffer from which it wrote the data, because it can fetch it from the file again. (It may keep it around *anyway* for performance reasons, but if it needs that buffer for other purposes, it can just recycle it.) This means that if the file is being accessed over NFS from a file server, the file server shouldn't reply - and thus tell the machine doing the writing to the file that the write is "done" - unless, if the server were to crash after the reply is sent out, and then reboot, the client could then ask the server for the data again, and get the data it wrote. If the server *also* buffers data written to the file, so that the file system doesn't necessarily wait until the data is written to some form of storage that survives a server crash before saying the write is "done", then, in order for the NFS write to be "safe", the server must ensure that the file system *does* wait until the data is written to some form of storage that survives a server crash before saying the write is "done" (or must, at least, wait until the data is written to some form of storage that survives a server crash before replying to the client's write request). Normally, at least on most UNIX systems, "some form of storage that survives a server crash" means "the disk", because the UNIX buffer cache or page pool is *not* necessarily preserved if the system crashes and reboots (although a crash *might* cause the OS to try to sync data out to disk, that isn't guaranteed to work). The Prestoserve board acts as "some form of storage that survives a server crash", but can be written to faster than can a disk; it's static RAM with a battery backup (so that even if the crash is caused by a power failure, the data will still survive the crash). Some vendors, including SGI, HP, and Auspex, have an *option* for their NFS server code that allows the system administrator to specify that writes to a particular file system by the NFS server should *not* wait until the data makes it to "some form of storage that survives a server crash" before being marked as "done". This allows you to get higher write performance, but take the risk that a write by an NFS client might not survive a crash. We (Auspex) do not make that the default; we use it here only for diskless clients' swap space. SGI *does* make it the default, claiming that crashes are rare enough that you're not actually going to lose any data. Sun and SGI folk spend a fair bit of time yelling at one another about this.... We (Auspex) also provide our own implementation of the Prestoserve idea. > Out of curiousity, where did you come upon your information on how NFS > is working internally? Which part? The bit about writes being, or not being, acknowledged by the server until they make it to "some form of storage that survives a server crash" is discussed in, for example, RFC 1094, the RFC that describes the NFS protocol: All of the procedures in the NFS protocol are assumed to be synchronous. When a procedure returns to the client, the client can assume that the operation has completed and any data associated with the request is now on stable storage. For example, a client WRITE request may cause the server to update data blocks, filesystem information blocks (such as indirect blocks), and file attribute information (size and modify times). When the WRITE returns to the client, it can assume that the write is safe, even in case of a server crash, and it can discard the data written. This is a very important part of the statelessness of the server. If the server waited to flush data from remote requests, the client would have to save those requests so that it could resend them in case of a server crash. I found out about the way the client code in SunOS implements file system operations from having worked for 3 years in the OS group at Sun, and from having worked for 4 years in the software group here (we're too small to have an OS group yet...). > Another question - which company provides Prestoserve boards? A company called Legato, in Palo Alto. They were originally founded as a software company; they came up with the idea of the Prestoserve board and started selling the idea (they never made the boards, as far as I know; I think they originally got somebody to make it, and then sold the idea to Sun and DEC and possibly other people) to fund themselves. They also sell various software products, such as Networker Backup, which is an RPC-based backup product; I think they're now providing it for Netware, as well as for various UNIX systems and DOS. > Sounds like the > board is designed to fit neatly into one niche, but I'd like to go get more > information. "Niche" in the hardware sense, or the marketplace sense? They offer both VMEbus and SBus versions for Suns; I don't know if their DEC version is a Turbochannel board or not. I think you can get them directly from Sun and DEC; I don't know if Legato still sells them directly (as indicated, they're really a software company, and got in the hardware business to keep themselves fed while they built their software). ---