home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.client-server
- Path: sparky!uunet!wupost!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!gatech!udel!rochester!rocksanne!news
- From: weltman@adoc.xerox.com (Rob Weltman)
- Subject: NFS slowness
- Message-ID: <1993Jan4.175456.20845@spectrum.xerox.com>
- Keywords: NFS
- Sender: news@spectrum.xerox.com
- Reply-To: weltman@adoc.xerox.com
- Organization: Xerox AODS, Palo Alto, CA
- Date: Mon, 4 Jan 1993 17:54:56 GMT
- Lines: 396
-
- I received a number of informative responses to my question
- about why large NFS writes are so slow, and a few suggestions on
- how to improve performance, and where to look for more info on
- tuning NFS. The following are all the responses (hope that's OK
- to post; everybody responded directly to me, so people on the
- net can't really know what's been communicated).
-
- Rob
-
-
- ---------------------------------------------
- | Rob Weltman |
- | Xerox |
- | 3400 Hillview Ave, building 5 |
- | Palo Alto, CA 94303 |
- | |
- | weltman@adoc.xerox.com |
- | phone (415)-813-7477 fax (415)-813-6792 |
- ---------------------------------------------
-
-
-
- Responses to question "Why is NFS so slow for large file writes"
- ----------------------------------------------------------------
-
- From auspex.com!guy Wed Dec 30 17:47:59 1992
- From: guy@auspex.com (Guy Harris)
- To: weltman@adoc.xerox.com
- Subject: Re: NFS and RPC
- Content-Length: 720
- X-Lines: 15
- Status: RO
-
- >What was surprising was that it was so slow to just do a
- >large write
-
- You may have said the magic word - "write".
-
- Did the SPARCstation 2 have a Prestoserve board on it? If not, then
- every write request that the client sent over the wire to the server
- would block the process doing the write (which is *not* necessarily the
- process doing the writes to the file; writes may be handed off to a
- "biod" process) until the data being written made it out to disk.
-
- The Prestoserve board, plus its supporting software, "pretends" that a
- write to disk completes when the data being written to disk has been
- copied into the Prestoserve board's battery-backed-up static RAM. Data
- written there gets written out to disk later.
-
- ----------------------------------------------------------------
-
- From auspex.com!guy Wed Dec 30 18:05:01 1992
- From: guy@auspex.com (Guy Harris)
- To: weltman@adoc.xerox.com
- Subject: Re: NFS and RPC
- Content-Length: 749
- X-Lines: 18
- Status: RO
-
- > Thanks for your comment on the NFS server process blocking
- > during a write.
-
- Well, yes, the server process does block, although there are probably 7
- or so more server processes ready to serve other request.
-
- My main point, though, was that some process on the NFS *client* would
- block.
-
- > If the client did a large binary write() - say 2 MB - would
- > that count as one transaction that would block until completion,
- > or does the OS or NFS break it up into a number of smaller
- > buffers, and block/resume for each one?
-
- NFS writes are generally limited to about 8K, so a 2MB write would be
- broken into 256 individual NFS write operations, and some process on the
- client would block and resume for each one (not necessarily the same
- process).
-
- ----------------------------------------------------------------
-
- From rus.uni-stuttgart.de!Kurt.Jaeger Thu Dec 31 07:48:55 1992
- From: Kurt.Jaeger@rus.uni-stuttgart.de (Kurt Jaeger aka PI)
- To: weltman@adoc.xerox.com
- Subject: Re: NFS and RPC
- Newsgroups: comp.client-server
- Status: RO
- Content-Length: 525
- X-Lines: 15
-
- In article <1992Dec30.155939.13344@spectrum.xerox.com> you write:
- > - have a small server process on the remote machine open a file there
- > - transfer the file to the remote process with RPC or socket
- > - have the remote process write to the file
-
- This is what AFS does.
-
- So short, PI
-
- --
- PI at the User Help Desk Comp.Center U of Stuttgart, FRG 28 Years to go !
- SMTP: pi@rus.uni-stuttgart.de Phone: +49 711 685-4828
- X.400: pi@rus.uni-stuttgart.dbp.de
- Bitnet: zrzr0111@ds0rus54.bitnet (aka Kurt Jaeger)
-
-
- ----------------------------------------------------------------
-
- From logos.ucs.indiana.edu!hughes Thu Dec 31 08:37:58 1992
- From: larry hughes <hughes@logos.ucs.indiana.edu>
- To: weltman@adoc.xerox.com
- Subject: Re: NFS and RPC
- Status: RO
- Content-Length: 1037
- X-Lines: 20
-
- NFS is really "Not a File System". :-)
-
- Seriously, it's somewhat of a hack. You _can_ get much better
- throughput, however, with a "black box" NFS system. Running
- NFS on a general purpose Unix box is functional but not optimal.
-
- Check out Auspex. They make high-end NFS dedicated servers that
- are maybe an order of magnitude faster than a Sparc running NFS.
- They utilize multiprocessor technology, and capture the NFS packets
- at a very low layer (right above data link) so they can be farmed
- out to the I/O processors. By contrast, a typical Unix-based
- NFS server processes the NFS requests way up at the application
- layer (nfsd is just a user process after all).
-
- //==================================================================\\
- || Larry J. Hughes, Jr. | hughes@indiana.edu ||
- || Indiana University | "The person who knows everything ||
- || University Computing Services | has a lot to learn." ||
- \\===================================================================//
-
-
- ----------------------------------------------------------------
-
- From uu5.psi.com!shearson!shearson.com!fgreco Thu Dec 31 09:12:12 1992
- From: fgreco@shearson.com (Frank Greco)
- To: weltman@adoc.xerox.com
- Subject: Re: NFS and RPC
- Newsgroups: comp.client-server
- Status: RO
- Content-Length: 1611
- X-Lines: 44
-
- In article <1992Dec30.155939.13344@spectrum.xerox.com> you write:
- >
- >Why is NFS so slow?
- >-------------------
-
- The question should really read "Why are NFS-writes so slow?".
-
- NFS-reads aren't really that bad.
-
- The "problem" with NFS-writes is that they are synchronous;
- the write must be completed on the server before write() can
- return to your workstation.
-
- There are several well-known ways of speeding this up.
-
- * Make sure NFS is tuned properly for your network/server.
- See Hal Stern's book for great tips.
-
- * Get a Legato Prestoserver board which "fakes out" NFS
- into thinking the write() was completed and buffers the
- writes for later completion. Fairly cheap.
-
- * Get an NFS-machine like an Auspex NFS server. Fairly
- expensive and really cannot be used as a CPU-server.
-
- * Use other mechanisms of network files (rcp/your own
- fileserver), but note that one benefit of NFS is that it uses
- XDR, which you should duplicate. Others include ONC+ (Solaris
- 2.1) and AFS (which caches files locally).
-
- * There's another product called eNFS that apparently increases
- NFS performance, but I really don't know that much about it.
-
- Hope this helps,
-
- Frank G.
- --
- +-On Assignment at: Lehman Brothers-+-Office: Mercury Technologies, Inc.-+
- | World Financial Center, 11th Floor| PO Box 529 |
- | New York City, NY 10285 Desk: 1515| Fanwood, NJ 07023 |
- | email: fgreco@shearson.com | email: fgreco@mercury.com |
- + voice: (212)-640-9159-------------+ voice: (908)-754-7820--------------+
-
- My comments reflect my own opinions, not my clients'.
-
-
- ----------------------------------------------------------------
-
- From iscp.bellcore.com!jona Sun Jan 3 10:54:36 1993
- To: weltman@adoc.xerox.com (Rob Weltman)
- Subject: NFS
- From: "Jon Alperin" <jona@iscp.bellcore.com>
- Content-Length: 307
- X-Lines: 11
- Status: RO
-
-
- Rob,
-
- it sounds like you have implemented caching, whereby the file is
- transferred to a local machine. The only problem you will have is
- maintaing the lock on the remote file so that some other user
- doesn't modify it underneath you. This is what the Andrew File
- System (AFS) did.
-
- jon alperin
- Bellcore.
-
-
- ----------------------------------------------------------------
-
- From uunet.uu.net!imatron!raven1!jmm Thu Dec 31 08:10:55 1992
- From: imatron!raven1!jmm@uunet.uu.net (Jon Meyers)
- To: guy@auspex.com
- Subject: NFS and RPC discussion.
- Cc: weltman@parc.xerox.com, jmm@uunet.uu.net
- Status: RO
- Content-Length: 1556
- X-Lines: 33
-
- I've noted the E-mail conversation you've been having with Rob Weltman at Xerox
- Parc regarding NFS and RPC (Rob's been cc'ing me). I'm actually the (not so)
- mysterious friend of Rob's who did the benchmarks he initially mentioned.
-
- First off, I'd like to say thanks for the cogent comments you've been making.
- I've found it very interesting to see people at various companies wrangling
- with the NFS transfer issue. Universally, writing to a remote-mounted file
- has caused them performance headaches. The discussion you and Rob have been
- having has pretty much supported my thought that there's something in NFS
- (and SunOS?) that really should be improved.
-
- Out of curiousity, where did you come upon your information on how NFS is working
- internally? A reference on the topic would be very useful. Thanks in advance.
-
- Another question - which company provides Prestoserve boards? Sounds like the
- board is designed to fit neatly into one niche, but I'd like to go get more
- information.
-
- As Rob mentioned, my solution to the NFS issue was to provide servers on the
- Sparcs I wanted to write to, and to let the clients send data to them using
- RPC's utilizing TCP/IP. Of course, the servers do more than just write - we
- had the need here to have them do several file and other operations. Anyway,
- the performance looks good so far - within the expectations of ethernet.
-
- So, thanks again for the information you've provided! Thanks in advance also
- for any more thoughts you might have.
-
-
- Happy New Year,
-
- Jonathan Meyers
- Imatron, Inc.
- jmm@imatron.com
-
-
- ----------------------------------------------------------------
-
- From auspex.com!guy Thu Dec 31 11:08:24 1992
- From: guy@auspex.com (Guy Harris)
- To: jmm@imatron.com
- Subject: Re: NFS and RPC discussion.
- Cc: weltman@parc.xerox.com
- Status: RO
- Content-Length: 5983
- X-Lines: 116
-
- > First off, I'd like to say thanks for the cogent comments you've been making.
- > I've found it very interesting to see people at various companies wrangling
- > with the NFS transfer issue. Universally, writing to a remote-mounted file
- > has caused them performance headaches. The discussion you and Rob have been
- > having has pretty much supported my thought that there's something in NFS
- > (and SunOS?) that really should be improved.
-
- It's mainly in NFS, and not SunOS-specific.
-
- The idea is that the protocol is (mostly...) "stateless", and if a
- server crashes and reboots, the client shouldn't lose any data or
- otherwise have any problems other than delays.
-
- If a machine writes data to a file, when the write is flagged as "done",
- the software on the machine generally assumes that it is not obliged to
- keep the data around in the buffer from which it wrote the data, because
- it can fetch it from the file again. (It may keep it around *anyway*
- for performance reasons, but if it needs that buffer for other purposes,
- it can just recycle it.)
-
- This means that if the file is being accessed over NFS from a file
- server, the file server shouldn't reply - and thus tell the machine
- doing the writing to the file that the write is "done" - unless, if the
- server were to crash after the reply is sent out, and then reboot, the
- client could then ask the server for the data again, and get the data it
- wrote.
-
- If the server *also* buffers data written to the file, so that the file
- system doesn't necessarily wait until the data is written to some form
- of storage that survives a server crash before saying the write is
- "done", then, in order for the NFS write to be "safe", the server must
- ensure that the file system *does* wait until the data is written to
- some form of storage that survives a server crash before saying the
- write is "done" (or must, at least, wait until the data is written to
- some form of storage that survives a server crash before replying to the
- client's write request).
-
- Normally, at least on most UNIX systems, "some form of storage that
- survives a server crash" means "the disk", because the UNIX buffer cache
- or page pool is *not* necessarily preserved if the system crashes and
- reboots (although a crash *might* cause the OS to try to sync data out
- to disk, that isn't guaranteed to work).
-
- The Prestoserve board acts as "some form of storage that survives a
- server crash", but can be written to faster than can a disk; it's static
- RAM with a battery backup (so that even if the crash is caused by a
- power failure, the data will still survive the crash).
-
- Some vendors, including SGI, HP, and Auspex, have an *option* for their
- NFS server code that allows the system administrator to specify that
- writes to a particular file system by the NFS server should *not* wait
- until the data makes it to "some form of storage that survives a server
- crash" before being marked as "done". This allows you to get higher
- write performance, but take the risk that a write by an NFS client might
- not survive a crash.
-
- We (Auspex) do not make that the default; we use it here only for
- diskless clients' swap space. SGI *does* make it the default, claiming
- that crashes are rare enough that you're not actually going to lose any
- data. Sun and SGI folk spend a fair bit of time yelling at one another
- about this....
-
- We (Auspex) also provide our own implementation of the Prestoserve idea.
-
- > Out of curiousity, where did you come upon your information on how NFS
- > is working internally?
-
- Which part?
-
- The bit about writes being, or not being, acknowledged by the server
- until they make it to "some form of storage that survives a server
- crash" is discussed in, for example, RFC 1094, the RFC that describes
- the NFS protocol:
-
- All of the procedures in the NFS protocol are assumed to be
- synchronous. When a procedure returns to the client, the client can
- assume that the operation has completed and any data associated with
- the request is now on stable storage. For example, a client WRITE
- request may cause the server to update data blocks, filesystem
- information blocks (such as indirect blocks), and file attribute
- information (size and modify times). When the WRITE returns to the
- client, it can assume that the write is safe, even in case of a
- server crash, and it can discard the data written. This is a very
- important part of the statelessness of the server. If the server
- waited to flush data from remote requests, the client would have to
- save those requests so that it could resend them in case of a server
- crash.
-
- I found out about the way the client code in SunOS implements file
- system operations from having worked for 3 years in the OS group at Sun,
- and from having worked for 4 years in the software group here (we're too
- small to have an OS group yet...).
-
- > Another question - which company provides Prestoserve boards?
-
- A company called Legato, in Palo Alto. They were originally founded as
- a software company; they came up with the idea of the Prestoserve board
- and started selling the idea (they never made the boards, as far as I
- know; I think they originally got somebody to make it, and then sold the
- idea to Sun and DEC and possibly other people) to fund themselves.
-
- They also sell various software products, such as Networker Backup,
- which is an RPC-based backup product; I think they're now providing it
- for Netware, as well as for various UNIX systems and DOS.
-
- > Sounds like the
- > board is designed to fit neatly into one niche, but I'd like to go get more
- > information.
-
- "Niche" in the hardware sense, or the marketplace sense? They offer
- both VMEbus and SBus versions for Suns; I don't know if their DEC
- version is a Turbochannel board or not. I think you can get them
- directly from Sun and DEC; I don't know if Legato still sells them
- directly (as indicated, they're really a software company, and got in
- the hardware business to keep themselves fed while they built their
- software).
-
-
-
-
- ---
-
-
-