home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!olivea!decwrl!sgi!rhyolite!vjs
- From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
- Newsgroups: comp.protocols.nfs
- Subject: Re: NFS corruption discovery
- Summary: you've read it all before
- Message-ID: <or7a4f8@rhyolite.wpd.sgi.com>
- Date: 21 Aug 92 04:41:43 GMT
- References: <1992Aug19.225010.18306@den.mmc.com> <1992Aug21.015953.10705@decuac.dec.com>
- Organization: Silicon Graphics, Inc. Mountain View, CA
- Lines: 94
-
- Just to bore everyone, I'll repeat what I said before. Please note
- that I said all of this before:
-
- 1. UDP checksums are good. There is no current performance reason to
- turn them off.
-
- 2. all else equal, NVRAM is more reliable than naked DRAM or even
- UPS-backed DRAM. NVRAM is less reliable than a synchronous write
- to a non-caching disk.
-
- 3. things are never equal, and the reliability of an NFS server is
- usually, and always in my experience since leaving my previous
- position in 1986, much more dependent on other things than the
- volatility of its memory.
-
- 4. Ultrix has had NFS bugs. That should surprise no one.
- We have had NFS bugs. That should surprise no one.
- To the best of my knowledge, just watching recent reports in the
- SGI bug tracking systems, whichever versions of Ultrix are commonly
- connected today to SGI boxes are a little buggier than current SGI
- versions. Given the relative age of the implementations (SGI's
- first shipped in 1986), that should surprise no one. I have no
- doubt that SGI and DEC will reverse that in the future, making the
- SGI code the buggier, and then reverse it again, as both
- organizations add and remove bugs. Such is life.
-
- 5. Those Ultrix bugs that I've heard about, in my honest but perhaps
- biased judgment, have caused customers grief that would not be
- affected by an NVRAM acclerator or by async writes. The same
- applies to bugs in previous generations of SGI NFS code. Because I
- am conservative, I think that is likely to be true of both in the
- future, that other problems affect reliability more than (a)sync
- writes.
-
- 6. Mr. Ranum is the fellow spreading FUD by equating UDP checksums
- with async writes. It does not matter which of the two is more or
- less evil, dangerous or unsafe. Equating UDP checksums with async
- writes is as silly as equating either with using ECC instead of
- parity protected memory, or with using a failsafe RAID instead of
- single disk spindles. Many things affect the cost, speed, and
- reliability of a system, and it is simply self-serving nonsense to
- say they are all the same. Equating UDP checksums with async
- writes is only FUD, conceivably, but probably not consciously
- intended to sell more NVRAM packages.
-
- Please recall the start of this thread. Someone asked about file
- corruption in circumstances that clearly had nothing to do with
- server crashes, and so absolutely nothing to do with (a)sync writes.
- Why did Mr. Ranum drag in async writes? Because he cares and is
- proud of his NVRAM, not because it was relevant. He was, in his words,
- spreading FUD.
-
- As I've observed before, it is a funny fact of life that those who are
- most offended by async NFS writes are those who sell hardware NFS
- accelerators which do "asynchronous writes to the real disk but
- syncrhonous to storage more stable than ordinary DRAM." I think the
- connection is not greed, but the human tendency to think that your
- extra "value added" is absolutely required instead of only extra.
- NVRAM-async is less reliable than a straight write to a non-caching
- disk drive and more reliable than a write to UPS protected DRAM. I
- trust no one disputes either of those facts. I think the small
- increase in reliability among all 3 is not worth the cost of the
- hardware. However, we looked at PrestoServ and decided not to support
- it, and so I might be as biased in my way as people at DEC, Sun, and
- Legato.
-
- Someone else has observed that the phrase "stable storage" is an
- unintentional mis-use of an existing technical term. B.Lampson coined
- the term in the literature years before NFS. As far as I know, that
- technicanl meaning of "stable storage" has nothing to do with any part
- of any NFS implemenation anywhere from any organization. The LADDIS
- committee has recently been trying to invent a new meaning for "stable
- storage" that somehow fits devices like PrestoServ. I think that is an
- exercise in marketing wordsmithing, but then I'm biased.
-
- Mr. Ranum asks that I defend async writes. The most that seems
- appropriate or necessary is to observe that to the best of my
- knowledge, I can recall absolutely no customer who has said that the
- default of async-on should be changed. I can recall absolutely no
- customer who has complained of lost data because of SGI async writes.
- Intellectually, I believe it has probably happened, but I do not know
- of even one case. Regretably, I do know that some customers have lost
- data because of bugs in our NFS client and server code. Because of my
- consistent position in the SGI developement organization since before
- the first SGI NFS release, I honestly believe that I have an accurate
- knowledge of the relative frequencies, and so of the relative dangers.
- The dangers of async writes are just not relevant for modern servers
- that rarely crash.
-
- In 1985, no-UDP-checksums and sync-writes was the right combination for
- best speed and reliability. Things have changed.
-
-
- Vernon Schryver, vjs@sgi.com
-