NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / benchmar / 1684 < prev next >

Wrap

Internet Message Format | 1992-11-13 | 19.8 KB

Xref: sparky comp.benchmarks:1684 comp.arch.storage:768 Path: sparky!uunet!europa.asd.contel.com!emory!swrinde!cs.utexas.edu!rutgers!cbmvax!jesup From: jesup@cbmvax.commodore.com (Randell Jesup) Newsgroups: comp.benchmarks,comp.arch.storage Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone Message-ID: <37043@cbmvax.commodore.com> Date: 14 Nov 92 04:17:29 GMT References: <1992Nov11.064154.17204@fasttech.com> <1992Nov11.210749.3953@igor.tamri.com> <36995@cbmvax.commodore.com> <1992Nov12.193308.20297@igor.tamri.com> Reply-To: jesup@cbmvax.commodore.com (Randell Jesup) Organization: Commodore, West Chester, PA Lines: 352 jbass@igor.tamri.com (John Bass) writes: > >I've gotten a lot of wonderful mail on this series of posting, the few >detractors make arguements similar to Randell's, so I will rebut his >public position a little more strongly than otherwise. Hmmm, I have this vision of a bunch of spectators waving flags in stands as we "battle" it out.... :-) The discussion topic (as you noted in the subject) has changed quite a bit from the original topic; I suspect assumptions about this topic have been part of the reasons for disagreement. (For historical interest, this started out with a "why is IOZone dangerous to use without understanding" message.) For the record, I used the Amiga (nee Tripos) filesystem more as an example of how implemtation details of an OS, such as the Unix buffer cache, can influence filesystem design and performance, and benchmarks. The Amiga filesystem is by no means perfect. It has some good qualities, such as good (close to driver limits) performance on large reads/writes and fast opening of files (due to hashed directories). It has some bad qualities too, such as using block lists instead of extents (fixing this would be an easy big win), slow directory scanning (though the recent release has an ok solution to this), and a few other things. It is designed for micro's (multitasking, but micros). It performs well in a single-user environment in most situations. Now on to the real discussion... > 2) Substantial in-field performance anoymolies where critical > data is in remapped blocks ... what if this was a bench > mark evaluation purchase and the customer based a $50M order > on it's performance? What about the ordinary customer who > just has to live with it's poor behaviour? That example is a good example of why I said I think we've been talking about different things/worlds. That's not the desktop market. That's not even close. In that sort of market, spending LOTS of engineering dollars to avoid worst-case problems or to get the last bit of performance out of a drive that is the gating performance issue for many users makes sense. It (usually) doesn't in the desktop market. Most drives that remap do so to spares on the same cylinder. While fetching such a spare will take time, it won't take much (average 1/2 rev). If you have contiguous data and the drive is doing read-ahead, it may only take an extra sector time, since the drive will be using that extra half-rev to throw data you'll want into it's buffer). Of course, if you run out of spares on the cylinder, you have to do an expensive extra seek. >my position was that remapping should be done in the filesystem >so that the bad-blocks would NEVER be allocated and need to be remapped. This >IS a major divergence in thought from current practice in UNIX ... not at all >for DOS which has always managed bad block info in the filesystem. There are costs with that approach also. Quite possibly the costs aren't large, but they are there. In our particular application, we make use of knowing that all blocks are usable to allow image-backups and copies of partitions - that's painful at best and at worst impossible without mapping in the driver or the device. However, for systems like most Unixes which rarely use removable media this would not be very important. >For WD1003 type controllers the long accepted practice was to flag each >sector/track bad at low level format, so that when DOS format or SCO badtrk >scanned the media it was ASSURED of finding the area bad and marking it so. >From a performance point of view (the one I consistantly take) this is vastly >better! One thing you miss in your discussion, though: all reasonable SCSI drives will tell you what sectors have been mapped out - Read Defect Data. If the filesystem wishes to use the information, it's there. You can even figure out if will replace a cylinder by examining some of the mode pages, or change the formatting so it doesn't spare on a per-cylinder basis, at least on some drives (exact options vary from drive to drive, but you can check the mode pages). >> The Read Multiple command performs similarly to the Read Sectors command. >> Interrupts are not generated on every sector, but on the transfer of a block >> which contains the number of sectors defined by a Set Multiple command. > >Sorry, but for PC's BRDY of the WD1010 is tied to IRQ, and does generate >an interrupt per sector -- EVEN WITH IDE .... This discussion started out as >a IDE/SCSI from a PC point of view ... you need to keep that in mind ... since >it may not be implemented that way on your Amiga. Let me reverse that: I told you what the AT-IDE spec says. Have you read the spec? IDE (_not_ WD1010) has NO BRDY signal. When using Read Multiple, it does NOT generate more than 1 interrupt per N sectors, where N is the value set with Set Multiple. IDE has one interrupt line: INTRQ. >This is a host adapter issue, and hopefully will go away in the future as >cheap DMA hostadpters become available. If they become available. The Dos/Windows people have little reason to care about DMA, and the OS/2/Windows-NT/Unix people generally opt for SCSI, for many reasons, including tape drives, larger drives, CDROM, more devices per interface, disconnect, ability to mount things outside the CPU box (IDE is limited to 12" of cable, basically it's a straight CPU bus), etc. We used IDE for only one reason: it's _cheap_. For mid/high-end we're committed to SCSI (with one exception for reasons that have little to do with technical issues). >> Yes, write-buffering does lose some error recovery chances, especially >>if there's no higher-level knowledge of possible write-buffering so >>filesystems can insert lock-points to retain consistency. However, it can be >>a vast speed improvement. It all depends on your (the user's) needs. Some >>can easily live with it, some can't, some need raid arrays and UPS's. > >It is only a vast speed improvement on single block filesystem designs ... >any design which combines requests into a single I/O will not see such >improvement ... log structured filesystems are a good modern example. >It certainly has no such effect for my current filesystem design. If it can provide the optimization and make the design of a higher level significantly simpler, it may be a win. Also, it can help non-"single- block filesystems" - the Amiga FS writes in as large amounts as possible, but it doesn't gather separate requests. Small writes end up in filesystem buffers, and when flushed the buffers are non-contiguous, so they must be sent as separate requests (driver's don't do gather-scatter on the Amiga). Large writes will become large writes at the drive, and write-caching will be irrelevant. >For the DOS user, while the speed up may be great ... I have grave questions >reqarding data reliability when the drive fails to post an error for the >sectors involved because the spare table overflowed. I also strongly >disagree with drive based automatic remapping since an overloaded power supply >or power supply going out of regulation will create excessive soft errors >which will trigger unnecessary remapping. First, I've only seen spurious drive errors once in my entire time working with HD's (when the air conditioning went out and it hit ~110F - without AC my office roasts), and the drive quickly shut down totally. Second, as I mentioned, write-buffering needn't be hidden from higher layers. Introducing filesystem lock-points (similar in concept to trap barrier instructions on a CPU) would resolve all your worries about missed errors. I also have almost never seen errors on write, I see a vast majority of read errors (which isn't suprising given how few people use verify). >Write buffering requires automatic remapping ... A good filesystem design >should not see any benefits from write buffering, and doesn't need/want >remapping. Nor do customers want random/unpredictable performance/response >times. Again, in the desktop market, few if any people notice what you call random performance. >>> Tapes are (or will be) here, and I >>>expect CDROMS (now partly proprietary & SCSI) to be mostly IDE & SCSI >>>in the future. IDE is already extending the WD1003 interface, I expect >>>addtional drive support will follow at some point, although multiple >>>hostadapters is a minor cost issue for many systems. >> >> There are rumbles in that direction. I'm not certain it's worth >>it, or that it can be compatible enough to gain any usage. Existing users >>who need lots of devices have no reason to switch from SCSI to IDE, and >>systems vendors have few reasons to spend money on lots of interfaces >>for devices that don't exist. The reason IDE became popular is that it was >>_cheap_, and no software had to be modified to use it. > >The fact is that IDE has become the storage bus of choice for low end >systems ... and other storage vendors will follow it to reduce interface >(extra slot/adapter) costs. In laptops, IDE IS THE STORAGE BUS, >no slots for other choices. Actually, the laptop/palmtop market seems to moving towards PCMCIA slots for IO and storage. I don't disagree that in low-power and/or low- cost applications that IDE has it's place. That's why it's in our 2 lowest- end machines. However, as I said, I don't see significant movement to extend it to more drives (I hear rumbles, but see no lightning...). Certainly there's no rush to IDE tape drives or CDROM drives. IDE is only _just_ starting to try to support removable media at all (pushed by Syquest and/or Bernoulli, I think). Again, there's no indication that this is going to be a major factor. It could happen; I doubt it will. >> Sounds like the old IPI vs SCSI arguments over whether smart or dumb >>controllers are better (which is perhaps very similar to our current >>discussion, with a few caveats). > >This IS VERY MUCH LIKE that discussion ... BUT about how a seemingly good >10 year old decision has gone bad. Given the processor and drive speeds >of that era ... I also supported SCSI while actively pushing for reduced >Command Decode times. See article by Dan Jones @ Fortune Systems 1986 I >think in EDN regarding SCSI performance issues ... resulting from >the WD1000 emulating SCSI hostadapter I did for them under contract. Could you summarize for those without large libraries close at hand? >Has IPI largely died due to a lack of volume? ... SCSI proved easier to >interface, lowering base system costs .... just as IDE has. Certainly >the standardization on SCSI by Apple after my cheap DDJ published >hostadapter, was a major volume factor toward the success of SCSI >and embeded SCSI drives. The market changed so fast after the MacPlus >that DTC, XEBEC, and OMTI became has been's, even though they shaped >the entire market up to that point. Quite true. I don't follow IPI, but it seems to have faded from view at least. >>I would suggest >> (a) that's a highly contrived example, especially for >> the desktop machines that all IDE and most SCSI drives >> are designed for, > >For DOS you are completely right .... for any multitasking with a >high performance filesystem you missed the mark. You need to separate multi-tasking from multi-user. Single-user machines (and this includes most desktop Unix boxes) don't have the activity levels for the example you gave to have any relevance. It's rare that more than one or two files are being accessed in any given second or even minute. Also, in a single-user environment, average response time becomes the over- riding factor over total throughput. >> (b) both C-SCAN and most other disksorting algorithms have tradeoffs >> for the increase in total-system throughput and decrease in >> worst-case performance; FCFS actually performs quite well in >> actual use (I have some old comp.arch articles discussing this >> if people want to see them). The tradeoffs are usually in >> fairness, response time, and average performance (no straight >> disksort will help you until 3 requests are queued in the first >> place). > >While your short queue observations are quite true, your assumptions >stating this is the norm are quite different than mine, and are largely >an artifact of 1975 filesystem designs and buffering constraints. >In my world systems with 5-20 active users are common, with similar >average queue depths -- and FCFS is not an acceptable solution since it >blocks request locality resulting from steady state read-ahead service, >resulting in a significant loss of thruput (80% or more). I never said that FCFS was the best (or even acceptable) for all uses. Your "world" is not the desktop world. >The primary assumption in FCFS proponents is that all requests are unrelated >and have no bearing on the locality of future requests. In addition they >extrapolate response time fairness for a given single block globally. >In reality, from the users perspective, they judge such from how quickly >a given task completes ... and most things improve thruput, will improve >task completion times .... as long as they don't create the ability for >some process to hog resources. I think that even in larger systems, both throughput and response time are important. Throughput does you no good if you have to wait 5 seconds because some other user was loading a 10MB simulation. Again, on desktop machines throughput becomes even less relevant. If you can show something is better than FCFS in those constraints, fine. >> For most desktop machines, even if we ignore single-tasking OS's, >>probably 99.44% of the time when disk requests occur there are no other >>currently active requests. > >Any strategy that offers performance gains under load can not be >dismissed out of hand. Especially RISC systems that completely out run >the disk subsystems. If your company is happy with slow single request >single process filesystems and hardware ... so be it, but to generalize >it to the rest of the market is folly. There are better filesystem designs >that do not produce this profile on even a single user, single process >desktop box. Certainly they shouldn't be "dismissed out of hand". However, recognition of the costs and complexity involved should be there also. Filesystems are complex beasts, and that complexity can lead to high maintenance costs. Things that increase the complexity of an already complex item for performance gains that are only rarely needed for the application are suspect. Also, adding something to an already complex object can be more costly than adding it to a simple object, because the complexities interact. An example: the Amiga FS is built around coroutines (which should give a good clue as to when it's base was designed...), basically one per active filehandle plus others as needed. This alone makes it very hard to modify; I have to take time to absorb the design every time before I can work on it, because of the complexity. Anything that added significant complexity to it would be a nightmare (rewriting from scratch is not an option, though I'd love to throw it out and start again...) This makes it easier to add mapping to either the driver or the device itself. >If this single user is going to run all the requests thru the cache >anyway ... why not help it up front ... and queue a significant amount >of the I/O on open or first reference. There are a few files were this >is not true ... such as append only log files ... but there are clues >that can be taken. Why should the single-user have to run everything through a cache? I think direct-DMA works very nicely for a single-user environment, especially if you don't own stock in RAM vendors (as the current maintainers of Unix, OS/2, and WindowsNT seem to). Reserve the buffers for things that are likely to be requested again or soon - most sequential reads of files are not re-used. >>>IDE drives should support the WD1010 SCAN_ID command allowing the driver >>>locate current head position ... no such SCSI feature exists. >The key word was should .... it was part of the WD1010 chipset that formed >the WD1003WHA interface standard. Again if nobody makes use of a feature, >it is optional ... I make the arguement at it has significant value. It also has significant cost for them to support, especially since they try to present an interface that looks like an old ST-506 MFM disk to the system. 99% of them already remap, since few of them have 17 sectors per track any more, not even counting zone recording or sparing. >>change, performance/cost ratios change, etc. I also think your opinions >>come from a large, heavy-io multi-user systems background; which while >>interesting for those working with such systems is pretty irrelevant to >>the desktop systems that IDE and (most of) SCSI are designed for. > >By background and perspective run the full range, single user single process >desktops to mulituser multiprocess multiprocessor servers. Clinging to old >filesystems designs is an option, but as I outlined ... your assumptions >and conclusions are vastly in conflict with designing storage systems >capable of servicing 486/RISC desktop machines ... and certainly in >conflict with the needs of fileservers and multiuser applications engines >found in 100,000's point of sale systems managing inventory and register >scanners in each sales island -- Sears, Kmart, every supermarket, autoparts >stores and a growing number of restrants and fast food stores. Those are interesting areas, but those are not desktop markets, and those are not single-user markets. Those are large, multi-user (from the point of the server) transaction and database server systems. >Your perspective for design tradeoffs for the Amiga home market may well >be correct ... but are in conflict with the larger comercial markets that >are the focus of the technologies I have discussed here and elsewhere. I made the mistake of trying to keep the discussion relevant to the original topic. If you had been more clear that your interests and comments were directed elsewhere, there would have been less confusion. >Ignore what your system does ... what CAN IT DO if things were better? > >Again ... Performance comes by DESIGN, not tuning, selection, or >non-reflective evolution. Too much performance engineering is making do >with existing poor designs ... instead of designing how it should be. True, but one rarely gets to re-do an existing design. If you're lucky, you get to occasionally redo a part of it, or spend a bit of time smoothing out one aspect of it. Most companies can't afford to take something that is sub-optimal but functional and start again from scratch. Look at the glorified program loader called MSDOS for an example of inertia. Even if I agreed that XYZ filesystem would be a major improvement for desktop systems (such as the Amiga), I would have to weigh the many months of work it takes to write and debug a full filesystem against both it's gains AND what other things I could do in that time elsewhere. If the gains will only be seen by .1% of the users, .1% of the time, I'm not going to spend _any_ time on it. Call it the FS version of RISC philosophy - keep things simple. Sometimes good enough is all that's needed. -- To be or not to be = 0xff - Randell Jesup, Jack-of-quite-a-few-trades, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com BIX: rjesup Disclaimer: Nothing I say is anything other than my personal opinion.