NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / benchmar / 1673 < prev next >

Wrap

Internet Message Format | 1992-11-12 | 16.8 KB

Xref: sparky comp.benchmarks:1673 comp.arch.storage:767 Newsgroups: comp.benchmarks,comp.arch.storage Path: sparky!uunet!ukma!darwin.sura.net!sgiblab!sgigate!sgi!igor!jbass From: jbass@igor.tamri.com (John Bass) Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone Message-ID: <1992Nov12.193308.20297@igor.tamri.com> Organization: TOSHIBA America MRI, South San Francisco, CA References: <1992Nov11.064154.17204@fasttech.com> <1992Nov11.210749.3953@igor.tamri.com> <36995@cbmvax.commodore.com> Date: Thu, 12 Nov 92 19:33:08 GMT Lines: 303 I've gotten a lot of wonderful mail on this series of posting, the few detractors make arguements similar to Randell's, so I will rebut his public position a little more strongly than otherwise. jesup@cbmvax.commodore.com (Randell Jesup) writes: > You're making the assumption that IDE doesn't hide that from you as >well. It does. Some (many?) current IDE drives use the cylinder/head/sector >registers as merely a convoluted way to specify a block number, use zone >recording, etc. I strongly suspect that this will continue, as the benefits >of zone recording, sector replacement, etc are too large to ignore (a number >of filesystems (most?) require that the device drivers present them with a >perfect media, no unusable blocks. This requires remapping of bad blocks in >the disk controller (SCSI or IDE) or in the device driver itself. Usually the >controller has a better chance to do a good job at this.) > ... > There's a big philosophical argument over who should deal with media >problems. The consensus seems to be that they should be pushed into the >lowest level possible (fs->driver->host controller->drive controller). Having >written both filesystems and device drivers, I must agree with that (and yes >I've had to implement bad-block mapping at the driver level, such as for IDE). I didn't assume so, in fact I bitched about drives that refuse to turn it off and not allow the driver/filesystem to deal directly with the unmapped zones. Remapping is certainly an issue to debate ... while it makes filesystems and drivers a little easier (minor development effort for a relatively complex filesystem/driver design) ... as I mentioned the cost in performance for mid-high performance systems is too high on two accounts: 1) The remapping task can be performed more quickly by any 386/486/RISC processor ... the on-drive micros are slow. 2) Substantial in-field performance anoymolies where critical data is in remapped blocks ... what if this was a bench mark evaluation purchase and the customer based a $50M order on it's performance? What about the ordinary customer who just has to live with it's poor behaviour? I don't understand why "the controller has a better chance to do a good job at this" ... my position was that remapping should be done in the filesystem so that the bad-blocks would NEVER be allocated and need to be remapped. This IS a major divergence in thought from current practice in UNIX ... not at all for DOS which has always managed bad block info in the filesystem. For WD1003 type controllers the long accepted practice was to flag each sector/track bad at low level format, so that when DOS format or SCO badtrk scanned the media it was ASSURED of finding the area bad and marking it so. From a performance point of view (the one I consistantly take) this is vastly better! I have also written filesystems and drivers, I take a STRONGLY different stand ... given the DOS filesystem handling of bad blocks, I hardly consider this a consensus to do it in the drive. ALthough many software guys would like it there to make their job simpler. I strongly prefer the ability to have the drive present vendor defects with the ID's marked as bad. Nor would I like to be the customer who has his FAT or root inode over a remapped sector. >>> Once per sector? Don't PC's use the ReadMultiple/WriteMultiple >>>commands? I guess not (which matches what I've heard elsewhere). Our IDE >>Yes, Yes ... the interrupt for WD1003/IDE interfaces means the 512 byte sector >>buffer is full, and must be emptied. R/W Multiple are used, but it requires >>handling a transfer request interrupt for each sector, or busy waiting on >>data_request in the command status register ... hence poor man's disconnect >>from the processor bus. > > I think you're confused. The CAM-ATA spec (and all the IDE drives I've >played with) says that when read/write Multiple is used (with SetMultiple), >you get 1 interrupt per N sectors. From CAM-ATA rev 2.3: > > 9.12 Read Multiple Command > > The Read Multiple command performs similarly to the Read Sectors command. > Interrupts are not generated on every sector, but on the transfer of a block > which contains the number of sectors defined by a Set Multiple command. Sorry, but for PC's BRDY of the WD1010 is tied to IRQ, and does generate an interrupt per sector -- EVEN WITH IDE .... This discussion started out as a IDE/SCSI from a PC point of view ... you need to keep that in mind ... since it may not be implemented that way on your Amiga. This is a host adapter issue, and hopefully will go away in the future as cheap DMA hostadpters become available. > Yes, write-buffering does lose some error recovery chances, especially >if there's no higher-level knowledge of possible write-buffering so >filesystems can insert lock-points to retain consistency. However, it can be >a vast speed improvement. It all depends on your (the user's) needs. Some >can easily live with it, some can't, some need raid arrays and UPS's. It is only a vast speed improvement on single block filesystem designs ... any design which combines requests into a single I/O will not see such improvement ... log structured filesystems are a good modern example. It certainly has no such effect for my current filesystem design. For the DOS user, while the speed up may be great ... I have grave questions reqarding data reliability when the drive fails to post an error for the sectors involved because the spare table overflowed. I also strongly disagree with drive based automatic remapping since an overloaded power supply or power supply going out of regulation will create excessive soft errors which will trigger unnecessary remapping. When it was demanded by the powers that be at Fortune Systems ... we put it in ... only to take it when Field Service grew tried of the bad block tables overflowing and taking a big loss on good drives being returned due to "excessive bad blocks" as the result of normal (or abnormal) soft error rates due to other factors. Write buffering requires automatic remapping ... A good filesystem design should not see any benefits from write buffering, and doesn't need/want remapping. Nor do customers want random/unpredictable performance/response times. > >> Tapes are (or will be) here, and I >>expect CDROMS (now partly proprietary & SCSI) to be mostly IDE & SCSI >>in the future. IDE is already extending the WD1003 interface, I expect >>addtional drive support will follow at some point, although multiple >>hostadapters is a minor cost issue for many systems. > > There are rumbles in that direction. I'm not certain it's worth >it, or that it can be compatible enough to gain any usage. Existing users >who need lots of devices have no reason to switch from SCSI to IDE, and >systems vendors have few reasons to spend money on lots of interfaces >for devices that don't exist. The reason IDE became popular is that it was >_cheap_, and no software had to be modified to use it. The fact is that IDE has become the storage bus of choice for low end systems ... and other storage vendors will follow it to reduce interface (extra slot/adapter) costs. In laptops, IDE IS THE STORAGE BUS, no slots for other choices. I combined both his postings into a single reply..... > Sounds like the old IPI vs SCSI arguments over whether smart or dumb >controllers are better (which is perhaps very similar to our current >discussion, with a few caveats). This IS VERY MUCH LIKE that discussion ... BUT about how a seemingly good 10 year old decision has gone bad. Given the processor and drive speeds of that era ... I also supported SCSI while actively pushing for reduced Command Decode times. See article by Dan Jones @ Fortune Systems 1986 I think in EDN regarding SCSI performance issues ... resulting from the WD1000 emulating SCSI hostadapter I did for them under contract. Has IPI largely died due to a lack of volume? ... SCSI proved easier to interface, lowering base system costs .... just as IDE has. Certainly the standardization on SCSI by Apple after my cheap DDJ published hostadapter, was a major volume factor toward the success of SCSI and embeded SCSI drives. The market changed so fast after the MacPlus that DTC, XEBEC, and OMTI became has been's, even though they shaped the entire market up to that point. >I would suggest > (a) that's a highly contrived example, especially for > the desktop machines that all IDE and most SCSI drives > are designed for, For DOS you are completely right .... for any multitasking with a high performance filesystem you missed the mark. > (b) both C-SCAN and most other disksorting algorithms have tradeoffs > for the increase in total-system throughput and decrease in > worst-case performance; FCFS actually performs quite well in > actual use (I have some old comp.arch articles discussing this > if people want to see them). The tradeoffs are usually in > fairness, response time, and average performance (no straight > disksort will help you until 3 requests are queued in the first > place). While your short queue observations are quite true, your assumptions stating this is the norm are quite different than mine, and are largely an artifact of 1975 filesystem designs and buffering constraints. In my world systems with 5-20 active users are common, with similar average queue depths -- and FCFS is not an acceptable solution since it blocks request locality resulting from steady state read-ahead service, resulting in a significant loss of thruput (80% or more). The primary assumption in FCFS proponents is that all requests are unrelated and have no bearing on the locality of future requests. In addition they extrapolate response time fairness for a given single block globally. In reality, from the users perspective, they judge such from how quickly a given task completes ... and most things improve thruput, will improve task completion times .... as long as they don't create the ability for some process to hog resources. As such, windowed CSCAN (cyl/trk) in the reverse order of file block allocation gives you the best of both worlds ... the ability to gain localized burst behaviour during readahead as long as the application has low enough cpu requirements and can process the blocks ... plus forced breaks to round robin the queue at each window boundry. I first presented this issue at the Santa Monica USENIX conference in the late 70's, and have made it a key point in numerous other performance presentations at other conferences and lectures. In addition my current filesystem design, and XENIX work I did for SCO, both attempt to read-ahead entire files of moderate length. A single block read-ahead doesn't allow for improved scheduling ... burst read-ahead does and uniformly gets the task completed quicker. > (c) Your example also depends on very fast track-to-track stepping > times AND very high locality of the requests (within your fast > stepping distance), and becomes less relevant on drives with > large numbers of cylinders and fast spin times. Or drives with steppers or dedicated servo that have a more than one head. The larger the number of heads, the bigger the gain. In practice any drive with multiple heads, or a track to track seek time less than 1/2 rev time will service a minimium of 15%-30% additional requests per second under normal UNIX timesharing loads ... with a potential of several hundred percent on fast seek and/or many headed drives. Any facility that uses a disk reorg utility that provides locality will benefit greatly .... in the case of my filesystem which does active data migration and maintains a high degree of locality, the results are VERY significant. > > For most desktop machines, even if we ignore single-tasking OS's, >probably 99.44% of the time when disk requests occur there are no other >currently active requests. Any strategy that offers performance gains under load can not be dismissed out of hand. Especially RISC systems that completely out run the disk subsystems. If your company is happy with slow single request single process filesystems and hardware ... so be it, but to generalize it to the rest of the market is folly. There are better filesystem designs that do not produce this profile on even a single user, single process desktop box. If this single user is going to run all the requests thru the cache anyway ... why not help it up front ... and queue a significant amount of the I/O on open or first reference. There are a few files were this is not true ... such as append only log files ... but there are clues that can be taken. >>In addition, if the filesystem were then presented with storing a 9K >>file it would use 12/3/4 to 12/3/12 (best fit nearest active region). >>A big win over using 10/0/[2,5,8], 10/3/[4,8], 10/4/6-7, and 11/0/8 as >>convential filesystem with a bitmap would tend to allocate. > > Any best-fit will produce better read and rewrite performance over >a first-fit, bitmapped or not, at the potential cost of increased fragmentation >and slower block allocation (especially if the bitmap or free-list isn't all >kept in memory at once). I argue that given the distribution of file sizes and life-times, my filesystem model shows otherwise ... decreased fragmentation, slight CPU cost (0.2-1.5%) with a significant reduction in disk requests -- the primary resource to optimize. Much of decreased fragmentation comes from a secondary effect of freeing contiguous files ... more continguous freespace. Allocating fragmented files .... promotes fragmented free space. Again the common wisdom from studies on resource allocators, fails to take these side effects into account .... their models don't match this usage, and it is an error to extrapolate their results to all systems/applications. >>IDE drives should support the WD1010 SCAN_ID command allowing the driver >>locate current head position ... no such SCSI feature exists. > > No such IDE feature exists. This _was_ a discussion of IDE vs. SCSI. >As far as I know, no one has even proposed such a feature to the CAM-ATA >committee, let alone implemented it. The only positional information is the >index bit in the status register (I suspect many drives just leave it 0). The key word was should .... it was part of the WD1010 chipset that formed the WD1003WHA interface standard. Again if nobody makes use of a feature, it is optional ... I make the arguement at it has significant value. >>Given 1974-1986 hardware, most of the current filesystem design issues >>were correct .... to just OK. Given 1992-1995 hardware, the same tradeoffs >>are mostly WRONG. Performance comes by DESIGN, not tuning, selection, or >>non-reflective evolution. Too much performance engineering is making do with >>existing poor designs ... instead of designing how it should be. > > While your statement is correct, I think you're guilty here also. >Just because method/hack/whatever was a good choice for V7 Unix running RPS >drives, doesn't mean that that approach is (as) effective today. Technologies >change, performance/cost ratios change, etc. I also think your opinions >come from a large, heavy-io multi-user systems background; which while >interesting for those working with such systems is pretty irrelevant to >the desktop systems that IDE and (most of) SCSI are designed for. By background and perspective run the full range, single user single process desktops to mulituser multiprocess multiprocessor servers. Clinging to old filesystems designs is an option, but as I outlined ... your assumptions and conclusions are vastly in conflict with designing storage systems capable of servicing 486/RISC desktop machines ... and certainly in conflict with the needs of fileservers and multiuser applications engines found in 100,000's point of sale systems managing inventory and register scanners in each sales island -- Sears, Kmart, every supermarket, autoparts stores and a growing number of restrants and fast food stores. Your perspective for design tradeoffs for the Amiga home market may well be correct ... but are in conflict with the larger comercial markets that are the focus of the technologies I have discussed here and elsewhere. Ignore what your system does ... what CAN IT DO if things were better? Again ... Performance comes by DESIGN, not tuning, selection, or non-reflective evolution. Too much performance engineering is making do with existing poor designs ... instead of designing how it should be.