home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.unix.bsd:4320 comp.sys.ibm.pc.hardware:22073
- Newsgroups: comp.unix.bsd,comp.sys.ibm.pc.hardware
- Path: sparky!uunet!mcsun!sunic!aun.uninett.no!barsoom!barsoom!tih
- From: tih@barsoom.nhh.no (Tom Ivar Helbekkmo)
- Subject: Re: Disklabeling (was Re: Another Adaptec Question)
- Message-ID: <tih.714115743@barsoom>
- Sender: news@barsoom.nhh.no (USENET News System)
- Organization: Norwegian School of Economics
- References: <1992Aug16.144341.24052@Informatik.TU-Muenchen.DE> <15776@star.cs.vu.nl> <1992Aug17.173807.2309@Informatik.TU-Muenchen.DE> <#79m_a.alm@netcom.com>
- Distribution: world,fj,spec
- Date: Tue, 18 Aug 1992 05:29:03 GMT
- Lines: 133
-
- alm@netcom.com (Andrew Moore) writes:
-
- >To return to 386BSD and disklabeling a SCSI drive:
- >am I correct in assuming that the specification of # of cylinders,
- >sectors per cylinder, etc does not matter at all, so long as the total
- >number of blocks is correct?
-
- No, you're not... It *does* matter to the file system. Although the
- following was written while I was trying to figure out how to set up
- SCSI disks correctly for Ultrix, it's relevant to any system using
- the Berkeley Fast File System. This is a summary I posted to the net
- in comp.unix.ultrix a while back:
-
- I recently asked how to correctly set up disk partitions for the SCSI
- disks connected to our DECstations, specifying some of the problems I
- had understanding what was right and wrong. I've had several very
- interesting responses, and feel that I've learned quite a bit of
- useful stuff here... Thanks go to Klaus Steinberger, Walter Wong,
- Mike Mitchell, and especially to Stefan Esser, who took the time to
- explain a lot of details to me in our email correspondence.
-
- Anyway -- to recap my situation, I wanted to make sure I partitioned
- my disks so partition boundaries were placed at cylinder boundaries,
- and their sizes worked out properly to an integral number of cylinder
- groups, 16 cylinders per group being the default number. Looking at
- the /etc/disktab entries for the disks helped me little, since DEC
- obviously hadn't cared about this in their setup, and multiplying
- sectors/track by tracks/cylinder by cylinders didn't work out to the
- specified total number of sectors on the disks anyway!
-
- Well, it turns out that the situation is more complex than that...
-
- The BSD Fast File System uses certain heuristics to allocate disk
- blocks within a partition. Some of these are supposed to increase
- data security (against accidental loss), some to make file access more
- efficient. For instance:
-
- - Groups of inodes are allocated in each cylinder group in the
- partition, and attempts are made to keep file data blocks near the
- inodes describing them. (Efficiency)
-
- - Each cylinder group has a redundant copy of the superblock, which is
- staggered by one track per cylinder group, to keep them on different
- platters. Inodes follow the superblock copy, to stagger those as
- well. (Security)
-
- - File data blocks are allocated for rotational contiguity. The
- optimal block is not necessarily the one following the previous one;
- if the system is not fast enough to schedule a new disk transfer in
- time, a "rotationally later" block is selected. If the optimal block
- on the disk is already taken, the same block (or one as closely
- following it as possible) on another track in the same cylinder is
- attempted allocated instead, and so on. (Efficiency)
-
- There's more to it than this, of course, but note the assumptions
- being made here: The file system needs to know the correct disk
- geometry; cylinders, heads, and sectors per track. The product of the
- last two of these must be the correct number of sectors per cylinder.
- It must also know the rotational speed of the disk, and it assumes
- that sectors within tracks are numbered in parallell, so that sector 0
- of each track in a cylinder passes the read/write head at the same
- time.
-
- Guess what? This doesn't hold true for SCSI disks! These disks tend
- to do quite a bit of optimization of their own, behind the file
- system's back... For instance:
-
- - Tracks are usually rotationally staggered, to optimize the time to
- sequentially get from the last sector of one track to the first sector
- of the (logically) next one. This counteracts the rotational delay
- optimization in the file system.
-
- - Spare sectors (for bad block replacement) are usually allocated on
- a per-cylinder basis. This is a good strategy for optimal disk
- utilization and effective relocation, but it breaks the file
- system's calculation of where cylinder boundaries are, since heads
- multiplied by sectors per track does not equal (usable) sectors
- per cylinder.
-
- - Large SCSI disks tend to use zone bit recording, which means that
- there are more sectors per track on the outer tracks than on the
- inner ones. Then they lie to the file system about geometry, giving
- it something that works out close to the correct size of the disk.
- Again, this ruins the file system's attempt to intelligently use
- cylinder boundary information, which is guaranteed to be wrong.
-
- So, what do you do if you want optimal performance from a SCSI disk?
- Well, as long as the disk does not do zone bit recording, there may
- be hope. SCSI disks can be reparameterized and reformatted. However,
- the number of parameters that you can change varies from disk to disk.
- (See the man page entry on 'rzdisk' for more information on how to
- examine and change these parameters.)
-
- - If you can set track skew and cylinder skew parameters to zero, thus
- reorienting the geometry of the disk to what the file system expects,
- you can get the timing calculations to work.
-
- - If you can make the disk allocate spare sectors on a per-track basis,
- you can make the cylinder boundary calculations work right, by
- using, say, one spare per track, and telling the file system that
- the disk has one less sector per track than it really does. (The
- file system doesn't know about spares; it counts usable sectors.)
- This means that tracks with more than one fault will be reallocated
- to the spare cylinders you reserve at the end of the disk, but that
- can't very well be helped.
-
- - If spare sectors can only be allocated on a per-cylinder basis, a
- hack is still possible! According to Stefan Esser, you can specify
- (through /etc/disktab and/or mkfs) that the disk has only one head,
- with a rather large number of sectors per track (the number of
- actually usable sectors per real cylinder). He notes, however, that
- a patch to the ufs_alloc() function in the file system is necessary,
- because, as shipped from DEC, it can't handle this large number of
- sectors per track.
-
- It would seem, then, that the correct choice, when you need high disk
- throughput, is to get a disk that does not do zone bit recording, and
- that can be reparameterized to use a non-staggered layout with a spare
- sector per track. This will normally mean smaller disks, and thus an
- increased number of drives to achieve the same storage space -- which
- isn't too bad anyway if you're really into speed; e.g. two optimized
- drives on each of two SCSI controllers should be much better than two
- non-optimized, bigger drives on one controller.
-
- I expect, though, that future versions of the BSD Fast File System
- will have knowledge of SCSI disks, and how to use them effectively.
- I understand that Sun has already made such changes, resulting in
- noticeable improvements.
-
- -tih
- --
- Tom Ivar Helbekkmo, NHH, Bergen, Norway. Telephone: +47-5-959205
- Postmaster for domain nhh.no. Internet mail: tih@barsoom.nhh.no
-