home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.questions
- Path: sparky!uunet!decwrl!pa.dec.com!nntpd2.cxo.dec.com!nabeth!alan
- From: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Subject: Re: Berkely Fast File System
- Message-ID: <1992Sep15.230627.16980@nntpd2.cxo.dec.com>
- Lines: 159
- Sender: alan@nabeth (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Reply-To: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Organization: Digital Equipment Corporation
- References: <1992Sep7.085336.22530@nntp.uoregon.edu> <1992Sep9.025316.19275@nntpd2.cxo.dec.com> <BuBvL6.2MJ@gumby.ocs.com> <1992Sep12.003806.20983@nntpd2.cxo.dec.com>
- Date: Tue, 15 Sep 1992 23:06:27 GMT
-
-
- In article <1992Sep12.003806.20983@nntpd2.cxo.dec.com>, alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) writes:
-
- >Recent improvements in the implementation of the Fast File System
- >by Sun and by DEC in ULTRIX V4.3 make this much less of a problem.
- >That will be the topic of my next post in this thread.
-
- Get your ULTRIX manual pages handy and turn to section 8. Find the
- manual page for tunefs(8). Under the options section is the
- description of the -a option for setting the value of maxcontig.
-
- I believe the intent is this is that the file will go along and
- take user I/O requests and break them up into appropriate file
- system block size (or fragment size) I/O requests and pass these
- off to the underlying driver. It was expected that the driver
- would look at the requests close to notice if any requests just
- happened to be for contiguous blocks on the device. Rather than
- issue many requests it would issue one larger request with
- appropriate page mapping to have the data end in the right
- place. I'm not sure if there are any disk drivers that do
- this. I know we (DEC) don't have any.
-
- But, suppose you let the file system layer do this request
- aggregation. It can do all the necessary page mapping to
- make a group of data buffers look contiguous and then use
- a large I/O request to the underlying driver for the data.
- Consider a sequential read. Today, if the file system discovers
- that the user is going sequential reads, it will do a block
- (or more) of read-ahead for the next block. With support
- for maxcontig it can assume that the data may have been
- written in long contiguous blocks and check. When it finds
- that there are long contiguous block it can then read the
- current long block and start a read-ahead of the next long
- block. You could easily get 128 KB into the buffer in two
- I/O requests.
-
- On the write side, the file system can introduce a certain
- amount of delay between when it chooses WHERE to write a block
- and when it actually writes the block. If it knows that there
- are a group of blocks all together for a particular file, it
- can map the data to look contiguous (if it isn't already) and
- do a single large write instead of many smaller writes.
-
- Sun Microsystems added maxcontig support at least a year ago
- with what I hear where significant performance improvments
- for sequential I/O. In ULTRIX V4.3 which should be available
- real soon, we have also added similar improvments.á My experience
- with the ULTRIX support so far has been to use it to feed large
- I/O requests to a striping driver from the file system. The
- results have been pretty impressive for having five disks on
- one SCSI controller.
-
- The other options to tunefs(8) are less interesting, but still
- worth going through. The -d option sets the rotational delay.
- As noted in a previous post, the intent is to place gaps in the
- block allocation that allow time for I/O requests to get to the
- controller in the hopes of missing rotation misses. Depending
- on the value you set you can get gaps of zero, one, two and
- so on, file system block size blocks. For the typical MSCP
- disk, arrangements might look like:
-
- 8 KB
- +------+------+------+------+------+------+------+
- 0 ms | data | data | data | data | data | data | data |
- +------+------+------+------+------+------+------+
-
- +------+------+------+------+------+------+------+
- 4 ms | data | | data | | data | | data |
- +------+------+------+------+------+------+------+
-
- +------+------+------+------+------+------+------+
- 8 ms | data | | | data | | | data |
- +------+------+------+------+------+------+------+
-
- For the limited testing I've done, the only useful values seem
- to be 4 or 0. Some gap is probably appropriate for disks that
- don't do their own read ahead. For disks with read-head they
- will have cache one or more tracks. By not putting in rotation
- delays you encourge the driver to make the block contiguous, which
- put more of them in the cache of the data, reducing the number reads
- it has do. Even on disks with read-ahead, if you're willing to give
- up some read performance for improved write performance there might
- be some value in setting a rotation delay. The only way to be sure
- is to experiment.
-
- One thing to watch is that maxcontig and rotdelay can get in each
- other's way. We recommend that if you use maxconfig that you set
- the rotdelay to 0. What will happen if you don't is that you'll
- probably get a group of contiguous block seperated by a block size
- gap. In this example an 8 KB block is indicated by a D with the
- gap by a space.
-
- +-----------------------------------+
- 288 KB |DDDDDDDD DDDDDDDD DDDDDDDD DDDDDDDD|
- +-----------------------------------+
-
- The gap might still help read/write performance on some disks, but
- it helps fragment the space more quickly.á Having the space fragmented
- is probably not desirable.
-
- Recall from probably the first post, that the block allocation scheme
- is to try and keep the block of a file in the same cylinder group
- as the file, UNLESS the file gets too big. "To big" is determined
- from the "maxbpg", which can be set with the -e option.á Our default
- for this 256 block or 2 MB. The desired value really depends on how
- big the cylinder group is and how the disk is being used. If the
- cylinder group is much larger than 2 MB then you might be able to
- raise it and still leave plenty of space for other files. If the
- cylinder group is smaller than 2 MB you might want to lower it.
-
- If the file system is going to hold exclusively large files, you
- might want to make the cylinder groups be very large and set maxbpg
- correspondingly large to keep the individual files as contiguous as
- possible. To a limited extent you can use the -c option of newfs(8)
- to control the number of cylinders per group. If you want to make
- them really large, you can lie about the geometry. If your vendor
- supports the "Fat" version of the BSD file system, you may not need
- to lie. I don't recall that much about it.
-
- The last option to tunefs(8) is -m which is used to set the minfree
- value. I haven't experimented with the perfomance loss of having to
- use the slower algorithms when the file system is full. My inclination
- would be to change minfree only when:
-
- 1. Desperate for a little more user space.
- 2. Setting up an archive disk, where space is going to be allocated
- once up front and never or only rarely changed.
-
- For the case of the archive disk, you can get a little extra space out
- of it at the expense of some files being badly allocated. If the files
- are not frequently referenced this might not be a problem.
-
- Specific to ULTRIX, we have a -c option change the our "clean byte"
- timeout factor. The "clean byte" is a flag in the file system that
- is set when the file system is unmounted cleanly. When the file system
- is mounted the flag is marked so as to be dirty. Thus, if the system
- crashes it won't have been mounted cleanly. If it was unmounted cleanly
- fsck(8) will not check it, unless forcably told do so. The clean byte
- timeout was added so that you could force an fsck from time to time.
-
- This covers the file system tuning that can be done with tunefs(8)
- dependent on the vendor supporting maxcontig in some way. If anybody
- knows of other vendors (besides Sun and DEC) using the Fast File System
- that also support maxcontig, I'm interested in knowing who they are.
-
- Potential topics for discussion next are "Stupid Pet File System Tricks"
- or "We Don't Need No Stinking Disk Defragmenters". The first would be
- a discussion of using knowledge of how the file system allocates files
- to do additional location optimization by hand. The other discusses
- the whether a disk defragmenter is interesting for the Fast File System.
- I'll probably be able to find time again on Thursday to post the next
- one. Mail your suggestion before then...
-
- >For now I'm off to see "Unforgiven"...
-
- Didn't get there in time, so I saw "A League of their Own" instead...
- --
- Alan Rollow alan@nabeth.cxo.dec.com
-
-