NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / unix / question / 11110 < prev next >

Wrap

Text File | 1992-09-15 | 8.6 KB | 172 lines

Newsgroups: comp.unix.questions Path: sparky!uunet!decwrl!pa.dec.com!nntpd2.cxo.dec.com!nabeth!alan From: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) Subject: Re: Berkely Fast File System Message-ID: <1992Sep15.230627.16980@nntpd2.cxo.dec.com> Lines: 159 Sender: alan@nabeth (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) Reply-To: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) Organization: Digital Equipment Corporation References: <1992Sep7.085336.22530@nntp.uoregon.edu> <1992Sep9.025316.19275@nntpd2.cxo.dec.com> <BuBvL6.2MJ@gumby.ocs.com> <1992Sep12.003806.20983@nntpd2.cxo.dec.com> Date: Tue, 15 Sep 1992 23:06:27 GMT In article <1992Sep12.003806.20983@nntpd2.cxo.dec.com>, alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) writes: >Recent improvements in the implementation of the Fast File System >by Sun and by DEC in ULTRIX V4.3 make this much less of a problem. >That will be the topic of my next post in this thread. Get your ULTRIX manual pages handy and turn to section 8. Find the manual page for tunefs(8). Under the options section is the description of the -a option for setting the value of maxcontig. I believe the intent is this is that the file will go along and take user I/O requests and break them up into appropriate file system block size (or fragment size) I/O requests and pass these off to the underlying driver. It was expected that the driver would look at the requests close to notice if any requests just happened to be for contiguous blocks on the device. Rather than issue many requests it would issue one larger request with appropriate page mapping to have the data end in the right place. I'm not sure if there are any disk drivers that do this. I know we (DEC) don't have any. But, suppose you let the file system layer do this request aggregation. It can do all the necessary page mapping to make a group of data buffers look contiguous and then use a large I/O request to the underlying driver for the data. Consider a sequential read. Today, if the file system discovers that the user is going sequential reads, it will do a block (or more) of read-ahead for the next block. With support for maxcontig it can assume that the data may have been written in long contiguous blocks and check. When it finds that there are long contiguous block it can then read the current long block and start a read-ahead of the next long block. You could easily get 128 KB into the buffer in two I/O requests. On the write side, the file system can introduce a certain amount of delay between when it chooses WHERE to write a block and when it actually writes the block. If it knows that there are a group of blocks all together for a particular file, it can map the data to look contiguous (if it isn't already) and do a single large write instead of many smaller writes. Sun Microsystems added maxcontig support at least a year ago with what I hear where significant performance improvments for sequential I/O. In ULTRIX V4.3 which should be available real soon, we have also added similar improvments.á My experience with the ULTRIX support so far has been to use it to feed large I/O requests to a striping driver from the file system. The results have been pretty impressive for having five disks on one SCSI controller. The other options to tunefs(8) are less interesting, but still worth going through. The -d option sets the rotational delay. As noted in a previous post, the intent is to place gaps in the block allocation that allow time for I/O requests to get to the controller in the hopes of missing rotation misses. Depending on the value you set you can get gaps of zero, one, two and so on, file system block size blocks. For the typical MSCP disk, arrangements might look like: 8 KB +------+------+------+------+------+------+------+ 0 ms | data | data | data | data | data | data | data | +------+------+------+------+------+------+------+ +------+------+------+------+------+------+------+ 4 ms | data | | data | | data | | data | +------+------+------+------+------+------+------+ +------+------+------+------+------+------+------+ 8 ms | data | | | data | | | data | +------+------+------+------+------+------+------+ For the limited testing I've done, the only useful values seem to be 4 or 0. Some gap is probably appropriate for disks that don't do their own read ahead. For disks with read-head they will have cache one or more tracks. By not putting in rotation delays you encourge the driver to make the block contiguous, which put more of them in the cache of the data, reducing the number reads it has do. Even on disks with read-ahead, if you're willing to give up some read performance for improved write performance there might be some value in setting a rotation delay. The only way to be sure is to experiment. One thing to watch is that maxcontig and rotdelay can get in each other's way. We recommend that if you use maxconfig that you set the rotdelay to 0. What will happen if you don't is that you'll probably get a group of contiguous block seperated by a block size gap. In this example an 8 KB block is indicated by a D with the gap by a space. +-----------------------------------+ 288 KB |DDDDDDDD DDDDDDDD DDDDDDDD DDDDDDDD| +-----------------------------------+ The gap might still help read/write performance on some disks, but it helps fragment the space more quickly.á Having the space fragmented is probably not desirable. Recall from probably the first post, that the block allocation scheme is to try and keep the block of a file in the same cylinder group as the file, UNLESS the file gets too big. "To big" is determined from the "maxbpg", which can be set with the -e option.á Our default for this 256 block or 2 MB. The desired value really depends on how big the cylinder group is and how the disk is being used. If the cylinder group is much larger than 2 MB then you might be able to raise it and still leave plenty of space for other files. If the cylinder group is smaller than 2 MB you might want to lower it. If the file system is going to hold exclusively large files, you might want to make the cylinder groups be very large and set maxbpg correspondingly large to keep the individual files as contiguous as possible. To a limited extent you can use the -c option of newfs(8) to control the number of cylinders per group. If you want to make them really large, you can lie about the geometry. If your vendor supports the "Fat" version of the BSD file system, you may not need to lie. I don't recall that much about it. The last option to tunefs(8) is -m which is used to set the minfree value. I haven't experimented with the perfomance loss of having to use the slower algorithms when the file system is full. My inclination would be to change minfree only when: 1. Desperate for a little more user space. 2. Setting up an archive disk, where space is going to be allocated once up front and never or only rarely changed. For the case of the archive disk, you can get a little extra space out of it at the expense of some files being badly allocated. If the files are not frequently referenced this might not be a problem. Specific to ULTRIX, we have a -c option change the our "clean byte" timeout factor. The "clean byte" is a flag in the file system that is set when the file system is unmounted cleanly. When the file system is mounted the flag is marked so as to be dirty. Thus, if the system crashes it won't have been mounted cleanly. If it was unmounted cleanly fsck(8) will not check it, unless forcably told do so. The clean byte timeout was added so that you could force an fsck from time to time. This covers the file system tuning that can be done with tunefs(8) dependent on the vendor supporting maxcontig in some way. If anybody knows of other vendors (besides Sun and DEC) using the Fast File System that also support maxcontig, I'm interested in knowing who they are. Potential topics for discussion next are "Stupid Pet File System Tricks" or "We Don't Need No Stinking Disk Defragmenters". The first would be a discussion of using knowledge of how the file system allocates files to do additional location optimization by hand. The other discusses the whether a disk defragmenter is interesting for the Fast File System. I'll probably be able to find time again on Thursday to post the next one. Mail your suggestion before then... >For now I'm off to see "Unforgiven"... Didn't get there in time, so I saw "A League of their Own" instead... -- Alan Rollow alan@nabeth.cxo.dec.com