home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.questions
- Path: sparky!uunet!spool.mu.edu!decwrl!pa.dec.com!nntpd2.cxo.dec.com!nabeth!alan
- From: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Subject: Re: Berkely Fast File System
- Message-ID: <1992Sep9.025316.19275@nntpd2.cxo.dec.com>
- Lines: 110
- Sender: alan@nabeth (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Reply-To: alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.)
- Organization: Digital Equipment Corporation
- References: <1992Sep7.085336.22530@nntp.uoregon.edu>
- Date: Wed, 9 Sep 1992 02:53:16 GMT
-
-
- In article <1992Sep7.085336.22530@nntp.uoregon.edu>, swynne@cie.uoregon.edu.uoregon.edu (Stephen Wynne) writes:
- >
- >I'm having difficulties understanding the Berkeley Fast File System.
- >I can see how fragmentation would be a problem under the old method,
- >but I'm getting lost in the details of cylinder groups, bitmaps, etc...
- >Would anyone care to start a discussion on this topic with me?
-
- Sure.
-
- The Fast File System offers a variety of solutions to a variety
- of problems not all related. Two of the more prominent problems
- solved were:
-
- A. Low bandwidth used compared to what was available.
- B. Very low locality of reference.
-
- The solution to A was straight forward, increase the block
- size. This presented the problem of wasting space on files
- that used very little of a block. Hence file fragments were
- born. This presented the additional problem of how to quickly
- find a good fragment of the right size when you needed one.
- By keeping each free block list in a table on a per cylinder
- group basis, one you had decided what cylinder group to put
- a block in, it was easy to do a pattern search for the a
- fragment that matched the one you wanted. This was aided by
- a VAX instructions that did this search very well.
-
- Problem B had lots of pieces of solutions. In the old file
- system blocks were allocated off the free list in the simplest
- way possible. When block were freed, they were put back on the
- list in the simplest way possible (I'm doing this from memory,
- if I'm wrong please correct me). As the result the free list
- got quickly scrambled.
-
- There was also another problem with the old file system. The
- arrangement of the superblock, inode table and data block was:
-
- +------------+
- | Superblock |
- +------------+
- | Inodes |
- +------------+
- | Data |
- +------------+
-
- If the data you were using happened to be at the far end of the
- disk from the inodes, any update to the inode result in a LONG
- seek. Now the invention of cylinder groups.
-
- A cylinder group resembles the old file system, with the addition
- of the cylinder group data. The file system is a collection of
- these groups of cylinders. The file allocation algorithms were
- changed so that when a new directory was created it was created
- in a cylinder with an above average amount of free space. As files
- were created in this directory the preference was to allocate them
- in the same cylinder group as the directory. As blocks were allocated
- it was preferred to allocate them in the same cylinder group as the
- file, except when the file was very large. In this way most files
- would be kept close together and if they were small enough in a
- single contiguous block. Even today a large percentage of files
- will fit in an 8 KB block.
-
- Large files would be scattered across the disk, in such a way that
- big pieces of the file would be close together, but these pieces
- might scattered completely across the disk. Further the blocks
- were allocated in such a way that they would be in a rotationally
- optimal location. One of the problems with many disk controllers
- is that two separate I/O requests for block next to each other
- will probably result in getting the first one, and having to wait
- a rotation for the 2nd block. In this case it is better to put the
- later block in a location such that the I/O has a chance to get to
- the controller before the block rotates by. The affect is very
- pronounced on DEC MSCP controllers with SDI disks. The rotational
- delay can sometimes double the bandwidth being used for sequential
- I/Os. This is not as much a problem with disks that have their own
- read-ahead caches.
-
- These block and file allocation methods greatly improved the locality
- of reference for a single user, especially if the applications used
- followed the assumptions under which the improvements were designed.
- The next problem that came up was what to do on a multi-user system.
- While one user may see most of his disk activity in one little area
- of the disk a number of user would cause the activity to be scattered
- all over the disk. The obvious solution was the to do seek ordering.
- I/O requests would be put on queues handled by the disk driver. When
- an entry was added the location would be noted so that it could be
- added with other requests for the same area of the disk.
-
- As long as many I/O requests are going to the disk, this will allow
- an through-put improvement. If there aren't many outstanding I/O
- requests, then disk probably isn't busy enough to worry about.
-
- This particular optimization was disabled for the DEC MSCP controllers
- that did their own seek ordering. I don't know of any performance
- tests to compare the two. I have heard one or two complaints from
- people that would have preferred the controller seek order feature
- be disabled and the Berkeley disksort kept.
-
- Well, I guess that's enough for now. I should probably go re-read the
- Fast File System paper and see how much I got wrong.
-
- >
- >Thanks,
- >
- >STEVE
- >
- --
- Alan Rollow alan@nabeth.cxo.dec.com
-
-