home *** CD-ROM | disk | FTP | other *** search
- .\" Copyright (c) 1982 The Regents of the University of California.
- .\" All rights reserved.
- .\"
- .\" Redistribution and use in source and binary forms, with or without
- .\" modification, are permitted provided that the following conditions
- .\" are met:
- .\" 1. Redistributions of source code must retain the above copyright
- .\" notice, this list of conditions and the following disclaimer.
- .\" 2. Redistributions in binary form must reproduce the above copyright
- .\" notice, this list of conditions and the following disclaimer in the
- .\" documentation and/or other materials provided with the distribution.
- .\" 3. All advertising materials mentioning features or use of this software
- .\" must display the following acknowledgement:
- .\" This product includes software developed by the University of
- .\" California, Berkeley and its contributors.
- .\" 4. Neither the name of the University nor the names of its contributors
- .\" may be used to endorse or promote products derived from this software
- .\" without specific prior written permission.
- .\"
- .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
- .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
- .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- .\" SUCH DAMAGE.
- .\"
- .\" @(#)2.t 4.4 (Berkeley) 4/17/91
- .\"
- .ds RH Overview of the file system
- .NH
- Overview of the file system
- .PP
- The file system is discussed in detail in [Mckusick84];
- this section gives a brief overview.
- .NH 2
- Superblock
- .PP
- A file system is described by its
- .I "super-block" .
- The super-block is built when the file system is created (\c
- .I newfs (8))
- and never changes.
- The super-block
- contains the basic parameters of the file system,
- such as the number of data blocks it contains
- and a count of the maximum number of files.
- Because the super-block contains critical data,
- .I newfs
- replicates it to protect against catastrophic loss.
- The
- .I "default super block"
- always resides at a fixed offset from the beginning
- of the file system's disk partition.
- The
- .I "redundant super blocks"
- are not referenced unless a head crash
- or other hard disk error causes the default super-block
- to be unusable.
- The redundant blocks are sprinkled throughout the disk partition.
- .PP
- Within the file system are files.
- Certain files are distinguished as directories and contain collections
- of pointers to files that may themselves be directories.
- Every file has a descriptor associated with it called an
- .I "inode".
- The inode contains information describing ownership of the file,
- time stamps indicating modification and access times for the file,
- and an array of indices pointing to the data blocks for the file.
- In this section,
- we assume that the first 12 blocks
- of the file are directly referenced by values stored
- in the inode structure itself\(dg.
- .FS
- \(dgThe actual number may vary from system to system, but is usually in
- the range 5-13.
- .FE
- The inode structure may also contain references to indirect blocks
- containing further data block indices.
- In a file system with a 4096 byte block size, a singly indirect
- block contains 1024 further block addresses,
- a doubly indirect block contains 1024 addresses of further single indirect
- blocks,
- and a triply indirect block contains 1024 addresses of further doubly indirect
- blocks (the triple indirect block is never needed in practice).
- .PP
- In order to create files with up to
- 2\(ua32 bytes,
- using only two levels of indirection,
- the minimum size of a file system block is 4096 bytes.
- The size of file system blocks can be any power of two
- greater than or equal to 4096.
- The block size of the file system is maintained in the super-block,
- so it is possible for file systems of different block sizes
- to be accessible simultaneously on the same system.
- The block size must be decided when
- .I newfs
- creates the file system;
- the block size cannot be subsequently
- changed without rebuilding the file system.
- .NH 2
- Summary information
- .PP
- Associated with the super block is non replicated
- .I "summary information" .
- The summary information changes
- as the file system is modified.
- The summary information contains
- the number of blocks, fragments, inodes and directories in the file system.
- .NH 2
- Cylinder groups
- .PP
- The file system partitions the disk into one or more areas called
- .I "cylinder groups".
- A cylinder group is comprised of one or more consecutive
- cylinders on a disk.
- Each cylinder group includes inode slots for files, a
- .I "block map"
- describing available blocks in the cylinder group,
- and summary information describing the usage of data blocks
- within the cylinder group.
- A fixed number of inodes is allocated for each cylinder group
- when the file system is created.
- The current policy is to allocate one inode for each 2048
- bytes of disk space;
- this is expected to be far more inodes than will ever be needed.
- .PP
- All the cylinder group bookkeeping information could be
- placed at the beginning of each cylinder group.
- However if this approach were used,
- all the redundant information would be on the top platter.
- A single hardware failure that destroyed the top platter
- could cause the loss of all copies of the redundant super-blocks.
- Thus the cylinder group bookkeeping information
- begins at a floating offset from the beginning of the cylinder group.
- The offset for
- the
- .I "i+1" st
- cylinder group is about one track further
- from the beginning of the cylinder group
- than it was for the
- .I "i" th
- cylinder group.
- In this way,
- the redundant
- information spirals down into the pack;
- any single track, cylinder,
- or platter can be lost without losing all copies of the super-blocks.
- Except for the first cylinder group,
- the space between the beginning of the cylinder group
- and the beginning of the cylinder group information stores data.
- .NH 2
- Fragments
- .PP
- To avoid waste in storing small files,
- the file system space allocator divides a single
- file system block into one or more
- .I "fragments".
- The fragmentation of the file system is specified
- when the file system is created;
- each file system block can be optionally broken into
- 2, 4, or 8 addressable fragments.
- The lower bound on the size of these fragments is constrained
- by the disk sector size;
- typically 512 bytes is the lower bound on fragment size.
- The block map associated with each cylinder group
- records the space availability at the fragment level.
- Aligned fragments are examined
- to determine block availability.
- .PP
- On a file system with a block size of 4096 bytes
- and a fragment size of 1024 bytes,
- a file is represented by zero or more 4096 byte blocks of data,
- and possibly a single fragmented block.
- If a file system block must be fragmented to obtain
- space for a small amount of data,
- the remainder of the block is made available for allocation
- to other files.
- For example,
- consider an 11000 byte file stored on
- a 4096/1024 byte file system.
- This file uses two full size blocks and a 3072 byte fragment.
- If no fragments with at least 3072 bytes
- are available when the file is created,
- a full size block is split yielding the necessary 3072 byte
- fragment and an unused 1024 byte fragment.
- This remaining fragment can be allocated to another file, as needed.
- .NH 2
- Updates to the file system
- .PP
- Every working day hundreds of files
- are created, modified, and removed.
- Every time a file is modified,
- the operating system performs a
- series of file system updates.
- These updates, when written on disk, yield a consistent file system.
- The file system stages
- all modifications of critical information;
- modification can
- either be completed or cleanly backed out after a crash.
- Knowing the information that is first written to the file system,
- deterministic procedures can be developed to
- repair a corrupted file system.
- To understand this process,
- the order that the update
- requests were being honored must first be understood.
- .PP
- When a user program does an operation to change the file system,
- such as a
- .I write ,
- the data to be written is copied into an internal
- .I "in-core"
- buffer in the kernel.
- Normally, the disk update is handled asynchronously;
- the user process is allowed to proceed even though
- the data has not yet been written to the disk.
- The data,
- along with the inode information reflecting the change,
- is eventually written out to disk.
- The real disk write may not happen until long after the
- .I write
- system call has returned.
- Thus at any given time, the file system,
- as it resides on the disk,
- lags the state of the file system represented by the in-core information.
- .PP
- The disk information is updated to reflect the in-core information
- when the buffer is required for another use,
- when a
- .I sync (2)
- is done (at 30 second intervals) by
- .I "/etc/update" "(8),"
- or by manual operator intervention with the
- .I sync (8)
- command.
- If the system is halted without writing out the in-core information,
- the file system on the disk will be in an inconsistent state.
- .PP
- If all updates are done asynchronously, several serious
- inconsistencies can arise.
- One inconsistency is that a block may be claimed by two inodes.
- Such an inconsistency can occur when the system is halted before
- the pointer to the block in the old inode has been cleared
- in the copy of the old inode on the disk,
- and after the pointer to the block in the new inode has been written out
- to the copy of the new inode on the disk.
- Here,
- there is no deterministic method for deciding
- which inode should really claim the block.
- A similar problem can arise with a multiply claimed inode.
- .PP
- The problem with asynchronous inode updates
- can be avoided by doing all inode deallocations synchronously.
- Consequently,
- inodes and indirect blocks are written to the disk synchronously
- (\fIi.e.\fP the process blocks until the information is
- really written to disk)
- when they are being deallocated.
- Similarly inodes are kept consistent by synchronously
- deleting, adding, or changing directory entries.
- .ds RH Fixing corrupted file systems
-