home *** CD-ROM | disk | FTP | other *** search
- .s2
- 3.6 I/O calls
- .es
- The system calls to do I/O are designed to eliminate
- the differences between the various devices and styles of
- access.
- There is no distinction between ``random''
- and ``sequential'' I/O, nor is any logical record size imposed
- by the system.
- The size of an ordinary file is determined
- by the highest byte written on it;
- no predetermination of the size of a file is necessary
- or possible.
- .pg
- To illustrate the essentials of I/O in \*sUNIX\*n, some of the basic calls are
- summarized below
- in an anonymous language which will
- indicate the required parameters without getting into the
- complexities of machine language programming.
- Each call to the system may potentially result in an error
- return, which for simplicity is not represented
- in the calling sequence.
- .pg
- To read or write a file assumed to exist already, it must
- be opened by the following call:
- .dc
- filep = open\|(\|name, flag\|)
- .ec
- \fIName\fR indicates the name of the file.
- An arbitrary path name may be given.
- The \fIflag\fR argument indicates whether the file is to be read, written,
- or ``updated,'' that is read and written simultaneously.
- .pg
- The returned value
- .ft I
- filep
- .ft R
- is called a
- .ft I
- file descriptor.
- .ft R
- It is a small integer used to identify the file
- in subsequent calls to read, write
- or otherwise manipulate the file.
- .pg
- To create a new file or completely rewrite an old one,
- there is a \fIcreate\fR system call which
- creates the given file if it does not exist,
- or truncates it to zero length
- if it does exist.
- \fICreate\fR also opens the new file for writing
- and, like \fIopen,\fR returns a file descriptor.
- .pg
- There are no user-visible locks in the file system, nor is there any
- restriction on the number of users who may have a file
- open for reading or writing.
- Although it is possible for the contents of a file
- to become scrambled when two users write on it simultaneously,
- in practice difficulties do not arise.
- We take the view that locks are neither
- necessary nor sufficient, in our environment,
- to prevent interference between users of the same file.
- They are unnecessary because we are not
- faced with large, single-file data bases
- maintained by independent processes.
- They are insufficient because
- locks in the ordinary sense, whereby
- one user is prevented from writing on a file which another
- user is reading,
- cannot prevent confusion
- when, for example, both users are editing
- a file with an editor which makes
- a copy of the file being edited.
- .pg
- It should be said that the system has
- sufficient internal interlocks to maintain
- the logical consistency of the file system
- when two users engage simultaneously in such inconvenient
- activities as writing on
- the same file,
- creating files in the same directory,
- or deleting each other's open files.
- .pg
- Except as indicated below, reading and writing
- are sequential.
- This means that if a particular
- byte in the file was the last byte written (or read),
- the next I/O call implicitly refers to the
- first following byte.
- For each open file there is a pointer, maintained
- by the system,
- which indicates the next byte to be read
- or written.
- If \fIn\fR bytes are read or written, the pointer advances
- by \fIn\fR bytes.
- .pg
- Once a file is open, the following calls
- may be used.
- .dc
- n = read\|(\|filep, buffer, count\|)
- .dc
- n = write\|(\|filep, buffer, count\|)
- .ec
- Up to
- .ft I
- count
- .ft R
- bytes are transmitted between the file specified
- by
- .ft I
- filep
- .ft R
- and the byte array
- specified by
- .ft I
- buffer.
- .ft R
- The returned value
- .ft I
- n
- .ft R
- is the number of bytes actually transmitted.
- In the
- .ft I
- write
- .ft R
- case,
- .ft I
- n
- .ft R
- is the same as
- .ft I
- count
- .ft R
- except under exceptional conditions like I/O errors or
- end of physical medium on special files;
- in a
- .ft I
- read,
- .ft R
- however,
- .ft I
- n
- .ft R
- may without error be less than
- .ft I
- count.
- .ft R
- If the read pointer is so near the end of the
- file that reading \fIcount\fR characters
- would cause reading beyond the end, only sufficient
- bytes are transmitted to reach the end of the
- file;
- also, typewriter-like devices
- never return more than one line of input.
- When a \fIread\fR call returns with \fIn\fR equal
- to zero, it indicates the end of the file.
- For disk files this occurs when the read pointer
- becomes equal to the current
- size of the file.
- It is possible to generate an end-of-file
- from a typewriter by use of an escape
- sequence which depends on the device used.
- .pg
- Bytes written on a file affect only those implied by
- the position of the write pointer and the
- count; no other part of the file
- is changed.
- If the last byte lies beyond the end of the file, the
- file is grown as needed.
- .pg
- To do random (direct access) I/O
- it is only necessary to move the read or write pointer
- to the appropriate location in the file.
- .dc
- location = seek\|(\|filep, offset, base\|)
- .ec
- The pointer
- associated with \fIfilep\fR is moved to a position \fIoffset\fR
- bytes from the beginning of the file, from the current position
- of the pointer, or from the end of the file,
- depending on
- .ft I
- base.
- .ft R
- .ft I
- Offset
- .ft R
- may be negative.
- For some devices (e.g. paper
- tape and
- typewriters) seek calls are
- ignored.
- The actual offset from the beginning of the file
- to which the pointer was moved is returned
- in \fIlocation\fR.
- .s2
- 3.6.1 Other I/O calls
- .es
- There are several additional system entries
- having to do with I/O and with the file
- system which will not be discussed.
- For example:
- close a file,
- get the status of a file,
- change the protection mode or the owner
- of a file,
- create a directory,
- make a link to an existing file,
- delete a file.
- .s1
- 4. Implementation of the file system
- .es
- As mentioned in \(sc3.2 above, a directory entry contains
- only a name for the associated file and a pointer to the
- file itself.
- This pointer is an integer called the
- .ft I
- i-number
- .ft
- (for index number)
- of the file.
- When the file is accessed,
- its i-number is used as an index into
- a system table (the \fIi-list\|\fR) stored in a known
- part of the device on which
- the directory resides.
- The entry thereby found (the file's
- \fIi-node\fR\|)
- contains
- the description of the file:
- .sp
- .tr |
- .in .5i
- .ti -\w'1. 'u
- 1.|its owner;
- .ti -\w'1. 'u
- 2.|its protection bits;
- .ti -\w'1. 'u
- 3.|the physical disk or tape addresses for the file contents;
- .ti -\w'1. 'u
- 4.|its size;
- .ti -\w'1. 'u
- 5.|time of last modification;
- .ti -\w'1. 'u
- 6.|the number of links to the file; that is, the number
- of times it appears in a directory;
- .ti -\w'1. 'u
- 7.|a bit indicating whether the
- file is a directory;
- .ti -\w'1. 'u
- 8.|a bit indicating whether the file is a special file;
- .ti -\w'1. 'u
- 9.|a bit indicating whether the file is ``large'' or ``small.''
- .sp
- .tr ||
- .in 0
- The purpose of an
- .ft I
- open
- .ft R
- or
- .ft I
- create
- .ft R
- system call is to turn the path name given by the user
- into an i-number
- by searching the explicitly or implicitly named directories.
- Once a file is open,
- its device, i-number, and read/write pointer are stored in a system table
- indexed by the file descriptor returned by the
- .ft I
- open
- .ft R
- or
- .ft I
- create.
- .ft R
- Thus the file descriptor supplied during a subsequent
- call to read or write the
- file
- may be easily related to the information necessary to access the file.
- .pg
- When a new file is created,
- an i-node is allocated for it and a directory entry is made
- which contains the name of the file and the i-node
- number.
- Making a link to an existing file involves
- creating a directory entry with the new name,
- copying the i-number from the original file entry,
- and incrementing the link-count field of the i-node.
- Removing (deleting) a file is done by
- decrementing the
- link-count of the i-node specified by its directory entry
- and erasing the directory entry.
- If the link-count drops to 0,
- any disk blocks in the file
- are freed and the i-node is deallocated.
- .pg
- The space on all fixed or removable disks which
- contain a file system is divided into a number of
- 512-byte
- blocks logically addressed from 0 up to a limit which
- depends on the device.
- There is space in the i-node of each file for eight device addresses.
- A \fIsmall\fR (non-special) file fits into eight or fewer
- blocks; in this case the addresses of the
- blocks themselves are stored.
- For \fIlarge\fR (non-special) files, seven of the eight device addresses may point
- to indirect blocks each containing 256 addresses
- for the data blocks of the file.
- If required,
- the eighth word is the address of a double-indirect block containing 256 more addresses
- of indirect blocks.
- Thus files may conceptually grow to (7+256)\v'-.3'.\v'.3'256\v'-.3'.\v'.3'512 bytes;
- actually they are restricted to
- 16,777,216
- (\|2\u\*t24\*n\d\|) bytes.
- Once opened, a small file (size 1 to 8 blocks)
- can be accessed directly.
- A large file (size 9 to 32768 blocks)
- requires one additional access
- to read below logical block 1792
- (7\v'-.3'.\v'.3'256)
- and two additional references above 1792.
- .pg
- The foregoing discussion applies to ordinary files.
- When an I/O request is made to a file whose i-node indicates that it
- is special,
- the last seven device address words are immaterial,
- and the first is interpreted as
- a pair of bytes which constitute an internal
- .ft I
- device name.
- .ft R
- These bytes specify
- respectively a device type
- and subdevice number.
- The device type indicates which
- system routine will deal with I/O on that device;
- the subdevice number selects, for example, a disk drive
- attached to a particular controller or one of several
- similar typewriter interfaces.
- .pg
- In this environment, the implementation of the
- .ft I
- mount
- .ft R
- system call (\(sc3.4) is quite straightforward.
- .ft I
- Mount
- .ft R
- maintains a system table whose
- argument is the i-number and device name of the
- ordinary file specified
- during the
- .ft I
- mount,
- .ft R
- and whose corresponding value is the
- device name of the indicated special file.
- This table is searched for each (i-number, device)-pair
- which turns up while a path name is being scanned
- during an
- .it open
- or
- .it create;
- if a match is found,
- the i-number is replaced by 1 (which is the i-number of the root
- directory on all file systems),
- and the device name is replaced by the table value.
- .pg
- To the user, both reading and writing of files appear to
- be synchronous and unbuffered.
- That is, immediately after
- return from a \fIread\fR call the data are available, and conversely
- after a \fIwrite\fR the user's workspace may be reused.
- In fact the system maintains a rather complicated
- buffering mechanism which reduces greatly the number
- of I/O operations required to access a file.
- Suppose a \fIwrite\fR call is made specifying transmission
- of a single byte.
- U\*sNIX\*n will search its buffers to see
- whether the affected disk block currently resides in core memory;
- if not, it will be read in from the device.
- Then the affected byte is replaced in the buffer and an
- entry is made in a list of blocks to be written.
- The return from the \fIwrite\fR call may then take place,
- although the actual I/O may not be completed until a later time.
- Conversely, if a single byte is read, the system determines
- whether the secondary storage block in which the byte is located is already
- in one of the system's buffers; if so, the byte can be returned immediately.
- If not, the block is read into a buffer and the byte picked out.
- .pg
- The system recognizes when
- a program has
- made accesses to
- sequential blocks of a file,
- and asynchronously
- pre-reads the next block.
- This significantly reduces
- the running time of most programs
- while adding little to
- system overhead.
- .pg
- A program which reads or writes files in units of 512 bytes
- has an advantage over a program which reads or writes
- a single byte at a time, but the gain is not immense;
- it comes mainly from the avoidance of system overhead.
- A program which is used rarely or which does
- no great volume of I/O may quite reasonably
- read and write in units as small as it wishes.
- .pg
- The notion of the i-list is an unusual feature
- of \*sUNIX\*n.
- In practice, this method of organizing the file system
- has proved quite reliable and easy to deal with.
- To the system itself, one of its strengths is
- the fact that each file has a short, unambiguous name
- which is related in a simple way to the protection, addressing,
- and other information needed to access the file.
- It also permits a quite simple and rapid
- algorithm for checking the consistency of a file system,
- for example verification
- that the portions of each device containing useful information
- and those free to be allocated are disjoint and together
- exhaust the space on the device.
- This algorithm is independent
- of the directory hierarchy, since it need only scan
- the linearly-organized i-list.
- At the same time the notion of the i-list induces certain
- peculiarities not found in other file system organizations.
- For example, there is the question of who is to be charged
- for the space a file occupies,
- since all directory entries for a file have equal status.
- Charging the owner of a file is unfair in general,
- since one user may create a file, another may link to
- it, and the first user may delete the file.
- The first user is still the owner of the
- file, but it should be charged
- to the second user.
- The simplest reasonably fair algorithm
- seems to be to spread the charges
- equally among users who have links to a file.
- The current version of \*sUNIX\*n avoids the
- issue by not charging any fees at all.
- .s2
- 4.1 Efficiency of the file system
- .es
- To provide an indication of the overall efficiency
- of \*sUNIX\*n and of the file system in particular,
- timings were made of the assembly of
- a 8848-line program.
- The assembly was run alone on the machine;
- the total clock time was 32 seconds,
- for a rate of 276 lines per second.
- The time was divided as follows:
- 66% assembler execution time,
- 21% system overhead,
- 13% disk wait time.
- We will not attempt any interpretation of these figures
- nor any comparison with other systems, but merely
- note that we are
- generally satisfied with the overall performance
- of the system.
-