home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: pc@hillside.co.uk (Peter Collinson)
-
- An Update on UNIX-Related Standards
-
- ANSI X3B11.1: WORM File Systems
-
- USENIX Standards Watchdog Committee
- Jeffrey S. Haemer <jsh@usenix.org>, Report Editor
-
-
- March 26, 1991
-
-
- Andrew Hume <andrew@research.att.com> reports on the January 22-24, 1991
- meeting in Murray Hill, NJ:
-
- Introduction
-
- X3B11.1 is working on a standard for file interchange on write-once
- media (both sequential and non-sequential, i.e., random access): a
- portable file system for WORMs. First let me apologize for laggardly
- snitching; we have had an extra meeting (in December) to accelerate our
- progress with the draft proposal and I have been busy writing a
- programmer's guide to the draft proposal. I shall describe the results
- of the last three meetings, October (Nashua, NH), December (Murray
- Hill, NJ), and January (San Jose, CA), not in chronological order, but
- rather as a summary of where we are now. Although many details remain
- to be ironed out, we have broad agreement on the current proposal.
-
- Multi-volume file systems
-
- The draft proposal supports multi-volume file systems. To avoid the
- confusion that reigned at our meetings, I will define what this means.
- A volume is a logical address space (on some medium). Thus, a typical
- WORM disk is two volumes, as each side is addressed separately. A
- volume partition is simply a contiguous subset of a volume's address
- space. A logical volume is simply a set of (volume) partitions upon
- which a file system is recorded. Finally, a logical volume set is a
- set of volumes with a single volume set identifier. (That is, it is
- simply a publishing concept.) Note, however, that when I say file
- system, I mean a set of files and directories described by possibly
- multiple directory hierarchies (typically each would be in a different
- character set). The (logical) block size, not the physical sector
- size, is $2 sup i$ bytes, $ 9<=i<65536$, and implementations would
- have to support at least a block size of 64KB. The various size
- limits are generous; internal block addresses allow 64K volumes, 64K
- partitions per volume, and $2 sup 32$ blocks per partition.
-
- Volume Headers
-
- The location of the volume header (the analog of the superblock) is a
- tricky issue because of the requirement that systems be able to boot
- off a disk in our format and there is simply no consensus on the size
- or location of the boot area. Accordingly, pointers to the volume
- header (actually a sequence of various descriptor records) are
- recorded at one or more of 0, 16, 64, 128, 192, 256, $N - 16$, $N - 4$
- (where $N$ is the size of the disk). The seek speed (or rather the
- lack of seek speed) of WORM disks encouraged us to put these at both
- ends of the disk. The volume header record, like all the other major
- control structures, has a 16-bit CRC and a unique 8-byte tag, which
- should prevent misrecognition.
-
- Volume/Partition Structure
-
- The volume layer handles space allocation for the volume, definitions
- of partitions, and bad-block mapping. The partition layer does its own
- space allocation, supports the file system, and does partition-access
- logging. Partitions have file-system-type tags; the intent is to allow
- partition $w$ to be an X3B11.1 file system, partition $x$ to be a CDROM
- file system, partition $y$ to be an MS-DOS floppy file system and
- partition $z$ to be of unknown type. There should be a registry for
- this type field; vendors may want to register their file-system
- formats.
-
- Bad-Block Handling
-
- A simple defect-management scheme has been adopted; it is similar to
- the bad-block remapping scheme used for most SMD disks. There was
- considerable resistance to such a scheme, particularly from the
- representatives of the hardware vendors, as the (SCSI) WORM disks
- already do as much error detection/correction as is possible. However,
- defect management (above the disk driver level) is still necessary
- because
-
- 1. error correction/detection in the drive can, and for performance
- reasons often is, turned off,
-
- 2. errors can easily occur between the disk and the host's main
- memory (have you ever heard of DMA or bus errors?), and
-
- 3. even though SCSI disks present an ``error free'' interface, most
- drives have a limited number of errors they can cope with, and
- many early drives did little or no error correction.
-
- FCB Format
-
- As you may recall, multiple versions of the direct entry (the
- equivalent of the inode) are stored in a data structure called the
- file control block (FCB). The original proposal involved various
- levels of indirect blocks exactly like classic Unix file systems. We
- adopted my proposal (adapted from an observation by Dennis Ritchie)
- for a simpler, more general format that allows arbitrary structures,
- which can be specialized for different applications.
-
- Partition Access Records
-
- This is more like logging changes to the file system than a security
- thing like access control lists. The idea is to have periods of
- writing to the partition bracketed by specific control records so that
- it will be possible to tell if a system closed out that partition
- gracefully. (More bluntly, did we unmount the partition gracefully or
- did the system crash in the middle of a session?) These records are
- kept on a per- file-system basis and are recorded as variants of
- direct entries in a structure identical to FCBs. Another side issue
- is support for a so called ``stable'' record, which is analogous to
- the proposed stable sync feature of BSD Unix. (The control structures
- such as inodes and indirect blocks are written to disk but the user's
- data may not be, yet.) This peculiar state avoids the need to run fsck
- (or its equivalent) on the disk but you still have to get the user's
- data from somewhere. [Ed: does anyone really need this ``stable''
- state?]
-
- Recording Directories
-
- For performance reasons, it is proposed that directories, or rather the
- records (FIDS) identifying the files (and subdirectories) in that
- directory, be kept in optionally sorted order. This would be in binary
- and not lexicographic order (thus evading nettlesome character-set-
- collating-order issues). It is not trivial to support this but is
- probably worth it. Related to this is the issue of system areas in
- directories and FIDs. It is expected that these areas will contain
- accelerator structures, such as B-tree indices and so on. Here, and
- elsewhere in the standard, the governing principle is to allow systems
- to use such structures but to neither mandate nor standardize their
- use.
-
- Anonymous Files
-
- There are numerous FCBs, or file-like objects, that have no FID. An
- example might be a Macintosh resource fork. The question is whether to
- make these visible to the user. This is a serious issue, and one not
- confined to this standard. It is an issue for the system supporting
- access to the file system on the disk. Do we rely on this system to do
- the right thing or should we mandate a mechanism? For example, take
- the example of a Macintosh file (with its resource fork) on a system
- (say Unix) that doesn't have that concept. We can either trust that
- the vendor supplying your Unix has implemented an fcntl (or ioctl) to
- access the resource fork, or we can evade the issue completely by
- mandating that the resource fork be available for normal access by a
- reserved name such as foo.RFORK. The general feeling is that users
- will not allow a standard to reserve parts of the file name space for
- its own use. Thus, it seems likely that access would have to be via
- standardized fcntl calls, but these are outside the scope of our
- standard.
-
- Byte Order
-
- I have pressed the issue of the byte order for numeric fields. The
- previous notion was to allow the recording system to choose the byte
- order. The issue is not technical (everyone seems happy to pick just
- one and stick with it) but political. We picked LSB order: the order
- used by the low-end (and slowest) systems. We measured the performance
- degradation for low-end MSB systems (the slowest Macintosh we could
- find), and the CPU cost of straightforward C code. Interpreting the
- byte order for the worst case (a block of integer block numbers) was
- about 10ms - comparable to doing a single disk I/O and one or two
- orders of magnitude less than the cost of doing a disk seek. (Careful
- assembly code would be much faster than this.)
-
- Extended Attributes
-
- The direct entry for a file has many attributes or fields. Some of
- these will be faster to access and be stored directly in the direct
- entry. The rest will be stored in an extended attribute record area
- much like resources in a Macintosh resource fork. There are two
- issues: which attributes get faster access and how do you access the
- other attributes? The former is something the standard specifies; our
- guiding principle was to include the fields needed for a Unix stat or
- an MS-DOS (or VMS) dir command. Unfortunately, the issue of access is
- beyond the domain of our standard and needs to be addressed by POSIX,
- probably best by 1003.8. Internally within our standard, the extended
- attributes are identified by a 32-bit number, some of which are set in
- the standard and the rest by a registry maintained by some authority
- (like ANSI). The current list of extended attributes is given below;
- treat it as very preliminary and subject to change.
-
- information creation file abstract
- information modification file type
- information expiration associated file
- information effective data compression
- file creation protection
- file access application-specific data segment
- file modification implementation segment
- file backup escape sequences segment
- file expiration action history
- file attribute icon
- file effective environment type
-
- Character Sets
-
- We have adopted a somewhat simpler way of dealing with character sets
- than the CD-ROM standard (ISO 9660). The current schemes available are
- ----------------------------------------------------------------------
- | 0| 0-9A-Z . from Latin-1 (ISO 8859-1), |
- | 1| portable filename character set 0-9A-Za-z .- (POSIX 1003.1), |
- | 2| $G sub 0$ set from Latin-1, |
- | 3| all graphic characters from Latin-1, and |
- | 255| defined via escape sequences - the full scale mechanisms |
- | | of ISO 2022, which are only rarely implemented. |
- ----------------------------------------------------------------------
-
- International Activity
-
- The appropriate ISO committee (SC15) has been reconstituted with Japan
- supplying secretariat duties. A meeting is expected in July or
- September and it is hoped that there will be close cooperation between
- X3B11.1 and SC15. There is some concern that ANSI might awaken the
- long-dormant file structure committee and that this might delay
- acceptance of X3B11.1's work. Also, because of a request by a working
- group involved in the Philips CD-WO device (a combination medium that
- is a 5.25in WORM with a CD-ROM portion), ECMA might also reconstitute
- its file structure committee (TC15).
-
- Finale
-
- What can, or should, you do? As always, I welcome any feedback,
- specific or general on the work our committee does. (I must express my
- appreciation to USENIX for publishing these reports; nearly all the
- mail I have received about X3B11.1's work starts off like, ``I read
- your report in the so-and-so login;''.) In particular, I invite
- comments on any fields or attributes you would like standardized and -
- perhaps more important to the Unix community - how to access auxiliary
- information about a file in a standard way. Plenty of ad hoc
- solutions already exist for the cases of versioned files (VMS file
- systems on Ultrix systems), Macintosh files mounted as NFS file
- systems, and CD-ROM file systems. The number of these problems will
- certainly increase over time; we need to address the solutions now
- before we standardize on file system interfaces (such as 1003.8) that
- omit such mechanisms.
-
- If you would like more details on X3B11.1's work, you should contact
- either me (andrew@research.att.com, (908) 582-6262) or the committee
- chair, Ed Beshore (edb@hpgrla.hp.com). I think the two most useful
- documents are the current draft of the working paper (about 80 pages)
- and a programmer's guide to the draft (about 12 pages written by me).
- I will send you copies of the latter document; requests for other
- documents or more general inquiries about X3B11.1's work would be best
- sent to Ed Beshore.
-
- The next meeting is in North Falmouth, MA on April 23-26, 1991. Anyone
- interested in attending should contact either me or Ed Beshore.
-
-
-
- Volume-Number: Volume 23, Number 22
-
-