home *** CD-ROM | disk | FTP | other *** search
- <?xml version="1.0" encoding="ISO-8859-1"?>
- <article id="paper-294">
- <articleinfo>
- <title>XFS for Linux</title>
- <author>
- <firstname>Christoph</firstname>
- <surname>Hellwig</surname>
- </author>
- <author>
- <firstname>Russell</firstname>
- <surname>Cattelan</surname>
- </author>
- <author>
- <firstname>Steve</firstname>
- <surname>Lord</surname>
- </author>
- <author>
- <firstname>Jim</firstname>
- <surname>Mostek</surname>
- </author>
- <copyright>
- <year>2003</year>
- <holder>SGI Inc.</holder>
- </copyright>
- </articleinfo>
-
- <section>
- <title>
- Abstract
- </title>
-
- <para>
- The limitations of traditional file systems were becoming evident as new application demands arose and
- larger hardware became available for SGI's MIPS-based systems and later for Intel-based Linux systems.
- This paper describes advanced and uniqueue features of
- SGI's XFS filesystem for IRIX and Linux, lessons learned during development and porting of XFS
- and explains the differences beteen the IRIX and Linux implementations of XFS.
- </para>
-
- </section>
- <section>
- <title>
- Introduction to XFS
- </title>
-
- <para>
- In the early 1990s, SGI realized its existing file system, EFS (Extent File System) would be
- inadequate to support the new application demands arising from the increased disk capacity,
- bandwidth, and parallelism available on its systems. Applications in film and video, supercomputing,
- and huge databases all required performance and capacities beyond what EFS, with a design similar
- to the Berkeley Fast File System, could provide. EFS limitations were similar to those found in
- Linux file systems until a few years ago: small file system sizes (8 gigabytes), small file sizes
- (2 gigabytes), statically allocated metadata, and slow recovery times using fsck.
-
- To address these issues in EFS, in 1994 SGI released an advanced, journaled file system on IRIX;
- this file system was called XFS. Since that time, XFS has proven itself in production as a fast,
- highly scalable file system suitable for computer systems ranging from the desktop to supercomputers.
-
- To help address these same issues in Linux SGI has made XFS technology available for Linux
- under the GNU General Public License (GPL).
- </para>
-
- <section>
- <title>
- Features
- </title>
-
- <para>
- XFS uses B+ trees extensively in place of traditional linear file system structures. B+ trees
- provide an efficient indexing method that is used to rapidly locate free space, to index directory
- entries, to manage file extents, and to keep track of the locations of file index information
- within the file system. XFS is a fully 64-bit file system. Most of the global counters in the
- system are 64-bits in length, as are the addresses used for each disk block and the unique
- number assigned to each file (the inode number). A single file system can theoretically be as
- large as 18 million terabytes. The file system is partitioned into regions called Allocation Groups
- (AG). Like UFS cylinder groups, each AG manages its own free space and inodes. The primary purpose
- of Allocation Groups is to provide scalability and parallelism within the file system. This
- partitioning also limits the size of the structures needed to track this information and allows
- the internal pointers to be 32-bits. AGs typically range in size from 0.5 to 4GB. Files and
- directories are not limited to allocating space within a single AG.
- </para>
- <para>
- XFS has a variety of sophisticated support utilities to enhance its usability. These include fast
- mkfs (make a file system), dump and restore utilities for backup, xfsdb (XFS debug), xfscheck
- (XFS check), and xfsrepair to perform file system checking and repairing. The xfs fsr utility
- defragments existing XFS file systems. The xfs bmap utility can be used to interpret the metadata
- layouts for an XFS file system. The growfs utility allows XFS file systems to be enlarged on-line.
- </para>
-
- </section>
- <section>
- <title>
- Architecture
- </title>
-
- <para>
- The high level structure of XFS is similar to a conventional file system with the addition of a
- transaction manager. XFS supports all of the standard Unix file interfaces and is entirely POSIX
- and XPG4compliant. It sits below the vnode interface in the IRIX kernel and takes full advantage
- of services provided by the kernel, including the buffer/page cache, the directory name lookup cache,
- and the dynamic vnode cache.
- </para>
- <para>
- XFS is modularized into several parts, each of which is responsible for a separate piece of the file
- system's functionality. The central and most important piece of the file system is the space manager.
- This module manages the file system free space, the allocation of inodes, and the allocation of space
- within individual files. The I/O manager is responsible for satisfying file I/O requests and depends
- on the space manager for allocating and keeping track of space for files. The directory manager
- implements the XFS file system name space. The buffer cache is used by all of these pieces to cache
- the contents of frequently accessed blocks from the underlying volume in memory. It is an integrated
- page and file cache shared by all file systems in the kernel. The transaction manager is used by the
- other pieces of the file system to make all updates to the metadata of the file system atomic. This
- enables the quick recovery of the file system after a crash. While the XFS implementation is modular,
- it is also large and complex. The current implementation is over 110,000 lines of C code (not including
- the buffer cache or user-level XFS utilities); in contrast, the EFS implementation is approximately
- 12,000 lines.
- </para>
- </section>
- <section>
- <title>
- Journaling
- </title>
- <para>
- XFS journals metadata updates by first writing them to an in-core log buffer, then asynchronously
- writing log buffers to the on-disk log. The on-disk log is a circular buffer: new log entries are
- written to the head of the log, and old log entries are removed from the tail once the inplace
- metadata updates occur. After a crash, the on-disk log is read by the recovery code which is called
- during a mount operation. XFS metadata modifications use transactions: create, remove, link, unlink,
- allocate, truncate, and rename operations all require transactions. This means the operation,
- from the standpoint of the file system on-disk metadata, either never starts or always completes.
- These operations are never partially completed on-disk: they either happened or they didn't.
- Transactional semantics are required for databases, but until recently have not been considered
- necessary for file systems. This is likely to change, as huge disks and file systems require the
- fast recovery and good performance journaling can provide. An important aspect of journaling is
- write-ahead logging: metadata objects are pinned in kernel memory while the transaction is being
- committed to the on-disk log. The metadata is unpinned once the in-core log has been written to
- the on-disk log. Note that multiple transactions may be in each in-core log buffer. Multiple
- in-core log buffers allow for transactions when another buffer is being written. Each transaction
- requires space reservation from the log system (i.e., the maximum number of blocks this transaction
- may need to write.) All metadata objects modified by an operation, e.g., create, must be contained
- in one transaction.
- </para>
- </section>
- </section>
-
- <section>
- <title>
- Porting XFS to Linux
- </title>
-
- <section>
- <title>
- The vnode/vfs interface in IRIX
- </title>
-
- <para>
- The vnode/vfs file system interface was developed in the mid-80s to allow the UNIX
- kernel to support multiple file systems simultaneously. Up to that time, UNIX kernels typically
- supported a single file system that was bolted directly into the kernel internals. With the
- advent of local area networks in the mid-80s, file sharing across networks became possible,
- and it was necessary to allow multiple file system types to be installed into the kernel.
- The vnode/vfs interface separates the file-systemindependent vnode from the file-system-dependent
- inode. This separation allows new file systems to re-use existing file-system-independent code,
- and, at least in theory, to be developed indepently of the internal kernel data structures.
- IRIX and XFS use the following major structures to interface between the file system and the
- rest of the IRIX OS components:
- </para>
-
- <itemizedlist>
- <listitem><para>
- vfs ュ Virtual File System structure.
- </para></listitem>
- <listitem><para>
- vnode ュ Virtual node (as opposed to inode)
- </para></listitem>
- <listitem><para>
- bhv desc ュ behaviors are used for file system stacking
- </para></listitem>
- <listitem><para>
- buf ュ used as an interface to store data in memory (to and from disk)
- </para></listitem>
- <listitem><para>
- xfs mount ュ top-level per XFS file system structure
- </para></listitem>
- <listitem><para>
- xfs inode ュ top-level per XFS file structure.
- </para></listitem>
- </itemizedlist>
-
- <para>
- The vnode structure points at the first behavior in the chain of file systems handling the file
- associated with this vnode. The behavior also points to the function vector, xfs vnodeops, which contains all the
- file-system-specific routines at the file level. In IRIX, the vnodeops contains more than 57
- routines which can be invoked on a "file". These routines cover many functions such as create, remove,
- read, write, open, close, and others.
- </para>
-
- </section>
- <section>
- <title>
- Mapping the vnode/vfs interface to Linux
- </title>
-
- <para>
- Changing XFS to fit directly into Linux VFS interface would require significant changes to a
- large portion of the XFS codebase. The current source code organization would need to be
- significantly changed and code charing between the IRIX and Linux versions of XFS would
- become much more difficult. The alternative is to integrate the vnode and vfs object as
- private file-system-dependent data in the struct inode and struct super block data in Linux.
- This approach introduces a translation layer between the XFS code and the Linux VFS interface
- which translate Linux VFS calls into the equivalent XFS vnode operations. The XFS vnode itself
- is attached to the private data area of the Linux inode, while the XFS vfs object is attached
- to the private data area of the Linux superblock.
- </para>
-
- <para>
- In the initial Linux port of XFS the vnode and vfs operations remained almost unchanged from
- IRIX and the translation layer, called linvfs, had to a certain amount of argument and
- semantics remapping. For example in IRIX the read/write entry points use the uio structures
- that allows to pass multiple I/O requests in one call to the filesystems while the Linux
- read/write entry points use a much simpler scheme with one I/O request per call. In later
- revisions of XFS for Linux some vnode/vfs operations where changed to fit the Linux model
- better. For example the lookup, create and rename VOPs now pass down the Linux dentry
- structure that describe a directory entry in all it's details insyead of just the name
- as in IRIX. This allows getting rid of superflous internal lookups calls and access/race
- checks already handled in the generic Linux code. A result of these changes is that more than
- 2,000 lines of code that were required on IRIX could be removed from the Linux XFS.
- </para>
-
- <para>
- Another example are the already mentioned read/write entry points that got simplified
- to match their Linux counterparts.
- </para>
-
- </section>
- <section>
- <title>
- fcntl Versus ioctl in IRIX and Linux
- </title>
-
- <para>
- In IRIX, XFS supports approximately 20 special fcntl interfaces used for space pre-allocation,
- extent retrieval, extended file information, etc. In addition, IRIX XFS has about a dozen
- special system call interfaces, all implemented via the special syssgi system call.
- These interfaces are used for operations such as growing the file system or retrieving internal
- file system information.
- </para>
-
- <para>
- The Linux file system interface has no fcntl operation. The only supported fcntl calls on Linux
- are file locking calls. We proposed to the Linux community that a fnctl file operation be added.
- After extensive discussion, it was decided to use the existing ioctl operation, linvfs_ioctl,
- and all of the fcntl and syssgi usages have been converted into ioctls. A shortcoming to the
- ioctl approach is in the semantics of an ioctl to block or character special devices which
- reside within the file system: In these cases, the device driver's ioctl routine will be used
- rather than the file system's. Outside of that, porting the fcntl and syssgi interfaces to
- ioctl's has been straightforward.
- </para>
-
- </section>
- <section>
- <title>
- IRIX XFS creds and Linux
- </title>
-
- <para>
- In older UNIX systems, the file system code used the current process's data structures to
- determine the user's credentials such as uid, gid, capabilities, etc. The VFS/vnode interface
- removed this code and introduced a cred structure which is passed to certain file system
- operations such as create and lookup. The file system uses this information to determine
- permissions and ownership.
-
- </para>
- <para>
-
- XFS was written using the VOP/vnode interface, so it regularly uses cred structures. One of
- the more prevalent cred usages on IRIX XFS is get_current_cred, which returns this structure
- for the current process.
-
- </para>
- <para>
-
- Linux is similar to older UNIX implementations in that file systems are expected to look
- directly at the task structure to determine the current process's credentials. Linux does not
- utilize a cred structure.
-
- </para>
- <para>
-
- In porting XFS to Linux, we first attempted to map the various cred fields onto the
- corresponding task fields. This had the undesired side-effect of producing code that
- utilized a cred pointer that in actuality was pointing at a task. This was determined
- to be unacceptable.
-
- </para>
- <para>
-
- We then considered implementing a complete cred infrastructure, which would include a pool
- of active creds, cred setup, teardown, lookup, etc. It was determined that this would require
- too much overhead.
-
- </para>
- <para>
-
- In looking at the Linux code, we saw that all of the access/permission work occurs above
- the file system dependent code, so having a cred is important only on creation. We then
- examined our own internal usage of cred fields in XFS, and found that more often than not,
- a cred was passed down through a VOP_, and never used. The few places that did use a
- cred field were changed to use the current task structure in Linux.
-
- </para>
- <para>
-
- Early versions of the XFS Linux port still passed a cred address on the VOPs, but we changed
- the linvfs later to always pass NULL into the cred arguments.
-
- </para>
- <para>
-
- In addition to these cred changes, we have removed many access checks from the XFS code
- since these are now performed at a higher layer and are redundant in the file system
- dependent code
-
- </para>
- </section>
- </section>
-
- <section>
- <title>
- XFS Caching and I/O
- </title>
-
- <para>
- When XFS was first implemented within IRIX, the buffer cache was enhanced in a number
- of ways to better support XFS, both for better file I/O performance and for better
- journaling performance. The IRIX implementation of XFS depends on this buffer cache
- functionality for several key facilities.
- </para>
- <para>
- First, the buffer cache allows XFS to store file data which has been written by an
- application without first allocating space on disk. The routines which flush delayed
- writes are prepared to call back into XFS, when necessary, to get XFS to assign disk
- addresses for such blocks when it is time to flush the blocks to disk. Since delayed
- allocation means that XFS can determine if a large number of blocks have been written
- before it allocates space, XFS is able to allocate large extents for large files,
- without having to reallocate or fragment storage when writing small files. This facility
- allows XFS to optimize transfer sizes for writes, so that writes can proceed at close
- to the maximum speed of the disk, even if the application does its write operations
- in small blocks. In addition, if a file is removed and its written data is still in
- delayed allocation extents, the data can be discarded without ever allocating disk space.
- </para>
- <para>
- Second, the buffer cache provides a reservation scheme, so that blocks with delayed
- allocation will not result in deadlock. If too much of the available memory is used for
- delayed allocation, a deadlock on the memory occurs when trying to do conversion from
- delayed to real allocations. The deadlock can occur since the conversion requires
- metadata reads and writes which need available memory.
- </para>
- <para>
- Third, the buffer cache and the interface to disk drivers support the use of a single
- buffer object to refer to as much as an entire disk extent, even if the extent is
- very large and the buffered pages in memory are not contiguous. This is important
- for high performance, since allocating, initializing, and processing a control block
- for each disk block in, for example, a 7 MB HDTV video frame, would represent a large
- amount of processor overhead, particularly when one considers the cost of cache misses
- on modern processors. XFS has been able to deliver 7 GB/second from a single file on an
- SGI Origin 2000 system, so the overhead of processing millions of control blocks per
- second is of practical significance.
- </para>
- <para>
- Fourth, the buffer cache supports "pinning" buffered storage in memory, which means
- that the affected buffers will not be written to disk until they have been "unpinned".
- XFS uses a write-ahead log protocol for metadata writes, which means XFS writes a log entry
- containing the desired after-image before updating the actual on disk metadata. On recovery,
- XFS just applies after-images from the log (in case some of the metadata writes were not
- completed). In order to avoid having to force the log before updating metadata, XFS "pins"
- modified metadata pages in memory. Such pages must count against the memory reservation
- (just as do delayed allocation pages). XFS pins a metadata page before updating it, logs
- the updates, and then unpins the page when the relevant log entries have been written to
- disk. Since the log is usually written lazily, this in effect provides group commit
- of metadata updates.
- </para>
-
- <section>
- <title>
- The pagebuf Module
- </title>
-
- <para>
- Our approach to porting XFS has included adding pagebuf, a layered buffer cache module on
- top of the Linux page cache. This allows XFS to act on extent-sized aggregates. Key to this
- approach is the pagebuf structure, which is the major structure of the pagebuf layer. The
- pagebuf objects implemented by this module include a list of physical pages associated with
- the pagebuf, plus the device information needed to perform I/O.
- </para>
- <para>
- In earlier versions of XFS for Linux we were experimenting with a new device request interface,
- so that we can queue one of these pagebuf objects directly to a device, rather than having
- to create and queue a large number of single-block buffer_head objects for each logical
- I/O request. These extensions have been superceeded by the Linux 2.5 block layer rewrite
- that allows the submission of multi-page bio requests, current XFS versions for Linux 2.4
- create and queue buffer_head objects to perform pagebuf I/O.
- </para>
- <para>
- A key goal for the layered buffer cache module is that its objects be strictly temporary,
- so that they are discarded when released by the file system, with all persistent data held
- purely in the page cache. This avoids creating yet another class of permanent system object,
- with separate locking and resource management issues. The IRIX buffer cache implementation
- has about 11000 lines of very complex code. By relying purely on the page cache for buffering,
- we avoid most of the complexity, particularly in regard to locking and resource management,
- of hybrid page and buffer caches, at the cost of having to pay careful attention to efficient
- algorithms for assembling large buffers from pages.
- </para>
- </section>
- <section>
- <title>
- Delayed Allocation of Disk Space for Cached Writes
- </title>
-
- <para>
- Allocating space when appending to a file slows down writes, since reliable metadata updates
- (to record extent allocations) result in extra writes. Also, incremental allocations can
- produce too-small extents, if new extents are allocated each time a small amount of data is
- appended to a file (as when many processes append to a log file). Delayed allocation reserves
- disk space for a write but does not allocate any particular space; it simply buffers the write
- in the page cache. Later, when pages are flushed to disk, the page writing path must ask the file
- system to do the actual allocation. Also, to allow for optimal extent allocations and optimal write
- performance, the page writing path must collect adjacent dirty pages ("cluster" them) and write
- them as a unit.
- </para>
- <para>
- Since allocation of disk space may be required in the page writing path when delayed allocation
- is present, and such allocation may require the use of temporary storage for metadata I/O operations,
- some provision must be made to avoid memory deadlocks. The delayed allocation path for writes must
- make use of a main memory reservation system, which will limit the aggregate amount of memory used
- for dirty pages for which disk space has not been allocated, so that there will always be some
- minimum amount of space free to allow allocations to proceed. Any other non-preemptible memory
- allocations, such as kernel working storage pages, must be counted against the reservation limit,
- so that the remaining memory is genuinely available.
- </para>
- </section>
-
- <section>
- <title>
- File I/O
- </title>
- <para>
- Early versions of XFS for Linux used an I/O path different from the normal Linux filesystems,
- basically a stripped down version of the IRIX XFS I/O code sitting ontop of the pagebuf
- layer. In the XFS/Linux 1.2 release the buffered I/O path has been completly rewritten
- to use the generic Linux I/O path as much as possible but still providing XFS-uniqueue features
- such as delayed allocated writes and clustered writeout.
- </para>
- <para>
- XFS is now using the generic read/write entry points for pagecache-based filesystems
- (generic_file_read/generic_file_write) but wraps them with XFS-specific functionality
- such as dealing with DMAPI callbacks and XFS-specific locking. This means all hard work
- for file I/O is done by the address_space operations where XFS against uses the generic
- versions for the readpage (read pages into memory), prepare_write and commit_write (copy
- file data into the pagecache and mark it for flushing) operations. The writepage
- operations that is responsible for flushing pagecache data to disk is the heart of the
- XFS I/O path and completly different from the generic code to handle delayed allocated writes
- and clustered writeout.
- </para>
-
- <para>
- A problem in Linux 2.4 is that the buffer layer directly writes out the buffers on
- it's LRU list to disk without any filesystem interaction, which makes the delayed disk
- block allocation and clustered writeout features of XFS impossible.
- </para>
- <para>
- To address this issue the Linux buffer layer has been modified to call back into the
- filesystems writepage method for a buffer marked for delayed allocation instead of directly
- writing it out. These changes are localized to one file (fs/buffer.c) and provide XFS-compatible
- semantics with minimal intrusion.
- </para>
- <para>
- Linux 2.5 already performs all pagecache writeout through the filesystems writepage
- (or writepages) entry points so no modification was nessecary.
- </para>
- </section>
- <section>
- <title>
- Direct I/O
- </title>
-
- <para>
- Small files which are frequently referenced are best kept in cache. Huge files, such as image and
- streaming media files and scientific data files, are best not cached, since blocks will always be
- replaced before being reused. Direct I/O is raw I/O for files: I/O directly to or from user buffers,
- with no data copying. The page cache must cooperate with direct I/O, so that any pages, which are
- cached and are modified, are read from memory, and so that writes update cached pages.
- </para>
- <para>
- Direct I/O and raw I/O avoid copying, by addressing user pages directly. The application promises
- not to change the buffer during a write. The physical pages are locked in place for the duration of
- the I/O, via Linux kernel methods (kiobufs in 2.4, get_user_pages in 2.5).
- </para>
- <para>
- Any dirty pages in the page cache must be flushed to disk before issuing direct I/O. The normal
- case will find no pages in the cache, and this can be efficiently tested by checking the inode.
- Once the pagebuf is assembled, the I/O path is largely common with the normal file I/O path,
- except that the write is never delayed and allocation is never delayed.
- </para>
- <para>
- Direct I/O is indicated at open( ) time by using the O_DIRECT flag. Usually the needed space
- for the file is pre-allocated using an XFS ioctl call to insure maximum performance.
- </para>
- <para>
- Unlike other Linux filesystems XFS allows multiple O_DIRECT writes to the same inode happen
- in parallel.
- </para>
- </section>
- </section>
-
- <section>
- <title>
- Volume Management Layers
- </title>
-
- <para>
- The integration of existing Linux volume managers with the XFS file system has created some
- issues for the XFS port to Linux.
- </para>
- <para>
- Traditional Linux file systems have been written to account for the requirements of the block
- device interface, ll_rw_block( ). ll_rw_block accepts a list of fixed size I/O requests. For
- any given block device on a system, the basic unit of I/O operation is set when the device is
- opened. This size is then a fixed length of I/O for that device. The current implementations
- of Linux volume managers have keyed off this fixed size I/O and utilize an I/O dispatcher algorithm.
- </para>
- <para>
- By using a fixed I/O length, the amount of "math" that is needed is significantly less than what
- it would be if the I/O length were not fixed. All I/O requests from a file system will be of the
- same size, as both metadata and user data is of fixed size. Therefore, all underlying devices of
- a logical volume must accept I/O requests of the same size. All that the volume manager needs to
- do for any I/O request is to determine which device in the logical volume the I/O should go to
- and recalculate the start block of the new device. Each I/O request is directed wholly to a new
- device.
- </para>
- <para>
- The XFS file system, however, does not assume fixed size I/O. In an XFS file system, metadata
- can be anywhere from 512 bytes to over 8 Kbytes. The basic minimum I/O size for user data is
- set at file system creation time, with a typical installation using 4 Kbytes. One of the XFS
- design goals was to aggregate I/O together, creating large sequential I/O.
- </para>
- <para>
- This feature of XFS created a problem for Linux volume managers, since the XFS file system
- can hand an I/O request off to a block device driver specifying the start position and
- length, which is not always fixed. A logical volume manager is just another block device
- to XFS, and a logical volume manager working in conjunction with XFS needs to be able to
- handle whatever size I/O request XFS desires, to some reasonable limit.
- </para>
- <para>
- One of the options to address this problem in XFS is to change the on disk format of the
- file system to use a fixed size. This would render the Linux version of XFS incompatible
- with the current IRIX implementations, however, and so it was deemed unacceptable, just
- as making different versions of NFS would be unacceptable.
- </para>
- <para>
- The Linux 2.4 version of XFS working around the problem of variable I/O request size by
- opening a device with the minimum I/O size needed: 512 bytes and performing operations
- in multiples of this size anyway unless the underlying device is in a blacklist of
- volume manages that can't handle these I/O request.
- </para>
- <para>
- In Linux 2.5 the new block layer interface allows to submit variable-sized requests
- and the burden of splitting them up is up to the actual volume managers.
- </para>
- </section>
-
- <section>
- <title>
- Moving XFS to Open Source
- </title>
-
- <para>
- For XFS to be a viable alternative file system for the open source community, it was deemed essential
- that XFS be released with a license at least compatible with the GNU General Public License (GPL).
- </para>
- <para>
- The IRIX operating system in which XFS was originally developed has evolved over a long period of
- time, and includes assorted code bases with a variety of associated third party license agreements.
- For the most part these agreements are in conflict with the terms and conditions of the GNU General
- Public License.
- </para>
- <para>
- The initial XFS project was an SGI initiative that started with a top-to-bottom file system design
- rather than an extension of an existing file system. Based upon the assertions of the original
- developers and the unique features of XFS, there was a priori a low probability of overlap between
- the XFS code and the portions of IRIX to which third-party licenses might apply. However it was
- still necessary to establish that the XFS source code to be open sourced was free of all
- encumbrances, including any associated with terms and conditions of third party licenses applying
- to parts of IRIX.
- </para>
- <para>
- SGI's objectives were:
- </para>
-
- <itemizedlist>
- <listitem><para>
- to ensure the absence of any intellectual property infringements
- </para></listitem>
- <listitem><para>
- to establish the likely derivation history to ensure the absence of any code
- subject to third party terms and conditions
- </para></listitem>
- </itemizedlist>
-
- <para>
- This was a major undertaking; as the initial release of buildable XFS open source
- contained some 400 files and 199,000 lines of source. The process was long, but
- relatively straightforward, and encumbrance relief was usually by removal of code.
-
- The encumbrance review was a combined effort for SGI's Legal and Engineering
- organizations. The comments here will be confined to the technical issues and
- techniques used by the engineers.
- </para>
-
- <section>
- <title>
- The Encumbrance Review Process
- </title>
-
- <para>
- We were faced with making comparisons across several large code bases, and in particular
- UNIX System V Release 4.2-MP, BSD4.3 NET/2, BSD4.4-lite and the open source version of
- XFS. We performed the following tasks:
- </para>
-
- <itemizedlist>
- <listitem><para>
- Historical survey
- </para>
- <para>
- We contacted as many as possible of the original XFS developers and subsequent
- significant maintainers, and asked a series of questions. This information was most
- useful as guideposts or to corroborate conclusions from the other parts of the review.
- </para></listitem>
- <listitem><para>
- Keyword search (all case insensitive)
- </para>
- <para>
- In each of the non-XFS code bases, we searched for keywords associated with unique XFS
- concepts or technologies (e.g. journal, transaction, etc.). In the XFS code base, we
- searched for keywords associated with ownership, concepts and technologies in the
- non-XFS code bases (e.g. att, berkeley, etc.).
- </para></listitem>
- <listitem><para>
- Literal copy check
- </para>
- <para>
- Using a specially built tool, we compared every line of each XFS source file against
- all of the source in the non-XFS code bases. The comparison ignored white space, and
- filtered out some commonly occurring strings (e.g. matching "i++;" is never going to be helpful).
- </para></listitem>
- <listitem><para>
- Symbol matching
- </para>
- <para>
- We developed tools to post-process the ASCII format databases from cscope to generate
- lists of symbols and their associated generic type (function, global identifier, macro,
- struct, union, enum, struct/union/enum member, typedef, etc.). In each XFS source file
- the symbols were extracted and compared against all symbols found in all the non-XFS
- code bases. A match occurred when the same symbol name and type was found in two
- different source files. Some post-processing of the symbols was done to include
- plausible name transformations, e.g. adding an "xfs_" prefix, or removal of all underscores, etc.
- </para></listitem>
- <listitem><para>
- Prototype matching
- </para>
- <para>
- Starting with a variant of the mkproto tool, we scanned the source code to extract
- ANSI C prototypes. Based on some equivalence classes, "similar" types were mapped to a
- smaller number of base types, and then the prototypes compared. A match occurred when the
- type of the function and the number and type of the arguments agreed.
- </para></listitem>
- <listitem><para>
- Check for similarity of function, design, concept or implementation.
- </para>
- <para>
- This process is based upon an understanding, and a study, of the source code. In the XFS
- code, for each source file, or feature implemented in a source file, or group of source
- files implementing a feature, it was necessary to conduct a review of the implementation
- of any similar source file or feature in each of the non-XFS code bases. The objective of
- this review is to determine if an issue of potential encumbrance arises as a consequence
- of similarity in the function, implementation with respect to algorithms, source code
- structure, etc.
- </para></listitem>
- <listitem><para>
- Check for evidence of license agreements.
- </para>
- <para>
- We examined the XFS code (especially in comments) to identify any references to relevant
- copyrights or license agreements.
- </para></listitem>
- </itemizedlist>
-
- <para>
- In all of the steps above, the outcome was a list of possible matches. For each match,
- it was necessary to establish in the context of the matches (in one or more files), if
- there was a real encumbrance issue.
- We used a modified version of the tkdiff tool to graphically highlight the areas of the
- "match" without the visual confusion of all of the minutiae of the line-by-line differences.
- However, the classification of the matches was ultimately a manual process, based on
- the professional and technical skills of the engineers.
- </para>
-
- </section>
- <section>
- <title>
- Encumbrance Relief
- </title>
-
- <para>
- Especially in view of the size of the XFS source, a very small number of real encumbrance
- issues were identified. In all cases the relief was relatively straightforward, with removal
- of code required for IRIX, but not for Linux, being the most common technique.
- </para>
- </section>
- </section>
-
- </article>
-