<corpauthor>This work was sponsored by the Defense Advanced Research Projects Agency of the United States Government (DARPA). DARPA does not endorse this work, it merely sponsors it.</corpauthor>
</articleinfo>
<section>
<title>Abstract</title>
<para>
I will discuss the performance aspects of Reiser4 design that allow us
to both 1) implement filesystem
operations as <emphasis> atomic </emphasis> operations, and 2) increase overall filesystem performance compared to
ReiserFS V3. I will also suggest that fully atomic filesystems keep
your data more secure than journaling filesystems, and allow application writers greater ease in avoiding certain types of security holes.</para>
</section>
<section>
<title>Reducing The Damage of Crashing</title>
<para>
When a computer crashes there is data in RAM which has not reached disk that is
lost. You might at first be tempted to think that we want to then keep all of
the data that <emphasis>did</emphasis> reach disk.</para>
<para>
Suppose that you were performing a transfer of $10 from bank account A to bank account B, and this consisted of two operations
1) debit $10 from A, and 2) credit $10 to B.
</para>
<para>
Suppose that 1) but not 2) reached disk and the computer crashed. It would be
better to disregard 1) than to let 1) but not 2) take effect, yes?</para>
<para>
When there is a set of operations which we will ensure will all take effect, or none
take effect, we call the set as a whole an <emphasis>atom</emphasis>. Reiser4
implements all of its filesystem system calls (requests to the kernel to do
something are called <emphasis>system calls </emphasis>) as fully atomic operations, and
allows one to define new atomic operations using its plugin infrastructure.
Why don't all filesystems do this? Performance. </para>
<para>Reiser4 possesses employs new
algorithms that allow it to make these operations atomic at little additional
cost where other
filesystems have paid a heavy, usually prohibitive, price to do that. We hope
to share with you how that is done.</para>
</section>
<section>
<title>A Brief History Of How Filesystems Have Handled Crashes</title>
<section>
<title>Filesystem Checkers</title>
<para> Originally filesystems had filesystem checkers that would run
after every crash. The problem with this is that 1) the checkers can
not handle every form of damage well, and 2) the checkers run for a
long time. The amount of data stored on hard drives increased faster
than the transfer rate (the rate at which hard drives transfer their
data from the platter spinning inside them into the computer's RAM
when they are asked to do one large continuous read, or the rate in
the other direction for writes), which means that the checkers took
longer to run, and as the decades ticked by it became less and less
reasonable for a mission critical server to wait for the checker.
</para>
</section>
<section><title>Fixed Location Journaling</title>
<para>A solution to this was adopted of first writing each atomic
operation to a location on disk called the
<emphasis>journal</emphasis> or <emphasis>log</emphasis>, and then,
only after each atom had fully reached the journal, writing it to the
<emphasis>committed area</emphasis> of the filesystem (in describing
ReiserFS V3 we used to call this the real area, but nobody much liked
that term....).</para>
<para>The problem with this is that twice as much
data needs to be written. On the one hand, if the workload is dominated by seeks, this is not
as much of a burden as one might think. On the other hand, for writes of large files, it halves
performance because such writes are usually transfer time dominated.
</para>
<para>For this reason, meta-data journaling came to dominate general
purpose usage. With meta-data journaling, the filesystem guarantees
that all of its operations on its meta-data will be done atomically.
If a file is being written to, the data in that file being written may
be corrupted as a result of non-atomic data operations, but the
filesystem's internals will all be consistent. The performance
advantage was substantial. V3 of reiserfs offers both meta-data and
data journaling, and defaults to meta-data journaling because that is
the right solution for most users. Oddly enough, meta-data journaling
is much more work to implement because it requires being precise about
what needs to be journaled. As is so often the case in programming,
doing less work requires more code.
</para>
<para>
In sum, with fixed location data journaling, the overhead of making
each operation atomic is too high for it to be appropriate for average
applications that don't especially need it --- because of the cost of
writing twice. Applications that do need atomicity are written to use
fsync and rename to accomplish atomicity, and these tools are simply
terrible for that job. Terrible in performance, and terrible in the
ugliness they add to the coding of applications. Stuffing a
transaction into a single file just because you need the transaction
to be atomic is hardly what one would call flexible semantics. Also,
data journaling, with all its performance cost, still does not
necessarily guarantee that every system call is fully atomic, much
less that one can construct sets of operations that are fully atomic.
It usually merely guarantees that the files will not contain random
garbage, however many blocks of them happen to get written, and
however much the application might view the result as inconsistent
data. I hope you understand that we are trying to set a new
expectation here for how secure a filesystem should keep your data,
when we provide these atomicity guarantees. </para>
</section></section>
<section>
<title>Wandering Logs</title>
<para>
One way to avoid having to write the data twice is to change one's
definition of where the log area and the committed area are, instead of
moving the data from the log to the committed area.
</para>
<para>
There is an annoying complication to this though, in that there are
probably a number of pointers to the data from the rest of the
filesystem, and we need for them to point to the new data. When the commit occurs, we need to write those pointers
so that they point to the data we are committing. Fortunately, these
pointers tend to be highly concentrated as a result of our tree
design. But wait, if we are going to update those pointers, then we
want to commit those pointers atomically also, which we could do if we
write them to another location and update the pointers to them,
and.... up the tree the changes ripple. When we get to the top of the
tree, since disk drives write sectors atomically, the block number of
the top can be written atomically into the superblock by the disk
thereby committing everything the new top points to. This is indeed
the way WAFL, the Write Anywhere File Layout filesystem invented by
Dave Hitz at Network Appliance, works. It always ripples changes all the way to
the top, and indeed that works rather well in practice, and most of
their users are quite happy with its performance.</para>
<section>
<title>Writing Twice May Be Optimal Sometimes</title>
<para>
Suppose that a file is currently well laid out, and you write to a single block in the
middle of it, and you then expect to do many reads of the file. That is an
extreme case illustrating that sometimes it is worth writing twice so that a
block can keep its current location. If one does write twice, one does not
need to ripple pointer updates to the top. If one writes a node twice, one also does not need
to update its parent.
Our code is a toolkit that can be used to implement different layout
policies, and one of the available choices is whether to write over a block in
its current place, or to relocate it to somewhere else. I don't think there is
one right answer for all usage patterns.</para>
<para>
If a block is adjacent to many other dirty blocks in the tree, then
this decreases the significance of the cost to read performance of
relocating it and its neighbors. If one knows that a repacker will
run once a week (a repacker is expected for V4.1, and is (a bit oddly)
absent from WAFL), this also decreases the cost of relocation. After
a few years of experimentation, measurement, and user feedback, we
will say more about our experiences in constructing user selectable
policies.</para>
<para>
Do we pay a performance penalty for making Reiser4 atomic? Yes, we
do. Is it an acceptable penalty? This is hard to measure, because we
picked up a lot more performance from other improvements in Reiser4
than we lost to atomicity, but I am fairly confident that the answer
is yes. If changes are either large or batched together with enough
other changes to become large, the performance penalty is low and
drowned out by other performance improvements. Scattered small
changes threaten us with read performance losses compared to
overwriting in place and taking our chances with the data's
consistency if there is a crash, but use of a repacker will mostly
alleviate this scenario. I have to say that in my heart I don't have
any serious doubts that for the general purpose user the increase in
data security is worthwhile. The users though will have the final
say.</para>
</section>
<section>
<title>Committing</title>
<para>A transaction preserves the previous contents of all modified
blocks in their original location on disk until the transaction
commits, and commit means the transaction has hereby reached a state
where it will be completed even if there is a crash. </para>
<para>
The dirty blocks of an atom (which were captured and subsequently
modified) are divided into two sets, <emphasis>relocate</emphasis> and
<emphasis>overwrite</emphasis>, each of which is preserved in a different
manner. </para>
<para>The <emphasis>relocatable</emphasis> set is the set of blocks that have a dirty
parent in the atom. </para>
<para>The <emphasis>relocate</emphasis> set is those members of the relocatable set
that will be written to a new or first location rather than
overwritten. </para>
<para>The <emphasis>overwrite</emphasis> set contains all dirty blocks in the atom that need to
be written to their original locations, which is all those not in the
relocate set. In practice this is those which do not have a parent we
want to dirty, plus also those for which overwrite is the better layout
policy despite the write twice cost. Note that the superblock is the parent of the root node
and the free space bitmap blocks have no parent. By these definitions,
the superblock and modified bitmap blocks are always part of the
overwrite set.</para>
<para>The <emphasis>wandered set</emphasis> is the set of blocks that the
overwrite set will be written to temporarily until the overwrite set commits.</para>
<para>An interesting definition is the <emphasis>minimum overwrite</emphasis> set,
which uses the same definitions as above with the following
modification. If at least two dirty blocks have a common parent that
is clean then its parent is added to the minimum overwrite set. The
parent's dirty children are removed from the overwrite set and placed
in the relocate set. This policy is an example of what will be
experimented with in later versions of Reiser4 using the layout
toolkit. </para>
<para>For space reasons, we leave out the full details on exactly when we
relocate vs. overwrite, and the reader should not regret this because
years of experimenting is probably ahead before we can speak with the
authority necessary for a published paper on the effects of the many
details and variations possible.</para>
<para>
When we commit we write a wander list which consists of a mapping of the wander
set to the overwrite set. The wander list is a linked list of blocks
containing pairs of block numbers. The last act of committing a transaction is to
update the super block to point to the front of that list. Once that is done,
if there is a crash, the crash recovery will go through that list and "play"
it, which means to write the wandered set over the overwrite set. If there is
not a crash, we will also play it.</para>
<para>
There are many more details of how we handle the deallocation of
wandered blocks, the handling of bitmap blocks, and so forth. You are
encouraged to read the comments at the top of our source code files (e.g. wander.c) for such
details.... </para>
</section>
</section>
<section>
<title>Reiser4's Performance Design</title>
<section>
<title>Tree Design Concepts</title>
<section>
<title>Height Balancing versus Space Balancing</title>
<para> <emphasis>Height Balanced Trees</emphasis> are trees such that
each possible search path from root node to leaf node has exactly the
same length (<emphasis>Length</emphasis> = number of nodes traversed
from root node to leaf node). For instance the height of the tree in
Figure 2 is four while the height of the left hand tree in Figure 1.3
is three and of the single node in figure 1.0 is 1. </para>
<para>The term <emphasis>balancing</emphasis> is used for several very
distinct purposes in the balanced tree literature. Two of the most
common are: to describe balancing the <emphasis>height</emphasis>, and
to describe balancing the <emphasis>space usage</emphasis> within the nodes of the
tree. These quite different definitions are unfortunately a classic
source of confusion for readers of the literature. </para>
<para>
Most algorithms for accomplishing height balancing do so
by only growing the tree at the top. Thus the tree never gets out of
alt="This is a tree that was four levels until a BLOB was inserted with a pointer from a leaf node. In this case the BLOB's blocks are all contiguous."/></p>
<para class="figure">
Fig. 1.14. XXXXXXXXXXXXX </para> -->
<para>
Gray and Reuter say the criterion for searching external memory is
to "minimize the number of different pages along the average (or longest)
search path. ....by reducing the number of different pages for an arbitrary
search path, the probability of having to read a block from disk is reduced."
(1993, Transaction Processing: concepts and techniques, Morgan Kaufman
Publishers, San Francisco, CA, p.834 ...)</para>
<para>
My problem with this explanation of why the height balanced approach
is effective is that it does not convey that you can get away with
having a moderately unbalanced tree provided that you do not
significantly increase the total number of internal nodes. In
practice, most trees that are unbalanced do have significantly more
internal nodes. In practice, most moderately unbalanced trees have a
moderate increase in the cost of in-memory tree traversals, and an
immoderate increase in the amount of IO due to the increased number of
internal nodes. But if one were to put all the BLOBs together in the
same location in the tree, since the amount of internal nodes would
not significantly increase, the performance penalty for having them on
a lower level of the tree than all other leaf nodes would not be a
significant additional IO cost. There would be a moderate increase in
that part of the tree traversal time cost which is dependent on RAM
speed, but this would not be so critical. Segregating BLOBs could
perhaps substantially recover the performance lost by architects not
noticing the drift in the definition of height balancing for trees.
It might be undesirable to segregate objects by their size rather than
just their semantics though. Perhaps someday someone will try it and
see what results. </para>
</section></section></section>
<section><title>Block Allocation</title>
<section><title>Parent First Pre-Order Is What We Try For</title>
<section><title>Version 3 Too Readily Adds Disk Seeks Between Nodes In Response To Changes In The Tree</title>
<para>In ReiserFS we attempt to assign keys for what we store in the tree
such that typical accesses are accesses to objects in the order of
their keys. Since one generally has to access a parent before
accessing its children, this means accesses will be in parent first
pre-order, as we draw it below (the below is an unbalanced tree).</para>
<para>In the block allocation code we then attempt to assign block
numbers such that accesses to nodes in parent first preorder involve
accessing nodes in the order of their block numbers. </para>
<para>This added layer of abstraction in which we control layout by
key assignment policy, and assign blocknumbers by tree order, is very
flexible in practice, and we find it improves substantially our
ability to perform more experiments with layout alternatives, and to
adapt for particular application needs. </para>
<para>In Version 3, whenever we insert a node into the tree, we assign it
a block number at the time we insert it. We assign it a block number
as close as we can to its left neighbor in the tree. </para>
<para>One problem is that the tree can be very different in its structure
at the time we insert a given block compared to when we finish some
batch operation such as copying a directory and all of its files, or
even just copying the current file if it is a large file. The less
fanout there is, the more that shifts of items tend to lead to
structural changes in the tree that change what the optimal block
allocation is. The less accurate we are in allocating blocks, the more
seeks. </para>
<para>In Version 4, we adopt new techniques to reduce this problem: we
increase fanout by getting rid of BLOBs as described at
www.namesys.com/v4/v4.html, and we allocate block numbers at flush
time (when dirty nodes are written to disk) instead of at node
insertion time. </para>
<para>We determined, as part of the art of design compromise, that
placing twigs before their children sadly matters enough to
performance to merit doing it, but placing branches before their
children does not because branches are more likely to be cached and
are relatively rare. Branches are therefor write optimized rather
than read-optimized in their layout --- that means we write the
branches at the end of flushing an atom, and we try to write them all
together in a location likely to not require a seek away from what was
written just before them. In respect to allocating blocks for twigs,
I say sadly, because the added complexity of allocating blocks for
twigs just before their children is immense when you consider that we
are also squeezing them, and squeezing and encrypting and compressing
their children, as we allocate blocks, and the effect of the squeezing
and allocating and encrypting can affect the number of extents
required to store a file which affects how much fits into a twig which
affects who is a child of what twig.... We did it, but not easily, I
regret to say. If a future designer finds that he can neglect
read-optimizing twigs, perhaps as a result of a design improvement or
a different usage pattern, it will immensely simplify the code, and
this point is worth noting. </para>
</section></section>
<section><title>Dancing Trees</title>
<para> Traditional balanced tree algorithms apply a fixed criterion
for whether a node is tightly enough packed, or should be combined
with its neighbors. For example, some will check on each modification
whether the node modified plus its left and its right neighbor can be
squeezed into one fewer nodes. ReiserFS V3 does this on the leaf
level of the tree. The stricter this algorithm is, and the more
neighbors it considers, the tighter the packing is, and the more bytes
must be shifted on average per modification in order to make space for
that modification. That means more overhead in both CPU and
IO.</para>
<para>
For this reason, some databases actually employ free on empty as their
static space usage balancing criterion.</para>
<para> "Garbage collection" algorithms are a different way of managing
space usage in memory. When space runs short, they run through memory
repacking it and reclaiming unused space, rather than repacking with
each usage or freeing of memory as they might otherwise have done.
This is like not driving to the recycler every time you finish a
bottle of milk. (ReiserFS is environment friendly --- we recycle
memory rather than discarding it.... )</para>
<para> With Reiser4, when we are short on space in RAM, the memory
manager asks us to write a dirty node to disk. We collect together
the maximal set of nodes that include the node we were asked to flush,
are contiguous together with each other, and are dirty. We call this set a
<emphasis>slum</emphasis>. We squeeze the slum into the minimum number
of nodes it can squeeze into. If the slum is, say, 64 nodes large
after squeezing, that will be much tighter than a fixed balancing
criterion tree would pack it because it will be at least 63/64ths
full, and most tree balancing algorithms examine only the immediate
neighbors when deciding how tight they can pack. (If the slum is one
node in size, we have several options we are experimenting with,
including the one we suspect will be best for "typical" use, which is
leaving it for the repacker to optimize well.) </para>
<para>The effect of this approach is that hot spots become relatively
loosely packed so that insertions are cheap, and slums that are no
longer hot spots get very tightly packed before being sent to
disk. </para>
<para>An area for possible future work is to allow for slum squeezing to
eliminate the need for performing IO if enough memory is freed by it.
This might require changes to the kernel memory manager though.</para>
<para> This resembles traditional garbage collection algorithms. So far
as we know though, applying this approach to trees is unique to
reiser4, and we call a tree whose packing is managed in this way a
"dancing tree" rather than a "space usage balanced tree". (Remember,
there is a difference between height balancing and space usage
balancing, and Reiser4 trees are still (very) height balanced
trees....) </para>
<section>
<title>Repacker</title>
<para>Another way of escaping from the balancing time vs. space
efficiency tradeoff is to use a repacker. 80% of files on the disk
remain unchanged for long periods of time. It is efficient to pack
them perfectly, by using a repacker that runs much less often than
every write to disk. This repacker goes through the entire tree
ordering, from left to right and then from right to left, alternating
each time it runs. When it goes from left to right in the tree
ordering, it shoves everything as far to the left as it will go, and
when it goes from right to left it shoves everything as far to the
right as it will go. (Left means small in key or in block number:-)
). In the absence of FS activity the effect of this over time is to
sort by tree order (defragment), and to pack with perfect
efficiency.</para>
<para>
Reiser4.1 will modify the repacker to insert controlled "air holes", as
it is well known that insertion efficiency is harmed by overly tight
packing.</para>
<para>
I hypothesize that it is more efficient to periodically run a repacker
that systematically repacks using large IOs than to perform lots of 1
block reads of neighboring nodes of the modification points so as to
preserve a balancing invariant in the face of poorly localized
modifications to the tree.</para>
<section>
<title>What Is Best Done Just Before Flushing To Disk vs. At Tree
Insertion Time vs. In Nightly Repacker</title>
<para>Painting it in broad strokes, Reiser4 is designed to massage and
performance optimize changes to data in three stages: </para>
<itemizedlist>
<listitem>
<para>at the time the system call that invokes Reiser4 executes (this
may put the data into the tree, the page cache, the dentry cache,
etc.)</para>
</listitem>
<listitem>
<para>at the time the data is flushed from a cache (usually to
disk)</para>
</listitem>
<listitem>
<para>at the time the repacker runs </para>
</listitem>
</itemizedlist>
<para>Different kinds of information massaging belong in each of these stages: </para>
<itemizedlist>
<listitem>
<para>Object plugins modify the data in ways appropriate to do as the
data is being changed.</para>
</listitem>
<listitem>
<para>Flush plugins modify the data in ways appropriate to do as the
data is flushed to disk.</para>
</listitem>
<listitem>
<para>The repacker modifies the data in ways appropriate to do only at long time intervals. </para>
</listitem>
</itemizedlist>
<para>I see the flushing stage as the time most optimal for: </para>
<itemizedlist>
<listitem>
<para>allocating blocks to nodes, including deciding whether to
overwrite or relocate</para>
</listitem>
<listitem>
<para>squeezing nodes together for tight packing before they go to disk</para>
</listitem>
<listitem>
<para>encrypting data</para>
</listitem>
</itemizedlist>
</section>
</section>
<section><title>A Hint of Things Left For Other Papers</title> <para>At
the time of writing we implement all VFS operations atomically, and
the infrastructure for atomic operations is fully complete. The
interface for specifying multiple FS operations that are atomic is
still being tested and debugged, we are going to let our mailing list comment
before we fix it in stone, and space in this paper is limited, so we
have left it for a future, hopefully interesting, paper.</para>
<para>
Expediency often tempts security researchers into adding
architecturally inelegant extensions to filesystems. Reiser4's
objective was to systematically reviewed all of the infrastructure
lackings that tempt security researchers into implementing inelegance,
and systematically approached remedying them. It creates a complete
plugin infrastructure that allows you to construct custom made file
objects. It implements inheritance, constraints, encryption at flush
time, and compression at flush time. We implement all of the features
needed to effectively implement security attributes as just flavors of
files. Space does not permit discussing these features, or the
philosophy behind why security attributes should be just particular
flavors of files, in this paper, but please watch our website at
www.namesys.com. </para>
</section>
<section><title>Conclusion</title>
<para>Atomic filesystems keep your data more secure. Historically
filesystems have not been made atomic in large part because of the
performance cost. One component of this performance cost is the need
to write twice, and Reiser4 avoids this. Reiser4 introduces a set of
performance improvements that dwarf what additional cost of atomicity
remains after the write twice penalty is eliminated, and the result is
that you are going to have both greater data security and faster
performance when you use it. Application writers are going to be able
to avoid a variety of security holes by tapping into the atomic
operation functionality.</para>
</section>
<section><title>References</title>
<para>A summary of some influences upon Reiser4 performance design: </para>
<itemizedlist>
<listitem>
<para>Its avoidance of seeks for writes is influenced by LFS and WAFL.</para>
</listitem>
<listitem>
<para>Plans for a repacker in Reiser4.1 resemble the use of a cleaner in LFS.</para>
</listitem>
<listitem>
<para>Its flush plugins are a generalization of delayed block
allocation used in XFS.</para>
</listitem>
<listitem>
<para>Performance problems, benefits (larger blocks), and space
savings, due to tails in ReiserFS V3 (not Reiser4) resemble
performance problems (these are not well documented in the literature
unfortunately), benefits (they do document in the FFS paper cited
below the benefit of being able to increase block size as a result of
not fearing as much its cost in space usage), and space savings, due
to fragments in FFS.</para>
</listitem>
<listitem>
<para>The use of balanced trees for aggregating multiple small
objects into one node resembles how they are used in
databases.</para>
</listitem>
</itemizedlist>
</section></section></section>
<section><title>Citations:</title>
<itemizedlist>
<listitem>
<para><emphasis>[Gray93]</emphasis></para>
<para>Jim Gray and Andreas Reuter. "Transaction Processing: Concepts and Techniques". Morgan Kaufmann Publishers, Inc., 1993. Old but good textbook on transactions.
Available at <ulink url="http://www.mkp.com/books_catalog/catalog.asp?ISBN=1-55860-190-2">
<para>D. Hitz, J. Lau and M. Malcolm. "File system design for an NFS file server appliance". Proceedings of the 1994 USENIX Winter Technical Conference, pp. 235-246, San Francisco, CA, January 1994
Available at <ulink url="http://citeseer.nj.nec.com/hitz95file.html">http://citeseer.nj.nec.com/hitz95file.html</ulink> </para>
Available at <ulink url="http://www.netapp.com/tech_library/3001.html">http://www.netapp.com/tech_library/3001.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[TR3002]</emphasis></para>
<para>D. Hitz, J. Lau and M. Malcolm. "File system design for an NFS file server appliance". Tech. Rep. TR3002, Network Appliance, Inc., 1995
Available at <ulink url="http://www.netapp.com/tech_library/3002.html">http://www.netapp.com/tech_library/3002.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[Kusick84]</emphasis></para>
<para>M. McKusick, W. Joy, S. Leffler, R. Fabry. "A Fast File System for UNIX". ACM Transactions on Computer Systems, Vol. 2, No. 3, pp. 181-197, August 1984
Available at <ulink url="http://citeseer.nj.nec.com/mckusick84fast.html">http://citeseer.nj.nec.com/mckusick84fast.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[Ousterh89]</emphasis></para>
<para>J. Ousterhout and F. Douglis. "Beating the I/O Bottleneck: A Case for Log-Structured File Systems". ACM Operating System Reviews, Vol. 23, No. 1, pp.11-28, January 1989
Available at <ulink url="http://citeseer.nj.nec.com/ousterhout88beating.html">http://citeseer.nj.nec.com/ousterhout88beating.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[Seltzer95]</emphasis></para>
<para>M. Seltzer, K. Smith, H. Balakrishnan, J. Chang, S. McMains and V. Padmanabhan. "File System Logging versus Clustering: A Performance Comparison". Proceedings of the 1995 USENIX Technical Conference, pp. 249-264, New Orleans, LA, January 1995
Available at <ulink url="http://citeseer.nj.nec.com/seltzer95file.html">http://citeseer.nj.nec.com/seltzer95file.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[Seltzer95Supp]</emphasis></para>
<para>M. Seltzer. "LFS and FFS Supplementary Information". 1995
<para>M. Rosenblum and J. Ousterhout. "The design and implementation of a log-structured file system". ACM Transactions on Computer Systems, Vol. 10, No. 1, pp. 26-52, February 1992
Available at <ulink url="http://citeseer.nj.nec.com/rosenblum91design.html">http://citeseer.nj.nec.com/rosenblum91design.html</ulink> </para>
</listitem>
<listitem>
<para><emphasis>[SwD96]</emphasis></para>
<para>A. Sweeny, D. Doucette, W. Hu, C. Anderson, M. Nishimoto and G. Peck. "Scalability in the XFS File System". Proceedings of the 1996 USENIX Technical Conference, pp. 1-14, San Diego, CA, January 1996
Available at <ulink url="http://citeseer.nj.nec.com/sweeney96scalability.html">http://citeseer.nj.nec.com/sweeney96scalability.html</ulink> </para>