Distributed metadata

One of the biggest complaints about PVFS1 is the single metadata server. There are actually two bases on which this complaint is typically launched. The first is that this is a single point of failure – we'll address that in a bit when we talk about data and metadata redundancy. The second is that it is a potential performance bottleneck.

In PVFS1 the metadata server steps aside for I/O operations, making it rarely a bottleneck in practice for large parallel applications, because they are busy writing data and not creating a billion files or some such thing. However, as systems continue to scale it becomes ever more likely that any such single point of contact might become a bottleneck for even well-behaved applications.

PVFS2 allows for configurations where metadata is distributed to some subset of I/O servers (which might or might not also serve data). This allows for metadata for different files to be placed on different servers, so that applications accessing different files tend to impact each other less.

Distributed metadata is a relatively tricky problem, but we're going to provide it in early releases anyway.