It would be nice to be able to test whether a system designed to continue
delivering services even in the presence of faults will in fact do
so.
To help answer the above
questions, we have implemented UMLinux, a User Mode Linux which can be used for
realistic fault injection experiments. In order to simulate reality as closely as possible, our
UMLinux implements kernel memory protection and runs the complete
virtual machine (including operating system and all processes) as a
single process on the (real) host. Of course UMLinux is
binary-compatible with the host, so all binaries which run on the host also
run on UMLinux without recompilation.
The system we want to examine is set up using virtual UMLinux
machines. For most Linux distributions, we can use the out-of-the-box
installation routine to install the virtual machine directly from
cdrom. The virtual hardware, i.e. the memory size, the
hard disk, floppy disk and CD-ROM drives, and the network interfaces can be configured
freely within the limits imposed by the resources available on the
host.
When the virtual server system is up and running, the fault injector
can be configured to inject faults into the virtual hardware. We can
currently inject bit-flips into CPU registers and main memory, defective
bytes into any kind of block device, and network send and receive errors.
The whole setup is currently controlled from a graphical user
front end.
We are currently working to implement a script-driven automatic experiment
controller.