What happens during failover

The following example illustrates the steps that occur when a node fails:

  1. Node2 (which is running a pvfs2-server on the virtual2 IP address) suffers a failure
  2. Client node begins timeout/retry cycle
  3. Heartbeat services running on remaining nodes notice that node2 is not responding
  4. After a timeout has elapsed, remaining nodes reach a quorum and vote to treat node2 as a failed node
  5. Node1 sends a stonith command to reset node2
  6. Node2 either reboots or remains powered off (depending on nature of failure)
  7. Once stonith command succeeds, node5 is selected to replace it
  8. The virtual2 IP address, mount point, and pvfs2-server service are started on node5
  9. Client node retry eventually succeeds, but now the network traffic is routed to node5