home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!HPLWK.HPL.HP.COM!albert
- From: albert@HPLWK.HPL.HP.COM (Joseph Albert)
- Newsgroups: comp.databases
- Subject: Re: Hot Standby DBMS's
- Message-ID: <9208172103.AA04378@hplwk.hpl.hp.com>
- Date: 17 Aug 92 21:03:36 GMT
- Sender: daemon@ucbvax.BERKELEY.EDU
- Lines: 56
-
-
- Jonathan Edwards writes:
-
- >In the transaction-processing world, there is the concept of a 'hot standby'
- >system, which is a geographically separated system containing a copy of the
- >database, and capable of coming online very quickly. The replicated data must
- >be close to current, and guaranteeing complete synchronization is required
- >by some applications. A further feature is the ability to 'catchup'
- >incrementally to missed changes after an outage, without a complete database
- >copy. Our database (homebrew non-relational) does this.
- >Are there any other databases that can do this?
-
- A more reliable fault-tolerance would be obtained from redundancy at a
- level which is lowered than the level of abstraction of the database.
- Disks can be made redundant by having 2 or 3 physical disks, with
- drivers that make them look like a single device. if one disk fails,
- the driver can automatically look at the 2nd disk. with 3 disks acting
- as a single logical disk, failure can be detected at the disk page level--
- a read can look at the 3 pages, and if 2 of them are the same, return
- that value, or with 2 disks, a chekcsum can be stored with each page,
- to determine the valid page from among hte two if a single one fails.
-
- CPU/OS/DBMS failure can be handled by redundant CPUs. for example, a
- VAXCluster can be configured so that, say, 2 CPUs share a pool of
- disks as if each disk was `their own'. a dbms can be run on each CPU.
- the first thing each dbms will try to do at initialization is request
- a lock on the database. the dbms granted the lock will run, the other
- will be blocked, waiting on the lock. when a copy of a dbms runs, the
- first thing it does is execute the recovery code, scanning the log for
- any incomplete work.
-
- if the first CPU/OS/DBMS fails, the lock on the database is released,
- and the 2nd dbms copy will then run. the 2nd dbms will begin, executing
- the recovery code, which will examine the log left by the first dbms
- and recover from that log, accessing the same database copy. (again,
- there are no redundant databases, but a single database implemented
- on a pool of logical disks (where each logical disk may be implemented
- as 2 or more redundant physical disks).
-
- thus, any uncommited transactions on the first CPU that crashed will
- be aborted, but any transactions whose commit records made it to the
- log before the first cpu crashed will be recovered by the 2nd copy of
- the dbms. this scheme can tolerate a single failure to any logical
- disk page, a single cpu failure, a single system software failure.
- the only glitch at a failure is that uncommited active transactions
- are aborted.
-
- the scheme described at hte beginning of this article, where there is
- a redundant logical database, with some form of update between
- the real and standby databases to keep the standby one current is
- less fault-tolerant than what was just described. effectively, the
- two databases become one large distributed dbms, with its own newly
- introduced points of failure.
-
- Joseph Albert
- albert@hplabs.hp.com
-