NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / database / 6159 < prev next >

Wrap

Internet Message Format | 1992-08-17 | 3.3 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!HPLWK.HPL.HP.COM!albert From: albert@HPLWK.HPL.HP.COM (Joseph Albert) Newsgroups: comp.databases Subject: Re: Hot Standby DBMS's Message-ID: <9208172103.AA04378@hplwk.hpl.hp.com> Date: 17 Aug 92 21:03:36 GMT Sender: daemon@ucbvax.BERKELEY.EDU Lines: 56 Jonathan Edwards writes: >In the transaction-processing world, there is the concept of a 'hot standby' >system, which is a geographically separated system containing a copy of the >database, and capable of coming online very quickly. The replicated data must >be close to current, and guaranteeing complete synchronization is required >by some applications. A further feature is the ability to 'catchup' >incrementally to missed changes after an outage, without a complete database >copy. Our database (homebrew non-relational) does this. >Are there any other databases that can do this? A more reliable fault-tolerance would be obtained from redundancy at a level which is lowered than the level of abstraction of the database. Disks can be made redundant by having 2 or 3 physical disks, with drivers that make them look like a single device. if one disk fails, the driver can automatically look at the 2nd disk. with 3 disks acting as a single logical disk, failure can be detected at the disk page level-- a read can look at the 3 pages, and if 2 of them are the same, return that value, or with 2 disks, a chekcsum can be stored with each page, to determine the valid page from among hte two if a single one fails. CPU/OS/DBMS failure can be handled by redundant CPUs. for example, a VAXCluster can be configured so that, say, 2 CPUs share a pool of disks as if each disk was `their own'. a dbms can be run on each CPU. the first thing each dbms will try to do at initialization is request a lock on the database. the dbms granted the lock will run, the other will be blocked, waiting on the lock. when a copy of a dbms runs, the first thing it does is execute the recovery code, scanning the log for any incomplete work. if the first CPU/OS/DBMS fails, the lock on the database is released, and the 2nd dbms copy will then run. the 2nd dbms will begin, executing the recovery code, which will examine the log left by the first dbms and recover from that log, accessing the same database copy. (again, there are no redundant databases, but a single database implemented on a pool of logical disks (where each logical disk may be implemented as 2 or more redundant physical disks). thus, any uncommited transactions on the first CPU that crashed will be aborted, but any transactions whose commit records made it to the log before the first cpu crashed will be recovered by the 2nd copy of the dbms. this scheme can tolerate a single failure to any logical disk page, a single cpu failure, a single system software failure. the only glitch at a failure is that uncommited active transactions are aborted. the scheme described at hte beginning of this article, where there is a redundant logical database, with some form of update between the real and standby databases to keep the standby one current is less fault-tolerant than what was just described. effectively, the two databases become one large distributed dbms, with its own newly introduced points of failure. Joseph Albert albert@hplabs.hp.com