NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / database / 6201 < prev next >

Wrap

Internet Message Format | 1992-08-19 | 4.4 KB

Path: sparky!uunet!cs.utexas.edu!usc!sdd.hp.com!hplabs!ucbvax!HPLWK.HPL.HP.COM!albert From: albert@HPLWK.HPL.HP.COM (Joseph Albert) Newsgroups: comp.databases Subject: Re: Hot Standby DBMS's Message-ID: <9208200523.AA06321@hplwk.hpl.hp.com> Date: 20 Aug 92 05:23:36 GMT Sender: daemon@ucbvax.BERKELEY.EDU Lines: 83 References: <9208172103.AA04378@hplwk.hpl.hp.com> <Bt6IsC.F00@world.std.com> Sender: Followup-To: Distribution: Organization: Hewlett-Packard Laboratories -- Database Technology Dept Keywords: In article <Bt6IsC.F00@world.std.com> edwards@world.std.com (Jonathan Edwards) writes: >Yes, this is 'remote mirroring', which I discussed in my post. >Note that I am talking about keeping the systems 100 miles apart. Usually, when people talk about `hot standby dbms they mean one that is available when an active one crashes from hardware or other failure, not a geographically distributed system. It is important to keep three separate issues distinct: one sort of functionality is recoverability from catastrophic errors-- eg inadvertantly deleted a table with vital data, malicious users who deliberately change data, complete failures, etc. this is handled by backing up the system. recovery from these errors is a manual process. these backups don't have to be to tape, but could be taken ovre a network, posted to another disk etc. a second sort of functionality has a need arising from transaction management. if the system crashes while there are active transactions, the system must, when brought back up, restore the database to a transaction consistent state. that is, all of the effects of all committed transactions must be applied to the database, and none of the effects of uncommitted transactions should be visible in the database. a system normally uses a log (sometimes shadow pages) to achieve this. note that, given the presence of a transaction log, one can further use the log to be able to do backups without taking the system down. (by doing a fuzzy dump, where the database may have effects of active, but uncommitted transactions, and use the log to roll the fuzzy state forward to a transaction-consistent state at recovery time). a third sort of functionality consists of building a high-availability or fault-tolerant system, which is a system that can tolerate some set of precisely specified errors without missing a beat, ie no need to do the sorts of recovery described in the first or second case for the specific set of errors. the 3rd functionality is what people often mean by `hot standby', which is to what i was proposing a solution. the functionality of the 1st and 2nd cases is typically provided by most commercial DBMS's. the 3 things are listed in order of decreasing recovery time. for example, someone drops a bomb on the computer. if the backup tapes are stored somewhere else, you can buy a new computer, install the software, and install the backup tapes, setting up a system as it was-- this will resolve very catastrophic errors, but it takes some time to get things back in order. less serious, the OS crashes, and with it, the dbms. the hardware still works the disk is not damaged, everything is fine, so you restart the system, and the dbms recovers to a transaction-consistent state. the system is only down however long it takes to get it restarted. finally, the 3rd case is designed to have the system keep working even during an error. a page on the disk goes bad, the OS crashes, the dbms crashes, .... your business loses $1 million for every hour the computer is unavailable. so, you design a hardware configuration with redundancy, so it can tolerate 1 instance of any of a number of faults. if you want geographic distribution, with replicated data, as noted in your 2nd post, a distributed dbms architecture will fill the bill. for such a system, the transaction model is enhanced to support distributed transactions, and data replicas that are kept consistent across all sites. various protocols for maintaining consitency of replicated data have been studied. for example, you could require a write to a data item atomically write to every replica (after obtaining locks that prevent reads or other writes to any of the replicas). then, a read on that item only has to read one replica. or you could only force the write to a majority of replicas (which must first be locked). then reads must (lock and ) read a majority of replicas to be ensured of getting the latest value. etc. joseph albert albert@hplabs.hp.com