home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!usc!sdd.hp.com!hplabs!ucbvax!HPLWK.HPL.HP.COM!albert
- From: albert@HPLWK.HPL.HP.COM (Joseph Albert)
- Newsgroups: comp.databases
- Subject: Re: Hot Standby DBMS's
- Message-ID: <9208200523.AA06321@hplwk.hpl.hp.com>
- Date: 20 Aug 92 05:23:36 GMT
- Sender: daemon@ucbvax.BERKELEY.EDU
- Lines: 83
-
- References: <9208172103.AA04378@hplwk.hpl.hp.com> <Bt6IsC.F00@world.std.com>
- Sender:
- Followup-To:
- Distribution:
- Organization: Hewlett-Packard Laboratories -- Database Technology Dept
- Keywords:
-
- In article <Bt6IsC.F00@world.std.com> edwards@world.std.com (Jonathan Edwards) writes:
-
- >Yes, this is 'remote mirroring', which I discussed in my post.
- >Note that I am talking about keeping the systems 100 miles apart.
-
- Usually, when people talk about `hot standby dbms they mean one that
- is available when an active one crashes from hardware or other failure,
- not a geographically distributed system.
-
- It is important to keep three separate issues distinct:
-
- one sort of functionality is recoverability from catastrophic errors--
- eg inadvertantly deleted a table with vital data, malicious users who
- deliberately change data, complete failures, etc. this is handled by
- backing up the system. recovery from these errors is a manual process.
- these backups don't have to be to tape, but could be taken ovre a network,
- posted to another disk etc.
-
- a second sort of functionality has a need arising from transaction management.
- if the system crashes while there are active transactions, the system must,
- when brought back up, restore the database to a transaction consistent
- state. that is, all of the effects of all committed transactions must
- be applied to the database, and none of the effects of uncommitted
- transactions should be visible in the database. a system normally
- uses a log (sometimes shadow pages) to achieve this.
- note that, given the presence of a transaction log, one can further
- use the log to be able to do backups without taking the system down.
- (by doing a fuzzy dump, where the database may have effects of active,
- but uncommitted transactions, and use the log to roll the fuzzy state
- forward to a transaction-consistent state at recovery time).
-
- a third sort of functionality consists of building a high-availability
- or fault-tolerant system, which is a system that can tolerate some
- set of precisely specified errors without missing a beat, ie no need
- to do the sorts of recovery described in the first or second case for
- the specific set of errors.
-
- the 3rd functionality is what people often mean by `hot standby', which
- is to what i was proposing a solution. the functionality of the
- 1st and 2nd cases is typically provided by most commercial DBMS's.
-
- the 3 things are listed in order of decreasing recovery time.
-
- for example, someone drops a bomb on the computer. if the backup tapes
- are stored somewhere else, you can buy a new computer, install the software,
- and install the backup tapes, setting up a system as it was-- this will
- resolve very catastrophic errors, but it takes some time to get things back
- in order.
-
- less serious, the OS crashes, and with it, the dbms. the hardware still works
- the disk is not damaged, everything is fine, so you restart the system, and
- the dbms recovers to a transaction-consistent state. the system is only
- down however long it takes to get it restarted.
-
- finally, the 3rd case is designed to have the system keep working even
- during an error. a page on the disk goes bad, the OS crashes, the dbms
- crashes, .... your business loses $1 million for every hour the
- computer is unavailable. so, you design a hardware configuration
- with redundancy, so it can tolerate 1 instance of any of a number of
- faults.
-
- if you want geographic distribution, with replicated data, as noted in
- your 2nd post, a distributed dbms architecture will fill the bill. for
- such a system, the transaction model is enhanced to support distributed
- transactions, and data replicas that are kept consistent across all
- sites. various protocols for maintaining consitency of replicated data
- have been studied. for example, you could require a write to a data
- item atomically write to every replica (after obtaining locks that prevent
- reads or other writes to any of the replicas). then, a read on that item
- only has to read one replica. or you could only force the write to
- a majority of replicas (which must first be locked). then reads must
- (lock and ) read a majority of replicas to be ensured of getting the latest
- value. etc.
-
- joseph albert
- albert@hplabs.hp.com
-