NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / database / 6253 < prev next >

Wrap

Text File | 1992-08-21 | 2.8 KB | 57 lines

Newsgroups: comp.databases Path: sparky!uunet!decwrl!world!edwards From: edwards@world.std.com (Jonathan Edwards) Subject: Re: distributed transactions Message-ID: <BtDA1n.JrB@world.std.com> Organization: The World Public Access UNIX, Brookline, MA References: <1736k9INNjhh@agate.berkeley.edu> <BtCrA6.Iuy@cup.hp.com> <173oogINNmrp@agate.berkeley.edu> Date: Sat, 22 Aug 1992 04:05:46 GMT Lines: 46 In article <173oogINNmrp@agate.berkeley.edu> mao@triplerock.CS.Berkeley.EDU (Mike Olson) writes: > >you're correct in your assertion that strong consistency in the face >of catastrophic failures is very difficult to guarantee. skeen's work >makes it possible for some nodes, under some conditions, to continue >to make progress when machines crash or communications fail, but it >doesn't solve the whole problem. other approaches are to relax your >consistency requirements, or to have humans outside the loop grant >permission to acquire locks and update tables during communications >failures. > >the added complexity of three-phase commit, coupled with the fact that >a clever adversary can force blockage anyway, have meant that it hasn't >gotten a lot of attention by commercial vendors. in real life, what >people do is spend a lot of money on redundant, highly reliable >communications systems. when something crashes, some expensive employees >get it working again fast. And so we come full circle through a theoretical detour to my original issue. The way to build a reliable distributed system in practice is: 1) partition the design into separate autonomous databases that interact via a reliable message-transfer protocol. This partitioning is naturally done along organizational or legal boundaries, so human responsibility correlates with system availability i.e., your ass is covered if the other guy's system is down. By interacting via messages, there are discrete quanta of responsibility that flow between systems in atomic steps. Distributed transactions should never cross autonomous systems, except perhaps to run the message transfers. 2) Reliability is acheived through redundant hardware and recoverable databases. Recovery from 'disasters' is done by geographically seperated hardware which is kept as a 'hot standby' with a synchronously updated database. This is optimally done via a copy of the journal stream, which contains all necessary data without the redundancy of the disk write stream. General purpose replicated data with multi-phase protocols and fancy voting and broadcast algorithms seems way too complex and inefficient (but maybe it will improve). My original post was a query to find out if any commercial databases were capable of robust hot standby. So far, NOT. So far, there doesn't even seem much acknowledgement that this is a correct approach to the problem.