NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / database / 6245 < prev next >

Wrap

Internet Message Format | 1992-08-21 | 2.7 KB

Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!ames!agate!triplerock.CS.Berkeley.EDU!mao From: mao@triplerock.CS.Berkeley.EDU (Mike Olson) Newsgroups: comp.databases Subject: Re: distributed transactions Message-ID: <173oogINNmrp@agate.berkeley.edu> Date: 21 Aug 92 21:55:28 GMT References: <1736k9INNjhh@agate.berkeley.edu> <BtCrA6.Iuy@cup.hp.com> Organization: University of California at Berkeley Lines: 48 NNTP-Posting-Host: triplerock.cs.berkeley.edu In <BtCrA6.Iuy@cup.hp.com>, dhepner@cup.hp.com (Dan Hepner) asks for details on how non-blocking commit protocols handle network partitions: > Maybe you can explain how this apparent dilemma is addressed: > > Time 1: node n acknowledges prepare > Time 2: node n notes an inability to communicate with anyone else, in > particular any site capable of being the transaction coordinator > Time 3: still no communications, patience exhausted at node n here's what the distributed systems theorists say: a partition splits a network into two pieces. whichever piece contains a majority of the nodes in the original network may continue processing. nodes in the minority clique cannot make further updates until the partition is repaired. there are some obvious problems with this -- for example, if no clique contains a majority of the nodes in the original network, no one can make progress. > On the other hand, the rarity of actual blockage makes extreme concern > not warranted for most real world systems. The coordinator must go > away at precisely the right time, _and then never return_. Surely some > way needs to be offered to resolve the tie-up, but this issue does > not justify a fundamental complaint against distributed transactions. partitions aren't that rare. ethernet bridges go down all the time. you're correct in your assertion that strong consistency in the face of catastrophic failures is very difficult to guarantee. skeen's work makes it possible for some nodes, under some conditions, to continue to make progress when machines crash or communications fail, but it doesn't solve the whole problem. other approaches are to relax your consistency requirements, or to have humans outside the loop grant permission to acquire locks and update tables during communications failures. the added complexity of three-phase commit, coupled with the fact that a clever adversary can force blockage anyway, have meant that it hasn't gotten a lot of attention by commercial vendors. in real life, what people do is spend a lot of money on redundant, highly reliable communications systems. when something crashes, some expensive employees get it working again fast. mike olson project sequoia 2000 uc berkeley mao@cs.berkeley.edu