home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!ames!agate!triplerock.CS.Berkeley.EDU!mao
- From: mao@triplerock.CS.Berkeley.EDU (Mike Olson)
- Newsgroups: comp.databases
- Subject: Re: distributed transactions
- Message-ID: <173oogINNmrp@agate.berkeley.edu>
- Date: 21 Aug 92 21:55:28 GMT
- References: <1736k9INNjhh@agate.berkeley.edu> <BtCrA6.Iuy@cup.hp.com>
- Organization: University of California at Berkeley
- Lines: 48
- NNTP-Posting-Host: triplerock.cs.berkeley.edu
-
- In <BtCrA6.Iuy@cup.hp.com>, dhepner@cup.hp.com (Dan Hepner) asks for
- details on how non-blocking commit protocols handle network partitions:
-
- > Maybe you can explain how this apparent dilemma is addressed:
- >
- > Time 1: node n acknowledges prepare
- > Time 2: node n notes an inability to communicate with anyone else, in
- > particular any site capable of being the transaction coordinator
- > Time 3: still no communications, patience exhausted at node n
-
- here's what the distributed systems theorists say: a partition splits
- a network into two pieces. whichever piece contains a majority of
- the nodes in the original network may continue processing. nodes in
- the minority clique cannot make further updates until the partition is
- repaired.
-
- there are some obvious problems with this -- for example, if no clique
- contains a majority of the nodes in the original network, no one can
- make progress.
-
- > On the other hand, the rarity of actual blockage makes extreme concern
- > not warranted for most real world systems. The coordinator must go
- > away at precisely the right time, _and then never return_. Surely some
- > way needs to be offered to resolve the tie-up, but this issue does
- > not justify a fundamental complaint against distributed transactions.
-
- partitions aren't that rare. ethernet bridges go down all the time.
-
- you're correct in your assertion that strong consistency in the face
- of catastrophic failures is very difficult to guarantee. skeen's work
- makes it possible for some nodes, under some conditions, to continue
- to make progress when machines crash or communications fail, but it
- doesn't solve the whole problem. other approaches are to relax your
- consistency requirements, or to have humans outside the loop grant
- permission to acquire locks and update tables during communications
- failures.
-
- the added complexity of three-phase commit, coupled with the fact that
- a clever adversary can force blockage anyway, have meant that it hasn't
- gotten a lot of attention by commercial vendors. in real life, what
- people do is spend a lot of money on redundant, highly reliable
- communications systems. when something crashes, some expensive employees
- get it working again fast.
-
- mike olson
- project sequoia 2000
- uc berkeley
- mao@cs.berkeley.edu
-