home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zephyr.ens.tek.com!uw-beaver!cornell!ken
- From: ken@cs.cornell.edu (Ken Birman)
- Newsgroups: comp.sys.isis
- Subject: Re: A question about network partition on process groups
- Message-ID: <1992Jul25.004433.3234@cs.cornell.edu>
- Date: 25 Jul 92 00:44:33 GMT
- References: <1992Jul23.190006.3551@fig.citib.com>
- Organization: Cornell Univ. CS Dept, Ithaca NY 14853
- Lines: 123
-
- In article <1992Jul23.190006.3551@fig.citib.com> kpt@fig.citib.com (Kevin P. Tyson) writes:
- >I am very new to ISIS so please forgive me if this is a FAQ. I am interested
- >in using ISIS to implement replicated resource managers. My understanding is
- >that the Client-Server Group mechanism is the technique to use for this type
- >of processing and that the logging facility is the mechanism to use when new
- >replicas need to be added to the group.
-
- It depends on what you are trying to do. The logging tool is normally
- NOT needed, because most services of this sort don't need to preserve
- state across total failures. Instead, they tend to restart a failed
- server on some reasonable machine (perhaps picking it with the "network
- resource manager") and the new server does a pg_join and becomes part
- of the service group in that way.
-
- As for this being a client-server group: this would certainly be a group
- with clients. However, they might not be registered using the ISIS pg_client
- system call. For example, I know that Citibank is a DCE user, and for
- your setting it might be preferable to use a DCE RPC to talk to a
- server within the group, perhaps the one on your file server or otherwise
- picked to be in a sensible place, and this might use ISIS internally to
- maintain synchronized state wrt. other members of the service. So, you
- would see a standard DCE interface from the outside, but could use ISIS
- on the inside. Since ISIS can run under pthreads, this should work
- smoothly with no special problems relative to the DCE startup sequence...
-
- Generally, the case where we use the term "client-server group" and where
- you need to call pg_client in the client program is when the client will
- receive diffusion multicasts from the group, or will need to multicast
- atomically into the group, say if it was going to subdivide the search
- of a database among its members. Here, the pg_client call is necessary
- because the multicast will otherwise take an inefficient route, and over
- time you would prefer not to pay that inefficiency repeatedly.
-
- One annoyance with ISIS is that it offers a lot of options. In principle,
- you should be able to implement any architecture that appeals to you and
- that makes sense...
- >
- >My question concerns the effect of network partition on the server members of
- >the process group. Suppose we have a server process group running on four
- >nodes on a single ethernet segment. Each node has at least the Protos process
- >and the server process running. Each server process is responding to all rpc
- >calls made to the group. At some point the segment is split into two segments
- >of two nodes each. I assume that the Protos processes on each node will
- >detect the missing nodes and on the segment which now lacks the primary server
- >will promote one of the two remaining server processes to the role of primary.
- >The result of this will be two server process groups, each of which will
- >continue to operate.
-
- Actually, no. In this situation ISIS will only allow the partition with
- the most nodes (or the heaviest "weight", if you use the new "weight="
- feature in the sites file), to continue execution. The nodes that are
- partitioned away will see a disconnect from the system, e.g. their
- isis_failed() procedures will be called, and will need to reconnect to the
- system later when the partition heals. This is illustrated in the manual.
- The point is that ISIS will never allow the same group to be duplicated by
- a partition.
-
- If you are looking at a setting where partition is a real risk, the
- solution is to run ISIS separately on each of the two (or more) chunks.
- Each ISIS can independently support an instance of your service, say a
- "wide-area database service", and with the wide-area tools you can
- then multicast to the set of groups.
-
- A simpler way to deal with a WAN, however, is to use the NEWS system
- as a wide-area communication tool, since it is easy to run NEWS in a WAN
- configuration. Processes in the various LAN systems would monitor the
- important WAN topics and, for example, update the local database using
- what they see.
-
- One can't do better: Skeen proved this a few years ago. There is
- a basic choice between waiting for the partition to heal and making
- progress during a partition, and the cost we pay for going the latter
- route is that we can't allow a group to straddle a partition.
-
- On the other hand, we could hide this better. In the future (1993 or 1994)
- we will introduce a new ISIS system that will have a more uniform interface
- regardless of whether you work in the LAN or WAN setting. This will
- at least look better, although it won't actually overcome Skeen's
- impossibility results. Databases have the same problem, by the way.
-
- >
- >At some point the segments will have to be re-united. What support is there
- >in ISIS for detecting the changes that may have gone to one but not both of
- >the server groups while the network was partitioned? What support is there
- >for merging the two server groups into a single server group?
- >
-
- Well, if you use my recommendation -- WAN news -- you have a very strong
- guarantee that when the partition merges, stacked up messages will be
- despooled and replayed into the local representative for each group.
- So, with WAN news, you know that important data always gets through
- eventually, in the order sent, and without duplication or other errors.
- You can also get the equivalent behavior at a lower level, using the
- WAN interfaces to the spooler.
-
- On the other hand, if you are really in a LAN system and, say, we
- are looking at one machine that lost its connection to the local ether,
- the partitioned programs will have been out of touch and will look
- like newly created programs when they reconnect. They need to rejoin
- process groups, resubscribe to news topics, etc. In this setting,
- the program that was disconnected can save up things to play into the
- system (e.g. trades that your brokers and bankers entered while in
- disconnected mode) but can't be sure, without checking, what the outcome
- was on a trade that was in progress when the connection failed.
- News will save up and replay messages, under your control. But,
- since the process is treated like a new arrival, you don't see a
- guarantee that when the connection is restored everything that you
- would have seen gets replayed.
-
- One reason is sheer volume of data. I don't know what citibank is]
- like, but we are talking to places with 100's or 1000's of transactions
- per second, system wide, and a program partitioned away could have
- missed megabytes while disconnected. Isis doesn't have a good place
- to save all that stuff... if you ask news to do so, it will, but this
- makes it your decision to save it, and your problem to find space, and
- your responsibility to say how long to keep it...
-
- Hope this helps! Feel free to ask for clarification. I'm sure a lot
- of people are interested in issues like this.
- --
- Kenneth P. Birman E-mail: ken@cs.cornell.edu
- 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office)
- Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
-