NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / sys / isis / 208 < prev next >

Wrap

Internet Message Format | 1992-07-25 | 7.3 KB

Path: sparky!uunet!zephyr.ens.tek.com!uw-beaver!cornell!ken From: ken@cs.cornell.edu (Ken Birman) Newsgroups: comp.sys.isis Subject: Re: A question about network partition on process groups Message-ID: <1992Jul25.004433.3234@cs.cornell.edu> Date: 25 Jul 92 00:44:33 GMT References: <1992Jul23.190006.3551@fig.citib.com> Organization: Cornell Univ. CS Dept, Ithaca NY 14853 Lines: 123 In article <1992Jul23.190006.3551@fig.citib.com> kpt@fig.citib.com (Kevin P. Tyson) writes: >I am very new to ISIS so please forgive me if this is a FAQ. I am interested >in using ISIS to implement replicated resource managers. My understanding is >that the Client-Server Group mechanism is the technique to use for this type >of processing and that the logging facility is the mechanism to use when new >replicas need to be added to the group. It depends on what you are trying to do. The logging tool is normally NOT needed, because most services of this sort don't need to preserve state across total failures. Instead, they tend to restart a failed server on some reasonable machine (perhaps picking it with the "network resource manager") and the new server does a pg_join and becomes part of the service group in that way. As for this being a client-server group: this would certainly be a group with clients. However, they might not be registered using the ISIS pg_client system call. For example, I know that Citibank is a DCE user, and for your setting it might be preferable to use a DCE RPC to talk to a server within the group, perhaps the one on your file server or otherwise picked to be in a sensible place, and this might use ISIS internally to maintain synchronized state wrt. other members of the service. So, you would see a standard DCE interface from the outside, but could use ISIS on the inside. Since ISIS can run under pthreads, this should work smoothly with no special problems relative to the DCE startup sequence... Generally, the case where we use the term "client-server group" and where you need to call pg_client in the client program is when the client will receive diffusion multicasts from the group, or will need to multicast atomically into the group, say if it was going to subdivide the search of a database among its members. Here, the pg_client call is necessary because the multicast will otherwise take an inefficient route, and over time you would prefer not to pay that inefficiency repeatedly. One annoyance with ISIS is that it offers a lot of options. In principle, you should be able to implement any architecture that appeals to you and that makes sense... > >My question concerns the effect of network partition on the server members of >the process group. Suppose we have a server process group running on four >nodes on a single ethernet segment. Each node has at least the Protos process >and the server process running. Each server process is responding to all rpc >calls made to the group. At some point the segment is split into two segments >of two nodes each. I assume that the Protos processes on each node will >detect the missing nodes and on the segment which now lacks the primary server >will promote one of the two remaining server processes to the role of primary. >The result of this will be two server process groups, each of which will >continue to operate. Actually, no. In this situation ISIS will only allow the partition with the most nodes (or the heaviest "weight", if you use the new "weight=" feature in the sites file), to continue execution. The nodes that are partitioned away will see a disconnect from the system, e.g. their isis_failed() procedures will be called, and will need to reconnect to the system later when the partition heals. This is illustrated in the manual. The point is that ISIS will never allow the same group to be duplicated by a partition. If you are looking at a setting where partition is a real risk, the solution is to run ISIS separately on each of the two (or more) chunks. Each ISIS can independently support an instance of your service, say a "wide-area database service", and with the wide-area tools you can then multicast to the set of groups. A simpler way to deal with a WAN, however, is to use the NEWS system as a wide-area communication tool, since it is easy to run NEWS in a WAN configuration. Processes in the various LAN systems would monitor the important WAN topics and, for example, update the local database using what they see. One can't do better: Skeen proved this a few years ago. There is a basic choice between waiting for the partition to heal and making progress during a partition, and the cost we pay for going the latter route is that we can't allow a group to straddle a partition. On the other hand, we could hide this better. In the future (1993 or 1994) we will introduce a new ISIS system that will have a more uniform interface regardless of whether you work in the LAN or WAN setting. This will at least look better, although it won't actually overcome Skeen's impossibility results. Databases have the same problem, by the way. > >At some point the segments will have to be re-united. What support is there >in ISIS for detecting the changes that may have gone to one but not both of >the server groups while the network was partitioned? What support is there >for merging the two server groups into a single server group? > Well, if you use my recommendation -- WAN news -- you have a very strong guarantee that when the partition merges, stacked up messages will be despooled and replayed into the local representative for each group. So, with WAN news, you know that important data always gets through eventually, in the order sent, and without duplication or other errors. You can also get the equivalent behavior at a lower level, using the WAN interfaces to the spooler. On the other hand, if you are really in a LAN system and, say, we are looking at one machine that lost its connection to the local ether, the partitioned programs will have been out of touch and will look like newly created programs when they reconnect. They need to rejoin process groups, resubscribe to news topics, etc. In this setting, the program that was disconnected can save up things to play into the system (e.g. trades that your brokers and bankers entered while in disconnected mode) but can't be sure, without checking, what the outcome was on a trade that was in progress when the connection failed. News will save up and replay messages, under your control. But, since the process is treated like a new arrival, you don't see a guarantee that when the connection is restored everything that you would have seen gets replayed. One reason is sheer volume of data. I don't know what citibank is] like, but we are talking to places with 100's or 1000's of transactions per second, system wide, and a program partitioned away could have missed megabytes while disconnected. Isis doesn't have a good place to save all that stuff... if you ask news to do so, it will, but this makes it your decision to save it, and your problem to find space, and your responsibility to say how long to keep it... Hope this helps! Feel free to ask for clarification. I'm sure a lot of people are interested in issues like this. -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428