NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / sys / isis / 387 < prev next >

Wrap

Text File | 1993-01-23 | 4.4 KB | 88 lines

Newsgroups: comp.sys.isis Path: sparky!uunet!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!aplcen.apl.jhu.edu!ddsdx2.jhuapl.edu!jmp From: jmp@ddsdx2.jhuapl.edu (Jim Pierce) Subject: RE: Challenges? Message-ID: <1993Jan22.141440.14520@aplcen.apl.jhu.edu> Summary: Client groups are a helpful model Keywords: client groups Sender: news@aplcen.apl.jhu.edu (USENET News System) Organization: Johns Hopkins University Date: Fri, 22 Jan 93 14:14:40 GMT Lines: 75 This is posted in response to Ken Birman's request for challenges. Let's start with the Isis client-server model. (If I screw this up, Ken will let me know.) The idea is that the server is a collection of processes which perform some service and have been replicated for fault tolerance and/or load balancing purposes. How many of these processes there are is normally of no concern to the client and so it should be. A process can become a client of this group and request and receive services and never have to be aware of membership changes in the server group unless they all fail. If the client fails the server members will all cleanup any state they may have been keeping with respect to that client. Now to the problem we have been discussing with Ken. Suppose our server was coordinating the assignment of some resource. In one of our servers, the resource is a number (from a limited set of numbers) which the server is guaranteeing is system wide unique. The client requests a number and returns it when it is finished with the number. If the client fails, the server can recover the numbers assigned to that client. But wait a minute. Suppose that client is itself a replicated application (server) for fault tolerance and load leveling purposes. The failure of the client process that requested the number from the server does not mean that the number is available for reuse. This is because the other members of the client's group want to keep on going and are managing the use of the number within their aplication. Also, we want to be sure that the surviving members of the client's group know about any outstanding requests made by the failed member but not yet satisfied by the server. Before going on, I should mention that we have been thinking almost entirely in terms of asynchronous xbcasts. This makes the solution somewhat simpler conceptually from the client side in that if client members are shadowing each other's transactions with the server, the communication would be asynchronous for the shadows anyway. Back to the main thread. Suppose we could tell Isis that all the members of the client's group were a single logical client of the server. I view this as pg_client with a group address instead of a process address. When a member of the client group sends a message to the server group we would want it to be atomically sent to the other members of the client group as well. Now everybody knows everything. The server members would not care which member of the client group sent the message. When the server member(s) responsable for responding to the client's message send their response, it would go atomically to all the members of the client group as well as the server group. All members of both groups are always in the same logical state with respect to the services provided by the server group. Suppose a client member fails. A group view change would occur in the client group. Survivors would reallocate their work or whatever makes sense for their group. They would be assured that they had seen all server requests made by the failed member before it failed and that any outstanding responses would be forthcoming without the request being repeated. Also no responses made by the server would have been lost with the failed member so numbers do not get lost. As far as the server members, they don't get a group view change if there are surviving members of the client group. When the last member of a client group fails, then the servers could see a group view change showing GV_CLDEPARTED. On this event, all state related to the logical client could be cleaned up. In our example, all numbers assigned to that logical client could be recovered and reused. We also want it to be fast and cheap. Jim Pierce phone: (301) 953-6326 The Johns Hopkins University fax: (301) 953-6141 (unclass) Applied Physics Laboratory email: pierce@capsrv.jhuapl.edu Johns Hopkins Rd. Laurel, MD 20723