NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / sys / isis / 385 < prev next >

Wrap

Text File | 1993-01-22 | 6.0 KB | 109 lines

Newsgroups: comp.sys.isis Path: sparky!uunet!srvr1.engin.umich.edu!batcomputer!cornell!ken From: ken@cs.cornell.edu (Ken Birman) Subject: Lazy replication Message-ID: <1993Jan22.165709.28909@cs.cornell.edu> Organization: Cornell Univ. CS Dept, Ithaca NY 14853 Date: Fri, 22 Jan 1993 16:57:09 GMT Lines: 99 I've gotten a few inquiries about the comparison with Isis in a recent TOCS paper by Rivka Ladin and some others, titled Providing High Availability using Lazy Replication. The paper came out in ACM TOCS in Nov. 1992, which was just sent out. This is a nice paper and describes some very interesting work, but the comparison with Isis is slightly inaccurate in some ways. Also, I think the paper uses unusually strong language in some of these comparisons, but thats their decision; they wrote it. (By the way, I had no role in handling or reviewing this paper at TOCS). - The paper says that that Isis sends more messages than the LR scheme when a client talks to a group. Actually, this seems to depend on the value of a parameter called K in their scheme, and the replication approach you use in our case. Its true that in Isis we normally replicate data among all the members of a group, so that each member has a valid local copy, and in the LR scheme you don't do this. But, if you did, the costs of the two methods would then be identical. So, the real point is not so much that one scheme is fundamentally more expensive than the other, but rather that we make a design choice in the way we replicate data for Isis, which they make differently in the LR approach. - Timestamp overhead: well, yes, if you want multi-group causality, which LR doesn't guarantee, we do this, or we delay when switching from group to group. But, we also compress timestamps (in Horus), which makes them quite small, and use other tricks to try to avoid putting timestamps in messages unless we need to. The evidence is that timestamp overhead can be kept to something very minor. So, again, the real point seems to be a design choice: Isis, by default, provides an ordering property that LR doesn't provide, and this costs something, although it also gives you something. If you don't want Isis to provide multigroup causality you can tell it so and the cost goes away -- in Isis V3.0 this is the "O" option, and in Horus, you will do it with something called a causal domain. In the case where Isis is asked to act like LR, I think the overhead is roughly the same. - The observation about delays during view changes is correct; Isis V3.0 does delay in this way. Talking to the Transis people (Yair Amir) it became clear to us that we don't need to delay at this stage of our flush protocol -- the correctness proof remains valid even if we just leave out the delay(!), and Robbert van Renesse coded the Horus implementation of the flush to use this trick. Good idea on their part, and imitation is the sincerest flattery. So: good point, we fixed our protocol to do the same thing when Yair Amir suggested this. I was not aware that LR was doing this too, but it makes a lot of sense. A shame that Pat and Andre and I didn't notice that we weren't making any real use of the delay in the proof of correctness... - Network partitions: this is a bit more complicated. The LR scheme logs updates and they have a notion of group member in which the same process may come back after a crash and you resume talking to it by replaying logged messages. Isis, on the other hand, makes such a process "rejoin" the group and basically forces it to restore its state from scratch. With the Isis spooling tool, you could get the same replay behavior if you like. We don't normally do this, though. If a partition is a real risk, we prefer to use the long-haul tool over the flakey link and run Isis separately on both sides. A design preference that makes life simpler for us. Anyhow, LR and Psync both have this spooling scheme, so yes, they both "tolerate" partitions, in a sense. But, none of the three systems actually can allow updates in an unrestricted way while the partition is present, and in fact, you can build exactly the same application in any of these schemes, Isis, LR, or Psync. I can't see why the performance would be expected to be different, although the issue here is one of engineering -- how well our logging scheme works, etc. I assume that any of these systems would be disk-IO limited in this mode. To summarize: the LR paper is a very nice paper and describes a good piece of work, and our work has been influenced in several ways by this work. But, I doubt that either system is particularly faster than the other. I think the contrast is more interesting at the level of design choices that were made, and decisions about what the default behaviors of the systems should be -- these are quite different, and they do have performance, functionality, and user-transparency implications. We have tended to err on the side of making Isis more costly but simpler to use. If people are routinely forced to turn off Isis ordering because of this cost, perhaps the LR design point would make more sense. However, I am not aware of any wide-spread trend along these lines. Frankly, I think the main lesson I have learned in 3 years of struggling with speed issues in Isis is that very little about performance has anything to do with which protocol you use. The real problems are flow control, where you run relative to UNIX and to the devices, memory management, when you send acks, when you retransmit dups, whether UNIX is losing packets... all that grungy stuff. At least, thats my insight. Not a very deep insight, unfortunately! -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428