home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.isis
- Path: sparky!uunet!caen!batcomputer!cornell!ken
- From: ken@cs.cornell.edu (Ken Birman)
- Subject: Re: join never returns
- Message-ID: <1992Nov12.012210.29358@cs.cornell.edu>
- Organization: Cornell Univ. CS Dept, Ithaca NY 14853
- References: <Af05oZO00hNSI1DYZ2@cs.cmu.edu>
- Date: Thu, 12 Nov 1992 01:22:10 GMT
- Lines: 40
-
- In article <Af05oZO00hNSI1DYZ2@cs.cmu.edu> Sean Levy <snl+@cs.cmu.edu> writes:
- (description of a join problem)
-
- I can see from your log that everything is piling up waiting for
- replies from one or two of your clients (e.g.: sent to xxxx, status W
- means "waiting for a reply from xxxx"). But, lacking logs from
- xxxx I don't know why.
-
- Some random ideas: if TCP channel breakage is not always working
- right on your systems (and this is a common thing we see on SUN
- systems, for example), then if isis_probe isn't set you might have
- Isis fail to notice that xxxx is dead and so hang. But, I bet that
- this is not the problem. V2.2.7 and V3.0.7, at least, would not have
- such a problem.
-
- Some evidence that your TCP is having trouble is the failure to restart
- after shutting down: seems that UNIX is not deallocating the TCP
- data structure in the kernel and hence Isis can't reopen it.
-
- SUN has problems in this part of TCP in one of their releases a while
- back. If you are on ISIS V2.2.5 on a SUN 4.1.1c platform, for example,
- this could explain it. But, later releases of SUN OS and also of ISIS
- (either of them) would probably not have this problem.
-
- Another idea: if you try and join multiple groups, say that p joins
- A and then B and q joins B and then A. If they don't call isis_start_done
- FIRST, then they can deadlock because p needs to help q on its join
- and vice versa. Would only see this for "concurrent" join situations.
- This could explain why adding some extra groups caused the problem --maybe
- you did so in a way that introduced a cyclic join pattern?
-
- Did you find client-created xxxx.log files after your snapshot? When
- you see protos log files that show people waiting for certain programs
- to take an action, the next step is to have a close look at the state
- of those programs...
-
- --
- Kenneth P. Birman E-mail: ken@cs.cornell.edu
- 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office)
- Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
-