home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.apollo
- Path: sparky!uunet!utcsri!helios.physics.utoronto.ca!alchemy.chem.utoronto.ca!system
- From: system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson))
- Subject: Re: Problems with 10.4
- Message-ID: <1992Jul23.144508.4401@alchemy.chem.utoronto.ca>
- Organization: University of Toronto Chemistry Department
- References: <14kv48INNh5r@agate.berkeley.edu>
- Date: Thu, 23 Jul 1992 14:45:08 GMT
- Lines: 36
-
- In article <14kv48INNh5r@agate.berkeley.edu> alanc@ocf.berkeley.edu writes:
- >We recently finished upgrading our cluster of 16 machines (2 DN 4500's + 14
- >3500's) to 10.4 - and since then we've been having several problems:
- >
- >- csh/tcsh will go into a "can't find any commands" mood, from which the only
- > way to do anything is exec /bin/ksh. (It seems to be linked to the
- > tty's as some tty's have the problem & others don't...rebuilding the
- > tty's "cures" the problem for a few days, but then it comes back)
-
- You don't happen to be running NFS on these systems, do you?
- Since I moved our home directories to a HP-UX system, our DN2500 systems
- have become so flaky when doing basic things like 'rn' that they are
- useless; they can not reliably spawn subshells seems to be the problem
- (since my .cshrc file has to be accessed by NFS).
-
- >- processes aren't dying properly. For example a user telnet's in, and
- > starts reading news. The telnet will die, but it's child csh and the
- > trn will keep going. In fact, the last process in the chain will
- > start using huge amounts of processor time. The only solution we've
- > found for this so far is to check the process table every 10 minutes
- > and kill -HUP every process who's unix_PPID is 1 (and is not owned
- > by root/daemon/user/etc.) kill(-9, -1) is also failing at times.
-
- This one is caused by improper handling of signals in telnetd and
- rlogind (and ftpd and ...) - this was changed at SR10.4 so that
- the members of a process group (which basically will correspond
- to a login session) are not signalled when the parent of the group
- dies. The child processes then start eating cpu (reason unknown).
-
- I suggest you call HP if you are on support; we called this in back
- in April (Call # A1975057, A1975055, Escalation issue EPIC 1801).
- We are testing a fixed rlogind for the DN10000, but no fixes for
- telnetd, or the M680x0 nodes, after more than 2 months.
- --
- What are the chances that any computer system will ever "work" properly?
- ... and Slim just left town. -*- Mike Peterson, SysAdmin, U/Toronto Chemistry
-