home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!pacbell.com!mips!sdd.hp.com!uakari.primate.wisc.edu!ames!riacs!pioneer.arc.nasa.gov!lamaster
- From: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster)
- Newsgroups: comp.unix.aix
- Subject: Re: runaway processes (was Re: Vi is still broken)
- Message-ID: <1992Aug19.163218.11786@riacs.edu>
- Date: 19 Aug 92 16:32:18 GMT
- References: <1992Aug17.163739.29534@APS.Atex.Kodak.COM> <133789@lll-winken.LLNL.GOV> <1992Aug19.132117.5939@msc.cornell.edu> <1992Aug19.145808.2342@murdoch.acc.Virginia.EDU>
- Sender: news@riacs.edu
- Organization: RIACS, NASA Ames Research Center
- Lines: 65
-
- In article <1992Aug19.145808.2342@murdoch.acc.Virginia.EDU>, scl@sasha.acc.Virginia.EDU (Steve Losen) writes:
- |> [ complaints about runaway processes eating up cpu time ]
- |>
- |> In article <1992Aug19.132117.5939@msc.cornell.edu>,
- |> |>
- |> |> Yup. We see this kind of thing frequently. Since we are about to start
- |> |> chargeback accounting, this will be a severe pain.
- |> |>
- |> |> Our most recent method of producing the problem is to close a window in
- |> |> which a process which is on the receiving end of a pipe is running.
- |> |>
- |> |> This has been going on for years but has lately become intolerable. I
- |> |> must now try to come up with a reproducible example and phone up IBM
- |> |> to see if it's a 'supported defect' :-}
- |>
- |> We've had this problem since day one way back at AIX 3.1. No AIX upgrade
- |> has fixed it yet. In our case, the runaways are all interactive jobs such
- |> as editors, mail readers, news readers, etc., and they all seem to happen
- |> when a telnet session ends abnormally.
-
-
- I assumed that this was a problem since way back, probably in 4.2 BSD.
- I have seen it for years. Mail readers seem especially
- vulnerable. I have experienced the same or a similar problem on SunOS and
- Ultrix as well, including very recently (SunOS 4.1.1 and Ultrix 4.2).
-
- |>
- |> Very early on I wrote a perl script that runs "ps caux" every few minutes
- |> and looks for runaways. I have a "hit list" of interactive commands that
- |> are known to runaway, including vi, jove, more, less, telnetd, rlogind,
- |> mail, mush, etc. The script kills off any of these commands if ps
- |> indicates that it is using over 9% of the cpu and has accumulated 2
- |> minutes of cpu time. I just pulled these heuristics out of thin air, but
- |> they have worked well on several loaded 540s and 550s.
- |>
- |> Sure beats getting called up several times a day to kill these things off.
- |>
- |> I would post the perl script, but it has grown very large because it does
- |> a whole lot of other stuff such as renicing long running cpu burners,
- |> detecting when a user is running >1 cpu burner at a time, etc. Also, I
- |> will have to fix the script to run under 3.2. IBM has changed the output
- |> format of ps. Thankfully the new format is easier to parse. Under 3.1.5,
- |> some of the fields can run together. I think under 3.2, you are always
- |> guaranteed at least one space of separation.
-
- This sounds useful. If you are feeling generous, you might post it to
- comp.sources.something one of these days. A systematic, generic problem
- like this ought to be cleaned up by the vendors if they expect to be able
- to market clusters of Unix boxes as alternatives to mainframes. In a
- production shop you can't very well just go killing off processes which
- happen to exceed the confines of a heuristic, but for now, it could be
- very useful.
-
-
- |>
- |> --
- |> Steve Losen scl@virginia.edu
- |>
- |> University of Virginia Academic Computing Center
-
- --
- Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster
- NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov
- Moffett Field, CA 94035-1000 Or: lamaster@george.arc.nasa.gov
- Phone: 415/604-1056 #include <usenet/std_disclaimer.h>
-