home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!nntp1.radiomail.net!cronos!Metaphor.COM!polk
- From: polk@Metaphor.COM (Ben Polk)
- Newsgroups: comp.unix.aix
- Subject: Re: Vanishing processes
- Message-ID: <2479@cronos.metaphor.com>
- Date: 31 Aug 92 18:42:31 GMT
- References: <2457@cronos.metaphor.com> <1992Aug24.160419.17269@awdprime.austin.ibm.com>
- Sender: news@cronos.metaphor.com
- Reply-To: polk@Metaphor.COM (Ben Polk)
- Organization: m4
- Lines: 38
-
- In article <1992Aug24.160419.17269@awdprime.austin.ibm.com>, curt@ekhadafi.austin.ibm.com (Curt Finch 903 2F021 curt@aixwiz.austin.ibm.com 512-838-2806) writes:
- |> In article <2457@cronos.metaphor.com> polk@Metaphor.COM (Ben Polk) writes:
- |>
- |> >I posted recently with a description of a problem where at least three
- |> >different processes running on my machine are terminated by AIX. We
- |> >have verified that a Signal 9 is being sent by the kernel, and that
- |>
- |> the only way the kernel ever does this 2 u as far as i know is if
- |> your program mallocs lots of space it never frees, uses up all the
- |> paging space and then the kernel kills it to free up that space.
- |>
-
- Nope. There apears to be a bug either in the kernel or in one of the
- device drivers that causes this problem. We are working with AIX support
- to try to determine what exactly is going on.
-
- I made two changes to one of the processes that were having this problem,
- and it seems to have gone away:
-
- 1. Increased the size of the signal stack from 10k to 100k.
- 2. Fixed a place in my code where I was passing random stack data as the bit mask
- for a select() system call. This means that bits were set for fd's that
- I really wasn't interested in. This should not cause signal 9 to be delivered
- to you, but then if computers always did what we wanted and expected, we
- wouldn't need this newsgroup.
-
- My belief is that it is the latter that corrected the problem, based on the fact
- that the signal 9 was coming back out while the process was in the select() system
- call, and that the select() returned an ENODEV status. (Yes, you do return from
- a system call even if there is a signal 9 pending on your process.)
-
- It's definitely not a paging space problem. And while they may tell you that
- the only reason processes can vanish is because of paging space problems, or
- that they called exit(), or that you sent a signal with kill(), DON'T BELIEVE
- THEM.
-
- Ben Polk
- polk@metaphor.com
-