home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.aix
- Path: sparky!uunet!portal!kdenning
- From: kdenning@portal.hq.videocart.com (Karl Denninger)
- Subject: Re: AIX malloc and fault tolerance
- Message-ID: <1992Sep4.044125.861@portal.hq.videocart.com>
- Summary: I've been nailed by this here.
- Organization: VideOcart Inc.
- References: <1992Sep3.135156.9166@medtron.medtronic.com>
- Date: Fri, 4 Sep 1992 04:41:25 GMT
- Lines: 70
-
- In article <1992Sep3.135156.9166@medtron.medtronic.com> sh0001@israel (Scott Hansohn) writes:
- >In AIX, a nonzero return from malloc does not guarantee that the
- >memory requested has been allocated to the process. Neither does
- >a successful launch of a program containing a large static buffer
- >guarantee that the buffer may be fully usable.
- >
- > program: Hey malloc! I'd like a few megabytes please.
- > malloc: No problem, here you go -- it's at this address.
- > program: Thanks very much. I'll just go and fill this memory, and --
- > malloc: Just kidding! I didn't really give you the memory. And by
- > the way, you're going to die if you continue to use it. Ha!
- >
- >It turns out that when my program went to fill in the memory, it was sent
- >a SIGKILL by AIX. This was clearly a bug of some kind, so I reported it.
- >I sent a demo program that just malloc'd as big a buffer as it could get,
- >and then started zeroing a byte every 4k. When I started up a couple of
- >them, they all died. Some got bus errors, others were sent SIGKILL.
- >
- >IBM said that they don't allocate the paging space until it's needed, in
- >order to accommodate programs which ask for large amounts of memory that
- >they never use (some nonsense about sparse arrays).
- >
- >I have questions:
- > 1) Has anyone seen a system where static memory may not really be
- > there, or where a nonzero malloc doesn't guarantee the successful
- > usage of the memory?
-
- Yeah, on AIX :-)
-
- > 2) Has anyone heard of SIGDANGER before?
-
- You bet. I've had entire PRODUCTION SYSTEMS come crashing down because of
- this. Believe it or not, I have had >curses< programs get SIGDANGER when
- the machine was heavily loaded. The result is a nice program crash.
-
- Further, I have had processes receive SIGDANGER when only one of two paging
- spaces on the system was close to filling. It seems that if >any< page
- space on an AIX box gets close to being full you get the SIGDANGER, even if
- there are other paging spaces with lots of room left. Yikes! So much for
- the performance advantages of spreading the page space across spindles!
-
- > 3) Read the POSIX standard for malloc from a "legal" standpoint.
- > If IBM claims POSIX compliance, can I use this as a weapon?
-
- Probably ;-)
-
- > 4) Even if I use this malloc wrapper everywhere in my own code,
- > how do I deal with third-party code I purchase that calls the
- > unwrapped malloc?
-
- You don't, other than to allow it to die. Oh, you had something >important<
- in that program going on, like perhaps a financial transaction? Too bad --
- that SIGKILL you just received can't be caught! So much for reliable
- software.
-
- This is one of the reasons I hate AIX. There are lots of them, but this is
- definately one of the top 5. When malloc() returns non-NULL, you are supposed
- to have the space available >period<. Same is true for static arrays -- I
- typically will declare these for things I >must< be able to get at and can't
- afford a NULL malloc() return for. It is quite a surprise to get SIGDANGER
- or SIGKILL when you don't expect it, and have no way to deal with it.
- Oh, you mean that large static array I have declared really >can't< be
- used, and you won't tell me ahead of time?! Oh, that array is declared in a
- library (like internal to Curses)? Now what the hell do I do about it?
-
- This is a >big< problem on heavily-loaded machines.
-
- --
- Karl Denninger Inet: kdenning@hq.videocart.com
- VideOcart Inc. Voice: (312) 987-5022
-