home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.unix.pc-clone.32bit:1020 biz.sco.general:5066
- Newsgroups: comp.unix.pc-clone.32bit,comp.unix.i386,biz.sco.general
- Path: sparky!uunet!mcsun!news.funet.fi!funic!nntp.hut.fi!nntp!Petri.Wessman
- From: Petri.Wessman@hut.fi (Petri Wessman)
- Subject: [SCO] execvp(2) seems to fail on shell scripts!
- Message-ID: <PETRI.WESSMAN.93Jan11182858@lk-hp-21.hut.fi>
- Sender: usenet@nntp.hut.fi (Usenet pseudouser id)
- Nntp-Posting-Host: lk-hp-21.hut.fi
- Reply-To: Petri.Wessman@inter.fi
- Organization: Inter Marketing Oy, Finland
- Distribution: comp
- Date: 11 Jan 93 18:28:58
- Lines: 51
-
- We've encountered a strange phenomenon with SCO Unix 3.2.4. We have an
- init-like program that keeps our other customer software running, and
- now it is failing mysteriously. The problem occurs only on SCO, not
- on AIX 3.1 or NCR Unix, our other current supported platforms.
-
- The program basicly does a fork + exec for each program it is
- monitoring, and blocks in wait(). If a child dies, it is restarted.
- The programs that it starts are /bin/sh scripts, and this seems to be
- the root of the problem. Things work fine for a while, but after an
- unknown interval of time (usually the next day), *something* goes
- wrong. After this point, it can't seem to start a single one of the
- programs. Exec() works fine (doesn't return error), but the script is
- never executed (we've tested N+1 variations) and wait() gets an
- "exited with status 0" for the child that was "started".
-
- Here's the strange(r) part: when this has happened, if we replace one
- of the scripts with a binary executable, it starts up fine! We tried a
- binary that just printed its arguments and env and then slept, and
- everything looked fine... the argument's weren't mangled or anything,
- which was our initial suspicion. When we put the shell script back,
- glich time again.
-
- It looks like a kernel bug, but I truly don't know for certain. What I *do*
- know is:
-
- a) The system works fine for a while, and then goes into this
- crazy state. If the init-clone is killed and restarted,
- everything is ok again (for a while).
-
- b) The exec arguments seem to be fine (and if they weren't exec()
- would/should return an error code). This never happens, exec()
- always seems to work ok. We use execvp(), by the way.
-
- c) The script that was "executed" by exec() returns at once with
- exit status 0. No commands whatsoever are run from the script
- itself.
-
- d) This occurs only on SCO (both 3.2.2 and 3.2.4). Nothing like
- this has been encountered on other platforms.
-
- e) The problem occurs only for shell scripts. Binary executables
- work fine.
-
- f) This is annoying *as hell*! :-( We're forced to restart
- programs by hand via a modem link for customers.
-
-
- Any and all help would be appreciated! Hasn't anyone encountered this
- before?
-
- //Petri
-