home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cis.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek
- From: torek@horse.ee.lbl.gov (Chris Torek)
- Newsgroups: comp.unix.wizards
- Subject: Re: Changing the owner of a process
- Date: 24 Nov 1992 03:20:20 GMT
- Organization: Lawrence Berkeley Laboratory, Berkeley
- Lines: 155
- Message-ID: <27630@dog.ee.lbl.gov>
- References: <1992Nov21.022833.24351@exlog.com>
- Reply-To: torek@horse.ee.lbl.gov (Chris Torek)
- NNTP-Posting-Host: 128.3.112.15
- Keywords: process ownership
-
- In article <1992Nov21.022833.24351@exlog.com> mcdowell@exlogcorp.exlog.com
- (Steve McDowell) writes:
- >Calm down Chris, there's no need for a personal exchange here. You don't
- >know me or what my background is, so be very careful before you presume
- >the wrong things.
-
- True (although the `exlog.com' is a bit of a giveaway :-) ). Perhaps I
- was overly grouchy and irritable---moving tends to do that.... I could
- have included a smiley or something.
-
- >>... I will not try to tell you how to sell commercial systems.
-
- >But you don't understand, I wish you would -- we really do need the help....
-
- Well, if you insist :-)
-
- >Or, in the case that my original article was in response to, you have a user
- >who fancies himself a "system programmer". I'm a big believer in the
- >condom approach to end-user computing -- let a user stroke himself all
- >he wants to but don't let his results impregnate the integrety of my
- >operating system.
-
- One can easily take this too far. For instance, it is possible on many
- (most? all?) Unix machines to open /dev/mem and alter the actual kernel
- code. (I did this once on a live multi-user VAX, to patch a bug in the
- old 4.1BSD CMU IPC. Quite an experience. :-) ) If someone replaces
- the kernel code, recovery is difficult---maybe not impossible (take a
- look at the techniques people use in CORE WARS), but certainly not
- worthwhile to everyone.
-
- >... If there's a situation that can be recovered from, then recover.
-
- This is a *very* hard problem in general. Some researchers argue that
- fault tolerance is in fact *too* hard, and that a `reboot and recover
- work-in-progress' approach is more tractable.
-
- What we do in practise (in both research and, in my previous
- incarnation at Univ. of MD, systems support) is try to anticipate
- `likely failures' and insert recovery code for these. In this case
- (per-UID process counts), the failure is unlikely, and the recovery
- code will mostly be `dead weight'. The benefit just does not match
- the cost. Note that costs and benefits differ for others; people
- who buy Tandem systems to run banks are willing to spend the extra
- money for dual or triple redundant systems, and willing to take
- speed reduction factors of 2 or more, to achieve higher reliability.
-
- Other things, like the `timeout table overflow' panic, are symptoms
- not of *errors* but of *overload*, and in this case I would be happy
- to replace it with something more effective (if I could just think
- of something simple and reliable...).
-
- There are about 140 panic's in the 4.4-alpha `kern' directory right
- now. Of these, I only see two or three that I think should `never' be
- there; the rest represent sanity checks that make sure the rest of the
- kernel is behaving in the manner expected. Errors in device drivers,
- file systems, and so forth can trigger them, but presumably those
- installing such drivers or file systems will prefer to debug these.
- Note that many of these are under `#ifdef DIAGNOSTIC': if you believe
- the system works, you can simply turn them off entirely. (This is not
- the same as `handling the situation'!)
-
- >Of course, for your purposes developing an operating system is a means
- >unto itself. You don't have to worry about irrelevant things like
- >"customers" and "applications".
-
- True to some extent, anyway. Our systems have to be reliable enough
- to keep us working, but not so reliable as to put us out of a job. :-)
-
- Just for fun, let us consider several panic's and possible alternatives,
- all from /sys/kern/kern_descrip.c.
-
- dup2(p, uap, retval)
- ...
- if (new >= fdp->fd_nfiles) {
- if (error = fdalloc(p, new, &i))
- return (error);
- if (new != i)
- panic("dup2: fdalloc");
- } else if ...
-
- Now, fdalloc's job is to allocate the first free file descriptor
- greater than or equal to the given one (arg 2; i holds the result).
- fdp->fd_nfiles is the limit on valid descriptors: all are in
- [0..fd_nfiles). Since new >= fd_nfiles, we must (by definition) get
- back the desired descriptor. If new != i, what went wrong?
-
- - maybe fd_nfiles is wrong.
- - maybe fdalloc() is broken.
- - maybe someone snuck in and allocated a descriptor while we were
- not watching (i.e., there is a race).
-
- Each of these appears to deserve different treatment. If fd_nfiles was
- wrong, just fix it and continue. If fdalloc() is broken, there is not
- much we can do here; we could return an error (EINTERNAL) to say that
- the system is broken, but we will never be able to get anything done
- that way. If there is a race, we could try again and maybe win the
- race---but we have to be careful not to loop forever in this case.
-
- fstat()
- ...
- switch (fp->f_type) {
- ...
- default:
- panic("fstat");
- }
-
- Since the only known `type's are vnodes and sockets, something is
- definitely wrong. Perhaps someone overwrote the original type. In
- this case we really have no way to recover it. I would argue that
- the right cure for this panic is to remove the distinctions between
- vnodes and sockets (this is now possible with the VFS, or would be
- with a few minor tweaks), so that instead of switching on the type,
- we just call through the appropriate pointer:
-
- struct vnode *vp;
- vp = f->f_vp;
- vp->v_ops->vo_stat(p, vp, &ub);
-
- In this case, if any of the pointers are overwritten, we will probably
- just crash immediately; but at least we will have written less code. :-)
-
- closef()
- ...
- if (fp->f_count < 0)
- panic("closef: count < 0");
-
- Those who ran 4.2BSD on the VAX when it first came out have some
- experience here. This panic did not appear in the original code.
- Instead, the count actually did go negative, due to the ability to
- longjmp() out of a close when (e.g.) closing a tty with a SIGALRM
- pending. When that happened, the kernel would eventually trash a file
- system. The panic prevented that, and gave the clues needed to
- diagnose the problem. Since then I have not seen this panic occur;
- its purpose is to act as a `firewall'. Nonetheless, how would one
- fix this? The count could be recovered, but only by scanning all
- descriptor tables for all processes---and in any case it is probably
- too late: the underlying close routine may have been called already,
- so the network connection to a peer (if that is what this represents)
- is already gone, with no way to get it back.
-
- In all these cases, whenever a count could be wrong, the pointer that
- points to it could be wrong instead. How do we tell the difference?
- Should we test every pointer before following it, in case it would
- cause a trap? Or perhaps try to recover from within the trap handler?
- (This is *not* straightforward on many machines, and both methods have
- a high cost.)
-
- So, all in all, the 4.4BSD line generally panics when a consistency
- check fails, because that makes it debuggable. The system reboots in a
- few minutes (fsck is quicker than it was in 4.3BSD), and after saving
- the evidence (the vmunix and vmcore files) the machine is back up and
- running.
- --
- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
- Berkeley, CA Domain: torek@ee.lbl.gov
-