home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!munnari.oz.au!nutmeg!pjm
- Newsgroups: comp.realtime
- Subject: Re: Where to stick the watchdog?
- Message-ID: <1992Sep4.085016.3717@darwin.ntu.edu.au>
- From: pjm@nutmeg.cs.ntu.edu.au (Phil Maker)
- Date: 4 Sep 92 08:50:16 +0900
- Distribution: world
- Organization: Computer Science, Northern Territory University, Australia
- Keywords: Watchdogs
- Nntp-Posting-Host: nutmeg.ntu.edu.au
- Originator: pjm@nutmegLines: 67
- Lines: 67
-
- > Recently I had an argument (discussion) with some co-workers regarding where
- > is an appropriate place to kick a watchdog timer inside an application program.
- >
- > Their side was 'stick it in the lowest level routine in a non-kernal based
- > application, or stick it in the null task in a real-time kernal.'
- >
- > I have problems with either solution.
- > [ Lots of reasonable problems deleted ]
-
- My suggestion for kicking watchdogs would be a little more complicated.
- (And I don't like watchdogs anyway but ...(And I don't like complicated
- solutions but ..)).
-
- First lets determine what errors we would like to catch in the best of
- all possible worlds.
-
- * Processor interrupts disabled or kernel crashed so that some
- processes cannot be executed.
-
- Use a low priority timer interrupt to start a low priority
- process which actually kicks the dog. (This provides a better
- test of the kernel and general health of the system).
- Of course timing deadlines come in to this but they are
- easy to solve aren't they. (:-))
-
- * Code or data store trashed by cosmic rays, software errors and hardware
- features.
-
- Have a self test process that checksums the code every whatever
- pico seconds. The data space can have its integrity checked by
- having a data OK function provided by each module (or class).
- This is very useful for testing and debugging and if you have
- the space in the product you could actually check some
- of the data structures out on the fly to see that they
- meet the designers beliefs.
-
- * Checking that the system is actually working, i.e. making progress
- towards the goals of the system.
-
- For example every hard real time event put a counter in the
- system. If these do NOT increase at the appropriate
- frequency then something is wrong. The test for these
- increases would done in the watchdog process.
- Intermittent events require the use of flags etc to
- determine if progress should be increasing but the method
- still applies.
-
- Final note: any rational engineer would of course implement a small
- subset of the above. Error detection and correction is
- one of the most difficult areas of real time systems.
- The best approach is to avoid the errors in the first place
- and have outside safety systems. (Not reliablity).,
-
-
- Phil Maker
-
- N.T. University
- Darwin, Australia.
-
-
-
-
-
-
-
-
-
-