NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / realtime / 1050 < prev next >

Wrap

Internet Message Format | 1992-09-03 | 2.7 KB

Path: sparky!uunet!munnari.oz.au!nutmeg!pjm Newsgroups: comp.realtime Subject: Re: Where to stick the watchdog? Message-ID: <1992Sep4.085016.3717@darwin.ntu.edu.au> From: pjm@nutmeg.cs.ntu.edu.au (Phil Maker) Date: 4 Sep 92 08:50:16 +0900 Distribution: world Organization: Computer Science, Northern Territory University, Australia Keywords: Watchdogs Nntp-Posting-Host: nutmeg.ntu.edu.au Originator: pjm@nutmegLines: 67 Lines: 67 > Recently I had an argument (discussion) with some co-workers regarding where > is an appropriate place to kick a watchdog timer inside an application program. > > Their side was 'stick it in the lowest level routine in a non-kernal based > application, or stick it in the null task in a real-time kernal.' > > I have problems with either solution. > [ Lots of reasonable problems deleted ] My suggestion for kicking watchdogs would be a little more complicated. (And I don't like watchdogs anyway but ...(And I don't like complicated solutions but ..)). First lets determine what errors we would like to catch in the best of all possible worlds. * Processor interrupts disabled or kernel crashed so that some processes cannot be executed. Use a low priority timer interrupt to start a low priority process which actually kicks the dog. (This provides a better test of the kernel and general health of the system). Of course timing deadlines come in to this but they are easy to solve aren't they. (:-)) * Code or data store trashed by cosmic rays, software errors and hardware features. Have a self test process that checksums the code every whatever pico seconds. The data space can have its integrity checked by having a data OK function provided by each module (or class). This is very useful for testing and debugging and if you have the space in the product you could actually check some of the data structures out on the fly to see that they meet the designers beliefs. * Checking that the system is actually working, i.e. making progress towards the goals of the system. For example every hard real time event put a counter in the system. If these do NOT increase at the appropriate frequency then something is wrong. The test for these increases would done in the watchdog process. Intermittent events require the use of flags etc to determine if progress should be increasing but the method still applies. Final note: any rational engineer would of course implement a small subset of the above. Error detection and correction is one of the most difficult areas of real time systems. The best approach is to avoid the errors in the first place and have outside safety systems. (Not reliablity)., Phil Maker N.T. University Darwin, Australia.