home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!europa.eng.gtefsd.com!emory!swrinde!sdd.hp.com!hplabs!ucbvax!U.WASHINGTON.EDU!DEREK
- From: DEREK@U.WASHINGTON.EDU
- Newsgroups: comp.os.vms
- Subject: Re: Warning: bug check messages
- Message-ID: <8CDED0373BBF23A12E@MAX.U.WASHINGTON.EDU>
- Date: 27 Jan 93 00:41:00 GMT
- Sender: usenet@ucbvax.BERKELEY.EDU
- Organization: The Internet
- Lines: 106
-
- > We are running VMS 5.5-2 on a VAX 6610. When a privileged user,
- >inside mail, says
- >
- >Mail>show forwarding/user=*
- >
- >we get the correct response listing the approximately 30 users who have
- >set forwarding. But the execution of the instruction causes bug check
- >mail messages from VAXsimPLUS; apparently it has something to do with
- >generating stuff coming across the network.
- > We reported this as a hardware problem; after dialing in they
- >asked us to call it in as a software problem. Software has never seen
- >this. The only way they could solve it would be to change a sysgen
- >parameter which allows a crash to occur from non-fatal errors; we find
- >this to be unacceptable on our busy machine.
- > Has anyone had any experience with this?
- >Brendan Welch, UMass/Lowell, W1LPG, welchb@woods.ulowell.edu
-
- YES! I have!
-
- QUICKLY! Although you *could* set the SYSGEN parameter BUGCHECKFATAL to 1,
- that will, as Carl points out, force all "continuable" bug checks to
- produce a system crash. In this case, that may NOT be desired.
-
- As for the statement: "Software has never seen this." RUBBISH! :)
- I TOLD them about it, nearly two years ago!
-
- The source of your problem is, I believe, is one (or more) corrupt record(s)
- in the file VMSMAIL_PROFILE.DATA. To fix it, I would first determine how
- many records are corrupt. (A description of the kind of corruption I
- mean will be later in this posting.)
-
- Let's see, you could start by producing a list of the users who have mail
- profile records. (I know, this can be large.) Then, execute the SHOW
- command once for each user in your list. The ones which produce bugchecks
- need to be repaired.
-
- The simplest "repair" is to REMOVE their record. Not nice, but...
-
- Now for some (humerous) history. (At least, I can laugh about it *now*.)
-
- As I recall, some version of VMS (was it 5.2?) shipped with BUGCHECKFATAL
- set to 1. (We had it for field testing, but I think it stayed that way
- after we got the "real" release.) Anyway, this made the problem VERY
- obvious to us. See, some of our users were on BITNET mailing lists. It
- happened that the MAIL profile records for some one of them got corrupted.
- When mail was sent to some of these users, the system would crash. Now,
- we use this wonderful e-mail software package, called PMDF (produced by
- Innosoft International, Inc.) which is started "authomatically" by the
- system startup procedure. Guess what happened when PMDF got a file for
- one of these "corrupt users"?
-
- Well, OBVIOUSLY the system kept crashed every time PMDF started. The
- thing you were *supposed* to guess was that I got woken up in the dead
- of night because the operators couldn't get the system to boot! :)
-
- (NOTE: It is NOT my intent to disparage PMDF and/or Innosoft in ANY way
- whatsoever. PMDF is a superior product, and Ned, Kevin, etc. are
- very nice folks!)
-
- To get the system up, of course, I did a conversational boot and set
- BUGCHECKFATAL to 0. This worked -- for a while. Someone else ran
- AUTOGEN, and it decided to reset BUGCHECKFATAL back to 1! Yes, I got
- another call in the wee hours of the morning.
-
- Now, back to the obligatory hack, er technical content. :)
- The corruption. How to describe this. The corrupted records we saw
- were corrupted in that they were missing a single byte. The profile
- records are variable length, maximum 2048 bytes. The notion, apparently,
- is to make the records as short as possible. Thus, if you don't have
- a "personal name", no space is wasted in the file to store that fact.
-
- Now, this is off the top of my head, and someone else doubtless knows
- the format better than I, but basically the first 31 bytes are the username.
- After this, there are "item blocks" of information. These blocks look
- something like:
-
- [item-code][item-length][item...]
- 2 bytes 2 bytes n bytes
-
- Now, what happened to us is that some of the records were missing bytes
- in an item length field. (I think that was the one.) Anyway, this caused
- MAIL to mis-parse the information. If the data you were trying to change
- was in the record AFTER the corruption, the system would crash. If it
- was BEFORE the corruption, you wouldn't notice any problems.
-
- Now, for the sake fo clarity and completeness, the system would only crash
- if BUGCHECKFATAL was set to 1. If BUGCHECKFATAL is set to 0, it is a
- continuable bugcheck.
-
- There are OTHER ways to notice the problem. Users may complain that MAIL
- won't remember their SETtings. You may recall a thread on this topic from
- last year. For example, the user may be able to set a personal name, but
- it is "lost" when they exit MAIL. And, yes, this CAN produce the onerous
- incorrect new mail count problem. :(
-
- Did I leave anything out? Contact me if you need more information/help.
-
- -Derek S. Haining
- University Computing Services
- University of Washington
- Seattle, Washington 98195
- (206) 543-5579
-
- DEREK@MAX.BITNET
- DEREK@MAX.U.WASHINGTON.EDU
-
-