home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!news.claremont.edu!nntp-server.caltech.edu!SOL1.GPS.CALTECH.EDU!CARL
- From: carl@SOL1.GPS.CALTECH.EDU (Carl J Lydick)
- Newsgroups: comp.os.vms
- Subject: Re: HELP!!! Security problem for gurus. [Directories]
- Date: 10 Jan 1993 10:33:50 GMT
- Organization: HST Wide Field/Planetary Camera
- Lines: 90
- Distribution: world
- Message-ID: <1iou2eINN372@gap.caltech.edu>
- References: <1imh53INNg7a@gap.caltech.edu>,<208009@zl2tnm.gen.nz>
- Reply-To: carl@SOL1.GPS.CALTECH.EDU
- NNTP-Posting-Host: sol1.gps.caltech.edu
-
- In article <208009@zl2tnm.gen.nz>, don@zl2tnm.gen.nz (Don Stokes) writes:
- >Calm down, Carl, you're hurting my ears.
-
- Sorry about that.
-
- >What's very important here is that there is a big difference between
- >corrupt memory and a corrupt system disk. In the former case, you shut
- >down NOW and reboot. When you come up, the problem has almost certainly
- >gone away, at least until next time.
-
- Well, *THAT* problem has gone away, but a memory corruption can cause
- corruption of the data on disk.
-
- >If you have a live (if sick) system, you have a chance of finding out what
- >went wrong. That is *very* important -- the glitch might have taken out
- >more than a part of the system disk, and you need the error log and stuff
- >to diagnose it. You need to know what got stepped on so you can fix it.
-
- Agreed. That's one of the many uses of standalone BACKUP. Shut the system
- down *NOW*, before you run the chance of rendering the disk unreadable.
-
- >Most importantly, you ned the *choice* to be able to fix the problem as
- >soon as possible, or possibly effect a temporary repair to maintain
- >production.
-
- And how certain can you be that you've *REALLY* fixed the problem? Sure, you
- can track down and repair the symptoms, but do you really want to trust a
- system that's been running for some time with either the system disk or memory
- corrupted?
-
- >Sure, there are cases where you should shut down Right Now
- >and boot standalone backup, but in my experience (ignoring total failure,
- >in which case you don't have a choice) these are extremely rare. A knee-
- >jerk "something's wrong, let's die" is the *wrong* way to deal with
- >discovery of disk problems. There's not much you can do to diagnose a
- >disk problem from the >>> prompt, unless you feel like keying in disk
- >controller instructions by hand.
-
- True. I'm assuming that you've got at least two disks on the system. Shut
- down, do a standalone BACKUP of the non-system disk, restore it from your last
- known good image BACKUP of the system disk, and THEN try to diagnose the
- problems with the system disk.
-
- >It needs to be up to the system manager to decide how serious the problem
- >is.
-
- And if the system manager isn't there 24 hours a day, 7 days a week, 52 weeks a
- year? I once managed a 780 that had it's floating point processor fail. At
- the time that happened, I was involved with some stuff that didn't use floating
- point. It took a user who *WAS* doing lots of floating point work several
- hours and about a dozen crashes of his program with machine checks for him to
- decide to notify me of the problem.
-
- >It needs to be possible for the system manager to make that decision.
-
- For sufficiently minor problems, yes; once the problem gets severe enough, you
- don't want to risk having the corrupted system running any longer than it takes
- it to notice that it IS corrupted.
-
- >In some cases it can be fixed quickly and simply. Others are going to
- >require that you kick the users off the system and get on with fixing it.
-
- That's why not all BUGCHECKs are fatal. Some are continuable.
-
-
- >> It might also result in two people writing to the same blocks allocated to two
- >> separate files. Result: Neither file is valid. You mean to tell me that
- >> you've got customers who are so goddamned stupid they's stand for that sort of
- >> bullshit?
- >
- >No. I have customers who consider me competant enough to make informed
- >decisions to provide them with the best possible service, including data
- >integrity, security and system availablility. I would object quite strongly
- >to having that decision taken away from me by an overly paranoid exception
- >handler. To me, it's far better to have a good exception reporting
- >facility (and VMS has to have one of the better ones out there) that helps
- >me make that decision than it is to have the decision made for me.
-
- Fine. Then perhaps we're simply arguing about just how severe the problem must
- be before the system takes matters out of our hands. It might be useful if VMS
- had more than two classes of BUGCHECKS, and allowed the system manager to set a
- SYSGEN parameter that specified the lowest class that was considered fatal.
- --------------------------------------------------------------------------------
- Carl J Lydick | INTERnet: CARL@SOL1.GPS.CALTECH.EDU | NSI/HEPnet: SOL1::CARL
-
- Disclaimer: Hey, I understand VAXen and VMS. That's what I get paid for. My
- understanding of astronomy is purely at the amateur level (or below). So
- unless what I'm saying is directly related to VAX/VMS, don't hold me or my
- organization responsible for it. If it IS related to VAX/VMS, you can try to
- hold me responsible for it, but my organization had nothing to do with it.
-