NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / os / vms / 20692 < prev next >

Wrap

Internet Message Format | 1993-01-10 | 5.1 KB

Path: sparky!uunet!news.claremont.edu!nntp-server.caltech.edu!SOL1.GPS.CALTECH.EDU!CARL From: carl@SOL1.GPS.CALTECH.EDU (Carl J Lydick) Newsgroups: comp.os.vms Subject: Re: HELP!!! Security problem for gurus. [Directories] Date: 10 Jan 1993 10:33:50 GMT Organization: HST Wide Field/Planetary Camera Lines: 90 Distribution: world Message-ID: <1iou2eINN372@gap.caltech.edu> References: <1imh53INNg7a@gap.caltech.edu>,<208009@zl2tnm.gen.nz> Reply-To: carl@SOL1.GPS.CALTECH.EDU NNTP-Posting-Host: sol1.gps.caltech.edu In article <208009@zl2tnm.gen.nz>, don@zl2tnm.gen.nz (Don Stokes) writes: >Calm down, Carl, you're hurting my ears. Sorry about that. >What's very important here is that there is a big difference between >corrupt memory and a corrupt system disk. In the former case, you shut >down NOW and reboot. When you come up, the problem has almost certainly >gone away, at least until next time. Well, *THAT* problem has gone away, but a memory corruption can cause corruption of the data on disk. >If you have a live (if sick) system, you have a chance of finding out what >went wrong. That is *very* important -- the glitch might have taken out >more than a part of the system disk, and you need the error log and stuff >to diagnose it. You need to know what got stepped on so you can fix it. Agreed. That's one of the many uses of standalone BACKUP. Shut the system down *NOW*, before you run the chance of rendering the disk unreadable. >Most importantly, you ned the *choice* to be able to fix the problem as >soon as possible, or possibly effect a temporary repair to maintain >production. And how certain can you be that you've *REALLY* fixed the problem? Sure, you can track down and repair the symptoms, but do you really want to trust a system that's been running for some time with either the system disk or memory corrupted? >Sure, there are cases where you should shut down Right Now >and boot standalone backup, but in my experience (ignoring total failure, >in which case you don't have a choice) these are extremely rare. A knee- >jerk "something's wrong, let's die" is the *wrong* way to deal with >discovery of disk problems. There's not much you can do to diagnose a >disk problem from the >>> prompt, unless you feel like keying in disk >controller instructions by hand. True. I'm assuming that you've got at least two disks on the system. Shut down, do a standalone BACKUP of the non-system disk, restore it from your last known good image BACKUP of the system disk, and THEN try to diagnose the problems with the system disk. >It needs to be up to the system manager to decide how serious the problem >is. And if the system manager isn't there 24 hours a day, 7 days a week, 52 weeks a year? I once managed a 780 that had it's floating point processor fail. At the time that happened, I was involved with some stuff that didn't use floating point. It took a user who *WAS* doing lots of floating point work several hours and about a dozen crashes of his program with machine checks for him to decide to notify me of the problem. >It needs to be possible for the system manager to make that decision. For sufficiently minor problems, yes; once the problem gets severe enough, you don't want to risk having the corrupted system running any longer than it takes it to notice that it IS corrupted. >In some cases it can be fixed quickly and simply. Others are going to >require that you kick the users off the system and get on with fixing it. That's why not all BUGCHECKs are fatal. Some are continuable. >> It might also result in two people writing to the same blocks allocated to two >> separate files. Result: Neither file is valid. You mean to tell me that >> you've got customers who are so goddamned stupid they's stand for that sort of >> bullshit? > >No. I have customers who consider me competant enough to make informed >decisions to provide them with the best possible service, including data >integrity, security and system availablility. I would object quite strongly >to having that decision taken away from me by an overly paranoid exception >handler. To me, it's far better to have a good exception reporting >facility (and VMS has to have one of the better ones out there) that helps >me make that decision than it is to have the decision made for me. Fine. Then perhaps we're simply arguing about just how severe the problem must be before the system takes matters out of our hands. It might be useful if VMS had more than two classes of BUGCHECKS, and allowed the system manager to set a SYSGEN parameter that specified the lowest class that was considered fatal. -------------------------------------------------------------------------------- Carl J Lydick | INTERnet: CARL@SOL1.GPS.CALTECH.EDU | NSI/HEPnet: SOL1::CARL Disclaimer: Hey, I understand VAXen and VMS. That's what I get paid for. My understanding of astronomy is purely at the amateur level (or below). So unless what I'm saying is directly related to VAX/VMS, don't hold me or my organization responsible for it. If it IS related to VAX/VMS, you can try to hold me responsible for it, but my organization had nothing to do with it.