home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky vmsnet.misc:997 comp.sys.dec:5872
- Path: sparky!uunet!ferkel.ucsb.edu!taco!gatech!darwin.sura.net!haven.umd.edu!umd5!umdsp.umd.edu!BLEAU
- From: bleau@umdsp.umd.edu
- Newsgroups: vmsnet.misc,comp.sys.dec
- Subject: VS3100-76 SCSI disk locks up
- Message-ID: <16921@umd5.umd.edu>
- Date: 10 Nov 92 21:28:51 GMT
- Sender: news@umd5.umd.edu
- Reply-To: bleau@umdsp.umd.edu
- Organization: University of Maryland Physics Dept., College Park, MD
- Lines: 77
-
- Hello, DEC workstation users. I have a problem that's been bothering me for
- some time, and I can't seem to get a handle on it. First, the configuration.
- I have a VAXstation 3100 model 76, 32MB memory, with a RZ5x internal disk
- (200MB+, I forget just which model number), a TZ30, and exteernal hard disk
- (1.2GB), and an external rewritable optical disk. The two external disks were
- packaged for us by American Digital Systems; they're sold under the name
- MasterDisk.
-
- Now for the problem. Normally everything works great, but when I do a lot of
- transfers of data between the external hard disk and the optical disk (same
- SCSI bus) the system slows down and eventually locks up. Everyone accessing
- the external hard disk gets hung, detached jobs eventually hang, and I start
- getting calls from everyone and his brother asking what is wrong. The only way
- out of the situation I can find is to (shudder) press Halt and boot the system
- from scratch. Not a nice situation, as you can imagine.
-
- This has shown up in only two applications so far: backup and archiving. Both
- times there is data being copied from the external hard disk to the optical
- disk. It has never happened (I'm not saying it can't, though) on transfers
- between the internal (system) disk and the optical disk. Backing up I have
- control over. Archiving user data files, however, is done under user control,
- not mine, and he's brought the system to its knees several times already.
-
- Don't do a lot of data transfers, you may say. That means not doing backups
- onto a fast media, and backing up onto 12 TK50s just doesn't cut it. Also, one
- reason we got this system is to distribute data to other sites. This problem,
- then, hurts us in a big way. Finally, we have a dedicated line going into this
- system to receive data, and the system is supposed to be up 100%, or we miss
- some data. So having it die in the middle of an incoming data transfer is very
- messy.
-
- Now you know the outlines of the problem. I'll spare you the details, as there
- are over 1000 lines produces by ANAL/ERROR for just the 24 hrs preceeding the
- last crash. There are, however, a few lines in the error log that I haven't
- seen anywhere else, so I'll include the English line output by ANAL/ERROR in
- case any body recognizes it. If you need more info on these specific error log
- entries, email me and I'll be happy to send them to you, but they shouldn't be
- posted. Here are those few tidbits:
-
- SCSI BUS PHASE ERROR
- PHASE CHANGE TIMEOUT DURING DATA IN
- SCSI BUS PHASE ERROR
- TIMEOUT WAITING FOR PHASE INTERRUPT
- BUS BUSY
- BUS RESET INITIATED
- CHECK CONDITION
- UNIT ATTENTION
- POWER ON OR RESET OCCURRED
- PHASE MATCH
-
- One more fact (here's the kicker): recently this same thing happened on a
- VAXstation 4000 model 60, with an identically configured (except for SCSI ID
- numbers) external hard disk and optical disk. So while it's been observed for
- the most part on the VS3100, it is not limited to that system. Same situation
- on the VS4000: I was doing a backup and the system hung. Fortunately, though,
- the backup on the VS4000 detected a large error count (or whatever it does) and
- stops, asking me to specify QUIT or RESTART. It was getting hairy, as the
- optical drive is the _only_ backup device I have on the VS4000! I had to keep
- powering the drive off then back on and telling BACKUP to RESTART before it
- would complete the saveset (it took 6 sides!).
-
- I've thought about it being a design problem with the American Digital Systems
- disk subsystem, but if that were the case I should see other errors in normal
- operations (like bad blocks?), shouldn't I? I don't see any. I've also
- thought, based upon the wording of the message output by ANAL/ERROR, that it
- might be a design problem with DEC's SCSI controller on the VS3100-76 (hence my
- cross-posting this to comp.sys.dec), but then why would it also show up on the
- VS4000-90? And why later, too, after several months of operations?
-
- Anything you can say to shed light on this will be appreciated. In the
- meantime I'm limping along with minimal backups and keeping a watchful eye on
- the error count, and trying to find some workaround. Thanks.
-
- Larry Bleau
- University of Maryland
- bleau@umdsp.umd.edu
- 301-405-6223
-