NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / os / rsts / 106 < prev next >

Wrap

Internet Message Format | 1993-01-28 | 3.8 KB

Path: sparky!uunet!enterpoop.mit.edu!spool.mu.edu!nigel.msen.com!dan From: dan@msen.com (Dan and Karen Sugalski) Newsgroups: comp.os.rsts Subject: Re: Exabyte Backup Failures Date: 27 Jan 1993 02:25:56 GMT Organization: Msen, Inc. -- Ann Arbor, Michigan Lines: 76 Distribution: inet Message-ID: <1k4rs4INN5tv@nigel.msen.com> References: <1993Jan25.123329.11700@syma.sussex.ac.uk> NNTP-Posting-Host: garnet.msen.com X-Newsreader: TIN [version 1.1 PL8] Stephen Carter (stevedc@syma.sussex.ac.uk) wrote: : We are running RSTS 9.7. We have had exabytes since I don't know how : long, and have been VERY happy with the overall functionality. : In July 1992 we swapped two 512Mb Fuji's for Two 2Gb Seagate ST42400 : Scsi discs hanging off a CMD Scsi controller. The users' data did : expand (naturally!) but currently stands at (laboriously calculated) : 2,146,696,704 bytes. : When we backup now, the run fails (after 13 hours) with the following : | : | : ?Error reading Backup set : ?Data error on device : ?Error reading Backup set : ?Data error on device : ?Unexpected error 14 in RSTRMS : | : On the face of it, with these data volumes it may appear to be a badly : handled end-of-reel situation, so before folk flame me, I HAVE seen a real : end-of-reel situation on the exabyte, and that was handled properly : (Please mount volume 2 of Backup set etc etc) : What is it? Well, this sounds suspiciously like a problem that I've encountered on a lot of customer's machines--namely that RSTS' TMSCP handler has pretty poor error recovery. While I've never encountered an error in the middle of a backup, the most anyone's ever dumped to the thing are two RA81s (~750 meg of data, not counting free space left on the drives). What seems to happen is that somewhere between the tape drive and the TMSCP handler, a packet gets lost, and the two get out of sync, never again to talk. (We have to power the drive and computer off--a reboot doesn't seem to be enough) The easiest way we've found to trigger it is to do a restore when there's a load on the machine (100% guaranteed to cause a failure). The restore works OK and the data gets restored, but the tape drive will not talk to the computer any more. Symptoms include: Data Error on Device errors, Magtape Select errors, Device Hung errors, and, my favorite, complete machine lockups. (The latter can be cured by either inserting a tape or ejecting the tape that's already in. Things pick up where they left off, no harm done) You might try upgrading to 10.1 (Skip 10.0 unless you're ready to apply several dozen patches, including one that's not documented anywhere, but is lurking on the upgrade tape). The TMSCP handling is reportedly better. I don't have any experience with the drives on it, though, as I don't know anyone with one of the tape drives that is using it. Alternatively, you may want to try putting a pause of some sort between the BACKUP commands (maybe the big dump of data is causing a probelm and giving the drive a chance to flush buffers and such will help). If you're doing the verify after each backup, then *don't*! Do both backups firtst, then verify the tape. Finally, check to see if there are any batch jobs that might be running in a different queue. (The bigger backup may be overlapping something else.) Also consider doing a SET SYSTEM/NOLOGINS in the com file before the backup starts. (You may want to shut down the other queues, as well as any other jobs that may be running) And, even more finally, back up the non-system disk *first*. It's been by experience that, once TMSCP loses it, the backup/restore job is immortal. (Looks to have pending async IO--prio/runburst go to 128/127 but doesn't die) $SHUTUP won't be able to bring the system down, so you'll have to power off the machine. Dismount the big drive before you do--one fewer drive to clean... .Sigs? We don' need no steenkin' .sigs! Dan