home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!pacbell.com!sgiblab!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!U.WASHINGTON.EDU!DEREK
- From: DEREK@U.WASHINGTON.EDU
- Newsgroups: comp.os.vms
- Subject: Plans for mixed Alpha/VAX cluster... WARNING!
- Message-ID: <C4A431AEA25FE48CFF@MAX.U.WASHINGTON.EDU>
- Date: 17 Nov 92 01:19:00 GMT
- Sender: usenet@ucbvax.BERKELEY.EDU
- Organization: The Internet
- Lines: 146
-
- OK, folks, I have been meaning to write about the following problem for
- quite some time, but just hadn't gotten around to it. A posting by
- Don Stokes (don@vuw.ac.nz), however, catalyzed me into action! :)
-
- Last May, Andy posted the following note. (I don't know who Andy is either.
- He only signed with his first name.)
-
- >From: IN%"INFO-VAX@SRI.COM" 18-MAY-1992 03:25
- >Subj: Upgrade VMS 5.0-2 to 5.5 failure...
- >
- >Say... *I* had problems upgrading from VMS V5.0-2 to VMS 5.5... I got some
- >kind of error saying it couldn't find the "CHGSYSPAR" program... this was
- >right at the end of phase 1, when it was setting the system parameters to
- >reboot and start phase 2. Very frustrating... I wound up having to install
- >VMS as opposed to upgrading, then had to restore & reinstall pertinent files...
- >
- >Andy
-
- Well, I had an experience with this problem, too!
-
- About two years ago, we ran into a weird problem during a VMS upgrade to V5.4.
- Things were proceeding along "swimingly" we reached phase 2, at which point
- it bombed out with a "file not found" error on the file "CHGSYSPAR.EXE".
- The folks who were doing the upgrade tried it again, and it failed again with
- the same error. They made a cursory examination of the KITINSTAL procedure,
- and decided that we were somehow missing the image. So they restored the
- file from the distribution media, and tried again. It failed again. They
- went back to the KITINSTAL and became very puzzled, because they discovered
- that the image was RETRIEVED from the VMSnnn.A saveset! "Obviously", the
- upgrade procedure had a logic flaw in it. (It had just finished deleting 1100
- or so files, and it looked like it had deleted the file it had just restored.)
- So they unpacked the .A saveset, modified KITINSTAL.COM to copy CHGSYSPAR
- from a copy saved specifically for this purpose, re-created the .A saveset,
- and completed the upgrade. Whew!
-
- I started looking at the problem the following Monday. This was quite a
- puzzle. I researched the problem for quite some time. Colorado was stumped.
- No one had reported such problems before. I even spoke with the engineer who
- was then responsible for VMSINSTAL. He hadn't heard of this problem, either.
- Well, we just wrote it off as random "cosmic ray" (AKA an "alpha" particle) :)
- which struck multiple times. :)
-
- It wasn't until about 6 months later that I solved this problem.
-
- So, what does Don's posting have to do with any of this? Good question!
- (Anyone else appreciate James Burke's "Connections", etc., series?)
-
- Don wrote:
-
- >... [VMS$COMMON] is used by practically nothing at all. I
- >smashed my system disk up a while ago, and made a fairly hairy recovery.
- >Several days later I discovered that [VMS$COMMON] hadn't been re-created,
- >but the system hadn't noticed. Obviously, [SYS0.SYSCOMMON] was present,
- >and that's all that mattered.
-
- My guess is that Don did not perform an IMAGE restore of his system disk,
- but that doesn't really matter for my point. However, it COULD produce
- the problem that both we and Andy experienced.
-
- So, what happened some six months after our problem? Well, I decided to
- do an "ANALYZE /DISK /NOREPAIR" on the system disk of the system which had
- had the problem. In doing so, I discovered an EXTRA SYSCOMMON.DIR! In
- going back through our logs, I discovered that a little more that three
- months BEFORE our upgrade, several images had "disappeared" from the
- SYS$SYSTEM directory. To correct the problem, without shutting down the
- system, someone had performed a BACKUP "file restore" of the directory
- tree [SYS0.SYSCOMMON]. This created a SYSCOMMON.DIR which was NOT
- linked with the VMS$COMMON directory, which sounds like the situation
- Don says he had (has).
-
- (As for why the files started disappearing in the first place, I seem to
- recall tracking it back to an installation of RDB the day before.)
-
- So, why is this a problem, and how would it cause a problem for the upgrade?
- Well, an upgrade of VMS uses the SYSF root as a temporary holding place for
- some of the new images. In order to keep track of both the old and new images,
- the KITINSTAL procedure creates, and uses, several names for the various
- directories. These are:
-
- "old_sysexe" old [SYSEXE] in SYS$SPECIFIC:, such as
- DUA0:[SYS0.SYSEXE]
-
- "new_sysexe" new [SYSEXE] in "new" SYS$SPECIFIC:,
- i.e., DUA0:[SYSF.SYSEXE]
-
- "clroot_sysexe" [SYSEXE] directory in the CLUSTER common root, if
- such exists. You might expect this to be defined
- as the translation of SYS$COMMON:[SYSEXE],
- DUA0:[SYS0.SYSCOMMON.SYSEXE], but it is actually
- defined as DUA0:[VMS$COMMON.SYSEXE]!
-
- Using these symbols, the procedure first deletes any "old" copies of specific
- upgrade-related files, such as CHGSYSPAR.EXE, from the "old_sysexe" and
- "clroot" directories. Then, it copies new files from the distribution to
- both the "new_sysexe" and "clroot_sysexe" directories.
-
- Now, IF SYS$COMMON is the same directory as [VMS$COMMON], everything is OK.
- In fact, things would be OK except for one thing. KITINSTAL invokes the
- procedure VMS$UPG_SYSPARFILES.COM which, naively, tries to run the new
- CHGSYSPAR image out of SYS$SYSTEM, and NOT out of "new_sysexe"! Granted,
- it "should" work, but I still insist that this is a logic error. After all,
- SYS$SYSTEM is defined to point to the OLD system, which is under demolition.
-
- Now that you have all heard my tale of woe, you should all be asking
- "why did the upgrade succeed if the directories were so messed up?"
- After all, SYSCOMMON.DIR still has all of the OLD images and files left in
- it -- doesn't it?
-
- Yes, it does. So didn't this cause a problem for us right after we completed
- the upgrade? One would expect at least a FEW ident and cld-image mismatches.
- It turns out that one of the last things done by the upgrade is to execute a
- command like:
-
- $ SET FILE /ENTER=[SYS0]SYSCOMMON.DIR VMS$COMMON.DIR
-
- This pretty effectively "loses" our old, "bad" SYSCOMMON directory, and
- "fixes up" everything for the new version.
-
- Except.... for the lost files, which, in our case, used up almost 300,000
- blocks of space. Fortunately, we are using a large disk, so we weren't
- adversely impacted by this.
-
- Yes, we ran for 3 months with an "incorrect" directory structure, and you
- can too ... but, don't expect an easy upgrade if you do.
-
- And, yes, after I discovered the "root" :) of the problem I contacted the
- engineer. He effectively said "Oh, yeah. That could be a problem." I
- suggested that he modify the KITINSTAL procedure to CHECK for, and notify
- the user of, an invalid directory structure. Nonetheless, no change has
- yet been made. Furthremore, as of VMS V5.5, the VMS$UPG_SYSPARFILES
- procedure STILL invokes CHGSYSPAR out of SYS$SYSTEM.
-
- -Derek S. Haining
- University Computing Services
- University of Washington
- Seattle, Washington 98195
- (206) 543-5579
-
- DEREK@MAX.BITNET
- DEREK@MAX.U.WASHINGTON.EDU
-
-
- Questions? Comments? Always happy to receive them -- even if they ARE
- flames. It lets me know that SOMEONE bothered to read what I so painstakingly
- wrote! :)
-
-