home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: vmsnet.sysmgt
- Path: sparky!uunet!elroy.jpl.nasa.gov!ncar!uchinews!iitmax!draughn
- From: draughn@iitmax.iit.edu (Mark Draughn)
- Subject: Need Help With Disk Rebuild Problem.
- Message-ID: <1992Jul30.004447.1273@iitmax.iit.edu>
- Organization: Illinois Institute of Technology
- Date: Thu, 30 Jul 92 00:44:47 GMT
- Lines: 71
-
- I'm having a system performance problem that I hope someone can help
- me with.
-
- We're running VMS 5.4-2.
-
- We have about 9000 users on our VAX cluster. All of the user
- directories are on one volume set containing about 200000 files using
- 2.5 GB, which is served to the cluster.
-
- THE PROBLEM: After a crash, when the volume set is mounted, the
- automatic rebuild takes about an hour. This is a major pain.
-
- I've got things set up so that this volume is only rebuilt if the
- server crashes. If one of the satellites crashs, we live with the
- corruption. This is still a problem because a server crash will hang
- all user file activity for an hour during the rebuild. Even a simple
- shutdown-and-reboot cycle on the server can take an hour if any of the
- systems in the cluster has crashed since the last reboot.
-
- SOME SPECULATION:
-
- A little experimentation suggests that the basic disk rebuild is fast,
- but that updating disk quotas is very slow. I think this is because
- disk quota entries are cached but not sorted or indexed.
-
- The quota rebuild process uses the ACP-QIO interface to rebuild the
- quota file. Each time it updates a quota entry, the ACP (actually the
- XQP) has to do a sequential search for the right record. Since all
- quota entries are being accessed, the caching doesn't help. The
- result is that the quota rebuilding process is quadratic in the number
- of quota entries.
-
- Just for the heck of it, I wrote a program that scans the INDEXF.SYS
- files and gathers all the quota information, then rewrites the quota
- file directly.
-
- The good news is that it only takes 5 minutes.
-
- The bad news is that the quota cache is not invalidated when the quota
- file is opened. I can't even update the cached entries using the
- ACP-QIO interface because I don't know which UICs are cached. The
- only way I know to invalidate the cache is to disable quotas then
- re-enable them. However, in order to preserve the accuracy of the
- disk quota usage information (which is, after all, the purpose of
- rebuilding the quota file) this has to be done on all nodes while the
- volume set is locked. The ACP Control function FIB$C_DSA_QUOTA only
- works on the current node---other nodes in the cluster will still have
- quotas enabled. (I suppose I could make it work by starting DECNET
- servers on the other nodes to do the disk quota operations, but this
- seems excessive.
-
- ----------
-
- So. Does anybody else have this slow rebuild problem? How do you
- deal with it? I can't believe we're the only site with thousands of
- disk quota entries on one volume set.
-
- I would love to be told that I'm ignorant of an obvious solution.
- (Disabling quotas, reducing the number of users, breaking up the
- volume set, or rebuilding off-line are not possible.) Does DEC plan
- to fix this performance problem? Better yet, has DEC fixed this in
- VMS 5.5?
-
- Any suggestions?
-
- Thanks.
- --
-
- Mark Draughn | <draughn@iitmax.iit.edu> or <SYSMARK@IITVAX> on BITNET
- ----------------+ Academic Computing Center, Illinois Institute of Technology
- +1 312 567 5962 | 10 W. 31st Street, Chicago, Illinois 60616
-