NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / vmsnet / sysmgt / 228 < prev next >

Wrap

Text File | 1992-07-29 | 3.3 KB | 81 lines

Newsgroups: vmsnet.sysmgt Path: sparky!uunet!elroy.jpl.nasa.gov!ncar!uchinews!iitmax!draughn From: draughn@iitmax.iit.edu (Mark Draughn) Subject: Need Help With Disk Rebuild Problem. Message-ID: <1992Jul30.004447.1273@iitmax.iit.edu> Organization: Illinois Institute of Technology Date: Thu, 30 Jul 92 00:44:47 GMT Lines: 71 I'm having a system performance problem that I hope someone can help me with. We're running VMS 5.4-2. We have about 9000 users on our VAX cluster. All of the user directories are on one volume set containing about 200000 files using 2.5 GB, which is served to the cluster. THE PROBLEM: After a crash, when the volume set is mounted, the automatic rebuild takes about an hour. This is a major pain. I've got things set up so that this volume is only rebuilt if the server crashes. If one of the satellites crashs, we live with the corruption. This is still a problem because a server crash will hang all user file activity for an hour during the rebuild. Even a simple shutdown-and-reboot cycle on the server can take an hour if any of the systems in the cluster has crashed since the last reboot. SOME SPECULATION: A little experimentation suggests that the basic disk rebuild is fast, but that updating disk quotas is very slow. I think this is because disk quota entries are cached but not sorted or indexed. The quota rebuild process uses the ACP-QIO interface to rebuild the quota file. Each time it updates a quota entry, the ACP (actually the XQP) has to do a sequential search for the right record. Since all quota entries are being accessed, the caching doesn't help. The result is that the quota rebuilding process is quadratic in the number of quota entries. Just for the heck of it, I wrote a program that scans the INDEXF.SYS files and gathers all the quota information, then rewrites the quota file directly. The good news is that it only takes 5 minutes. The bad news is that the quota cache is not invalidated when the quota file is opened. I can't even update the cached entries using the ACP-QIO interface because I don't know which UICs are cached. The only way I know to invalidate the cache is to disable quotas then re-enable them. However, in order to preserve the accuracy of the disk quota usage information (which is, after all, the purpose of rebuilding the quota file) this has to be done on all nodes while the volume set is locked. The ACP Control function FIB$C_DSA_QUOTA only works on the current node---other nodes in the cluster will still have quotas enabled. (I suppose I could make it work by starting DECNET servers on the other nodes to do the disk quota operations, but this seems excessive. ---------- So. Does anybody else have this slow rebuild problem? How do you deal with it? I can't believe we're the only site with thousands of disk quota entries on one volume set. I would love to be told that I'm ignorant of an obvious solution. (Disabling quotas, reducing the number of users, breaking up the volume set, or rebuilding off-line are not possible.) Does DEC plan to fix this performance problem? Better yet, has DEC fixed this in VMS 5.5? Any suggestions? Thanks. -- Mark Draughn | <draughn@iitmax.iit.edu> or <SYSMARK@IITVAX> on BITNET ----------------+ Academic Computing Center, Illinois Institute of Technology +1 312 567 5962 | 10 W. 31st Street, Chicago, Illinois 60616