All Advanced Laboratory Workstations are interconnected by the NIH campus-wide network, which they use to share resources and access services such as file backup, software maintenance, applications software, online documentation, international electronic mail and news, computation and database servers, laser printers, and an international distributed file system. ALWs are particularly suitable for scientific applications requiring high-performance computing or graphics, or access to large amounts of data. The most popular such applications include medical image processing, DNA and protein sequencing and searching, statistical analysis, and molecular graphics and modeling.
The ALW System won the 1992 Best in Open Systems Solutions (BOSS) award in the Innovation in Hardware, Software, and Networking Approaches category. This award is conferred annually by the Federal Computer Conference and the Government Open Systems Solutions Council to recognize government agencies that have best applied open systems technology.
Storing files in the distributed file system rather than on the workstation's local disk offers many advantages: daily file backup, higher security, automatic software maintenance, file sharing, and user mobility. The following sections describe these in greater detail.
Automatic software maintenance
Optionally, users can exchange multi-media mail with other ALW users via the Andrew Message System (AMS). In addition to plain text, AMS multi-media mail can contain multi-font text, equations, tables, graphics, images, animations, and sound.
Our use of AFS also positions us perfectly for migration to OSF's Distributed Computing Environment (DCE*), since the next major version of AFS (AFS 4.0) was adopted as the distributed file system component for DCE.
Most of our workstations are configured as dataless clients: the ALW's local disk contains only that software necessary to boot the machine and get it communicating on the network. The remaining local disk space is used for swap space, temporary files, and the AFS file cache. Since all the information on the local disk can be rapidly reconstructed from data held in the distributed file system, it does not need to be backed up, and users are relieved of this burden.
AFS makes a read-only clone of the files to be backed up early each day, and this "snapshot" is backed off to 8mm tape during the prime shift. We thus obtain a consistent backup of the file system without the necessity of shutting down the file servers.
The distributed file system retains the read-only clones online until the next morning. This allows users to easily restore the most recent version of files that they have accidentally deleted or modified without requiring operator assistance.
AFS permits us to replicate read-only data to improve system performance and reliability. We operate with two copies of the operating system and application program executables for each system type on physically different servers. This distributes access to these files across two machines, and eliminates a single point of failure: when a fileserver fails, only those users whose home directories reside on the failed machine are affected. Client workstations accessing replicated data dynamically switch to use the remaining accessible copy.
AFS can move collections of files "on-the-fly" between disks or servers. We use this capability to balance disk space utilization.
Applications software is managed by depot, a C program developed at Carnegie-Mellon University. Depot enables system administrators to more easily perform integrated testing and release engineering, simplifies search path management, permits control of application version configuration, and can selectively copy and update individual applications on an ALW's local disk. This last feature is crucial when configuring workstations that must operate when portions of the system are inaccessible; for example, machines used to restore files from backup tapes or run network diagnostics.
The attached article, which appeared in the August 18, 1992 issue of The NIH Record, describes the major benefits of the ALW System, a few of its more demanding scientific applications, and the highly favorable reaction of several users. We have achieved these results by using innovative, state-of-the-art software and hardware, particularly for the distributed file system and for software distribution and management.
Keith Gorlen Chief, Distributed Systems Section Computing Facilities Branch Division of Computer Research and Technology National Institutes of Health Bethesda, MD 20892 Phone: (301) 496-1111 FAX: (301) 402-2867 E-mail: kgorlen@alw.nih.gov