home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.cray
- Path: sparky!uunet!stanford.edu!ames!data.nas.nasa.gov!wk42!ciotti
- From: ciotti@wk42.nas.nasa.gov (Robert B. Ciotti)
- Subject: D.C. CUG and SRFS
- Keywords: Long
- Sender: news@nas.nasa.gov (News Administrator)
- Organization: NAS, NASA Ames Research Center, Moffett Field, CA
- Date: Wed, 9 Sep 92 01:30:17 GMT
- Message-ID: <1992Sep9.013017.27969@nas.nasa.gov>
- Lines: 355
-
-
- Topic: Session Reservable File Systems (SRFS)
-
- At the up comming CRAY User Group meeting, the topic of Session Reservable
- File Systems (SRFS) will be raised at the Operating Systems Special Interest
- Committee (OS-SIC) meeting Monday morning, September 14. SRFS is a facility
- developed at NASA that provides a resource management function for the mass
- storage requirements of running jobs.
-
- The issue to address will be whether SRFS functionality is desired at
- other CRAY sites. The committee will be preparing a recommendation
- to CRI and requests your input. If other sites *express* interest,
- SRFS may be included in a future release of UNICOS. Included in this
- message are some abstracts that detail SRFS.
-
- Our experience with SRFS has shown it to be a very useful facility
- in our disk management and administration.
-
- A more detailed paper is available by making a request to me by e-mail
- or USPS. If you will not be attending the CUG meeting, please feel
- free to proxy your voting interest to me.
-
- Bob Ciotti
-
- ----------------------------------------------------------------
- Robert B. Ciotti HSP Systems Development
- Numerical Aerodynamic Simulation N258-5 TEL (415) 604-4408
- NASA Ames Research Center FAX (415) 604-4377
- Moffett Field, CA 94035-1000 ciotti@nas.nasa.gov
- ----------------------------------------------------------------
-
-
- -------------------------------------------------------------------------------
- -------------------------------------------------------------------------------
- -------------------------------------------------------------------------------
-
- Session Reservable File Systems (SRFS)
- Bob Ciotti
- NASA Ames Research Center
- Moffett Field, CA 94035, USA
- ciotti@nas.nasa.gov
- April 4, 1992
-
- Abstract
-
- Neither UNIX nor UNICOS provide an integrated resource management
- facility to control and support the allocation of file space. Session Reserv-
- able File Systems enable resource control of file space allocation on a per
- session, per file system basis. This facility guarantees access to reserved
- space. Integrated resource management is provided through NQS so that
- jobs are scheduled as file space resources are available. Resource sharing
- between interactive users and NQS is dynamic and can be configured to
- provide equal access to critical resources (e.g. SSD). All UNIX file system
- semantics are preserved and are completely backward compatible. Load
- sensitive performance variations are eliminated by limiting file space allo-
- cation on ldcached file systems. Configuring the SSD as a ramdisk has the
- added benefit of doubling I/O throughput performance over ldcache.
-
- 1.0 Introduction
-
- Session Reservable File Systems (SRFS) was developed at the Numeri-
- cal Aerodynamic Simulation Facility (NAS) at NASA Ames Research
- Center, Moffett Field, CA. SRFS is currently in production on both of
- the supercomputers at NAS: Navier (a CRAY2 4-4/256), and Reynolds
- (a Y-MP 8-8/256 with 256MW Solid State Devide (SSD)). Computa-
- tional Fluid Dynamics is the primary research science pursued at NAS.
- Computational chemistry is not as common, but is particularly prone to
- using large amounts of temporary file space. Frequently, the standard
- workload of jobs produces a clashing demand for file space resources.
- The principal motivation behind the development effort was to provide
- a means of resource management for file space allocation between
- competing scientific applications, whether interactive or batch (Net-
- work Queuing System - NQS).
-
- 2.0 Motivation
-
- UNIX views resources as infinite. As long as one may assume that this
- is the case, resource allocation among competing processes is not an
- issue. Once we hold that certain resources are scarce (e.g., competing
- processes frequently cause overallocation), or subject to performance
- degradation when over allocated (e.g., ldcache, main memory), a for-
- malized allocation management function is in order. File space is such a
- resource. Jobs having a requirement for significant amounts of file
- space should be capable of specifying this as a requirement for execu-
- tion. Lacking this capability leads to loss of CPU time, and in the case
- of ldcache, leads to varied and poor performance. This is the primary
- motivation for the development of SRFS.
-
- 3.0 Requirements
-
- An integrated resource management function for file space is needed.
- Below is a summarized list of requirements that outlines the functional-
- ity provided through SRFS:
-
- o Ability to reserve file space per file system
- o Guaranteed access to reserved file space
- o Portability TO/FROM target machine
- o Support of all UNIX and FORTRAN I/O
- o Prevent or allow overuse of reserved space
- o Signal disallowed attempts to overuse reserved space
- o Provide system and session level monitoring
- o Integration into NQS
- o Batch and interactive co-existence
- o No significant degradation in performance.
- o Administrative control of over/under subscription
- o Logging of accounting information
- o Effecting Quotas
-
- 4.0 Design
-
- 4.1 General Outline
-
- A Session Reservable File System is an attribute that any native file
- system may posses. The system initialization (/etc/rc) files typically
- contain the necessary commands to set up the file systems as desired
- (see sadmin(1) below) after being fsck'ed and mounted. Once the sys-
- tem has been initialized, any session has access to the reservation
- mechanism (see srfs(1) and qsub(1) below). Sessions are either initi-
- ated by NQS when starting a job from an NQS queue, or by init when
- spawning an interactive log-in. The session terminates when either the
- NQS job finishes, or when the interactive user log-off. Any session,
- either interactive or NQS, may preallocate file space on a Session
- Reservable configured file system. Unused reserved space is returned to
- the system either when the session voluntarily releases it or when the
- session terminates.
-
- 4.1.1 User Interface
-
- When a session wishes to make a reservation, it makes a system call.
- The session passes the amount required and a pathname to a file or
- directory on the file system where the reservation is desired. The UID,
- GID, and ACID of the file/directory passed in the system call are used
- by the kernel in processing the request. If the process making the call is
- either UID root or SUID root the request is processed as long as there is
- sufficient unreserved space to allocate the amount of requested space. If
- the calling process is not root, the kernel then validates the request
- against several criteria:
-
- The space allocated remains in the possession of the session until it is
- specifically released, or the session terminates.
-
- To illustrate, take for example User Smith. User Smith logs on and
- requests 100 megabytes on /scr/smith. User Smith then runs a CFD
- application and creates 95 megabytes of output in a file called OUT1.
- Smith still has 5 megabytes of guaranteed space reserved. Now Smith
- decides that OUT1 is incorrect and deletes it. Since Smith created and
- then deleted the OUT1 file, the amount of guaranteed space is back to
- the original 100 megabytes. User Smith then reruns the corrected ver-
- sion creating 90 megabytes of output. User Smith then logs out, and in
- doing so automatically returns the 10 megabytes of unused reserved
- space to the system.
-
- Any number of files may be created and deleted. Tracking is performed
- on the net balance of allocated disk blocks that are created/deleted in
- the UID, GID and ACID specified in the reservation request.
-
- 4.1.2 Administrative Controls
-
- When the administrator wants to designate a file system as Session
- Reservable, a system call is made to do so. Flexibility in how SRFS
- may be configured on different file systems is important in order to sup-
- port varied working environments and applications. There are three
- pieces of information passed to the kernel for initializing the attributes:
-
- 1. The pathname to the mount point of the file system that is to be ses-
- sion reservable.
-
- 2. A value which indicates the number of blocks to over/under sub-
- scribe the named file system.
-
- 3. One of the following four behavioral variants:
-
- o Soft Mount
- o Hard Mount
- o Soft/Restricted Mount
- o Hard/Restricted Mount
-
- SRFS may be turned on or off at anytime. If turned off when one or
- more sessions have an existing reservation, all reservations are erased,
- but the processes within the sessions are otherwise unaffected. The file
- allocation mechanism then returns to a first-come first-served policy.
-
- 4.1.3 Checkpoint/Restart
-
- Sessions (jobs) that are checkpointed and killed release all of their
- unused reserved space. Any files created by the session on the reserved
- file system still remain. When the job is restarted, all SRFS allocations
- are reinitiated. If current space limitations will not allow the reinstate-
- ment, the restart will fail. Failed restarts may be tried again later when
- the required file space resources are available.
-
- 4.1.4 NQS Extensions
-
- NQS has been modified to provide for the notion of resource allocation
- of file space and special scheduling consideration for the jobs that
- request it. The modifications to NQS were made so as to maintain com-
- patibility between modified and unmodified versions of NQS. The ver-
- sion of NQS modified to support SRFS is completely backward
- compatible with other existing version of NQS. Requests are processed
- by the NQS daemon which in effect makes reservations on behalf of the
- requesting session (NQS job).
-
- 4.1.5 Scheduling
-
- Because the availability of file space is continually changing, and jobs
- have varied file space requirements, NQS schedules jobs based upon
- the availability of file space and the demand for that space.
-
- The first fit SRFS scheduling mechanism has been incorporated into the
- standard FIFO scheduling mechanism. The existing site-dependent
- queue structure does not require modification or additional queues to be
- defined. An NQS job will not be started until all of its SRFS require-
- ment(s) can be satisfied. If a job's SRFS requirement(s) can be satis-
- fied, the NQS daemon obtains the file space and passes it to the job
- when first started. If the requested space is not available on the speci-
- fied file system(s), the job is held until the space becomes available. In
- the meantime the request is bypassed and the next jobs up the queue is
- tested for execution. If the next job does not have a reservation request,
- or the reservation request(s) can be satisfied, then that job is run next.
- This is repeated until a runable job is found.
-
- There are two situations where SRFS requirements could prevent a job
- from starting:
-
- 1. The UID, GID, or ACID quota is preventing the request from suc-
- ceeding. If this is the case, the user that submitted the job is
- informed one time via mail. It is then the user's responsibility to cre-
- ate enough free quota space for the job to run. The job then remains
- in the queue until it is explicitly removed or enough quota space has
- been freed to allow the job to run.
-
- 2. The file system does not have enough free space to support the
- request. If this is the case, the NQS daemon takes note of the time
- that this event occurred. After a per-queue configurable amount of
- time (see queue time-out - /etc/srfsqto.conf below) a job will become
- assisted.
-
- 4.1.6 Job Assist
-
- Due to high demand and/or very large requests jobs may be eligible for
- execution but unable to run because their SRFS requirements cannot be
- met. Under certain conditions (see #2 above) the NQS daemon will
- track the amount of time that jobs wait for SRFS requests. Each queue
- is configured with a threshold value in minutes waiting. Once this
- threshold value is exceeded, the NQS daemon enters a state that for the
- job called assist. When a job becomes assisted, no other requests for
- file space on the file system that caused the assist will be processed
- through NQS. Attrition or data migration is then relied upon to free
- enough space for the request.
-
- 4.1.7 Temporary Directories
-
- Temporary working space for jobs is managed effectively through the
- use of the tmpdir(1) facility provided in UNICOS. However a problem
- exists in that users do not know the name of their tmpdir directories at
- the time of the qsub(1) request. Therefore a configuration file of key-
- word names (see /etc/tmpdir.conf below) allows users to specify a key-
- word that is later mapped to a directory name. The NQS daemon will
- create a temporary directory for the user and make the directory path
- available in the users environment variable space under the $keyword
- name.
-
- For example:
-
- qsub -lr `$BIGDIR',50MW
-
- On our system, this allocates 50 megawords to the user on the tmp-
- dir(1) managed file system /big. Users access the name of their tempo-
- rary directory on /big via the environment variable $BIGDIR, which
- evaluates to something like /big/nqs/a0001. Interactive users are also
- automatically assigned a $BIGDIR via the system shell initialization
- files (e.g., /etc/cshrc or /etc/profile). This provides script compatibility
- between interactive and batch environments.
-
- 5.0 Implementation
-
- 5.1 System Call Interface
-
- It was strongly desired to make the kernel and NQS changes as modu-
- lar as possible. Further, effecting as few source files as possible was
- also desirable to reduce the effort required to maintain our source with
- constant updates and fixes coming in from CRI.
-
- The quotactl(2) system call was chosen for modification because of its
- similarity in function, and the desire not to create an additional system
- call. All interfaces for the support of user commands and administrative
- control are provided via the modified quotactl(2) system call. These are
- detailed in the man page appendix.
-
- 5.2 User Interface
-
- Due to the nature of UNIX and NQS, it was unavoidable to have two
- different user-level interfaces to SRFS.
-
- Interactively, users make requests for allocation of space via the srfs(1)
- command (see man page appendix). When a user makes a request inter-
- actively, the request returns immediately. If successful, the space is
- reserved and a positive status is returned. If the attempt fails, negative
- status is returned and quotactl(2) will have set errno to the appropriate
- value that can be listed from the srfs(1) command using the -v com-
- mand line option. The return status allows the command to function
- within a shell script using an if test conditional branch.
-
- To make SRFS requests via NQS for batch processing, qsub(1) was
- modified to accept an additional command line argument -lr or an
- embedded shell script command #QSUB (see man page appendix).
-
- Interactive and batch jobs adjust or release their SRFS requests via the
- srfs(1) command. All informational displays are also handled by
- srfs(1).
-
- Root has the capability of adjusting the SRFS allocations of any job on
- the system via the srfs(1) command.
-
- 5.3 Administration
-
- 5.3.1 Initialization
-
- File system initialization of SRFS is handled through the sadmin(1)
- command (see man page appendix). The sadmin(1) command will typ-
- ically appear in the systems /etc/rc start-up files after the file systems
- have been fsck'ed and mounted. Once this has been done, the SRFS
- attributes applied remain until the system is brought down or otherwise
- specifically removed.
-
- 5.3.2 NQS Administration
-
- Several configuration and information files are used and maintained by
- NQS for SRFS processing.
-
- 1. /etc/srfsqto.conf - Queue time out configuration
- 2. /etc/tmpdir.conf - Temporary directory configuration
- 3. /usr/spool/nqs/log - General SRFS information
- 4. /usr/spool/srfslog - Detailed SRFS information
-
- 8.0 Future Directions
-
- Now that qsub(1) has the capability for users to specify their file space
- usage requirements, the system can use this information for more than
- just scheduling. There is interest in somehow tying these estimates into
- an automatic mechanism for data migration.
-
- 9.0 Summary
-
- At NAS, SRFS has provided an integrated resource management tool
- for access to file space on the Cray supercomputers. SRFS allows us
- more flexibility to provide our users with better functionality and per-
- formance. SRFS retains full backward compatibility for all I/O and
- NQS.
-
- Ldcache alone is quite subject to severe load sensitive performance
- variations with even conservative cache to disk ratios. Ramdisk pro-
- vides consistent performance and outperforms production throughput
- rates by at least a factor of two compared to ldcache.
-
-