home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.parallel
- Path: sparky!uunet!gatech!hubcap!fpst
- From: alan@ec.msc.edu (Alan Klietz)
- Subject: DJM - Distributed Job Manager for CM-2 and CM-5
- Message-ID: <1992Jul24.115403.8337@hubcap.clemson.edu>
- Sender: fpst@hubcap.clemson.edu (Steve Stevenson)
- Organization: Clemson University
- Date: Thu, 23 Jul 92 20:16:43 CDT
- Approved: parallel@hubcap.clemson.edu
- Lines: 110
-
- Are you:
- Frustrated with NQS?
- Tired of batch jobs?
- A frazzled manager of a CM-2 or CM-5 system?
-
- If yes, you should try the Distributed Job Manager, and boldly move out of
- the 1960's and into the modern (well, 1970's) era of parallel computing! :-)
-
-
- What is the Distributed Job Manager?
- ------------------------------------
-
- The Distributed Job Manager (DJM) is a drop-in replacement for the
- Network Queuing System (NQS) for managing batch and interactive jobs on a
- Thinking Machines Corporation CM-2 or CM-5 system.
-
- DJM is freely available software under the auspices of the GNU copyright.
-
-
- What does DJM do?
- -----------------
-
- Briefly, DJM manages the resources of a Connection Machine system. It
- allocates resources (memory, processors, time) to jobs based on a
- set of site-defined scheduling criteria. It provides efficiency
- and flexibility for system administrators, and provides ease-of-use
- and productivity for end users.
-
- The Distributed Job Manager has the following features:
-
- o Upward compatible with NQS
-
- o Full support for interactive use
-
- o Load Balancing
-
- o Flexible controls and fault tolerance
-
- Compatibility: All the user-visible commands of NQS (qsub, qstat, etc.)
- are supported. Support is also provided for many of the user-visible
- extensions to NQS as proposed in the Posix Batch Queuing draft standard
- (IEEE Std 1003.15).
-
- Interactive use: Interactive access is integral to the system. DJM
- makes no distinction between `batch' and `interactive' jobs. A job
- is a login session. Sessions can be detached from the terminal and
- then later re-attached on a different terminal.
-
- Load balancing: DJM balances job load across multiple front-ends.
- The criteria for selecting where to run jobs can be influenced by
- site policy to an arbitrary degree. Users never have to log into
- a particular front-end.
-
- Fault tolerance: A portion of the MPP machine or a front-end
- can go down without affecting other components of the system.
-
-
- What machines are supported?
- ----------------------------
-
- DJM currently supports the Thinking Machines Corporation CM-2 and
- CM-5. The package is modular to facilitate easy adaptation to other
- MPP architectures as well. You are encouraged to donate your extensions
- for support of other architectures for incorporation into future
- DJM releases.
-
-
- Ok, I'd like to try it out. How do I get it?
- ---------------------------------------------
-
- A beta-test release of DJM is available for anonymous ftp from the
- CM LIGHTNING User Group archives. Use anonymous ftp to ec.msc.edu
- and retrieve the file /pub/LIGHTNING/djm_0.9.6.tar.Z.
-
- You should test it extensively under `live' operating conditions for
- several weeks, running it side-by-side with NQS, before you decide
- whether to switch.
-
- All documentation is included. See the file INSTALL for installation
- instructions. You need an ANSI C compiler, preferrably gcc version
- 2.1 or later.
-
- NOTE: Although this version is currently in production use at
- the Minnesota Supercomputer Center and at Los Alamos, it should
- still be considered a BETA-TEST version, not a final version.
-
-
- What kind of support can I expect?
- ----------------------------------
-
- As much as you paid for: NONE! You have the source code, so be prepared
- to make changes to fix bugs or apply local hacks. I will accept bug reports.
- If you include source code patches, I will try to fold them in. If not, I
- may try to fix bugs depending on how much time I have. Basically, however,
- you are on your own.
-
-
- DISCLAIMER: This project was funded in part by the U.S. Army Research
- Office under contract no. DAAL03-89-C-0038. There are NO WARRANTIES,
- EXPRESS OR IMPLIED. The package includes software developed by
- the University of California, Berkeley and its contributors.
-
- --
- Alan E. Klietz
- Minnesota Supercomputer Center, Inc (MSCI).
- Army High Performance Computing Research Center (AHPCRC)
- 1200 Washington Avenue South
- Minneapolis, MN 55415
- Ph: +1 612 626 1737 Internet: alan@msc.edu
-
-