home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-385-Vol-1of3.iso
/
c
/
condor40.zip
/
CONDOR
/
doc
/
tech
/
tech.grn
< prev
next >
Wrap
Text File
|
1989-10-16
|
42KB
|
1,869 lines
.nr si 3n
.he 'CONDOR TECHNICAL SUMMARY''%'
.+c
.(l C
.sz 14
CONDOR TECHNICAL SUMMARY
.)l
.(l C
Allan Bricker
and
Michael J. Litzkow
.)l
.sp .5i
.sh 1 "Introduction to the Problem"
.pp
A common computing environment consists of many workstations
connected together by a high speed local area network.
These workstations have grown in power over the past several years,
and if viewed as an aggregate they can represent a
significant computing resource.
However in many cases even though these workstations are
owned by a single organization, they are dedicated to the
exclusive use of individuals.
.pp
In examining the usage patterns of the workstations,
we find it useful to identify three
.q typical
types of users.
.q "Type 1"
users are individuals who mostly use their workstations
for sending and receiving mail or preparing papers.
Theoreticians and administrative people often fall into this
category.
We identify many software development people as
.q "type 2"
users.
These people are frequently involved in the debugging cycle where
they edit software, compile, then run it possibly using some kind
of debugger.
This cycle is repeated many times during a typical working day.
Type 2 users sometimes have too much computing capacity on their
workstations such as when editing, but then during the compilation
and debugging phases they could often use more CPU power.
Finally there are
.q "type 3"
users.
These are people who frequently do large numbers of simulations,
or combinitoric searches.
These people are almost never happy with just a workstation, because it
really isn't powerful enough to meet their needs.
Another point is that most type 1 and type 2 users leave their machines
completely idle when they are not working, while type 3 users
may keep their machines busy 24 hours a day.
.pp
.i Condor
is an attempt to make use of the idle cycles from type 1 and 2 users
to help satisfy the needs of the type 3 users.
The
.i condor
software monitors the activity on all the
participating workstations in the local network.
Those machines which are determined to be idle, are placed into
a resource pool or
.q "processor bank" .
Machines are then allocated from the bank for the execution of jobs
belonging to the type 3 users.
The bank is a dynamic entity;
workstations enter the bank when they become idle,
and leave again when they get busy.
.sh 1 "Design Features"
.np
No special programming is required to
use condor.
Condor is able to run normal UNIX\**
.(f
\**UNIX is a trademark of AT&T.
.)f
programs, only requiring the user to relink, not recompile
them or change any code.
.np
The local execution environment is preserved for remotely
executing processes.
Users do not have to worry about moving data files to remote
workstations before executing programs there.
.np
The condor software is responsible for locating and allocating
idle workstations.
Condor users do not have to search for idle machines,
nor are they restricted to using machines only during a static portion
of the day.
.np
.q Owners
of workstations have complete priority over their own machines.
Workstation owners are generally happy to let somebody else compute on
their machines while they are out,
but they want their machines back promptly upon returning,
and they don't want to have to take special action to regain control.
Condor handles this automatically.
.np
Users of condor may be assured that their jobs will eventually complete.
If a user submits a job to condor which runs on somebody else's workstation,
but the job is not finished when the workstation owner returns,
the job will be checkpointed and restarted as soon as possible
on another machine.
.np
Measures have been taken to assure
owners of workstations that their filesystems will
not be touched by remotely executing jobs.
.np
Condor does its work completely outside the kernel, and is compatible
with Berkeley 4.2 and 4.3 UNIX kernels and many of their derivitives.
You do not have to run a custom operating system to get the benefits
of condor.
.sh 1 "Limitations"
.np
Only single process jobs are supported, i.e.
the fork(2), exec(2), and similar calls are not implemented.
.np
Signals and signal handlers are not supported, i.e.
the signal(3), sigvec(2), and kill(2) calls are not implemented.
.np
Interprocess communication (IPC) calls are not supported, i.e.
the socket(2), send(2), recv(2), and similar calls are not implemented.
.np
All file operations must be idempotent \(em
read-only and write-only file accesses work correctly,
but programs which both read and write the same file may not.
.np
Each condor job has an associated
.q "checkpoint file"
which is approximately the size of the address space of the process.
Disk space
.b must
be available to store the checkpoint file
.b both
on the
.b submitting
and
.b remote
machines.
.np
Condor does a significant amount of work to prevent security hazards,
but some loopholes are known to exist.
One problem is that condor user jobs are supposed to do only remote system
calls, but this is impossible to guarantee.
User programs are restricted on the remote machine both by running only
as an ordinary user (condor), and by operating in a changeroot'd
directory.
Still a sufficiently malicious and clever user could cause problems by
doing local system calls on the remote machine.
.np
A different security problem exists for owners of condor jobs who necessarily
give remotely running processes access to their own file system.
The risk can be greatly reduced by requesting that access only be granted
to a changeroot'd directory in the local file system, but that does
reduce the flexibility of file access for the condor jobs.
See condor(1) for details on how to submit jobs with such a request.
.sh 1 "Overview of Condor Software"
.pp
Condor user programs do
.q "remote system calls"
back to the machine from which they were submitted.
Remote system calls provide user
programs with the illusion that they are operating in the
local environment and give the user the flexibility of running
programs written for the normal UNIX environment on condor.
Programs are converted to using
remote system calls simply by relinking with a special library.
The remote system call mechanism is described in Section 6.
.pp
Condor user programs are constructed in such a way that they
can be checkpointed and restarted at will.
This assures users that their jobs will complete, even if they are
interrupted during execution by the return of a hosting workstation's
owner.
Checkpointing is also implemented by linking with the special library.
The checkpointing mechanism is described more fully in Section 7.
.pp
Condor includes
control software consisting of two daemons which run on each
member of the condor pool, and two other daemons which run on a
single machine called the
.b "central manager" .
This software automatically locates and releases
.q "target machines"
and manages the queue of jobs waiting for condor resources.
The control software is described in Section 8.
.sh 1 "Remote System Calls"
.pp
To better understand how the condor remote system calls work,
it is appropriate to quickly review how normal UNIX system
calls work.
Figure 1 illustrates the normal UNIX system call mechanism.
The user program is linked with a standard library called the
.q "C library" .
This is true even for programs written in languages other than C.
The C library contains routines, often referred to as
.q "system call stubs" ,
which cause the actual system calls to happen.
What the stubs really do is push the system call number, and
system call arguments onto the stack, then execute an instruction
which causes a trap to the kernel.
When the kernel trap handler is called, it reads the system call number
and arguments, and performs the system call on behalf of the user
program.
The trap handler will then place the system call return value in a well
known register or registers, and return control to the user program.
The system call stub then returns the result to the calling process,
completing the system call.
.(b
.br
.nr g1 960u
.nr g2 1440u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "C Library
.sp 360u
\h'480u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "User Program
.sp 120u
\h'480u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "(Trap to Kernel)
.sp 480u
\h'480u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kernel
.sp 720u
\h'480u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kernel Services
.sp 1080u
\h'120u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "e.g. File System
.sp 1200u
\h'120u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
\D'l 0u 840u'\D'l 960u 0u'\D'l 0u -840u'\D'l -960u 0u'
.sp -1
.sp 240u
\D'l 0u 0u'\D'l 960u 0u'\D'l 0u 0u'\D'l -960u 0u'
.sp -1
\D'l 960u 0u'
.sp -1
.sp 360u
\D'l 960u 0u'
.sp -1
.sp 240u
\h'360u'\D'l -240u 180u'
.sp -1
\h'480u'\D'l -240u 180u'
.sp -1
.sp 180u
\h'120u'\D'l 16u -41u'\D'l 2u 27u'\D'l 25u 9u'\D'l -43u 5u'
.sp -1
.sp -180u
\h'480u'\D'l -17u 41u'\D'l -2u -27u'\D'l -25u -9u'\D'l 44u -5u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 1: Normal UNIX System Calls
.sp 600u
\h'480u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.sp 600u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.pp
Figure 2 illustrates how this mechanism has been altered by condor
to implement remote system calls.
Whenever condor is executing a user program remotely, it also runs a
.q shadow
program on the initiating host.
The
.b shadow
acts an agent for the remotely executing program in doing
system calls.
Condor user programs are linked with a special version of the C
library.
The special version contains all of the functions provided by the normal
C library, but the system call stubs have been changed to accomplish
remote system calls.
The remote system call stubs package up the system call number and
arguments and send them to the
.b shadow
using the network.
The
.b shadow ,
which is linked with the normal C library, then executes
the system call on behalf of the remotely running job in the normal
way.
The
.b shadow
then packages up the results of the system call and sends them
back to the system call stub in the special C library on the remote
machine.
The remote system call stub then returns its result to the calling procedure
which is unaware that the call was done remotely rather than locally.
Note that the
.b shadow
runs with its UID set to the owner of the remotely
running job so that it has the correct permissions into the local
file system, and the remotely running job runs with its UID set to
.q condor.
Condor is an ordinary user on the remote system, and thus has no special
privileges into that file system.
The remotely running user program runs in a
.q changeroot'd
environment to further protect the owner of the remote machine from
unwanted file system accesses by the foreign job it is hosting.
.(b
.br
.nr g1 3111u
.nr g2 1727u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
\D't 1u'
.sp -1
.sp 777u
\h'2117u'\D'l -31u 7u'\D'l 16u -12u'\D'l -5u -19u'\D'l 20u 24u'
.sp -1
.sp -345u
\h'1080u'\D'l 1037u 345u'
.sp -1
.sp -87u
\h'1080u'\D'l 31u -7u'\D'l -15u 12u'\D'l 5u 19u'\D'l -21u -24u'
.sp -1
\h'1080u'\D'l 1037u 346u'
.sp -1
.sp 260u
\h'2160u'\D'l 691u 0u'
.sp -1
.sp -346u
\h'2160u'\D'l 0u 604u'\D'l 691u 0u'\D'l 0u -604u'\D'l -691u 0u'
.sp -1
.sp 647u
\h'821u'\D'l -8u 30u'\D'l -3u -19u'\D'l -19u -3u'\D'l 30u -8u'
.sp -1
.sp 129u
\h'605u'\D'l 8u -30u'\D'l 4u 19u'\D'l 19u 4u'\D'l -31u 7u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "C Library
.sp -258u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Special
.sp -344u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "(UID = Condor)
.sp -517u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Program
.sp -603u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Condor User
.sp -690u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 12
.nr g8 \n(.d
.ds g9 "Remote Machine
.sp -906u
\h'2506u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -1035u
\h'1901u'\D'l 0u 1554u'\D'l 1210u 0u'\D'l 0u -1554u'\D'l -1210u 0u'
.sp -1
\D'l 0u 1554u'\D'l 1210u 0u'\D'l 0u -1554u'\D'l -1210u 0u'
.sp -1
.sp 906u
\h'821u'\D'l -130u 129u'
.sp -1
\h'735u'\D'l -130u 129u'
.sp -1
\D't 3u'
.sp -1
.sp 259u
\h'173u'\D'l 864u 0u'
.sp -1
\h'173u'\D'l 950u 0u'
.sp -1
\D't 1u'
.sp -1
.sp 86u
\h'1066u'\D'l 0u 216u'
.sp -1
\h'893u'\D'l 0u 231u'
.sp -1
.sp 231u
\h'893u'\D'g 29u 14u 57u 15u 58u -15u 29u -29u'
.sp -1
.sp -187u
\h'979u'\D'g -86u -44u 86u -43u 87u 43u -87u 44u'
.sp -1
\h'720u'\D'g -86u -44u 86u -43u 87u 43u -87u 44u'
.sp -1
.sp 187u
\h'634u'\D'g 29u 14u 57u 15u 58u -15u 29u -29u'
.sp -1
.sp -231u
\h'634u'\D'l 0u 231u'
.sp -1
\h'807u'\D'l 0u 216u'
.sp -1
\h'547u'\D'l 0u 216u'
.sp -1
\h'375u'\D'l 0u 231u'
.sp -1
.sp 231u
\h'375u'\D'g 29u 14u 57u 15u 58u -15u 28u -29u'
.sp -1
.sp -187u
\h'461u'\D'g -86u -44u 86u -43u 86u 43u -86u 44u'
.sp -1
.sp -44u
\h'331u'\D'l 0u 216u'
.sp -1
\h'159u'\D'l 0u 231u'
.sp -1
.sp 231u
\h'159u'\D'g 29u 14u 57u 15u 58u -15u 28u -29u'
.sp -1
.sp -187u
\h'245u'\D'g -86u -44u 86u -43u 86u 43u -86u 44u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Local File System
.sp -216u
\h'605u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -604u
\h'519u'\D'l 518u 0u'
.sp -1
.sp -173u
\h'519u'\D'l 518u 0u'
.sp -1
.sp -259u
\h'519u'\D'l 0u 604u'\D'l 518u 0u'\D'l 0u -604u'\D'l -518u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kernel
.sp 518u
\h'778u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "C Library
.sp 346u
\h'778u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "(UID = User)
.sp 173u
\h'778u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Shadow
.sp 86u
\h'778u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 12
.nr g8 \n(.d
.ds g9 "Initiating Machine
.sp -130u
\h'605u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "System Call Request
.sp 86u
\h'1555u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "System Call Reply
.sp 432u
\h'1555u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 2: Remote System Calls
.sp 1468u
\h'1469u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.sp 1468u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.sh 1 Checkpointing
.pp
To checkpoint a UNIX process, several things must be preserved.
The text, data, stack, and register contents are needed, as well as
information about what files are open, where they are seek'd to,
and what mode they were opened in.
The data, and stack are available in a core file,
while the text is available in the original executable.
Condor gathers the information about currently open files through
the special C library.
In condor's special C library the system call stubs for
.q open ,
.q close ,
and
.q dup
not only do those things remotely, but they also record which files
are opened in what mode, and which file descriptors correspond to
which files.
.pp
Condor causes a running job to checkpoint by sending it a signal.
When the program is linked, a special version of
.q crt0
is included which sets up CKPT() as that signal handler.
When CKPT() is called, it updates the table of
open files by seeking each one to the current location and recording
the file position.
Next a setjmp(3) is executed to save key register contents in a global
data area, then the process sends itself a signal which results in a
core dump.
The condor software then combines the original executable file, and the
core file to produce a
.q checkpoint
file, (figure 3).
The checkpoint file is itself executable.
.pp
When the checkpoint file is restarted, it starts from the crt0 code
just like any UNIX executable, but again this code is special,
and it will set up the restart() routine as a signal handler with
a special signal stack, then send itself that signal.
When restart() is called, it will operate in the temporary stack area
and read the saved stack in from the checkpoint file,
reopen and reposition all files from the saved file state information,
and execute a longjmp(3) back to CKPT().
When the restart routine returns, it does so with respect to
the restored stack, and CKPT() returns to the routine which was active
at the time of the checkpoint signal, not crt0.
To the user code, checkpointing looks exactly like a signal handler
was called, and restarting from a checkpoint looks like a return from
that signal handler.
.(b
.br
.nr g1 1868u
.nr g2 1727u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
\D't 1u'
.sp -1
.sp 559u
\h'1588u'\D'g -280u -92u 280u -94u 280u 94u -280u 92u'
.sp -1
.sp 280u
\h'1308u'\D'g 93u 47u 187u 47u 187u -47u 93u -47u'
.sp -1
.sp -372u
\h'1868u'\D'l 0u 372u'
.sp -1
\h'1308u'\D'l 0u 372u'
.sp -1
.sp 466u
\h'280u'\D'g -280u -94u 280u -93u 280u 93u -280u 94u'
.sp -1
.sp 560u
\D'g 93u 47u 187u 47u 187u -47u 93u -47u'
.sp -1
.sp -654u
\D'l 0u 654u'
.sp -1
.sp -652u
\h'280u'\D'g -280u -94u 280u -93u 280u 93u -280u 94u'
.sp -1
.sp 280u
\D'g 93u 46u 187u 46u 187u -46u 93u -46u'
.sp -1
.sp -374u
\h'560u'\D'l 0u 374u'
.sp -1
\D'l 0u 374u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "text
.sp 374u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 234u
\D'g 93u 46u 187u 47u 187u -47u 93u -46u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "data
.sp 1073u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "stack
.sp 933u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "registers
.sp 793u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 933u
\D'g 93u 46u 187u 47u 187u -47u 93u -46u'
.sp -1
.sp -140u
\D'g 93u 46u 187u 47u 187u -47u 93u -46u'
.sp -1
.sp -140u
\D'g 93u 46u 187u 47u 187u -47u 93u -46u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Executable
.sp -374u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Core
.sp 653u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Checkpoint
\h'1588u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -141u
\h'560u'\D'l 0u 654u'
.sp -1
.sp -419u
\h'607u'\D'l 654u 93u'
.sp -1
.sp 93u
\h'1261u'\D'l -31u 13u'\D'l 14u -15u'\D'l -10u -19u'\D'l 27u 21u'
.sp -1
.sp 513u
\h'607u'\D'l 654u -420u'
.sp -1
.sp -420u
\h'1261u'\D'l -15u 30u'\D'l 0u -21u'\D'l -19u -8u'\D'l 34u -1u'
.sp -1
.sp 560u
\h'607u'\D'l 654u -467u'
.sp -1
.sp -467u
\h'1261u'\D'l -14u 32u'\D'l 0u -22u'\D'l -20u -7u'\D'l 34u -3u'
.sp -1
.sp 654u
\h'607u'\D'l 654u -560u'
.sp -1
.sp -560u
\h'1261u'\D'l -11u 32u'\D'l -2u -21u'\D'l -21u -6u'\D'l 34u -5u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "(incl. file info)
.sp 700u
\h'280u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 3: Creating a Checkpoint File
.sp 934u
\h'1027u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.sp 934u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.sh 1 "Control Software"
.pp
Each machine in the condor pool runs two daemons, the
.b schedd
and the
.b startd .
In addition, one machine runs two other daemons called the
.b collector
and the
.b negotiator .
While the
.b collector
and the
.b negotiator
are separate processes, they
work closely together, and for purposes of this discussion can
be considered one logical process called the
.b "central manager" .
The
.b "central manager"
has the job of keeping track of which machines are idle,
and allocating those machines to other machines which have condor jobs
to run.
On each machine the
.b schedd
maintains a queue of condor jobs,
and negotiates with the
.b "central manager"
to get permission to run those jobs on remote machines.
The
.b startd
determines whether its machine is idle, and also
is responsible for starting and managing foreign jobs which it
may be hosting.
On machines running the X window system,
an additional daemon the
.b kbdd
will periodically inform the
.b startd
of the keyboard and mouse
.q "idle time" .
Periodically the
.b startd
will examine its machine, and update the
.b "central manager"
on its degree of "idleness".
Also periodically the
.b schedd
will examine its job queue and update the
.b "central manager"
on how many jobs it wants to run and how many jobs
it is currently running, (figure 4).
.(b
.br
.nr g1 3299u
.nr g2 1727u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Central Manager
.sp 314u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 393u
\h'1650u'\D'g -314u -79u 314u -79u 314u 79u -314u 79u'
.sp -1
.sp 470u
\h'393u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
\h'1178u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
\h'2121u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
\h'2906u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp -79u
\h'1178u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp -79u
\h'2121u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp -79u
\h'2906u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp -79u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -235u
\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
\h'1728u'\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
.sp -628u
\h'1021u'\D'l 0u 550u'\D'l 1257u 0u'\D'l 0u -550u'\D'l -1257u 0u'
.sp -1
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine a
.sp 157u
\h'1964u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine b
.sp 1491u
\h'1257u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine c
.sp 1491u
\h'2985u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 706u
\h'550u'\D'l 943u -313u'
.sp -1
.sp -313u
\h'1493u'\D'l -19u 22u'\D'l 5u -18u'\D'l -14u -10u'\D'l 28u 6u'
.sp -1
.sp 313u
\h'1178u'\D'l 393u -313u'
.sp -1
.sp -313u
\h'1571u'\D'l -10u 26u'\D'l -1u -17u'\D'l -17u -6u'\D'l 28u -3u'
.sp -1
.sp 313u
\h'1178u'\D'l 10u -27u'\D'l 2u 18u'\D'l 17u 5u'\D'l -29u 4u'
.sp -1
.sp -313u
\h'1728u'\D'l 236u 313u'
.sp -1
.sp 313u
\h'1964u'\D'l -27u -11u'\D'l 18u -1u'\D'l 6u -17u'\D'l 3u 29u'
.sp -1
.sp -313u
\h'1728u'\D'l 27u 10u'\D'l -18u 1u'\D'l -6u 17u'\D'l -3u -28u'
.sp -1
.sp -40u
\h'1925u'\D'l 824u 353u'
.sp -1
\h'1925u'\D'l 28u -3u'\D'l -15u 9u'\D'l 3u 18u'\D'l -16u -24u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Legend
.sp -314u
\h'2435u'\v'0.85n'\&\*(g9
.sp |\n(g8u
\D's 4u'\D't 1u'
.sp -1
.sp -196u
\h'2514u'\D'l 157u 0u'
.sp -1
\D's -1u'
.sp -1
.sp 78u
\h'2514u'\D'l 157u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "process started by fork/exec
.sp -78u
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "communication link
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 4: Condor Processes With No Jobs Running
.sp 1492u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 932u
\h'1174u'\D'g -315u -78u 315u -79u 314u 79u -314u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -83u
\h'1183u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
\h'2897u'\D'g -315u -78u 315u -79u 314u 79u -314u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -83u
\h'2906u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -314u
\h'2906u'\D'l 0u 152u'
.sp -1
\h'2906u'\D'l 15u 25u'\D'l -15u -10u'\D'l -14u 10u'\D'l 14u -25u'
.sp -1
.sp 152u
\h'1178u'\D'l -549u -157u'
.sp -1
.sp -157u
\h'629u'\D'l 27u -7u'\D'l -13u 11u'\D'l 5u 17u'\D'l -19u -21u'
.sp -1
.sp 879u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.pp
At some point the
.b "central manager"
may learn that
.i "machine b"
is idle, and decide that
.i "machine c"
should execute one of its jobs remotely on
.i "machine b" .
The
.b "central manager"
will then contact the
.b schedd
on
.i "machine c"
and give it
.q permission
to run a job on
.i "machine b" .
The
.b schedd
on
.i "machine c"
will then select a job from its queue and spawn off a
.b shadow
process to run it.
The
.b shadow
will then contact the
.b startd
on
.i "machine b"
and tell it that it would
like to run a job.
If the situation on
.i "machine b"
hasn't changed since the last update to the
.b "central manager" ,
.i "machine b"
will still be idle, and will respond with an OK.
The
.b startd
on
.i "machine b"
then spawns a process called the
.b starter .
It's the
.b starter's
job to start and manage the remotely running job
(figure 5).
.(b
.br
.nr g1 3299u
.nr g2 1727u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "communication link
.sp 235u
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "process started by fork/exec
.sp 157u
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 235u
\h'2514u'\D'l 157u 0u'
.sp -1
\D's 4u'
.sp -1
.sp -78u
\h'2514u'\D'l 157u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Legend
.sp -118u
\h'2435u'\v'0.85n'\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 706u
\h'2121u'\D'l 0u 157u'
.sp -1
\h'393u'\D'l 0u 157u'
.sp -1
\D's -1u'
.sp -1
.sp 236u
\h'1807u'\D'l -25u 14u'\D'l 10u -14u'\D'l -10u -15u'\D'l 25u 15u'
.sp -1
.sp -315u
\h'707u'\D'l 28u -7u'\D'l -14u 11u'\D'l 6u 17u'\D'l -20u -21u'
.sp -1
.sp 315u
\h'1807u'\D'l -1100u -315u'
.sp -1
.sp -746u
\h'1925u'\D'l 28u -3u'\D'l -15u 9u'\D'l 3u 18u'\D'l -16u -24u'
.sp -1
\h'1925u'\D'l 824u 353u'
.sp -1
.sp 40u
\h'1728u'\D'l 27u 10u'\D'l -18u 1u'\D'l -6u 17u'\D'l -3u -28u'
.sp -1
.sp 313u
\h'1964u'\D'l -27u -11u'\D'l 18u -1u'\D'l 6u -17u'\D'l 3u 29u'
.sp -1
.sp -313u
\h'1728u'\D'l 236u 313u'
.sp -1
.sp 313u
\h'1178u'\D'l 10u -27u'\D'l 2u 18u'\D'l 17u 5u'\D'l -29u 4u'
.sp -1
.sp -313u
\h'1571u'\D'l -10u 26u'\D'l -1u -17u'\D'l -17u -6u'\D'l 28u -3u'
.sp -1
.sp 313u
\h'1178u'\D'l 393u -313u'
.sp -1
.sp -313u
\h'1493u'\D'l -19u 22u'\D'l 5u -18u'\D'l -14u -10u'\D'l 28u 6u'
.sp -1
.sp 313u
\h'550u'\D'l 943u -313u'
.sp -1
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine c
.sp 785u
\h'2985u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine b
.sp 785u
\h'1257u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine a
.sp -549u
\h'1964u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -706u
\h'1021u'\D'l 0u 550u'\D'l 1257u 0u'\D'l 0u -550u'\D'l -1257u 0u'
.sp -1
.sp 628u
\h'1728u'\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp 156u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp 156u
\h'2906u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Shadow
.sp 471u
\h'2121u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp 156u
\h'2121u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp 156u
\h'1178u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Starter
.sp 471u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 235u
\h'2906u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
.sp 314u
\h'2121u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.sp -314u
\h'2121u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
\h'1178u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
.sp 314u
\h'393u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.sp -314u
\h'393u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
.sp -470u
\h'1650u'\D'g -314u -79u 314u -79u 314u 79u -314u 79u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Central Manager
.sp -79u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 5: Condor Processes While Starting a Job
.sp 1334u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 774u
\h'1188u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -88u
\h'1183u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -5u
\h'2926u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -88u
\h'2921u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -157u
\h'2906u'\D'l 0u -152u'
.sp -1
.sp -152u
\h'2911u'\D'l 14u 25u'\D'l -14u -10u'\D'l -15u 9u'\D'l 15u -24u'
.sp -1
.sp 152u
\h'1178u'\D'l -549u -157u'
.sp -1
.sp -157u
\h'629u'\D'l 27u -7u'\D'l -13u 11u'\D'l 5u 17u'\D'l -19u -21u'
.sp -1
.sp 879u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.pp
The
.b shadow
on
.i "machine c"
will transfer the checkpoint file to the
.b starter
on
.i "machine b" .
The
.b starter
then sets a timer and spawns off the remotely running job
from
.i "machine c"
(figure 6).
The
.b shadow
on
.i "machine c"
will handle all system calls for the job.
When the
.b starter's
timer expires it
will send the user job a checkpoint signal,
causing it to save its file state and stack, then dump core.
The
.b starter
then builds a new version of the checkpoint file which
is stored temporarily on
.i "machine b" .
The
.b starter
restarts the job from the new checkpoint file, and the
cycle of execute and checkpoint continues.
At some point, either the job will finish, or
.i "machine b's"
user will
return.
If the job finishes, the job's owner is notified by mail, and the
.b starter
and
.b shadow
clean up.
If
.i "machine b"
becomes busy, the
.b startd
on
.i "machine b"
will detect that either by
noting recent activity on one of the tty or pty's, or by the rising
load average.
When the
.b startd
on
.i "machine b"
detects this activity, it will send a
.q suspend
signal to the
.b starter ,
and the
.b starter
will temporarily suspend the user job.
This is because frequently the owners of machines are active for only
a few seconds, then become idle again.
This would be the case if the owner were just checking to see if there were
new mail for example.
If
.i "machine b"
remains busy for a period of about 5 minutes, the
.b startd
there will send a
.q vacate
signal to the
.b starter .
In this case, the
.b starter
will abort the user job and return the latest
checkpoint file to the
.b shadow
on
.i "machine c" .
If the job had not run long enough on
.i "machine b"
to reach a checkpoint,
the job is just aborted, and will be restarted later from the most
recent checkpoint on
.i "machine c" .
Notice that the
.b starter
checkpoints the condor user job periodically rather than waiting
until the remote workstation's owner wants it back.
Checkpointing, and in particular core dumping, is an I/O intensive
activity which we avoid doing when the hosting workstation's owner is active.
.(b
.br
.nr g1 3299u
.nr g2 1727u
.GS C
.nr g3 \n(.f
.nr g4 \n(.s
\0
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "communication link
.sp 235u
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "process started by fork/exec
.sp 157u
\h'2710u'\v'0.85n'\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 235u
\h'2514u'\D'l 157u 0u'
.sp -1
\D's 4u'
.sp -1
.sp -78u
\h'2514u'\D'l 157u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Legend
.sp -118u
\h'2435u'\v'0.85n'\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 706u
\h'2121u'\D'l 0u 456u'
.sp -1
.sp 314u
\h'393u'\D'l 0u 157u'
.sp -1
.sp -314u
\h'393u'\D'l 0u 157u'
.sp -1
\D's -1u'
.sp -1
.sp -510u
\h'1925u'\D'l 28u -3u'\D'l -15u 9u'\D'l 3u 18u'\D'l -16u -24u'
.sp -1
\h'1925u'\D'l 824u 353u'
.sp -1
.sp 40u
\h'1728u'\D'l 27u 10u'\D'l -18u 1u'\D'l -6u 17u'\D'l -3u -28u'
.sp -1
.sp 313u
\h'1964u'\D'l -27u -11u'\D'l 18u -1u'\D'l 6u -17u'\D'l 3u 29u'
.sp -1
.sp -313u
\h'1728u'\D'l 236u 313u'
.sp -1
.sp 313u
\h'1178u'\D'l 10u -27u'\D'l 2u 18u'\D'l 17u 5u'\D'l -29u 4u'
.sp -1
.sp -313u
\h'1571u'\D'l -10u 26u'\D'l -1u -17u'\D'l -17u -6u'\D'l 28u -3u'
.sp -1
.sp 313u
\h'1178u'\D'l 393u -313u'
.sp -1
.sp -313u
\h'1493u'\D'l -19u 22u'\D'l 5u -18u'\D'l -14u -10u'\D'l 28u 6u'
.sp -1
.sp 313u
\h'550u'\D'l 943u -313u'
.sp -1
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine c
.sp 785u
\h'2985u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine b
.sp 785u
\h'1257u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft I
.ps 8
.nr g8 \n(.d
.ds g9 "machine a
.sp -549u
\h'1964u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -706u
\h'1021u'\D'l 0u 550u'\D'l 1257u 0u'\D'l 0u -550u'\D'l -1257u 0u'
.sp -1
.sp 628u
\h'1728u'\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
\D'l 0u 942u'\D'l 1571u 0u'\D'l 0u -942u'\D'l -1571u 0u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp 156u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Startd
.sp 156u
\h'2906u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Shadow
.sp 785u
\h'2121u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp 156u
\h'2121u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Schedd
.sp 156u
\h'1178u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "User Job
.sp 785u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Starter
.sp 471u
\h'393u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 235u
\h'2906u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
.sp 628u
\h'2121u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.sp -628u
\h'2121u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
\h'1178u'\D'g -314u -79u 314u -78u 315u 78u -315u 79u'
.sp -1
.sp 628u
\h'393u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.sp -314u
\h'393u'\D'g -314u -78u 314u -79u 314u 79u -314u 78u'
.sp -1
.sp -314u
\h'393u'\D'g -314u -79u 314u -78u 314u 78u -314u 79u'
.sp -1
.sp -470u
\h'1650u'\D'g -314u -79u 314u -79u 314u 79u -314u 79u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Central Manager
.sp -79u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Figure 6: Condor Processes With One Job Running
.sp 1334u
\h'1650u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp 784u
\h'1178u'\D'g -314u -78u 314u -79u 315u 79u -315u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -93u
\h'1178u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
\h'2906u'\D'g -314u -78u 314u -79u 315u 79u -315u 78u'
.sp -1
.ft R
.ps 8
.nr g8 \n(.d
.ds g9 "Kbdd
.sp -93u
\h'2906u'\v'0.85n'\h-\w\*(g9u/2u\&\*(g9
.sp |\n(g8u
\D't 1u'
.sp -1
.sp -15u
\h'629u'\D'l 1178u 236u'
.sp -1
.sp 236u
\h'1807u'\D'l -27u 10u'\D'l 12u -13u'\D'l -6u -16u'\D'l 21u 19u'
.sp -1
.sp -236u
\h'629u'\D'l 27u -9u'\D'l -13u 12u'\D'l 7u 17u'\D'l -21u -20u'
.sp -1
.sp 236u
\h'1807u'\D'l -1100u 0u'
.sp -1
\h'707u'\D'l 25u -15u'\D'l -10u 15u'\D'l 10u 15u'\D'l -25u -15u'
.sp -1
\h'1807u'\D'l -25u 15u'\D'l 10u -15u'\D'l -10u -15u'\D'l 25u 15u'
.sp -1
.sp -393u
\h'1178u'\D'l -549u -157u'
.sp -1
.sp -157u
\h'629u'\D'l 27u -7u'\D'l -13u 11u'\D'l 5u 17u'\D'l -19u -21u'
.sp -1
\h'2906u'\D'l 0u 157u'
.sp -1
\h'2906u'\D'l 15u 25u'\D'l -15u -10u'\D'l -14u 10u'\D'l 14u -25u'
.sp -1
.sp 879u
\D't -1u'\D's -1u'
.br
.ft \n(g3
.ps \n(g4
.GE
.)b
.sh 1 "Control Expressions"
.pp
The condor control software is driven by a set of powerful
.q "control expressions" .
These expressions are read from the file
.q ~condor/condor_config
on each machine at run time.
It is often convenient for many machines of the same type to share
common control expressions, and this may be done through a fileserver.
To allow flexibility for control of individual machines, the file
.q ~condor/condor_config.local
is provided, and expressions defined there take precedence over those
defined in condor_config.
Following are examples of a few of the more important condor control
expressions with explanations.
See condor_config(5) for a detailed description of all the control expressions.
.sh 2 "Starting Foreign Jobs"
.pp
This set of expressions is used by the
.b startd
to determine when to allow
a foreign job to begin execution.
.ta 15n
.(l
BackgroundLoad = 0.3
StartIdleTime = 15 * $(MINUTE)
CPU_Idle = LoadAvg <= $(BackgroundLoad)
START : $(CPU_Idle) && KeyboardIdle > $(StartIdleTime)
.)l
.lp
This example of the START expression specifies that
to begin execution of a foreign job
the load average must be less than 0.3, and there must have been no keyboard
activity during the past 15 minutes.
.lp
Other expressions are used to determine when to suspend, resume, and
abort foreign jobs.
.sh 2 "Prioritizing Jobs"
.pp
The
.b schedd
must prioritize its own jobs and negotiate with the
.b "central manager"
to get permission to run them.
It uses a control expression to assign priorities to its local jobs.
.(l
PRIO : (UserPrio * 10) + $(Expanded) - (QDate / 1000000000.0)
.)l
.lp
.q UserPrio
is a number defined by the jobs owner in a similar spirit to
the UNIX
.q nice
command.
.q Expanded
will be 1 if the job has already completed some execution, and
0 otherwise.
This is an issue because expanded jobs require more disk space than
unexpanded ones.
.q QDate
is the UNIX time when the job was submitted.
The constants are chosen so that
.q UserPrio
will be the major criteria,
.q Expanded
will be less important, and
.q QDate
will be the minor criteria
in determining job priority.
.q UserPrio ,
.q Expanded ,
and
.q QDate
are variables known to the
.b schedd
which it determines for each job before applying the PRIO expression.
.sh 2 "Prioritizing Machines"
.pp
The
.b "central manager"
does not keep track of individual jobs on the member
machines.
Instead it keeps track of how many jobs a machine wants to run, and how
many it is running at any particular time.
This keeps the information that must be transmitted between the
.b schedd
and the
.b "central manager"
to a minimum.
The
.b "central manager"
has the job of prioritizing the machines which want to
run jobs, then it can give permission to the
.b schedd
on high priority
machines and let them make their own decision about what jobs to run.
.(l
UPDATE_PRIO : Prio + Users - Running
.)l
.lp
Periodically the
.b "central manager"
will apply this expression to all of the
machines in the pool.
The priority of each machine will be incremented by the number of individual
users on that machine who have jobs in the queue, and decremented by the
number of jobs that machine is already executing remotely.
Machines which are running lots of jobs will tend to have low priorities,
and machines which have jobs to run, but can't run them, will accumulate
high priorities.
.sh 1 "Acknowledgements"
.pp
This project is based on the idea of a
.q "processor bank" ,
which was introduced by Maurice Wilkes in connection with his work on the
Cambridge Ring.\**
.(f
\**Wilkes, M. V.,
Invited Keynote Address,
10th Annual International Symposium on Computer Architecture,
June 1983.
.)f
.pp
We would like to thank Don Neuhengen and Tom Virgilio for their
pioneering work on the remote system call implementation;
Matt Mutka and Miron Livny
for first convincing us that a general checkpointing mechanism
could be practical and for ideas on how to distribute control and
prioritize the jobs;
and David Dewitt and Marvin Solomon for their continued guidance
and support throughout this project.
.pp
This research was supported by the National Science Foundataion under
grants MCS81-05904 and DCR-8512862 and by a Digital Equipment Corporation
External Research Grant.
.sh 1 "Copyright Information"
.lp
Copyright 1986, 1987, 1988, 1989 University of Wisconsin
.lp
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of the University of
Wisconsin not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission. The University of Wisconsin makes no representations about
the suitability of this software for any purpose. It is provided "as
is" without express or implied warranty.
.lp
THE UNIVERSITY OF WISCONSIN DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS. IN NO EVENT SHALL THE UNIVERSITY OF WISCONSIN BE LIABLE FOR
ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.lp
.ta 10n
Authors: Allan Bricker and Michael J. Litzkow,
.br
University of Wisconsin, Computer Sciences Dept.
.sh 1 "Bibliography"
.np
Mutka, M. and Livny, M.
.q "Profiling Workstations' Available Capacity For Remote Execution" .
.i
Proceedings of Performance-87, The 12th IFIP W.G. 7.3
International Symposium on Computer Performance Modeling,
Measurement and Evaluation.
.r
Brussels, Belgium, December 1987.
.np
Litzkow, M.
.q "Remote Unix \(em Turning Idle Workstations Into Cycle Servers" .
.i
Proceedings of the Summer 1987 Usenix Conference.
.r
Phoenix, Arizona.
June 1987
.np
Mutka, M.
.i
Sharing in a Privately Owned Workstation Environment.
.r
Ph.D. Th.,
University of Wisconsin, May 1988.
.np
Litzkow, M., Livny, M. and Mutka, M.
.q "Condor \(em A Hunter of Idle Workstations" .
.i
Proceedings of the
8th International Conference on Distributed Computing Systems.
.r
San Jose, Calif.
June 1988
.np
Bricker, A. and Litzkow M.
.q "Condor Installation Guide" .
May 1989
.np
Bricker, A. and Litzkow, M.
Unix manual pages: condor_intro(1), condor(1), condor_q(1), condor_rm(1),
condor_status(1), condor_summary(1),
condor_config(5),
condor_control(8), and condor_master(8).
May 1989