The Distributed Queueing System (DQS) can potentially manage a large, heterogeneous collection of workstations involving a massive throughput of jobs. It is often desirable to track that throughput, for reasons of accountability. DQS provides for accountability by:
DQS's accounting support software consists of the programs qacct
and qusage
. DQS Queue Accounting (qacct
) is a UNIX shell
command-line interface to the analysis of DQS accounting files. DQS
Queue Usage (qusage
) is an X Window System graphical interface to
the analysis of DQS accounting files.
We're interested in measurements of queue usage during a given interval of time. The smallest time resolution is one second -- typically, however, the interval is divided into some number of usage bins.
The queue utilization measure provides an indication of the number of queues that are running jobs at a particular moment (i.e. during a bin). It is given by the sum of the job execution times during a bin divided by the bin size (length of time of the bin). Another measure, CPU utilization, will be described in a moment.
Consider the following figure. The interval is bounded by the times
denoted IStart
and IStop
. The interval has been divided
into eight usage bins, enumerated 0 to 7, with pound symbols (#) marking
the end of each usage bin. The bin size is 6.
Also illustrated are five jobs (lettered) whose begin and end times are marked with carets (^). (Note that the execution time for each job is one time unit less than you may interpret from the diagram e.g. job A's execution time is 8 time units, not 9.) Jobs B, C, and D fall completely within the interval, while jobs A and E are only partially contained.
IStart IStop |---------------------------------------------| 0 # 1 # 2 # 3 # 4 # 5 # 6 # 7 # ^---A---^ ^B-^ ^------------C----------^ ^--D---^ ^------------E----------------^
At the time of analysis, bin 0 has seen one job (A) which endured the entire bin (6 time units). The queue utilization during that bin is therefore 6 time units divided by a bin size of 6 time units, or 1.
Bin 1 involves job B, which lasted 3 time units. The queue utilization, then, is 3 / 6, or 0.5.
Bin 5 has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. Accordingly, the queue usage during the bin is (6 + 5 + 6) / 6, or 2.83.
The CPU utilization measurement indicates the usage of the processors in
the cluster, broken down into user and system categories. (The quality
of this information is entirely dependent on the facilities that your
machines' operating systems provide. Documentation for the UNIX system
call getrusage(2)
might be a good starting point.)
CPU usage is given by the sum of each job's execution time multiplied by that job's average CPU time (user or system) during a bin, divided by the bin size.
For instance, recalling the above figure, job B lasted 3 time units. Suppose the user time for that job, reported by the operating system, was 1. The CPU usage (user) is therefore (3 * (1 / 3)) / 6, or 0.17.
Bin 5 has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. The total time for jobs C, D, and E is 24, 7, and 31 time units, respectively. Assume that the system times for those jobs are 19, 4, and 29, respectively. The CPU usage (system) for bin 5, then, is given by:
(6 * (19 / 24)) + (5 * (4 / 7)) + (6 * (29 / 31)) ------------------------------------------------- 6
or 2.2.
The top-level window presents the following options:
[Usage...] enters the Usage Options submenu, allowing the specification of parameters for the analysis of an accounting file.
[Help...] invokes the on-line help facility.
[About...] displays version, copyright and author information.
[Quit] exits the program.
The following fields are available to specify options for the analysis of the accounting file. The logic behind the matching of entries in the accounting file is roughly: "If any part of the job occurs within the specified interval (Start and Days) and fits any of the specified parameters (Queue, Host, Complex, Group, Owner, and Job), then a match has occurred."
Name of the accounting file. Default (in order of override) is the
value of the environmental variable DQSACCTFILE
, or the name of
the accounting file compiled into the DQS package.
Number of "days ago" to start the analysis. Default is 7 days.
Number of days to analyze (from Start). Default is 7 days.
Name of a queue. The name may include wildcard characters in order to perform pattern matching (as may most of these parameters). No default.
Name of a host. No default.
A queue complex specification. No default.
Name of an accounting (i.e. billing) group. No default.
Username of the owner of a job. No default.
A job identifier. Default is "*" i.e. match all jobs.
Number of seconds per bin. Default is 3600 seconds i.e. 1 hour.
[Use Defaults/Remove Defaults] either inserts the aforementioned default values into the fields, or removes them.
[Accept] signifies approval of the options entered--the accounting file is analyzed and graphed.
[Help...] invokes the on-line help facility (see section Help).
[Cancel] returns to the top-level window.
[CPU/Queue] toggles between CPU and queue utilization.
[Line/Bar] toggles between a line or bar plot style.
[Grid/No Grid] toggles between a grid or no grid on the plot.
[Legend/No Legend] toggles the plot legend on and off.
[Print...] enters the Usage Print submenu, allowing the writing of the current plot to a PostScript file.
[Help...] invokes the on-line help facility (see section Help).
[Done] exits the plot, returning to the Usage Options window.
Usage Print
This window allows the specification of some options for the writing of a plot to a PostScript file.
Name of the file to which the PostScript will be written. Default is
qusage.ps
.
Toggles between portrait and landscape orientation.
WARNING: Existing files will be overwritten.
This window displays version, copyright and author information. Refer to this information when reporting bugs, suggesting enhancements, etc.
This chapter is intended for those who wish to understand and perhaps modify the source code for the DQS accounting programs. It is also relevant for users that are asking "Why in hell is this program not giving me what I think it should!?"
The high-level algorithm for the DQS accounting programs is as follows:
(1.0) get user's request (i.e. interval start and stop, queue complex specification, job ID, etc.) (2.0) for each job (i.e. line in the accounting file) (2.1) if job matches user's request (2.2) tally job's usage into the usage bins (3.0) display the usage bins
Portions 1.0 and 3.0 are related to user interface and are discussed in the "qacct Internals" and "qusage Internals" sections.
Portions 2.1 and 2.2 are covered in the "Commonalities" section.
Portion 2.0 is covered by the section entitled "Accounting Files".
The DQS accounting file (act_file
) contains a line for each DQS
job that has completed execution. Each line contains the following
fields, separated by colons.
char *qname; /* name of queue */ char *hostname; /* name of host */ u_long32 master; /* master node? (true/false) */ char *complex; /* queue complex resource string */ /* (comma sep'd) */ char *group; /* name of accounting group */ char *owner; /* user name of owner */ char *job_name; /* name of job (perhaps NULL) */ char *dqs_job_name; /* job identifier */ u_long32 job_number; /* job identifier */ u_long32 submission_time; /* time of receipt by qmaster (in sec) */ u_long32 start_time; /* time execution began (in sec) */ u_long32 end_time; /* time execution finished (in sec) */ u_long32 exit_status; /* exit value returned */ u_long32 ru_wallclock; /* time taken to execute (in sec) */ u_long32 ru_utime; /* user time used */ u_long32 ru_stime; /* system time used */ u_long32 ru_maxrss; /* maximum resident set size */ u_long32 ru_ixrss; /* integral shared text size */ u_long32 ru_ismrss; /* integral shared memory size */ u_long32 ru_idrss; /* integral unshared data size */ u_long32 ru_isrss; /* integral unshared stack size */ u_long32 ru_minflt; /* page reclaims */ u_long32 ru_majflt; /* page faults */ u_long32 ru_nswap; /* swaps */ u_long32 ru_inblock; /* block input operations */ u_long32 ru_oublock; /* block output operations */ u_long32 ru_msgsnd; /* messages sent */ u_long32 ru_msgrcv; /* messages received */ u_long32 ru_nsignals; /* signals received */ u_long32 ru_nvcsw; /* voluntary context switches */ u_long32 ru_nivcsw; /* involuntary context switches */
The fields whose labels begin with "ru_" contain information gathered
from the UNIX system call getrusage(2)
. It is important to note
that the quality of this information is entirely dependent on the
facilities that your machines' operating systems provide. Your vendor's
documentation for getrusage(2)
might be a good starting point.
The DQS statistics file (stat_file
) contains a line for each DQS
job currently running. The statistics are repeatedly written to the
statistics file at the end of an interval (usually every ten minutes,
but this is adjustable). Each line contains the following fields,
separated by colons.
u_long32 now; /* time (in secs) that stats were logged */ char *hostname; /* name of host */ char *qname; /* name of queue */ u_long32 load_avg; /* load average */ u_long32 qty; /* number of said queues */ u_long32 qty_active; /* number of said queues with active jobs */ char *complex; /* queue complex resource string */ /* (comma sep'd) */ char *states; /* One or more of (concat'ed): */ /* 'a' ALARM */ /* 'c' SUSPEND_ON_COMP */ /* 'd' DISABLED */ /* 'e' ENABLED */ /* 'h' HELD */ /* 'm' MIGRATING */ /* 'q' QUEUED */ /* 'r' RUNNING */ /* 's' SUSPENDED */ /* 't' TRANSISTING */ /* 'u' UNKNOWN */ /* 'w' WAITING */ /* 'x' EXITING */
The qacct
and qusage
programs necessarily share a good
deal of functionality. Essentially, the only difference is the user
interface e.g. textual and shell command-line-based vs. graphical and
point-and-click-based.
This section covers the internals of the code common to both programs. Recall portions 2.X of the high-level algorithm stated above (See section Internals).
(2.0) for each job (i.e. line in the accounting file) (2.1) if job matches user's request (2.2) tally job's usage into the usage bins
Portion 2.0 has been covered above (See section Accounting). Portion 2.1 is handled below in "Matching a Job to a Request" and portion 2.2 in "Calculating DQS Usage".
The user has specified a request that constrains us to match a subset of the jobs in the accounting file. A match has occurred if both of the following are true:
All of the request's parameters may contain wildcard characters.
If these conditions are met then the matched job is factored into the usage.
The sections that follow consider the following figure, reproduced from a previous section (See section Queue Utilization), only here using more specific timing details.
The interval is bounded by the times 10 and 56, denoted IStart
and IStop
, respectively. According to a bin size of 6, the
interval has been divided into eight usage bins, enumerated 0 to 7, with
pound symbols (#) marking the end of each usage bin.
Also illustrated are five jobs (lettered) whose begin and end times are below the carets (^). The execution time for each job is defined to be the job's end time minus its start time e.g. job A's execution time is 16 - 8 = 8 time units. (Note that this is one time unit less than you may interpret from the diagram's caret markers.) Jobs B, C, and D fall completely within the interval, while jobs A and E are only partially contained.
IStart IStop < 1 2 3 4 5 5 | 0 0 0 0 0 6 | Time Interval |---------------------------------------------| < 0 # 1 # 2 # 3 # 4 # 5 # 6 # 7 # < Usage Bins ^---A---^ < 8 16 | | ^B-^ | 17 20 | | ^------------C----------^ | 22 46 | Jobs | ^--D---^ | 38 45 | | ^------------E----------------^ | 30 61 <
The sections that follow provide an overview of the algorithm for calculating usage and an example of its use.
The basic idea for calculating a job's usage is to walk through the job a bin at a time, accruing usage in each bin. The job may begin and/or end inside of a bin, so we handle those as special cases. Here's the algorithm.
adjust job start and stop times if either falls outside the interval if job begins inside a bin calc job's usage up to the start of the next bin if job has any "middle" bins for each middle bin calc job's usage for that bin if job ends inside a bin calc job's usage from the end of the previous bin
Job E from the above figure provides an interesting case for our usage calculation algorithm. Job E begins inside of bin 3 and proceeds through bins 4 to 7.
Let's follow the algorithm. We first adjust the job's start and stop times according to how they fall within the bounds of the interval. Job E's stop time falls outside the interval, so we adjust the stop time to IStop + 1, or 57.
The job begins inside bin 3 at time 30. The next bin (4) begins at time 34, so the usage during bin 3 is 34 - 30 = 4 units.
The "middle" bins are 4 to 6, so we add a usage of 6 (the bin size) to each of those bins.
The final bin is 7 which begins at time 52. The usage for bin 7 is therefore the job's stop time, 57, minus the bin's begin time, 52, equalling 5.
Note that the final usage values will be divided through by the bin size 6. This is done after the entire accounting file has been processed and just before reporting the usage. For example, the usage for bin 7 in the above example would be reported as 5 / 6, or 0.83.
That takes care of job E. Jobs A through D will also factor into the usage calculations--they were presumably processed prior to job E. Consider bin 5, which has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. Accordingly, the usage during the bin is (6 + 5 + 6) / 6, or 2.83.