home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.hp
- Path: sparky!uunet!usc!cs.utexas.edu!torn!watserv2.uwaterloo.ca!madmax.uwaterloo.ca!gordon
- From: gordon@madmax.uwaterloo.ca (Gordon R. Strachan)
- Subject: Re: Batchjob que for hp700
- Message-ID: <C0Fst8.EBz@watserv2.uwaterloo.ca>
- Sender: news@watserv2.uwaterloo.ca
- Organization: University of Waterloo
- References: <1ickjuINNl1f@transfer.stratus.com> <1993Jan5.203704.12846@alchemy.chem.utoronto.ca>
- Date: Wed, 6 Jan 1993 14:41:31 GMT
- Lines: 78
-
- In article <1993Jan5.203704.12846@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
- >In article <1ickjuINNl1f@transfer.stratus.com> dean@nassau.hw.stratus.com (Dean Markarian) writes:
- >I am been running:
- >
- >batch (ftp.cs.utoronto.ca in /pub/batch.tar.Z) - works on Sun, SGI,
- > Ultrix, Apollo Domain/OS, Stardent, MIPS. This is the one I use.
- > Does not work across a network. Users can get mail when job finishes.
- > Starts/suspends jobs based on load average, time of day, etc.
- > I have disabled the job time limits since it causes the daemon
- > to hang/die, but otherwise it works fine.
- > This is also available from iworks.ecn.uiowa.edu, or I will
- > post my version with the time limits disabled if there is enough
- > demand.
- >
-
- Mike, if this is the code I ported over and sent to you then I think I have
- finally found the bug in the time limit code. I have been meaning to send this
- off to you for about a month now. Anyway, the problem I found was in the
- list walking code in the PruneProcs function and is only hit on systems which
- have more programs running than the hash table size (probably why I never
- caught it for so long). Anyway, sorry it took me so long to get back to
- you but my to do list has been extremely long recently. Here is the new
- PruneProcs function. Could you please try it and let me know if it fixes
- the bug you found.
-
- PruneProcs()
-
- {
- struct ProcessTime *current;
- struct ProcessTime *previous;
- int i;
-
- for(i = 0; i <HASHSIZE; i++)
- {
- current = previous = HashTable[i];
- while(current != NULL)
- {
- if(current->LastSeen != Round)
- {
- mdebug1("Removing process %d\n",current->Pid);
- RemoveFromParent(current->PPid,current->DataSize + current->ChildDSize);
- if(current == HashTable[i])
- {
- HashTable[i] = current->next;
- free(current);
- current = previous = HashTable[i];
- }
- else
- {
- previous->next = current->next;
- free(current);
- current = previous->next;
- }
- }
- else
- {
- previous = current;
- current = previous->next;
- }
- }
- }
- }
-
-
- There is also another bug I found in the code which "program" statement in
- the batch profile. The batchd program uses the system call to execute the
- command. But, at least on HP-UX, this causes a sigchld to be generated. The
- batch daemon then reaps it but doesn't recognize the pid of the child so
- assumes a major error occures and drains the queue. Right now my work around
- is to disable sigchld prior to the system call but this doesn't always work.
-
- Anyway, please let me know if this fixes the bug in the resource limit code.
-
- Gordon
-
- PS
- Once again sorry I took so long.
-
-