NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / sys / hp / 14557 < prev next >

Wrap

Text File | 1993-01-06 | 3.1 KB | 90 lines

Newsgroups: comp.sys.hp Path: sparky!uunet!usc!cs.utexas.edu!torn!watserv2.uwaterloo.ca!madmax.uwaterloo.ca!gordon From: gordon@madmax.uwaterloo.ca (Gordon R. Strachan) Subject: Re: Batchjob que for hp700 Message-ID: <C0Fst8.EBz@watserv2.uwaterloo.ca> Sender: news@watserv2.uwaterloo.ca Organization: University of Waterloo References: <1ickjuINNl1f@transfer.stratus.com> <1993Jan5.203704.12846@alchemy.chem.utoronto.ca> Date: Wed, 6 Jan 1993 14:41:31 GMT Lines: 78 In article <1993Jan5.203704.12846@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes: >In article <1ickjuINNl1f@transfer.stratus.com> dean@nassau.hw.stratus.com (Dean Markarian) writes: >I am been running: > >batch (ftp.cs.utoronto.ca in /pub/batch.tar.Z) - works on Sun, SGI, > Ultrix, Apollo Domain/OS, Stardent, MIPS. This is the one I use. > Does not work across a network. Users can get mail when job finishes. > Starts/suspends jobs based on load average, time of day, etc. > I have disabled the job time limits since it causes the daemon > to hang/die, but otherwise it works fine. > This is also available from iworks.ecn.uiowa.edu, or I will > post my version with the time limits disabled if there is enough > demand. > Mike, if this is the code I ported over and sent to you then I think I have finally found the bug in the time limit code. I have been meaning to send this off to you for about a month now. Anyway, the problem I found was in the list walking code in the PruneProcs function and is only hit on systems which have more programs running than the hash table size (probably why I never caught it for so long). Anyway, sorry it took me so long to get back to you but my to do list has been extremely long recently. Here is the new PruneProcs function. Could you please try it and let me know if it fixes the bug you found. PruneProcs() { struct ProcessTime *current; struct ProcessTime *previous; int i; for(i = 0; i <HASHSIZE; i++) { current = previous = HashTable[i]; while(current != NULL) { if(current->LastSeen != Round) { mdebug1("Removing process %d\n",current->Pid); RemoveFromParent(current->PPid,current->DataSize + current->ChildDSize); if(current == HashTable[i]) { HashTable[i] = current->next; free(current); current = previous = HashTable[i]; } else { previous->next = current->next; free(current); current = previous->next; } } else { previous = current; current = previous->next; } } } } There is also another bug I found in the code which "program" statement in the batch profile. The batchd program uses the system call to execute the command. But, at least on HP-UX, this causes a sigchld to be generated. The batch daemon then reaps it but doesn't recognize the pid of the child so assumes a major error occures and drains the queue. Right now my work around is to disable sigchld prior to the system call but this doesn't always work. Anyway, please let me know if this fixes the bug in the resource limit code. Gordon PS Once again sorry I took so long.