NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / unix / wizards / 3604 < prev next >

Wrap

Text File | 1992-08-13 | 3.7 KB | 77 lines

Newsgroups: comp.unix.wizards Path: sparky!uunet!sun-barr!ames!network.ucsd.edu!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: MP CPU-time phenomena Message-ID: <1992Aug14.063450.25740@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1992Aug6.214740.164@socrates.umd.edu> <15250002@hprpcd.rose.hp.com> Date: Fri, 14 Aug 92 06:34:50 GMT Lines: 65 In article <15250002@hprpcd.rose.hp.com> tmilner@hprpcd.rose.hp.com (Tom Milner) writes: >In comp.unix.wizards, boyd@prl.dec.com (Boyd Roberts) writes: > >| In article <1992Aug6.214740.164@socrates.umd.edu>, steves@socrates.umd.edu (Steven Sonnenberg) writes: >| > The cpu-bound application is: >| > >| > while (1) i++; >| > >| > I am measuring process CPU utilization based on u.u_utime + u.u_stime >| > over the elapsed time (CPU seconds). For example, in 10 seconds >| > there are 20 seconds on CPU time (assuming 2 processors). >| > >| And just how is your while loop going to run on both processors at the same time? >There's no reason that a processor switch could not occur anywhere in >the code... In a 10 second window, all processors could have a chance to >execute the code stream. At the same time, or are you switching processors (ie: serializing)? I suspect that even if you forced a processor switch, that's all it'd be. I mean, you can't seriously be expecting i to increment n times as fast for n processors! What you need to do to force concurrent utilization is to take a concurrent benchmark and compile it with the right compiler (that knows about breaking concurrent code into multiple threads of execution) and then run it. Even then, you probably aren't going to get aggregate CPU time, especially if you are runing an SMP'able kernel (it might be possible to get aggregate time if you are using an ASMP box and measuring using the master... but not likely). The only alternative to this might be to gather statistics in the scheduler... but then, of course, you have to subtract out your statistics gathering to be accurate, and then subtract out your subtract time, etc. A lot of "how much paralellization" you get is related to "how paralellizable is my problem" and "how good at generating paralell code are the tools I'm using". The Cray FORTRAN compiler for the Y/MP, for instance, has a ridiculous (even more than ANSI C) number of hints you give it to tell it about how to paralellize your code (like ANSI C, it isn't too smart about knowing precisely what a programmer is doing and how it maps to the architecture, otherwise). This all goes to show you why throwing a bazillion processors at a problem won't necessarily solve it faster. My suggestion would be to build a model with something that is unit benchmarkable for multiple threads of execution (like a preemptive threads package... so Sun's doesn't qualify). Then build an infinitely paralellizable benchmark (or as close as you can get ;-), run it, and use it to model by linear scaling the performance you'll probably get, assuming you never context switch. Of course, if your actual application isn't infinitely paralellizable, you may get different mileage... Terry Lambert terry_lambert@gateway.novell.com terry@icarus.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- terry@icarus.weber.edu "I have an 8 user poetic license" - me -------------------------------------------------------------------------------