home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.wizards
- Path: sparky!uunet!sun-barr!ames!network.ucsd.edu!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: MP CPU-time phenomena
- Message-ID: <1992Aug14.063450.25740@fcom.cc.utah.edu>
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1992Aug6.214740.164@socrates.umd.edu> <15250002@hprpcd.rose.hp.com>
- Date: Fri, 14 Aug 92 06:34:50 GMT
- Lines: 65
-
- In article <15250002@hprpcd.rose.hp.com> tmilner@hprpcd.rose.hp.com (Tom Milner) writes:
- >In comp.unix.wizards, boyd@prl.dec.com (Boyd Roberts) writes:
- >
- >| In article <1992Aug6.214740.164@socrates.umd.edu>, steves@socrates.umd.edu (Steven Sonnenberg) writes:
- >| > The cpu-bound application is:
- >| >
- >| > while (1) i++;
- >| >
- >| > I am measuring process CPU utilization based on u.u_utime + u.u_stime
- >| > over the elapsed time (CPU seconds). For example, in 10 seconds
- >| > there are 20 seconds on CPU time (assuming 2 processors).
- >| >
- >| And just how is your while loop going to run on both processors at the same time?
- >There's no reason that a processor switch could not occur anywhere in
- >the code... In a 10 second window, all processors could have a chance to
- >execute the code stream.
-
- At the same time, or are you switching processors (ie: serializing)? I suspect
- that even if you forced a processor switch, that's all it'd be. I mean, you
- can't seriously be expecting i to increment n times as fast for n processors!
-
- What you need to do to force concurrent utilization is to take a concurrent
- benchmark and compile it with the right compiler (that knows about breaking
- concurrent code into multiple threads of execution) and then run it. Even
- then, you probably aren't going to get aggregate CPU time, especially if you
- are runing an SMP'able kernel (it might be possible to get aggregate time if
- you are using an ASMP box and measuring using the master... but not likely).
-
- The only alternative to this might be to gather statistics in the scheduler...
- but then, of course, you have to subtract out your statistics gathering to
- be accurate, and then subtract out your subtract time, etc.
-
- A lot of "how much paralellization" you get is related to "how paralellizable
- is my problem" and "how good at generating paralell code are the tools I'm
- using". The Cray FORTRAN compiler for the Y/MP, for instance, has a
- ridiculous (even more than ANSI C) number of hints you give it to tell it
- about how to paralellize your code (like ANSI C, it isn't too smart about
- knowing precisely what a programmer is doing and how it maps to the
- architecture, otherwise).
-
- This all goes to show you why throwing a bazillion processors at a problem
- won't necessarily solve it faster.
-
- My suggestion would be to build a model with something that is unit
- benchmarkable for multiple threads of execution (like a preemptive threads
- package... so Sun's doesn't qualify). Then build an infinitely paralellizable
- benchmark (or as close as you can get ;-), run it, and use it to model by
- linear scaling the performance you'll probably get, assuming you never
- context switch.
-
- Of course, if your actual application isn't infinitely paralellizable, you
- may get different mileage...
-
-
- Terry Lambert
- terry_lambert@gateway.novell.com
- terry@icarus.weber.edu
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- terry@icarus.weber.edu
- "I have an 8 user poetic license" - me
- -------------------------------------------------------------------------------
-