home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.benchmarks
- Path: sparky!uunet!cs.utexas.edu!sun-barr!ames!data.nas.nasa.gov!amelia.nas.nasa.gov!eugene
- From: eugene@amelia.nas.nasa.gov (Eugene N. Miya)
- Subject: [l/m 4/7/92] Performance metrics (5/28) c.be FAQ
- Keywords: who, what, where, when, why, how
- Sender: news@nas.nasa.gov (News Administrator)
- Organization: NAS Program, NASA Ames Research Center, Moffett Field, CA
- Date: Thu, 5 Nov 92 12:25:10 GMT
- Message-ID: <1992Nov5.122510.10305@nas.nasa.gov>
- Reply-To: eugene@amelia.nas.nasa.gov (Eugene N. Miya)
- Lines: 272
-
- 5 Performance Metrics <This panel>
- 6 Temporary scaffold of New FAQ material
- 7 Music to benchmark by
- 8 Benchmark types
- 9 Linpack
- 10
- 11 NIST source and .orgs
- 12 Benchmark Environments
- 13 SLALOM
- 14
- 15 12 Ways to Fool the Masses with Benchmarks
- 16 SPEC
- 17 Benchmark invalidation methods
- 18
- 19 WPI Benchmark
- 20 Equivalence
- 21 TPC
- 22
- 23
- 24
- 25 Ridiculously short benchmarks
- 26 Other miscellaneous benchmarks
- 27
- 28 References
- 1 Introduction to FAQ chain and netiquette
- 2
- 3 PERFECT
- 4
-
-
- Performance/benchmark metric terminology
-
- The usual important quote is:
- What's important is the time it takes to solve MY problem.
- This does not help the architect designing tne next machine.
- It is a arrogant closed minded, Gestaltist statement which conflicts
- with the analytic/reductionist needs for science.
-
- Synthetic problems/benchmarks have some if limited value.
- We walk before we run, and we crawl before we walk. Similarly,
- right now, there is more benchmarking noise than signal.
-
- Perhaps the only, certainly best measure, is the second (time):
- one of the best studied metrics see the atomic clocks of the NIST.
- Subject to relativistic effects: the Lorentz time contraction.
- Don't laugh, this is becoming more important at the pico-second level
-
- Less reliable measures include:
-
- MIP, GIP, TIP :
- MIPS, GIPS, TIPS: Million (Giga, billions; tera, trillion) Instructions
- Per Second
- : Meaningless Indicator of Performance
- : "Marketing's" Indicator of Performance
- What's an "instruction?"
- An instruction is an event. It is frequently a minunt change in the
- state of a CPU (and the computer). Frequently, an instruction if
- synonymous with the clock rate of a machine: that ignores instructions
- requiring more than one clock pulse tick to execute.
-
- A common fallacy by naive benchmarkers is that a CPU determines the
- speed of a computation. This is frequently false. The people in
- the know these days understand
- Amdahl's "other law:"
- 1 MIPS for each 1 MB main memory at 1 MB/S transfer to disk
-
- MFLOPS, GFLOPS, TFLOPS: Million (Giga, billions; tera, trillion) Floating-Point
- Operations Per Second
- : The measure ignores non-floating-point instructions. Particularly
- bad for numeric codes transitioning from 2-D to 3-D since additional
- time is required for array address calculation, and for algorithms
- requiring big non-numeric steps like matrix transposition.
- : the original program name for Frank McMahon's Livermore Loops
- program.
- : one of the metrics used by Dongarra's LINPACK benchmark.
-
- LIPS, KLIPS, MLIPS: Logical "inferences" Per Second --
- from the Logic Programming community (Gabriel LISP benchmarks).
- Also available in Prolog (Evan Tick). LIPS roughly correspond to
- "calls per second" for very simple predicates.
-
- Packets Per Second: Unit of measure used by the networking, communications
- community. Sometimes useful.
- : What do they do make consistent packets?
-
- MHz, GHz, Bits per Second, Bytes per Second, Words per Second:
- : Frequently used to mismeasure the performance of computer networks
- like Ethernet (tm). It confuses the base band carrier frequency
- with the data trasnfer rate. It's not truth, but not complete false.
- : Also sometimes call Null or Wait instructions.
-
- TPS : Transactions per second, agree on metric by transaction
- processing council.
- : What's a transaction?
-
- Stones : An arbitrary unit of computation based on the Whetstone
- (or Dhrystone or other *stone) which is subject to the influences
- like compiler optimization or cache metrics.
- : What's a transaction?
-
- Normalized metrics
-
- SPECmark: A normalized metric based on the performance against a
- DEC VAX-11/780. Based on a SPEC workload on a 780 under glass.
-
- Speed up:
-
- Efficiency:
-
- Our problems aren't counting seconds (intervals or days), it's not counting
- instructions, operations, floating point operations.
-
- Events counts like instructions or operations are best done by non-instrusive
- instruction/operating counting hardware. These are expensive to say the least.
- Software profilers/event counters are also some times useful, but they are
- subject to optimization.
-
- We need to distingush "virtual" operations or instructions from
- real or actual instructions.
-
- Prefixes:
- kilo, mega, giga, tera, eka, peta,
- milli, micro, nano, pico, femto,
-
-
- Performance metrics are unlike conventional mathematics.
- You can't make mathematical inferences (excepting "guaranteed not to
- exceed numbers"), you can't apply all mathematical operators. The basis
- for metric theory is that for a metric space X and a metric function d()
- which maps elements of X to the real number system, then
- a)
- b)
- c) d(A + B) <= d(A) + d(B) [triangle inequality]
-
- You might have a benchmrk sized for 128 elements. A program might not test
- well if it used 127 or 129 elements instead. It is not possible in
- infer or interpolate between values because of benchmarking "gotchas."
- This is especially bad when dealing with powers of two: an artifact
- of computer architecture, but sometimes also due to software (in a base-10
- world).
-
- Mathematics derives a large portion of its power because of assumptions of
- continuity. Computers are very discrete objects. What works for case n might
- not work for case n-1 or n+1 (vector architectures for instance).
- Some interesting thing are learned by simply modifying the size of a benchmark
- by one (remember Kernighan and Plauger: beware off-by-1 errors).
-
- Can you even be assured of consistent measures?
- Most benchmarks try to run their tests in standalong conditions to
- attain consistency. This is an artifact of not being able to have a
- non-intrusive measurement environment.
-
- Measurement issues:
- 1) Reproducibility: first and foremost. You must be able to reproduce
- performance.
- 2) Accuracy and precision. Tough because of human limits.
- 3) Resolution. Details sometimes count.
- 4) History (memory).
- 5)
-
- Another important: measurement tools and environments
- What are some nice ones:
- Simple ones (non-standard) software
- Several: 'arch' name architecture,
- Cray: flotrace, hpm (hardware and software actually), others
- SGI/MIPS: gr_osview, ancillary: hinv (hardware inventory), pixie
- Convex: syspic,
- Obsolete ones: gprof, prof (your names may not vary, but the tools does,
- watch for name collision)
-
- Other useful tools should be reported. Why? Because most people do
- not get reasonable experience with the various kinds of tools out there
- to understand their advantages, drawbacks, etc.
-
- Beware of the graphical tools. They can deceive you. All performance
- monitoring tools can deceive you. Use them carefully.
-
- Example of a good/useful tool from a 'Class A' measurement environment.
- Sample Cray Research, Inc. Hardware Performance Monitor output:
-
- hpm VERSION 1.3
-
- (c) COPYRIGHT CRAY RESEARCH, INC.
-
- UNPUBLISHED -- ALL RIGHTS RESERVED UNDER
- THE COPYRIGHT LAWS OF THE UNITED STATES
-
- STOP (called by EMPTY )
- CP: 0.001s, Wallclock: 0.038s, 0.2% of 8-CPU Machine
- HWM mem: 97679, HWM stack: 2048, Stack overflows: 0
- Group 0: CPU seconds : 0.00 CP executing : 197638
-
- Million inst/sec (MIPS) : 44.47 Instructions : 52730
- Avg. clock periods/inst : 3.75
- % CP holding issue : 42.57 CP holding issue : 84134
- Inst.buffer fetches/sec : 0.77M Inst.buf. fetches: 913
- Floating adds/sec : 0.21M F.P. adds : 246
- Floating multiplies/sec : 0.23M F.P. multiplies : 267
- Floating reciprocal/sec : 0.05M F.P. reciprocals : 54
- I/O mem. references/sec : 0.22M I/O references : 256
- CPU mem. references/sec : 14.58M CPU references : 17287
-
- Floating ops/CPU second : 0.48M
- STOP (called by EMPTY )
- CP: 0.001s, Wallclock: 0.002s, 4.2% of 8-CPU Machine
- HWM mem: 97679, HWM stack: 2048, Stack overflows: 0
-
- Group 1: CPU seconds : 0.00119 CP executing: 198071
-
- Hold issue condition % of all CPs actual # of CPs
- Waiting on semaphores : 0.14 284
- Waiting on shared registers : 0.00 0
- Waiting on A-registers/funct. units: 9.35 18520
- Waiting on S-registers/funct. units: 27.98 55418
- Waiting on V-registers : 1.35 2671
- Waiting on vector functional units : 0.00 9
- Waiting on scalar memory references: 0.56 1101
- Waiting on block memory references : 1.86 3685
- STOP (called by EMPTY )
- CP: 0.001s, Wallclock: 0.002s, 4.4% of 8-CPU Machine
- HWM mem: 97679, HWM stack: 2048, Stack overflows: 0
-
- Group 2: CPU seconds : 0.00121 CP executing : 201785
-
- Inst. buffer fetches/sec : 0.75M total fetches : 913
- fetch conflicts : 5265
- I/O memory refs/sec : 0.00M actual refs : 0
- avg conflict/ref 0.00: actual conflicts : 100
- Scalar memory refs/sec : 5.51M actual refs : 6668
- Block memory refs/sec : 8.77M actual refs : 10619
- CPU memory refs/sec : 14.28M actual refs : 17287
- avg conflict/ref 0.15: actual conflicts : 2668
- CPU memory writes/sec : 8.66M actual refs : 10479
- CPU memory reads/sec : 5.62M actual refs : 6808
- STOP (called by EMPTY )
- CP: 0.001s, Wallclock: 0.030s, 0.2% of 8-CPU Machine
- HWM mem: 97679, HWM stack: 2048, Stack overflows: 0
-
- Group 3: CPU seconds : 0.00119 CP executing: 198445
-
- (octal) type of instruction inst./CPUsec actual inst. % of all inst.
- (000-017)jump/special : 5.30M 6315 11.98
- (020-077)scalar functional unit : 33.24M 39578 75.07
- (100-137)scalar memory : 5.60M 6668 12.65
- (140-157,175)vector integer/log.: 0.01M 14 0.03
- (160-174)vector floating point : 0.00M 2 0.00
- (176-177)vector load and store : 0.12M 141 0.27
-
- type of operation ops/CPUsec actual ops avg. VL
- Vector integer&logical : 0.12M 138 9.86
- Vector floating point : 0.19M 232 116.00
- Scalar functional unit : 33.24M 39578
- =====
-
- Im memoriam to Rear Adm. Grace Murray Hopper, for all the "nano seconds"
- and "pico seconds" she passed out (30 cm/1 ft copper wires or salt grains).
- She will be missed.
-
- ^ A
- s / \ r
- m / \ c
- h / \ h
- t / \ i
- i / \ t
- r / \ e
- o / \ c
- g / \ t
- l / \ u
- A / \ r
- <_____________________> e
- Language
-
-