home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!think.com!yale.edu!ira.uka.de!uka!uka!news
- From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa)
- Newsgroups: comp.sys.ibm.pc.hardware
- Subject: Performance comparison: Cyrix 486DLC, Intel RapidCAD, C&T 38600DX
- Date: 14 Dec 1992 13:30:29 GMT
- Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt.
- Lines: 511
- Distribution: world
- Message-ID: <1gi29lINNm9p@iraul1.ira.uka.de>
- NNTP-Posting-Host: irav1.ira.uka.de
- X-News-Reader: VMS NEWS 1.23
-
- Performance Comparison Intel 386DX, Intel RapidCAD, C&T 38600DX, Cyrix 486DLC
-
- This document, containing descriptions and a benchmark comparison of the
- Intel 386DX, Intel RapidCAD, C&T 38600DX, and the Cyrix 486DLC has been
- put together for the benefit of the net.community. I believe, but cannot
- guarantee, that all data given herein is correct. If you would like to
- report errors in this document, or have suggestions for its enhancement,
- feel free to contact me at the following email address:
-
- S_JUFFA@IRAVCL.IRA.UKA.DE
-
- You can also write to me at my snail mail address:
-
- Norbert Juffa
- Wielandstr. 14
- 7500 Karlsruhe 1
- Germany
-
- Corrections pertaining to spelling or grammatical errors are also encouraged,
- especially from those net.users that are native speakers of (American) English.
-
-
- This is the second version of this document. Thanks to the people that helped
- improve it by their comments on the previous version:
-
- Alex Martelli (martelli@cadlab.sublink.org),
- Fred Dunlap (cyrix!fred@texsun.Central.Sun.COM)
-
- A special thanks for editing this article goes to:
-
- David Ruggiero (osiris@halcyon.halcyon.com)
-
-
- ---------------------------------------------------------------------------
-
- In 1992, three new CPUs were introduced into the PC marketplace. Each has the
- capability to replace the standard Intel 386DX CPU and provide higher
- performance for existing 386 systems. These new chips are the Intel RapidCAD,
- the Chips&Technologies (C&T) Super386 38600DX, and the Cyrix 486DLC. At the
- moment, all of these are only available in 33 MHz versions, but C&T and Cyrix
- have announced 40 MHz versions for delivery in early 1993. (Of course you could
- try to push existing 33 MHz chips to 40 MHz, but I do not recommend this, as my
- experience shows that operation at the higher frequency tends to be unreliable,
- with the possibility of the machine locking up unexpectedly.) While the Intel
- RapidCAD is marketed only as an end-user product (which means PC manufacturers
- will not ship systems with the RapidCAD already installed), both C&T and Cyrix
- sell their products directly to the end user as well as to PC motherboard
- manufacturers. Cyrix also manufactures the 486SLC, which is a replacement for
- the Intel/AMD 386SX CPUs. C&T has announced a 38600SX chip, but it will not
- ship before sometime in 1993, if at all.
-
- The Intel RapidCAD
- ------------------
-
- The Intel RapidCAD is, in rough terms, an Intel 486DX with the 8 kB on-chip
- cache removed and a 386 pinout. To software, it appears to be a 386DX with a
- math coprocessor, as the 486-specific instructions have been removed from its
- instruction set. It is marketed by Intel as "the ultimate coprocessor" and, as
- the name implies, its main purpose is to replace the 386DX in existing systems
- and thereby boost the performance of floating-point intensive applications such
- as CAD software, spreadsheets, and math packages (e.g. SPSS, Mathematica).
-
- RapidCAD is delivered as a set of two chips. RapidCAD-1 is a 132-pin PGA (pin
- grid array) chip that goes into the CPU socket and replaces the 386DX. It
- contains the CPU and the FPU (floating-point unit). RapidCAD-2 is a 68-pin PGA
- chip that goes into the 387 coprocessor (or EMC) socket; it contains a simple
- PLA whose only purpose it to generate the FERR signal for the motherboard
- logic: this provides full PC-compatible handling of floating point exceptions.
- Many CPU instructions execute in one clock cycle in the RapidCAD, just as on
- the Intel 486. The RapidCAD is constrained, however, by use of the standard
- 386DX bus interface, since every bus cycle takes two CPU clock cycles. This
- means that instructions may be executed by the RapidCAD faster than they can be
- fetched from memory. But as most FPU instructions take longer to execute than
- CPU instructions, they are not greatly slowed down by this slow bus interface,
- and most of them execute in the same time as in the Intel 486DX. Therefore, the
- Intel RapidCAD provides higher overall floating-point performance than any
- existing 386/387 combination. Results from the SPEC benchmarks, a common
- workstation UNIX benchmark suite, show that the RapidCAD provides 85% more
- floating point and 15% more integer performance than an Intel 386DX/387DX
- combination at the same clock frequency.
-
- Power consumption of the RapidCAD is typically 3500 mW at 33 MHz. The current
- street price of the 33 MHz version is ~300 US$.
-
- The C&T 38600DX
- ---------------
-
- The Chips&Technologies 38600DX is designed to be 100% compatible to the Intel
- 386DX CPU. Unlike AMD's Am386 CPUs (which use microcode that is identical to
- the Intel 386DX's microcode), the C&T 38600DX uses new microcode developed
- within C&T using "clean-room" techniques. C&T even included the 386DX's
- "undocumented" LOADALL386 instruction in the instruction set to provide full
- compatibility with the Intel chip.
-
- Some instructions execute faster on the 38600DX than on the 386DX. C&T also
- produces a 38605DX CPU that includes a 512-byte instruction cache and provides
- further performance increases over the Intel part. The 38605DX needs a bigger
- socket (144-pin PGA) and is therefore *not* pin-compatible with the 386DX.
-
- In my tests, I found that the 38600DX has severe problems with the CPU-
- coprocessor communication, which causes its floating-point performance to drop
- below the level provided by the Intel 386DX/Intel 387DX for most programs. This
- problem exists with all available 387 compatible coprocessors (ULSI 83C87, IIT
- 3C87, Cyrix EMC87, Cyrix 83D87, Cyrix 387+, C&T 38700, and the Intel 387DX). (A
- net.acquaintance also did tests with the 38600DX and arrived at similar
- results. He contacted C&T and they said they were aware of the problem.)
-
- Typical power consumption of the 40 MHz 38600DX is 1650 mW, which is less than
- the typical power consumption of the Intel 386DX at 33 MHz (2000 mW). The
- 38600DX currently sells for ~US$ 80 for the 33 MHz version.
-
- The Cyrix 486DLC
- ----------------
-
- The Cyrix 486DLC is the latest entry into the market of 386DX replacements. It
- features a 486SX compatible instruction set, a 1 kB on-chip cache, and a 16x16-
- bit hardware multiplier. The RISC-like execution unit of the 486DLC executes
- many instructions in a single clock cycle. The hardware multiplier multiplies
- 16-bit quantities in 3 clock cycles, as compared to 12-25 cycles on a standard
- Intel 386DX. This is especially useful in address calculations (code from non-
- optimizing compilers may contain many MUL instructions for array accesses) and
- for software floating-point arithmetic (e.g., a coprocessor emulator). The 1 kB
- cache helps the 486DLC to overcome the limitations of the 386 bus interface,
- although its hit rate averages only about 65% under normal program conditions.
-
- In existing 386 systems, DMA transfers (such as those performed by a SCSI
- controller or a sound board) may force the internal cache of the 486DLC to be
- flushed as the only means available in a 386 system to enforce consistency
- between the contents of the on-chip cache and external memory. This stems from
- the fact that the 386 bus interface was designed without provisions for an
- on-chip cache. This problem can reduce the performance of the 486DLC in systems
- that do a sizeable amount of (bus master) DMA. Cyrix has, however, defined some
- additional cache control signals for some of the 486DLC lines; they can be used
- to improve communications between the on-chip cache and a larger external cache
- or main memory and prevent this problem. Although current 386 systems ignore
- these signals (since they are not defined for the Intel 386DX), future systems
- designed with the Cyrix chip in mind may take advantage of them and thereby
- gain increased performance.
-
- The Cyrix cache is a unified data/instruction write-through type and can be
- configured as either a direct-mapped or 2-way set associative cache. It permits
- definition of up to four non-cacheable regions, which are particularly useful
- if a system has memory mapped peripherals (e.g., the Weitek math coprocessor).
- For compatibility reasons, the cache is disabled after a processor reset
- and must be enabled with the help of a small program provided by Cyrix. This
- prevents problems with BIOSes that are not "486DLC aware". (I am certain that
- future versions of the AMI BIOS and other BIOSes will take the 486DLC into
- consideration and directly support the Cyrix chip's cache.)
-
- The 486DLC will not work correctly with all math coprocessors in all
- circumstances, with protected mode multitasked environments (e.g. MS-Windows
- 386-enhanced mode) being especially critical. Using the 486DLC with the Cyrix
- EMC87, Cyrix 83D87 (chips produced prior to November 1991), and IIT 3C87, I
- have been able to completely lock up the machine due to synchronization
- problems between the CPU and the coprocessor while executing the FSAVE or
- FRSTOR instructions (which are used to save and restore the coprocessor status
- during task switches). According to Cyrix, this problem only occurs with the
- first revision of the 486DLC and is fixed on newer ones. To be on the safe
- side, the 486DLC should best be used with the Cyrix 387+ (its "Europe-only"
- name) or with the identical Cyrix 83D87 (US-bound chips manufactured after
- October 1991): these are not only are the highest performing 387 coprocessors
- on the market, but they also work properly even with the first generation
- 486DLC.
-
- If you already have a Cyrix 83D87 coprocessor and want to know whether it is
- the old or new type, I recommend you use my COMPTEST program, available as
- CTEST257.ZIP via anonymous ftp from garbo@uwasa.fi and other fine ftp servers.
- If COMPTEST reports a 387+, you either have the 387+ or the identical newer
- version of the 83D87 installed and can use any version of the Cyrix 486DLC
- without problems. If you believe that you may have problems with a 486DLC/387
- combination, I suggest you contact Cyrix technical support (1-800-FAS-MATH in
- the US).
-
- Power consumption of a 40 MHz 486DLC is typically 2800 mW. The 486DLC sells for
- ~115 US$ for the 33 MHz version, according to its German distributor.
-
- Tests and benchmarks
- --------------------
-
- HW configuration: 33.3/40 MHz motherboard with Forex chip set and AMI BIOS. 128
- kB zero- wait-state, direct mapped, write-through CPU cache with one write
- buffer, 4 bytes per cache line, and 4 clock cycles penalty for a cache line
- miss. 8 MB of main memory with an average access time of 1.6 wait states. Cyrix
- EMC87 in 387 compatibility mode as math coprocessor. (This and the Cyrix 83D87
- / 387+ are the fastest coprocessors available for use with the
- 386DX/486DLC/38600DX). Conner 3204F hard disk, 203 MB capacity, IDE interface
- (CORETEST throughput 1100 kB/s, seek time 16 ms). Diamond SpeedSTAR HiColor,
- ISA bus SVGA card using Tseng's ET4000 chip, 1 MB DRAM as frame buffer, *no*
- accelerator. The switches on the card were set for fastest reliable operation,
- with a throughput of 6500 bytes/ms at 40 MHz and 5400 bytes/ms at 33.3 MHz.
-
- SW configuration: MS-DOS 5.0, MS Windows 3.1, HyperDisk 4.32 disk cache program
- in write-back mode, using 2 MB of extended memory, 386MAX 6.01 used as memory
- manager and DPMI provider in some benchmarks. Latest Tseng (Colorview) driver
- for Windows 3.1 at 1024x768x256, using the 8514 fonts.
-
-
- For the Whetstone, Dhrystone, WINTACH, DODUC, LINPACK, LLL, and Savage
- benchmarks, *higher* numbers indicate *faster* performance.
-
- For the MAKE RTL, MAKE TRANCK, and String-Test benchmarks, *lower* numbers
- indicate *faster* performance.
-
-
- Intel C&T Intel Cyrix Cyrix
- 33.3 MHz 386DX 38600DX RapidCAD 486DLC 486DLC
- cache off cache on
- integer
-
- Whetstone [kWhets/s] 447 585 563 695 803
- Dhrystone (C) [Dhry./s] 11688 11819 12357 14150 15488
- Dhrystone (Pas) [Dhry./s]10455 10877 10751 12154 13858
- String-Test [ms] 459 453 441 347 327
- MAKE RTL [s] 51.32 47.10 46.34 43.45 39.13
- MAKE TRANCK [s] 62.42 55.47 55.37 53.64 46.12
- WINTACH [overall RPM] 4.85 4.90 5.49 5.53 6.14
-
- float
-
- DODUC [Rapidity index] 79.0 76.4 150.3 89.4 90.7
- LINPACK [MFLOPS] 0.2808 0.2707 0.4578 0.3158 0.3438
- LLL [MFLOPS] 0.3352 0.3537 0.6083 0.3816 0.4139
- Whetstone [kWhets/s] 2540 2340 3990 2908 3061
- Savage [function eval/s] 71685 53191 72464 88757 93897
-
-
- Intel C&T Intel Cyrix Cyrix
- 40.0 MHz 386DX 38600DX RapidCAD 486DLC 486DLC
- cache off cache on
- integer
-
- Whetstone [kWhets/s] 536 702 676 835 963
- Dhrystone (C) [Dhry./s] 14128 14116 14836 16987 18750
- Dhrystone (Pas) [Dhry./s]12490 13067 12890 14573 16624
- String-Test [ms] 384 377 368 289 273
- MAKE RTL [s] 43.46 40.11 39.84 37.25 33.54
- MAKE TRANCK [s] 53.00 47.59 47.07 45.36 39.00
- WINTACH [overall RPM] 5.65 5.73 6.41 6.46 7.23
-
- float
-
- DODUC [Rapidity index] 94.9 77.5 180.3 105.1 106.6
- LINPACK [MFLOPS] 0.3324 0.3260 0.5418 0.3789 0.4131
- LLL [MFLOPS] 0.4025 0.4204 0.7263 0.4562 0.4956
- Whetstone [kWhets/s] 3061 2632 4798 3505 3677
- Savage [function eval/s] 86083 49587 86957 106762 112360
-
-
- To complete the picture, I ran the CPU/FPU standard benchmarks on an Intel
- 486DX running at 33.3/40 MHz. Since the 486 machine was configured with a
- different hard disk than the 386 system, and the compilers and tools installed
- on the 386 machine were not present, the MAKE benchmarks could unfortunately
- not be included in the tests.
-
- 486DX, 256 kBytes CPU cache (write-thru), 8 MB of RAM, AMI-BIOS, Diamond
- SpeedSTAR HiColor (VIDSPEED thoughput: 6500 bytes/ms at 40 MHz and 5400
- bytes/ms at 33.3 MHz), MS-DOS 5.0, MS-Windows 3.1:
-
- integer 33.3 MHz 40 Mhz
-
- Whetstone [kWhets/s] 707 848
- Dhrystone (C) [Dhry./s] 19394 23265
- Dhrystone (Pas) [Dhry./s] 16978 20368
- String-Test [ms] 333 279
- WINTACH 8.59 10.14
-
- float
-
- DODUC [Rapidity index] 184.0 220.7
- LINPACK [MFLOPS] 0.6682 0.8204
- LLL [MFLOPS] 0.9387 1.1110
- Whetstone [kWhets/s] 5143 6195
- Savage [function eval/s] 82192 98522
-
-
- Conclusions
- -----------
-
- The Cyrix 486DLC is the 386DX replacement with the highest integer performance.
- With the internal cache enabled, integer performance of the 486DLC can be up to
- 80% higher compared with a Intel 386DX at the same clock frequency, with the
- average speed gain for integer applications being 35%. Enabling the internal
- cache provides about 5-15% more performance than with the cache disabled for
- both integer and floating-point applications. Floating-point applications are
- accelerated by about 15%-30% if the Cyrix 486DLC (with cache enabled) is used
- instead of the Intel 386DX. Compared with the Intel 486DX, the Cyrix 486DLC
- provides about 70% of the integer performance and about 50% of the floating
- point performance at the same clock frequency.
-
- The Intel RapidCAD is the 386DX replacement that provides the highest floating
- point performance. It can speed up most floating-point intensive programs by
- 60%-90% compared with the fastest Intel 386DX/math coprocessor combination; it
- provides nearly 75% of the floating-point performance of a Intel 486DX at the
- same clock frequency. Integer performance increases by an average 15% by using
- the RapidCAD instead of the standard Intel 386DX, with the maximum performance
- gain being 35%.
-
- The Chips&Technologies 38600DX has a slightly higher integer performance than
- the Intel 386DX, with the speedup ranging from 0%-30% and an average speedup on
- the order of 10%.
-
-
- Description of benchmarks
- -------------------------
- DHRYSTONE [9] is a synthetic benchmark developed by R. Weicker from Siemens in
- 1984. The frequency of operations and data types used by Dhrystone are modeled
- after statistics collected for 'typical' programs that are written in a HLL
- (high level language) such as C or Pascal that do not use floating-point
- arithmetic. Thus there is a certain distribution of global and local variables
- being used in procedures, there is a certain percentage of use of records
- (structs) or strings out of the total number of variables accessed and there is
- a certain percentage of procedure calls and if-statements out of the total
- lines of codes executed. All these percentages match the statistics used in the
- development of Dhrystone quite closely. To preserve the typical distributions,
- the measurement rules for Dhrystone forbid function inlining (a frequently used
- optimization technique optionally performed by most optimizing compilers). The
- current version of Dhrystone is 2.1, and this is the version used for my tests.
- Version 2.1 [10] differs from previous versions in that it contains additional
- code that prevents optimizing compilers from throwing out most of the code by
- dead code elimination (since Dhrystone has no input file, many expressions can
- be computed at compile time), thus artificially inflating performance numbers.
- The Dhrystone benchmark exists in equivalent Ada, Pascal and C versions, which
- are all available from netlib@ornl.gov. Dhrystone measures the time to execute
- its main loop and sets this time into relation with a fixed reference time, the
- result being a performance index given in "Dhrystones per second". I used two
- Dhrystone executables. One was compiled using the non-optimizing Turbo Pascal
- 6.0 compiler, the other one was compiled with the MS-C 7.0 compiler using the
- large memory model and maximum optimization but without the use of function
- inlining. Because the Turbo Pascal executable uses more memory operands and the
- MS-C executable uses more register-to-register operations, the speedup for the
- executables on different CPUs can be noticeably different.
-
- WINTACH is a public domain benchmark program by Texas Instruments that measures
- the speed of graphics output for four typical MS-Windows applications: a word
- processor, a spreadsheet, a CAD program, and a paint program. TI has collected
- profiles for each class of applications under 'typical' user loads. The four
- parts of the WINTACH benchmark were then modeled after these profiles, so that
- the percentage of certain GDI (graphics device interface) calls found in the
- profiles is also present in the benchmark. WINTACH determines the performance
- of each program part relative to the performance of a standard VGA card in a 20
- MHz 386DX machine. It also combines the four values into an overall relative
- performance index, which is reported in my tables. Using this single number to
- represent graphics performance is justified in this case since the four partial
- results do *not* deviate much from the average for the SVGA tested (a Diamond
- SpeedSTAR HiColor). I used a frame buffer card for the test because for these
- graphics card type, graphics performance depends solely on the performance of
- the CPU, which handles all accesses to the display memory. On an accelerated
- card, some of the low-level operations are delegated to the accelerator chip
- and the influence of the CPU speed on the graphics performance is not seen so
- clearly. For this test, I used the latest Tseng Windows 3.1 driver (Colorview)
- at a resolution of 1024x768x256 with the large 8514 fonts (WINTACH code C8).
-
- MAKE RTL is not a single program. Rather it is a build of a complete project,
- my own run-time library for Turbo Pascal 6.0. The project consists of about 200
- source files with a total size of ~650 kByte. Most of the source files are
- assembly language (.ASM) files, while there are also some Pascal (.PAS) files.
- Building the complete project using MAKE, about 200 binary files (.OBJ and
- .TPU) are produced with a combined size of ~300 kB. The programs used during
- the build are MAKE 3.5, TPC 6.01, and TASM 2.01, all by Borland, Inc. All files
- are read from and written to the hard disk, using a 2 MB write back disk cache
- provided by the HyperDisk 4.32 program. This eliminates most of the I/O
- overhead. No memory manager was installed during the test. The time reported is
- the time from starting the MAKE utility to the reappearance of the DOS prompt.
- MAKE RTL can be thought of being typical of applications that operate on a lot
- of small files, and use only integer instructions.
-
- MAKE TRANCK is a project build for a project consisting of two assembly
- language modules, two C modules and one FORTRAN module. The combined size of
- the source files is approximately 120 kBytes. All modules are compiled
- (assembled) and combined into a single program linking with both, the C and the
- FORTRAN libraries. Programs used are MAKE 3.5 by Borland and MASM 6.0, MS-C
- 7.0, MS-FORTRAN 5.0, and LINK 5.3, all by Microsoft, Inc. The C compiler and
- the linker run in protected mode and require a DPMI host to be present. 386MAX
- 6.01 by Qualitas was installed for this purpose. To minimize I/O overhead, the
- HyperDisk 4.32 disk caching program was installed, with 2 MByte of extended
- memory allocated to the cache. The modules are compiled with compiler switches
- set for maximum optimization, and most of the time for the project build is
- spent in the optimizing stages of the compilers.
-
- STRING-TEST is a simple benchmark that tests the string handling functions
- built into the Turbo Pascal language. Nearly all of the execution time is spent
- in repeated execution of the STOS, CMPS, SCAS, and MOVS instructions. The data
- and code fit into about five kBytes of memory. In cached systems, almost all
- memory accesses will be cache hits. The program was written and compiled with
- Turbo Pascal 6.0 and linked with my own run-time library, which provides much
- faster string functions than the run-time library delivered by Borland.
- Compiler switches were set for fastest execution. The time reported is the time
- to complete the whole benchmark in milliseconds, with an accuracy of +/-1
- millisecond.
-
- LLL is short for Lawrence Livermore Loops [8], a set of kernels taken from real
- floating point extensive programs. Some of these loops are vectorizable, but
- since we don't deal with vector processors here, this doesn't matter. For this
- test, LLL was adapted from the FORTRAN original in [7] to Turbo Pascal. By
- variable overlaying (similar to FORTRAN's EQUIVALENCE statement) memory
- allocation for data was reduced to 64 kB, so all data fits into a single 64 kB
- segment. The older version of LLL is used here which contains 14 loops. There
- also exists a newer, more elaborate version consisting of 24 kernels. The
- kernels in LLL exercise only multiplication and addition. The MFLOPS rate
- reported is the average of the MFLOPS rate of all 14 kernels as reported by the
- LLL program. LLL and Whetstone results (see below) are reported as returned by
- my COMPTEST test program in which they have been included as a measure of
- coprocessor/FPU performance. COMPTEST has been compiled under Turbo Pascal 6.0
- with all 'optimizations' on and using my own run-time library, which gives
- higher performance than the one included with TP 6.0. My library is available
- as TPL60N17.ZIP from garbo.uwasa.fi and ftp-sites that mirror this site. All
- floating point variables in the program where declared as DOUBLE.
-
- LINPACK [4] is a well known floating-point benchmark that also heavily
- exercises the memory system. Linpack operates on large matrices and takes up
- about 570 kB in the version used for this test. This is about the largest
- program size a pure DOS system can accommodate. Linpack was originally designed
- to estimate performance of BLAS, a library of FORTRAN subroutines that handles
- various vector and matrix operations. Note that vendors are free to supply
- optimized (e.g. assembly language versions) of BLAS. Linpack uses two routines
- from BLAS which are thought to be typical of the matrix operations used by
- BLAS. Both routines only use addition/subtraction and multiplication. The
- FORTRAN source code for Linpack can be obtained from the automated mail server
- netlib@ornl.gov. Linpack was compiled using MS FORTRAN 5.0 in the HUGE memory
- model (which can handle data structures larger than 64 kB) and with compiler
- switches set for maximum optimization. Linpack performs the same test
- repeatedly. The number reported is the maximum MFLOPS rate returned by Linpack.
- All floating point variables in the program were declared as DOUBLE. Linpack
- MFLOPS ratings for a great number of machines are contained in [5]. This
- PostScript document is also available from netlib@ornl.gov.
-
- WHETSTONE [1,2,3] is a synthetic benchmark based on statistics collected about
- the use of certain control and data structures in programs written in high
- level languages. Based on these statistics, Whetstone tries to mirror a
- 'typical' HLL program. Whetstone performance is expressed by how many
- theoretical 'whetstone' instructions are executed per second. It was originally
- implemented in ALGOL. Unlike LLL and Linpack, Whetstone not only uses addition
- and multiplication but exercises all basic arithmetic operations as well as
- some transcendental functions. Whetstone performance depends on the speed of
- the coprocessor as well as on the speed of the CPU, while LLL and Linpack place
- a heavier burden on the coprocessor/FPU. There exist an old and a new version
- of Whetstone. Note that results from the two versions can differ by as much as
- 20% for the same test configuration. For this test, the new version in Pascal
- from [2] was used. It was compiled with Turbo Pascal 6.0 and my own library
- (see above) with all 'optimizations' on. For the integer test, software fp
- arithmetic using the REAL type was utilized. Using the software arithmetic
- exercises only the CPU, in particular the execution of MOV, SHL, SHR, RCR,
- ADD, SUB, MUL, and DIV instructions. For the floating point test, the hardware
- floating-point arithmetic of the coprocessor was used and computations were
- performed using the DOUBLE type.
-
- SAVAGE tests the performance of transcendental function evaluation. It is
- basically a small loop in which the sin, cos, arctan, ln, exp, and sqrt
- functions are combined in a single expression. While sin, cos, arctan, and sqrt
- can be evaluated directly with a single 387 coprocessor instruction each, ln
- and exp need additional preprocessing for argument reduction and result
- conversion. According to [11], the Savage benchmark was devised by Bill Savage,
- and is distributed by: The Wohl Engine Company, Ltd., 8200 Shore Front Parkway,
- Rockaway Beach, NY 11693, USA. Usually, Savage is programmed to make 250,000
- passes though the loop. Here only 10,000 loops are executed for a total of
- 60,000 transcendental function evaluations. The result is expressed in function
- evaluations per second. SAVAGE source code was taken from [6] and compiled with
- Turbo Pascal 6.0 and my own run-time library (see above).
-
- DODUC [12] is a modified application program by Nhuan Doduc that is also part
- of the SPEC benchmark suite. It is a nuclear safety analysis code that
- simulates the time evolution of a thermohydraulic modelization for a nuclear
- reactor's component and the benchmark is created from the computational kernel
- of the original application. The benchmark consist of 5323 FORTRAN statement
- lines and the executable size is about 350 kBytes. Almost all processing is
- done using double precision floating-point values. There is not much array
- processing. Instead the program has an iterative structure with an abundance of
- short branches and small loops. The version used for this test was compiled
- with the highly optimizing NDP FORTRAN V3.0 compiler from Microway, which
- generates a 32-bit mode executable that runs in protected mode using a
- protected mode loader provided by Microway. The execution time of the DODUC
- program is set into relation with fixed reference times and the result is
- scaled to give a "Rapidity index".
-
-
- References
- ----------
-
- [1] Curnow, H.J.; Wichmann, B.A.: A synthetic benchmark.
- Computer Journal, Vol. 19, No. 1, 1976, pp. 43-49
- [2] Wichmann, B.A.: Validation code for the Whetstone benchmark.
- NPL Report DITC 107/88, National Physics Laboratory, UK, March 1988
- [3] Curnow, H.J.: Wither Whetstone? The Synthetic Benchmark after 15 Years.
- In: Aad van der Steen (ed.): Evaluating Supercomputers.
- London: Chapman and Hall 1990
- [4] Dongarra, J.J.: The Linpack Benchmark: An Explanation.
- In: Aad van der Steen (ed.): Evaluating Supercomputers.
- London: Chapman and Hall 1990
- [5] Dongarra, J.J.: Performance of Various Computers Using Standard Linear
- Equations Software.
- Report CS-89-85, Computer Science Department, University of Tennessee,
- March 11, 1992
- [6] Huth, N.: Dichtung und Wahrheit oder Datenblatt und Test.
- Design & Elektronik 1990, Heft 13, Seiten 105-110
- [7] Esser, R.; Kremer, F.; Schmidt, W.G.: Testrechnungen auf der IBM 3090E mit
- Vektoreinrichtung.
- Arbeitsbericht RRZK-8803, Regionales Rechenzentrum an der Universit"at zu
- K"oln, Februar 1988
- [8] McMahon, H.H.: The Livermore Fortran Kernels: A test of the numerical
- performance range.
- Technical Report UCRL-53745, Lawrence Livermore National
- Laboratory, USA, December 1986
- [9] Weicker, R.P.: Dhrystone: A Synthetic Systems Programming Benchmark.
- Communications of the ACM, Vol. 27, No. 10, October 1984, pp. 1013-1030
- [10] Weicker, R.P.: Dhrystone Benchmark: Rationale for Version 2 and
- Measurement Rules.
- SIGPLAN Notices. Vol. 23, No. 8, August 1988, pp. 49-62
- [11] FasMath 83D87 Benchmark Report. Cyrix Corporation, June 1990
- Order No. B2004
- [12] Doduc, Nhuan: Fortran Execution Time Benchmark.
- Unpublished manuscript, version 45, available from ndoduc@framentec.fr
-