home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.ibm.pc.hardware
- Path: sparky!uunet!paladin.american.edu!europa.asd.contel.com!darwin.sura.net!jvnc.net!yale.edu!ira.uka.de!rz.uni-karlsruhe.de!usenet
- From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa)
- Subject: What you always wanted to know about math coprocessors 3/4
- Message-ID: <1992Sep15.162827.11073@rz.uni-karlsruhe.de>
- Sender: usenet@rz.uni-karlsruhe.de (USENET News System)
- Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt.
- Date: Tue, 15 Sep 1992 16:28:27 GMT
- X-News-Reader: VMS NEWS 1.23
- Lines: 933
-
- Whetstone [2,3,4] is a synthetic benchmark based upon statistics
- collected about the use of certain control and data structures
- in programs written in high level languages. Based on these
- statistics, Whetstone tries to mirror a 'typical' HLL program.
- Whetstone performance is expressed by how many theoretical
- 'whetstone' instructions are executed per second. It was
- originally implemented in ALGOL. Unlike PEAKFLOP, LLL, and
- Linpack, Whetstone not only uses addition and multiplication
- but exercises all basic arithmetic operations as well as some
- transcendental functions. Whetstone performance depends on the
- speed of the coprocessor as well as on the speed of the CPU,
- while PEAKFLOP, LLL, and Linpack place a heavier burden on the
- coprocessor/FPU. There exists an old and a new version of
- Whetstone. Note that results from the two versions can differ
- by as much as 20% for the same test configuration. For this
- test, the new version in Pascal from [3] was used. It was
- compiled with Turbo Pascal 6.0 and my own library (see above)
- with all 'optimizations' on.
-
- SAVAGE tests the performance of transcendental function
- evaluation. It is basically a small loop in which the sin,
- cos, arctan, ln, exp, and sqrt functions are combined in a
- single expression. While sin, cos, arctan, and sqrt can be
- evaluated directly with a single 387 coprocessor instruction
- each, ln and exp need additional preprocessing for argument
- reduction and result conversion. According to [14], the Savage
- benchmark was devised by Bill Savage, and is distributed by:
- The Wohl Engine Company, Ltd., 8200 Shore Front Parkway,
- Rockaway Beach, NY 11693, USA. Usually, Savage is programmed
- to make 250,000 passes though the loop. Here only 10,000 loops
- are executed for a total of 60,000 transcendental function
- evaluations. The result is expressed in function evaluations
- per second. SAVAGE source code was taken from [7] and compiled
- with Turbo Pascal 6.0 and my own run-time library (see above).
-
-
- Benchmark results for 387 coprocessors, coprocessor emulators and
- the Intel RapidCAD and Intel 486 CPUs.
-
-
- 40 MHz PEAKFLOP TRNSFORM LLL Linpack Whetstone Savage
- MFLOPS MFLOPS MFLOPS MFLOPS kWhet/sec Func/sec
-
- 386, EM87 0.0084 0.0080 0.0060 0.0060 31 502 ##
- 386, Franke387 0.0369 0.0295 0.0233 0.0215 164 4002 $$
- 386, TP 6 Emu 0.0316 0.0273 0.0200 0.0190 160 3794 %%
- Intel 387DX 0.9204 0.7212 0.3932 0.3211 2428 52677
- ULSI 83C87 1.2093 0.7936 0.3890 0.3120 2528 56926
- IIT 3C87 1.0196 0.7145 0.3834 0.3179 2663 58766
- IIT 3C87,4x4 1.0196 1.7244 0.3834 0.3179 2663 58766 ??
- C&T 38700 1.0722 0.7908 0.4007 0.3222 2837 74906
- Cyrix 387+ 1.1305 0.8162 0.3945 0.3208 2946 80322
- Intel RapidCAD 2.2128 1.8931 0.7377 0.5432 4810 86957
- Intel 486 2.4762 2.1335 1.1110 0.8204 6195 98522
-
-
- 33.3 MHz PEAKFLOP TRNSFORM LLL Linpack Whetstone Savage
- MFLOPS MFLOPS MFLOPS MFLOPS kWhet/sec Func/sec
-
- 386, EM87 0.0070 0.0040 0.0050 0.0050 26 418 ##
- Franke387 0.0307 0.0246 0.0194 0.0179 137 3335 $$
- 386, TP 6 Emu 0.0263 0.0227 0.0167 0.0158 133 3160 %%
- Intel 387DX 0.7647 0.6004 0.3283 0.2676 2046 43860
- ULSI 83C87 1.0097 0.6609 0.3239 0.2598 2089 47431
- IIT 3C87 0.8455 0.5957 0.3198 0.2646 2203 49020
- IIT 3C87,4X4 0.8455 1.4334 0.3198 0.2646 2203 49020 ??
- C&T 38700 0.9455 0.6907 0.3338 0.2700 2376 62565
- Cyrix 387+ 0.9286 0.6806 0.3293 0.2669 2435 66890
- Cyrix 83D87 1.013 N/A 0.333 0.273 2550 N/A
- Intel RapidCAD 1.8572 1.5798 0.6072 0.4533 3953 72464
- Intel 486 2.0800 1.7779 0.9387 0.6682 5143 82192
-
-
- For comparison:
-
- PEAKFLOP TRNSFORM LLL Linpack Whetstone Savage
- MFLOPS MFLOPS MFLOPS MFLOPS kWhet/sec Func/sec
-
- i486DX2-66 4.1601 3.4227 1.6531 1.3010 10655 163934
- i486DX2-50 3.0589 2.6665 1.2537 0.9744 7962 123203
- i387, 20 MHz 0.2253 0.3271 0.1434 0.1171 952 21739 ++
- i387DX, 20 MHz 0.3567 0.4444 0.1484 0.1161 1034 24155 &&
- i80287, 5 MHz 0.0281 0.0310 0.0242 0.0222 150 3261 !!
- i8087,9.54 MHz 0.0636 0.0705 0.0321 0.0219 234 5782 **
-
- HW configuration for test of 387 coprocessors and Intel RapidCAD:
- System A: Motherboard with Forex chip set, 128 kB CPU Cache, 8 MB RAM
-
- HW configuration for test of 486 FPU (extra fan for 40 MHz operation):
- System B: Motherboard with SIS chip set, 256 kB CPU Cache, 8 MB RAM
-
- ## EM87 V1.2 by Ron Kimball is a public domain coprocessor emulator
- that loads as a TSR. It uses INT 7 traps emitted by 80286, 80386
- systems with no coprocessor upon encountering coprocessor
- instructions to catch coprocessor instructions and emulate them.
- Whetstone and Savage benchmarks for this test were compiled
- with the original TP 6.0 library, as EM87 chokes on the 387
- specific FSIN and FCOS instructions used in my own library if
- a 387 is detected. Obviously EM87 identifies itself as a 387,
- but has no support for 387 specific instructions.
- $$ Franke387 is a commercial 387 emulator that is also available in
- a shareware version. For this test, shareware version V2.4 was
- used. Franke387 unlike many other emulators supports all 387
- instructions. It is loaded as a device driver and uses INT 7
- to trap coprocessor instructions.
- %% These benchmarks were run using the built-in coprocessor emulators
- of the TP 6.0 and the MS FORTRAN 5.0 run-time libraries.
- ?? The 3C87 specific F4X4 instruction was used in the vector trans-
- formation benchmark.
- ++ Older motherboard with no chip set (discrete logic), no CPU cache,
- 16 MB RAM
- && System A, CPU cache disabled via extended set-up, turbo-switch
- set to half speed (that is, 20 MHz)
- !! 80386 @ 20 MHz / Intel 80287 @ 5 MHz, no CPU cache, 4 MB RAM
- due to the fast CPU used here, performance figures are somewhat
- higher than can be expected for a 80286/287 combination, except
- for the PEAKFLOP benchmark, which is basically coprocessor limited
- ** 8086/8087 system with 640 kB RAM
-
-
- Since neither a Weitek coprocessor nor a compiler that generates
- code for the Weitek chips were available, performance data for
- the Weitek Abacus is given here according to [31,32] and scaled to
- show performance of a 33 MHz system. The benchmarks were compiled
- using highly optimizing 32-bit compilers.
-
- Single Prec. Double Prec. Double Prec.
-
- 3167 4167 3167 4167 387 486
-
- Linpack MFLOPS 1.8 5.0 0.8 3.2 0.4 1.6
- Whetstone kWhet/sec 7470 22700 4900 14000 3290 12300
-
- Note that for the Intel coprocessors, running programs in single
- vs. double precision doesn't provide much of an performance advantage
- since all internal calculations are always done in extended precision.
- Using Weitek coprocessors however, performance nearly doubles when
- switching from double to single precision. For double precision
- calculations using only basic arithmetic, the Weitek Abacus can
- provide performance at twice the level of the respective Intel
- coprocessor (387/486) clocked at the same speed at most.
-
-
- Speed of various coprocessor instructions measured in clock cycles
- as measured with my program 87TIMES. Error is +/- one clock cycle,
- except for the Intel 80287. Times for the 80287 were determined on
- a system with a 20 MHz 80386 and a 5 MHz Intel 80287. Therefore,
- times may differ from a genuine 80286/287 system, especially for
- those instructions that access an operand in memory. Since the
- times are stated as the number of coprocessor clock cycles used,
- the faster 386 which can execute four clock cycles where the 80287
- executes one clock cycle may decrease memory access times as seen
- by the coprocessor.
-
-
- Intel Intel Cyrix Cyrix C&T ULSI IIT Intel Intel
- i486 RapidCAD 83D87 387+ 38700 83C87 3C87 387DX 80387
-
- FLD1 4 3 14 14 14 18 24 23 26
- FLDZ 4 3 14 14 14 18 24 23 31
- FLDPI 7 8 14 15 14 18 24 38 45
- FLDLG2 7 8 14 14 14 18 24 33 45
- FLDL2T 7 8 14 14 14 19 24 38 45
- FLDL2E 7 8 14 14 14 19 24 38 45
- FLDLN2 7 8 14 14 14 19 24 38 45
- FLD ST(0) 4 4 14 14 14 14 24 20 21
- FST ST(1) 3 4 14 14 14 14 19 18 22
- FSTP ST(0) 4 4 14 14 14 15 19 19 22
- FSTP ST(1) 4 4 15 15 14 15 19 20 22
- FLD ST(1) 4 4 14 14 14 14 24 18 21
- FXCH ST(1) 4 4 14 20 14 19 24 24 27
- FILD [Word] 12 16 33 37 32 42 38 47 62
- FILD [DWord] 8 11 26 26 21 32 28 35 45
- FILD [QWord] 9 15 30 30 25 36 32 34 54
- FLD [DWord] 3 5 26 26 21 23 28 20 25
- FLD [QWord] 3 7 30 30 25 27 32 24 35
- FLD [TByte] 5 11 46 46 46 46 47 46 57
- FBLD [TByte] 83 90 66 86 106 146 197 71 278
- FIST [Word] 31 31 37 40 37 42 51 69 90
- FIST [DWord] 29 30 35 40 35 40 49 66 84
- FST [DWord] 7 7 35 37 32 40 33 37 40
- FST [QWord] 8 9 43 43 39 47 40 45 51
- FISTP [Word] 32 32 42 40 37 43 46 70 90
- FISTP [DWord] 31 31 40 40 35 41 50 67 87
- FISTP [QWord] 29 29 44 44 42 48 56 73 92
- FSTP [DWord] 8 8 38 36 32 41 35 38 43
- FSTP [QWord] 9 9 46 43 39 48 42 46 49
- FSTP [TByte] 8 8 50 45 49 50 48 53 58
- FBSTP [TByte] 170 172 98 98 114 129 218 144 533
- FINIT 17 31 15 16 15 15 16 16 25
- FCLEX 7 20 15 16 16 16 16 16 25
- FCHS 7 8 14 15 14 14 19 30 33
- FABS 5 5 14 15 14 14 19 30 33
- FXAM 12 13 14 15 14 14 19 39 43
- FTST 5 5 19 25 14 24 24 34 38
- FSTENV 67 82 125 125 124 132 124 159 165
- FLDENV 44 59 106 106 112 120 106 119 129
- FSAVE 181 169 355 355 374 361 376 469 511
- FRSTOR 130 203 358 358 385 372 371 420 456
- FSTSW [mem] 4 5 14 14 14 14 14 14 17
- FSTSW AX 3 4 12 12 11 11 11 11 14
- FSTCW [mem] 4 5 14 14 13 13 13 14 18
- FLDCW [mem] 4 11 26 26 31 32 27 32 36
- FADD ST,ST(0) 8 9 19 20 19 19 24 24 32
- FADD ST,ST(1) 9 9 19 20 19 18 24 20 32
- FADD ST(1),ST 10 10 19 20 19 18 24 24 37
- FADDP ST(1),ST 11 11 19 19 19 16 24 25 37
- FADD [DWord] 9 10 25 28 22 23 23 21 34
- FADD [QWord] 9 10 32 32 26 27 27 25 38
- FIADD [Word] 20 21 34 34 33 40 40 52 80
- FIADD [DWord] 20 21 27 28 27 30 30 37 61
- FSUB ST(1),ST 10 10 19 20 19 19 24 24 38
- FSUBR ST(1),ST 9 10 19 22 19 19 24 27 38
- FSUBRP ST(1),ST 10 10 19 19 22 20 24 25 38
- FSUB [DWord] 11 12 27 28 27 23 29 27 32
- FSUB [QWord] 11 12 32 32 31 27 33 26 44
- FISUB [Word] 21 21 34 34 34 40 40 52 80
- FISUB [DWord] 21 22 27 28 27 29 30 40 60
- FMUL ST,ST(1) 16 17 19 25 24 24 29 38 57
- FMUL ST(1),ST 16 17 19 24 24 24 29 40 62
- FMULP ST(1),ST 17 17 19 24 24 25 29 40 58
- FIMUL [Word] 22 23 40 40 37 46 46 52 80
- FIMUL [DWord] 22 23 27 28 27 36 35 45 68
- FMUL [DWord] 11 12 27 28 27 28 29 25 45
- FMUL [QWord] 14 15 32 32 31 32 33 37 61
- FDIV ST,ST(0) 73 74 26 40 59 54 54 89 100
- FDIV ST,ST(1) 73 74 36 45 59 54 54 77 100
- FDIV ST(1),ST 73 74 36 45 59 55 54 78 102
- FDIVR ST(1),ST 73 74 36 45 59 54 54 77 102
- FDIVRP ST(1),ST 73 74 36 44 59 55 54 76 106
- FIDIV [Word] 84 85 52 58 75 76 76 105 141
- FIDIV [DWord] 84 85 45 46 65 65 65 101 123
- FDIV [DWord] 73 74 45 46 63 56 59 77 101
- FDIV [QWord] 73 74 50 50 67 60 63 78 103
- FSQRT (0.0) 25 25 19 19 14 19 24 29 37
- FSQRT (1.0) 83 84 36 74 54 89 59 109 132
- FSQRT (L2T) 86 87 36 74 54 89 59 104 137
- FXTRACT (L2T) 17 17 19 19 19 28 79 53 72
- FSCALE (PI,5) 30 30 36 24 24 49 79 59 82
- FRNDINT (PI) 31 31 19 29 24 34 29 49 82
- FPREM (99,PI) 58 59 54 99 44 54 49 79 96
- FPREM1(99,PI) 90 91 54 99 44 59 54 104 121
- FCOM 5 6 15 20 19 25 19 29 32
- FCOMP 6 6 15 19 19 25 19 30 33
- FCOMPP 7 7 15 19 19 25 19 31 40
- FICOM [Word] 16 17 34 34 33 46 34 58 76
- FICOM [DWord] 16 16 21 28 21 35 23 45 57
- FCOM [DWord] 5 6 21 28 22 23 23 27 34
- FCOM [QWord] 5 8 27 32 25 27 27 31 39
- FSIN (0.0) 24 24 14 99 14 19 24 39 43
- FSIN (1.0) 310 313 114 164 144 494 219 509 596
- FSIN (PI) 88 89 118 189 64 64 214 134 152
- FSIN (LG2) 292 295 72 89 139 454 184 449 531
- FSIN (L2T) 299 302 123 179 164 469 214 454 536
- FCOS (0.0) 24 24 19 159 14 19 24 34 42
- FCOS (1.0) 302 305 84 104 139 489 214 459 547
- FCOS (PI) 88 89 154 254 64 64 224 199 232
- FCOS (LG2) 300 303 108 149 139 454 194 504 583
- FCOS (L2T) 307 310 159 239 164 469 224 509 601
- FSINCOS (0.0) 25 25 14 19 19 18 34 38 55
- FSINCOS (1.0) 353 356 124 174 254 493 419 538 636
- FSINCOS (PI) 105 106 162 263 79 68 424 228 277
- FSINCOS (LG2) 340 343 119 159 249 458 359 533 627
- FSINCOS (L2T) 347 350 168 248 274 473 424 538 646
- FPTAN (0.0) 25 25 14 19 19 18 29 38 46
- FPTAN (1.0) 266 269 119 149 184 538 309 323 396
- FPTAN (PI) 145 146 134 228 104 108 304 168 211
- FPTAN (LG2) 244 246 94 129 179 498 274 298 363
- FPTAN (L2T) 247 249 139 219 204 513 304 298 365
- FPATAN (0.0) 38 39 19 24 19 20 29 95 93
- FPATAN (1.0) 294 298 124 159 29 375 604 360 433
- FPATAN (PI) 304 308 139 188 279 360 424 375 472
- FPATAN (LG2) 290 293 128 154 269 365 379 375 448
- FPATAN (L2T) 304 308 144 189 274 359 424 375 468
- F2XM1 (0.0) 25 25 14 14 14 19 24 34 37
- F2XM1 (LN2) 209 211 89 119 169 394 284 299 348
- F2XM1 (LG2) 204 206 78 104 159 379 284 294 337
- FYL2X (1.0) 60 61 36 39 24 75 94 115 127
- FYL2X (PI) 294 297 108 163 249 450 359 395 504
- FYL2X (LG2) 311 314 108 159 249 460 339 410 518
- FYL2X (L2T) 293 296 108 164 249 439 359 390 501
- FYL2XP1 (LG2) 334 337 99 169 234 460 284 435 538
-
-
-
- 80386 + 80386 + 80386 +
- Intel Intel Franke387 TP 6.0 EM87
- 8087 80287 Emulator Emulator Emulator
-
- FSTP ST(0) | 26 54 507 358 2115
- FLD1 | 26 55 481 422 1626
- FLDZ | 21 53 480 416 1646
- FLDPI | 26 55 486 443 1626
- FLDLG2 | 26 56 486 423 1626
- FLDL2T | 26 55 486 440 1626
- FLDL2E | 26 53 486 423 1626
- FLDLN2 | 26 55 486 441 1626
- FLD ST(0) | 31 55 493 362 1851
- FST ST(1) | 26 54 489 355 1931
- FSTP ST(1) | 21 55 507 356 2116
- FLD ST(1) | 26 55 493 362 1852
- FXCH ST(1) | 21 57 497 486 2187
- FILD [Word] | 58 90 667 712 2259
- FILD [DWord] | 64 74 608 812 2164
- FILD [QWord] | 74 93 652 707 2971
- FLD [DWord] | 49 44 633 473 2077
- FLD [QWord] | 54 57 641 524 2336
- FLD [TByte] | 59 45 607 492 2063
- FBLD [TByte] | 309 310 2019 1512 17827
- FIST [Word] | 79 72 854 766 2418
- FIST [DWord] | 84 80 865 518 2325
- FST [DWord] | 89 85 686 441 2200
- FST [QWord] | 99 92 703 516 2481
- FISTP [Word] | 79 80 864 794 2620
- FISTP [DWord] | 79 81 879 541 2523
- FISTP [QWord] | 88 75 904 916 3226
- FSTP [DWord] | 89 75 713 467 2400
- FSTP [QWord] | 93 72 732 538 2678
- FSTP [TByte] | 49 21 685 467 2124
- FBSTP [TByte] | 528 472 3305 1555 27013
- FINIT | 11 10 742 641 1369
- FCLEX | 11 10 440 323 912
- FCHS | 21 54 460 354 1744
- FABS | 21 54 456 349 1738
- FXAM | 21 54 481 380 1551
- FTST | 51 75 585 386 2721
- FSTENV | 54 57 928 519 2104
- FLDENV | 48 50 1125 450 1631
- FSAVE | 214 244 1949 976 2749
- FRSTOR | 209 227 2182 657 2225
- FSTSW [mem] | 28 10 516 401 1189
- FSTSW AX | N/A 55 451 N/A N/A
- FSTCW [mem] | 28 10 506 359 1167
- FLDCW [mem] | 19 47 524 437 1584
- FADD ST,ST(0) | 86 128 643 706 2805
- FADD ST,ST(1) | 85 116 707 808 3093
- FADD ST(1),ST | 92 131 664 812 3146
- FADDP ST(1),ST | 92 129 704 799 3143
- FADD [DWord] | 105 122 874 969 3139
- FADD [QWord] | 115 122 888 1021 3396
- FIADD [Word] | 115 122 940 1211 3330
- FIADD [DWord] | 125 122 882 1297 3215
- FSUB ST(1),ST | 88 130 738 817 3156
- FSUBR ST(1),ST | 96 132 740 868 3004
- FSUBRP ST(1),ST | 99 132 733 805 3301
- FSUB [DWord] | 119 122 918 1018 3127
- FSUB [QWord] | 129 123 932 1070 3632
- FISUB [Word] | 115 123 977 1081 3802
- FISUB [DWord] | 125 125 940 980 4161
- FMUL ST,ST(1) | 145 151 810 1368 3924
- FMUL ST(1),ST | 145 151 817 1377 3962
- FMULP ST(1),ST | 148 168 840 1365 4164
- FIMUL [Word] | 132 151 1039 1517 4039
- FIMUL [DWord] | 141 151 980 1643 3976
- FMUL [DWord] | 125 123 948 1480 3445
- FMUL [QWord] | 175 192 991 1602 4416
- FDIV ST,ST(0) | 201 207 726 1536 9789
- FDIV ST,ST(1) | 203 218 808 1658 10332
- FDIV ST(1),ST | 207 214 825 1655 10342
- FDIVR ST(1),ST | 201 206 819 1806 10213
- FDIVRP ST(1),ST | 201 205 845 1803 10409
- FIDIV [Word] | 237 227 980 1779 11225
- FIDIV [DWord] | 246 227 944 1680 11572
- FDIV [DWord] | 229 226 893 1722 10577
- FDIV [QWord] | 236 227 993 1777 10829
- FSQRT (0.0) | 21 57 512 382 1755
- FSQRT (1.0) | 186 206 1106 2504 37836
- FSQRT (L2T) | 186 207 1398 2467 37925
- FXTRACT (L2T) | 51 56 726 571 3326
- FSCALE (PI,5) | 41 56 817 443 3194
- FRNDINT (PI) | 51 58 808 800 7092
- FPREM (99,PI) | 81 131 1696 941 4098
- FPREM1(99,PI) | N/A N/A 1625 N/A N/A
- FCOM | 56 75 582 483 2799
- FCOMP | 61 92 616 485 2983
- FCOMPP | 61 90 661 476 3198
- FICOM [Word] | 79 77 808 861 3654
- FICOM [DWord] | 89 77 750 964 3684
- FCOM [DWord] | 74 75 741 625 3643
- FCOM [QWord] | 74 76 754 667 3771
- FSIN (0.0) | N/A N/A 639 N/A N/A
- FSIN (1.0) | N/A N/A 4640 N/A N/A
- FSIN (PI) | N/A N/A 2488 N/A N/A
- FSIN (LG2) | N/A N/A 3911 N/A N/A
- FSIN (L2T) | N/A N/A 3767 N/A N/A
- FCOS (0.0) | N/A N/A 740 N/A N/A
- FCOS (1.0) | N/A N/A 4777 N/A N/A
- FCOS (PI) | N/A N/A 2557 N/A N/A
- FCOS (LG2) | N/A N/A 4176 N/A N/A
- FCOS (L2T) | N/A N/A 3905 N/A N/A
- FSINCOS (0.0) | N/A N/A 714 N/A N/A
- FSINCOS (1.0) | N/A N/A 6049 N/A N/A
- FSINCOS (PI) | N/A N/A 4091 N/A N/A
- FSINCOS (LG2) | N/A N/A 5640 N/A N/A
- FSINCOS (L2T) | N/A N/A 5405 N/A N/A
- FPTAN (0.0) | 41 58 752 8381 2324
- FPTAN (1.0) | 581 582 6366 10817 29824
- FPTAN (PI) | 606 587 4388 12410 2300
- FPTAN (LG2) | 516 513 5939 12502 26770
- FPTAN (L2T) | 576 586 5723 12483 2301
- FPATAN (0.0) | 41 55 616 1208 10578
- FPATAN (1.0) | 736 736 1426 13446 34208
- FPATAN (PI) | 206 207 12835 13305 46903
- FPATAN (LG2) | 756 736 12490 13319 41312
- FPATAN (L2T) | 206 204 12922 13364 50149
- F2XM1 (0.0) | 16 56 563 723 1722
- F2XM1 (LN2) | 631 624 4178 11070 33823
- F2XM1 (LG2) | 611 585 4798 11116 32163
- FYL2X (1.0) | 56 57 961 1214 4327
- FYL2X (PI) | 946 961 8987 12858 40148
- FYL2X (LG2) | 1081 1038 8933 12748 46821
- FYL2X (L2T) | 926 886 8982 12712 38986
- FYL2XP1 (LG2) | 1026 1037 10485 11867 44708
-
- The Weitek 3167 and 4167 coprocessors only implement the basic
- arithmetic functions (add, subtract, multiply, divide, square
- root) in hardware. Transcendental functions are implemented
- by means of a software library supplied by Weitek that uses
- the Weitek hardware to approximate the transcendental functions
- with polynomial and rational approximations. The clock cycle
- timings for the transcendental functions are average values,
- since execution time differs with the value of argument. The
- speed of transcendental functions for the 4167 is estimated
- based on the numbers in [31,33], from which this timing
- information has been extracted.
-
-
- Execution time for floating-point operations in clock cycles on
- Weitek coprocessors
-
- Single Precision Double Precision
-
- 3167 4167 3167 4167
-
- ABS 3 2 3 2
- NEG 6 2 6 2
- ADD 6 2 6 2
- SUB 6 2 6 2
- SUBR 6 2 6 2
- MUL 6 2 10 3
- DIVR 38 17 66 31
- SQRT 60 17 118 31
- SIN 146 ~50 292 ~100
- COS 140 ~50 285 ~100
- TAN 188 ~60 340 ~110
- EXP 179 ~60 401 ~130
- LOG 171 ~60 365 ~120
- F->ASCII 1000 N/A 1700 N/A //
- ASCII->F 1100 N/A 1800 N/A //
-
- // rough average of the timings given for different numeric
- formats by Weitek. Note that these conversions routines
- do much more work than the FBLD and FBSTP instructions
- provided by the 80x87 coprocessors. FBLD and FBSTP are
- useful for conversion routines but quite a bit of additional
- code is need for this purpose.
-
-
- Accuracy
-
- The IEEE-754 Standard for Binary Floating-Point Arithmetic [10,11]
- is fully implemented by Intel's 387 coprocessor [17]. Among other
- things, this means that the add, subtract, multiply, divide,
- remainder, and square root operations always deliver the 'exact'
- result. By exact it is meant that the coprocessor always delivers
- the machine number closest to the real result, which may not
- be representable exactly in the available numeric format. The
- 80387 implements the single, double, and double extended formats
- as specified in the standard as well as all functions required
- by it [17]. Note that earlier Intel coprocessors (the 8087 and
- the 80287) comply with a draft version of the standard that differs
- from the final version. These chips came out before the IEEE-754
- standard was finally accepted in 1985. As in the 80387, the basic
- arithmetic in the 8087 and the 80287 is exact in the sense that
- the computed result is always the machine number closest to the
- real result. However, there are some differences regarding certain
- operands like infinities and some operations like the remainder are
- defined differently. Some instructions have been added in the 80387,
- most notably the FSIN and FCOS operations. The argument range for
- some transcendental function has been extended [17]. Note that the
- IEEE-754 standard says nothing about the quality of the implementation
- of transcendental functions like sin, cos, tan, arctan, log. Intel
- uses a modified CORDIC [18,19] technique to compute the transcendental
- functions. Intel claims that maximum error in the 8087, 80287, and
- 80387 for all transcendental functions does not exceeed two bits
- in the mantissa of the double extended format, which features 64
- mantissa bits for an accuracy of approximately 19 decimal places
- [22,23]. This claim has been independently verified by a competing
- vendor [13]. This means that at least 62 of the 64 mantissa bits
- in a transcendental function result are correct.
-
- The Weitek Abacus 3167 and 4167 are 'mostly compatible' with
- IEEE-754 [31,32,33]. They support the single precision and double
- precision numeric formats formats described in the standard as
- well as the four rounding modes required by it. However, due to
- the need for extremely high speed operation, some of the finer
- points of IEEE-754 have not been implemented. One of the most
- notable omissions is the missing support for denormal numbers.
- Denormals are always flushed to zero.
-
- The 387 clone makers claim 100% compatibility with Intel's 80387.
- So one would expect the same accuracy from their chips. For example,
- on the packaging of the IIT 3C87 it says that ".. the requirements
- of ANSI/IEEE standards are fulfilled and exceeded". Cyrix states
- that their 83D87 complies fully with the IEEE-754 standard [12].
- Cyrix delivers with their copocessors some diagnostic software.
- This includes the program IEEETEST which is based on the IEEE test
- vectors from the Ph.D. thesis of Jerome T. Coonen [9]. A test using
- the IEEE test vectors has also been included into the RUNDIAG
- program on the Intel RapidCAD diagnostic disk. Rather than performing
- random tests, the test vectors check specific cases that may
- be hard to get right. Each test vector specifies the operation
- to be performed, the operands, precision and rounding mode to be
- used, and the result (including flags set) to be expected according
- to IEEE-754. I ran IEEETEST on all the available coprocessors/FPUs.
- The Intel 486, Intel RapidCAD, Intel 387, Intel 387DX, Cyrix 83D87,
- and the Cyrix 387+ passed with no errors. The ULSI 83C87 showed
- some minor flaws in the FCOM, FDIV, FMUL, and FSCALE operations,
- getting flag errors in about 1% of the tested cases, but no
- computational errors. However, for the IIT 3C87, the IEEETEST
- program showed flag *and* some computational errors (that is, wrong
- results) for all tested operations except FXTRACT and FCHS. The Intel
- 80287 shows numerous errors, but this it not surprising, since the
- 80287 does not comply with IEEE-754 but with an earlier draft of that
- standard, so it does some thing differently than required by the final
- version of the standard.
-
- Although IEEETEST is written in Turbo Pascal, the coprocessor
- emulator in the TP 6.0 library could not be tested since IEEETEST
- was compiled with the $E- switch excluding the emulator from
- program code. The public domain emulator EM87 could be tested, but
- hung in the last test which checks the implementation of the
- remainder operation. This is probably caused by some bug in the
- emulation of the FPREM instruction tested in this test. It is
- interesting to note how the error profile of EM87 matches exactly
- that of the Intel 80287, so it can be assumed that EM87 is a very
- good emulation of the 80287. The Franke387 V2.4 emulator hung in
- the division test quite early in IEEETEST. The tests performed
- up to the division test reported several errors.
-
-
-
- Explanatory text printed at the start of the IEEETEST program:
-
- JT Coonen's 1984 UC Berkeley Ph.D. thesis centers around his
- activities as a member of the floating-point working group that
- defined the IEEE 754-1985 Standard for Binary Floating-Point
- Arithmetic. Appendix C of his thesis presents FPTEST, a Pascal
- program written by J Thomas and JT Coonen. IEEETEST is a port of
- FPTEST and runs on PCs whose math coprocessor accepts 80387
- compatible floating-point instructions.
-
- IEEETEST reads test vectors from the file TESTVECS and compares
- the answer returned by the math coprocessor with the answer listed
- in the test vector. If these answers differ an 'F' is displayed,
- otherwise a '.'is displayed. Answers can differ due to two types
- of failures: numeric failures or flag failures. Numeric failures
- occur when the computed answer has the wrong value. Flag failures
- occur when the status (invalid operation, divide by zero, underflow,
- overflow, inexact) is incorrectly identified.
-
- TESTVECS is the concatenation of unmodified versions of all the
- test vectors distributed by UC Berkeley. The test data base is
- copyrighted by UC Berkeley (1985) and is being distributed with
- their permission. FPTEST and the test data base can be obtained
- by asking for 'IEEE-754 Test Vector' from UC Berkeley, Electrical
- Engineering and Computer Science, Industrial Liaison Program,
- 479 Corey Hall, Berkeley, CA, 94720 (415)643-6687.
-
- The initial version of this test data base for the proposed IEEE
- 754 binary floating-point standard (draft 8.0) was developed for
- Zilog, Inc. and was donated to the floating-point working group
- for dissemination. Errors in or additions to the distributed data
- base should be reported to the agency of distribution, with copies
- to Zilog, Inc., 1315 Dell Avenue, Campbell, CA, 95008.
-
-
- IEEETEST output for Intel 80387, Intel 387DX, Intel 486, C&T 38700,
- Cyrix 83D87, Cyrix 387+, Intel RapidCAD
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 3528 0 | 0 0 0 | 0 0 0
- Comparison C | 4320 0 | 0 0 0 | 0 0 0
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4311 0 | 0 0 0 | 0 0 0
- Fraction Part F | 624 0 | 0 0 0 | 0 0 0
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3978 0 | 0 0 0 | 0 0 0
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 2832 0 | 0 0 0 | 0 0 0
- Round to Integer I | 558 0 | 0 0 0 | 0 0 0
- Scalb S | 948 0 | 0 0 0 | 0 0 0
- Square Root V | 744 0 | 0 0 0 | 0 0 0
- Subtraction - | 3528 0 | 0 0 0 | 0 0 0
- Remainder % | 2984 0 | 0 0 0 | 0 0 0
- Totals | 31235 0 |
-
-
- IEEETEST output for ULSI 83C87 (manufactured 91/48)
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 3528 0 | 0 0 0 | 0 0 0
- Comparison C | 4312 8 | 0 0 0 | 0 0 8
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4250 61 | 0 0 0 | 28 28 5
- Fraction Part F | 624 0 | 0 0 0 | 0 0 0
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3936 42 | 0 0 0 | 19 19 4
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 2828 4 | 0 0 0 | 0 0 4
- Round to Integer I | 558 0 | 0 0 0 | 0 0 0
- Scalb S | 930 18 | 0 0 0 | 6 6 6
- Square Root V | 744 0 | 0 0 0 | 0 0 0
- Subtraction - | 3528 0 | 0 0 0 | 0 0 0
- Remainder % | 2984 0 | 0 0 0 | 0 0 0
- Totals | 31102 133 |
-
-
- IEEETEST output for ULSI 83S87 (manufactured 92/17)
- (data kindly supplied by Bengt Ask, f89ba@efd.lth.se)
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 3528 0 | 0 0 0 | 0 0 0
- Comparison C | 4320 0 | 0 0 0 | 0 0 0
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4296 15 | 0 0 0 | 5 5 5
- Fraction Part F | 624 0 | 0 0 0 | 0 0 0
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3966 12 | 0 0 0 | 4 4 4
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 2828 4 | 0 0 0 | 0 0 4
- Round to Integer I | 558 0 | 0 0 0 | 0 0 0
- Scalb S | 930 18 | 0 0 0 | 6 6 6
- Square Root V | 744 0 | 0 0 0 | 0 0 0
- Subtraction - | 3528 0 | 0 0 0 | 0 0 0
- Remainder % | 2984 0 | 0 0 0 | 0 0 0
- Totals | 31102 45 |
-
-
- IEEETEST output for IIT 3C87 (manufactured 92/20)
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 200 16 | 0 0 16 | 0 0 0
- Addition + | 3336 192 | 0 0 128 | 0 0 96
- Comparison C | 4224 96 | 0 0 96 | 0 0 0
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4159 152 | 0 0 124 | 0 0 116
- Fraction Part F | 600 24 | 0 0 24 | 0 0 24
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3702 276 | 0 0 248 | 0 0 100
- Negation - | 200 16 | 0 0 16 | 0 0 0
- Next After N | 2248 584 | 0 0 584 | 0 0 168
- Round to Integer I | 542 16 | 0 0 4 | 0 0 16
- Scalb S | 874 74 | 5 5 44 | 8 8 20
- Square Root V | 688 56 | 0 0 56 | 0 0 56
- Subtraction - | 3336 192 | 0 0 128 | 0 0 96
- Remainder % | 2844 140 | 0 0 140 | 0 0 116
- Totals | 29401 1834 |
-
-
- IEEETEST output for Intel 80287 run together with a 80386 CPU
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 2886 642 | 16 16 112 | 174 174 174
- Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 3777 534 | 18 18 37 | 169 169 165
- Fraction Part F | 552 72 | 24 24 24 | 24 24 24
- Logb L | 900 60 | 12 12 12 | 20 20 20
- Multiplication * | 2944 1034 | 105 105 197 | 303 303 231
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 348 2484 | 768 768 768 | 504 504 526
- Round to Integer I | 546 12 | 0 0 0 | 4 4 4
- Scalb S | 663 285 | 45 43 26 | 102 98 46
- Square Root V | 720 24 | 4 4 4 | 8 8 8
- Subtraction - | 2886 642 | 16 16 112 | 174 174 174
- Remainder % | 708 2276 | 768 768 560 | 216 216 216
- Totals | 18850 12385 |
-
-
- IEEETEST output for EM87 coprocessor emulator run on an Intel 386 CPU
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 2886 642 | 16 16 112 | 174 174 174
- Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 3777 534 | 18 18 37 | 169 169 165
- Fraction Part F | 552 72 | 24 24 24 | 24 24 24
- Logb L | 900 60 | 12 12 12 | 20 20 20
- Multiplication * | 2944 1034 | 105 105 197 | 303 303 231
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 348 2484 | 768 768 768 | 504 504 526
- Round to Integer I | 546 12 | 0 0 0 | 4 4 4
- Scalb S | 663 285 | 45 43 26 | 102 98 46
- Square Root V | 720 24 | 4 4 4 | 8 8 8
- Subtraction - | 2886 642 | 16 16 112 | 174 174 174
-
-
- To complement the checks done by IEEETEST I wrote some short
- programs DENORMTS, RCTRL, PCTRL in Turbo Pascal 6.0 that test
- the following features:
-
- 1. support for denormals in all precisions (single, double, extended)
- 2. support for the four IEEE rounding modes (up, down, nearest, chop)
- 3. support for precision control
-
- Note that passing all tests is required for IEEE conformance and
- required for compatibility with Intel's coprocessors. Precision
- control forces the results of the FADD, FSUB, FMUL, FDIV, and FSQRT
- instruction to be rounded to the specified precision (single, double,
- double extended). This feature is provided to obtain compatibility
- with certain programming languages [17]. By specifying lower
- precision, one effectively nullifies the advantages of extended
- precision intermediate results. The IEEE-754 standard for floating
- point arithmetic demands that processors/fp-packages that can not
- store the result of operations *directly* to single and double
- precision location must provide precision control. The programs
- that test precision control and rounding control are designed to
- return a different result for each of the modes for the same sequence
- of operation. The source code of the programs can be found in appendix
- A. The Intel 8087 and 80287 were not tested with DENORMTS since Turbo
- Pascal does not support extended precision denormals on 8087/80287
- processors, so the denormal test fails anyway. The 8087 and 287
- pass the RCTRL and PCTRL tests, though.
-
-
- These are the results for the Intel 387, Intel 387DX, Intel 486,
- Intel RapidCAD, Cyrix 83D87, Cyrix 387+, C&T 38700, and the EM87
- emulator (on a 80386 machine)
-
- Precision Control SINGLE 1.13311278820037842E+0000
- DOUBLE 1.23456789006442125E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals supported
- EXTENDED denormal prints as: 1.31640625000000E-4934
- Denormal should be printed as 1.3164...E-4934
-
-
- These are the results for the ULSI 83C87
-
- Precision Control SINGLE 1.23456789012337585E+0000
- DOUBLE 1.23456789012337585E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals supported
- EXTENDED denormal prints as: 1.31640625000000E-4934
- Denormal should be printed as 1.3164...E-4934
-
-
- These are the results for the IIT 3C87
-
- Precision Control SINGLE 1.13311278820037842E+0000
- DOUBLE 1.23456789006442125E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals not supported
-
-
- These are the results for the TP 6.0 coprocessor emulator:
-
- Precision Control SINGLE 1.23456789012351396E+0000
- DOUBLE 1.23456789012351396E+0000
- EXTENDED 1.23456789012351396E+0000
-
- Rounding Control NEAREST -1.23457766383395931E+0100
- DOWN -1.23457766383395931E+0100
- UP -1.23457766383395931E+0100
- CHOP -1.23457766383395931E+0100
-
- Denormal support
-
- SINGLE denormals not supported
- DOUBLE denormals not supported
- EXTENDED denormals not supported
-
-
- The test results show that the IIT 3C87 does not conform to the
- IEEE-754 floating-point standard in that it does not support
- denormals in double extended precision. The ULSI 83C87 does not
- conform to that standard in that it does not support precision
- control, but uses double extended precision for all operations.
- The TP 6.0 emulator supports neither precision control, rounding
- control nor support for any denormals. In addition, its basic
- arithmetic operations do not seem to conform to the IEEE standard
- as the results of the test programs differ from that of any result
- computed by a coprocessor for any mode.
-
-
- With regard to the accuracy of transcendental functions, Cyrix
- claims that the relative error of the transcendental functions
- on the 83D87 never exceeds 0.5 units in the last place (0.5 ULP)
- of the double extended format [13]. This means that the maximum
- relative error is below 2**-64, while Intel's published error
- limit is 2**-62. While Intel uses a modified CORDIC algorithm
- [18,19] to compute the transcendental functions, Cyrix uses
- rational approximations that utilize a very fast array multiplier.
- For an explanation why this approach is superior to CORDIC with
- todays technology, see [61]. Also, Cyrix uses an internal 75 bit
- data path for the mantissa [15], so intermediate computations in
- the generation of transcendental function values will enjoy some
- additional accuracy over the 64 bits provided by the double
- extended format. Using 75 mantissa bits also provides an advantage
- over other coprocessors like the Intel 387DX and ULSI 83C87 which
- use only a 68 bit data path for the mantissa [58,59]. Note that a
- maximum relative error of 0.5 ULP for the Cyrix coprocessor does
- not mean that it returns the 'exact' result (machine number closest
- to infinitely precise result) all the time. Just consider the case
- where the infinitely precise result of a transcendental function
- falls nearly half way between two machine numbers. A relative error
- of 0.5 ULP can cause the result to be either of the numbers after
- rounding, depending on the direction of the error. But the 83D87
- should deliver results that never differ from the 'exact' result
- by more than one ULP. Please note that the claim of relative error
- being below 0.5 ULPs is slightly exaggerated. 0.6 ULPs would be a
- more realistic error limit. Imagine that the infinitely precise
- result for some argument to a transcendental was xxx..xxx1001...
- (where the xxx...xxx represent the first 64 bits of the result),
- but that the coprocessor computes the result as xxx..xxx0111 and
- then round this down to xxx..xxx0000. Then the relative error is
- (1001b-0b)/1000b = 0.5625 ULPs. I tested some of the transcendental
- functions of the Cyrix 387+ and found the relative error to be
- always below 0.6 ULPs. Cyrix also claims that its transcendental
- functions satisfy the monotonicity criterion [13], a claim not
- made by any of the competitors, which does not mean that the
- transcendental functions on the other 387 compatibles may not be
- monotonic, too. Monotonicity means that for all x1 > x2, it always
- follows that f(x1) >= f(x2) for an increasing function like sin
- on [0..pi/4]. Likewise, for a decreasing function like cos on
- [0..pi/4], for all x1 > x2, it follows that f(x1) <= f(x2).
-
- The Weitek Abacus 3167 and 4167 implement only the basic arithmetic
- operations (add, subtract, negate, multiply, divide, square root)
- in hardware. Transcendental functions are provided via a software
- library provided by Weitek. For these library functions Weitek
- claims a maximum relative error of 5 ULPs [31,33] (ULP = Unit in
- the Last Place, numeric weight of the least significant mantissa
- bit). This means that the last three bits in the mantissa of a
- double precision result can be wrong. Note that the Intel 387 and
- compatible math coprocessors generate the transcendental functions
- with a small relative error with regard to the *extended double
- precision* format. Thus, when rounded to double precision, their
- function values are nearly always 'exact'. 387 type coprocessors
- have superior accuracy when compared with Weitek's coprocesssors.
-
- The test diskette distributed with early versions of the
- Cyrix 83D87 contained a program TRANCK that checks the
- accuracy of the transcendental functions in the coprocessor
- against a more precise software arithmetic [16]. I used this
- program to compare the accuracy of the transcendental functions
- on those 287/387/486 coprocessors/FPUs available to me. As TRANCK
- will not accept negative numbers as intervall limits, I tested
- each function on an intervall along the positive x-axis. The
- functions tested are F2XM1 (2**x-1), FSIN (sine), FCOS (cosine),
- FPTAN (tangent), FPATAN (arctangent), FYL2X (y * log2 (x)),
- and FYL2XP1 (y * log2 (x+1)). These are all the transcendental
- functions implemented on the 80387. Note that the square root
- (FSQRT) is *not* a transcendental function. For every function,
- 100,000 arguments were evaluated. The arguments were uniformally
- distributed within the intervall tested. The EM87 emulator could
- not be checked with TRANCK, since the multiple precision package
- in TRANCK would always return with an error message immediately.
- However, the Franke387 emulator could be tested.
-
-
-