home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!kithrup!hoptoad!pacbell.com!mips!swrinde!elroy.jpl.nasa.gov!usc!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news
- From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa)
- Newsgroups: comp.sys.intel
- Subject: What you always wanted to know about math coprocessors for 80x86 3/4
- Message-ID: <16tnqrINNcs1@iraul1.ira.uka.de>
- Date: 19 Aug 92 15:02:51 GMT
- Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt.
- Lines: 971
- NNTP-Posting-Host: irav1.ira.uka.de
- X-News-Reader: VMS NEWS 1.23
-
-
- HW configuration for test of 387 coprocessors and Intel RapidCAD:
- System A: Motherboard with Forex chip set, 128 kB CPU Cache, 8 MB RAM
-
- HW configuration for test of 486 FPU (extra fan for 40 MHz operation):
- System B: Motherboard with SIS chip set, 256 kB CPU Cache, 8 MB RAM
-
- ## EM87 V1.2 by Ron Kimball is a public domain coprocessor emulator
- that loads as a TSR. It uses INT 7 traps emitted by 80286, 80386
- systems with no coprocessor upon encountering coprocessor
- instructions to catch coprocessor instructions and emulate them.
- Whetstone and Savage benchmarks for this test were compiled
- with the original TP 6.0 library, as EM87 chokes on the 387
- specific FSIN and FCOS instructions used in my own library if
- a 387 is detected. Obviously EM87 identifies itself as a 387,
- but has no support for 387 specific instructions.
- $$ Franke387 is a commercial 387 emulator that is also available in
- a shareware version. For this test, shareware version V2.4 was
- used. Franke387 unlike many other emulators supports all 387
- instructions. It is loaded as a device driver and uses INT 7
- to trap coprocessor instructions.
- %% These benchmarks were run using the built-in coprocessor emulators
- of the TP 6.0 and the MS FORTRAN 5.0 run-time libraries.
- ?? The 3C87 specific F4X4 instruction was used in the vector trans-
- formation benchmark.
- ++ Older motherboard with no chip set (discrete logic), no CPU cache,
- 16 MB RAM
- && System A, CPU cache disabled via extended set-up, turbo-switch
- set to half speed (that is, 20 MHz)
- !! 80386 @ 20 MHz / Intel 80287 @ 5 MHz, no CPU cache, 4 MB RAM
- due to the fast CPU used here, performance figures are somewhat
- higher than can be expected for a 80286/287 combination, except
- for the PEAKFLOP benchmark, which is basically coprocessor limited
- ** 8086/8087 system with 640 kB RAM
-
-
- Since neither a Weitek coprocessor nor a compiler that generates
- code for the Weitek chips were available, performance data for
- the Weitek Abacus are given here according to [31,32] and scaled to
- show performance of a 33 MHz system. The benchmarks were compiled
- using highly optimizing 32-bit compilers.
-
- Single Prec. Double Prec. Double Prec.
-
- 3167 4167 3167 4167 387 486
-
- Linpack MFLOPS 1.8 5.0 0.8 3.2 0.4 1.6
- Whetstone kWhet/sec 7470 22700 4900 14000 3290 12300
-
- Note that for the Intel coprocessors, running programs in single
- vs. double precision doesn't provide much of an performance advantage
- since all internal calculations are always done in extended precision.
- Using Weitek coprocessors however, performance nearly doubles when
- switching fron double to single precision. For double precision
- calculations using only basic arithmetic, the Weitek Abacus can
- provide performance at twice the level of the respective Intel
- coprocessor (387/486) clocked at the same speed at most.
-
-
- Speed of various coprocessor instructions measured in clock cycles
- as measured with my program 87TIMES. Error is +/- one clock cycle,
- except for the Intel 80287. Times for the 80287 were determined on
- a system with a 20 MHz 80386 and a 5 MHz Intel 80287. Therefore,
- times may differ from a genuine 80286/287 system, especially for
- those instructions that access an operand in memory. Since the
- times are stated as the number of coprocessor clock cycles used,
- the faster 386 which can execute four clock cycles where the 80287
- executes one clock cycle may decrease memory access times as seen
- by the coprocessor.
-
-
- Intel Intel Cyrix Cyrix ULSI IIT Intel Intel
- i486 RapidCAD 387+ 83D87 83C87 3C87 387DX 80387
-
- FLD1 | 5 7 17 17 17 22 27 35
- FLDZ | 5 7 17 17 17 22 22 29
- FLDPI | 8 9 17 17 17 22 37 45
- FLDLG2 | 8 9 17 17 17 22 37 44
- FLDL2T | 8 9 17 17 17 22 37 44
- FLDL2E | 8 9 17 17 17 22 37 44
- FLDLN2 | 8 9 17 17 17 22 37 45
- FLD ST(0) | 5 7 17 17 17 22 17 24
- FST ST(1) | 4 7 17 17 17 17 17 24
- FSTP ST(0) | 5 7 17 17 17 18 23 25
- FSTP ST(1) | 5 7 17 17 17 17 23 25
- FLD ST(1) | 5 7 17 17 17 22 17 25
- FXCH ST(1) | 5 7 17 17 17 22 22 25
- FILD [Word] | 13 16 35 36 41 46 46 65
- FILD [DWord] | 12 17 30 30 37 37 40 51
- FILD [QWord] | 13 20 40 40 47 47 45 66
- FLD [DWord] | 7 13 30 36 32 37 25 35
- FLD [QWord] | 7 15 40 44 42 47 35 45
- FLD [TByte] | 10 19 52 52 52 57 57 61
- FBLD [TByte] | 83 91 84 66 145 205 70 278
- FIST [Word] | 32 34 43 42 45 54 72 92
- FIST [DWord] | 33 35 48 44 48 57 74 91
- FST [DWord] | 11 14 44 42 49 41 46 47
- FST [QWord] | 16 18 56 54 60 53 58 60
- FISTP [Word] | 32 35 43 42 45 49 73 93
- FISTP [DWord] | 34 37 48 44 48 52 75 88
- FISTP [QWord] | 35 37 57 53 61 63 86 96
- FSTP [DWord] | 12 13 44 42 48 37 46 42
- FSTP [QWord] | 16 17 56 55 60 50 59 57
- FSTP [TByte] | 14 16 59 58 58 56 67 70
- FBSTP [TByte] | 171 175 101 98 126 216 147 535
- FINIT | 18 35 18 18 18 18 19 25
- FCLEX | 8 24 18 18 18 18 19 25
- FCHS | 8 11 17 17 17 17 31 35
- FABS | 6 8 17 17 17 17 28 31
- FXAM | 13 15 17 17 17 17 37 40
- FTST | 5 7 22 17 22 22 32 35
- FSTENV | 68 85 127 127 135 127 162 169
- FLDENV | 45 62 109 109 123 109 122 132
- FSAVE | 160 172 359 359 366 377 467 504
- FRSTOR | 131 206 361 361 369 367 424 453
- FSTSW [mem] | 4 7 16 16 17 16 17 22
- FSTSW AX | 4 7 14 14 14 14 14 17
- FSTCW [mem] | 4 7 16 16 16 16 16 22
- FLDCW [mem] | 5 14 28 28 29 29 29 34
- FADD ST,ST(0) | 8 9 22 17 17 22 27 30
- FADD ST,ST(1) | 9 10 22 17 17 22 22 34
- FADD ST(1),ST | 10 10 22 17 17 22 23 35
- FADDP ST(1),ST | 11 11 22 17 17 22 23 34
- FADD [DWord] | 9 14 30 30 33 32 31 42
- FADD [QWord] | 9 16 40 40 43 42 41 51
- FIADD [Word] | 20 21 36 36 43 43 49 77
- FIADD [DWord] | 20 25 30 30 38 38 43 65
- FSUB ST(1),ST | 10 10 22 17 17 22 23 35
- FSUBR ST(1),ST | 9 10 22 17 20 25 27 35
- FSUBRP ST(1),ST | 10 10 22 17 17 22 23 35
- FSUB [DWord] | 11 14 30 30 32 32 30 41
- FSUB [QWord] | 11 16 40 40 42 43 40 51
- FISUB [Word] | 21 21 36 36 44 43 56 77
- FISUB [DWord] | 21 25 30 30 39 38 43 65
- FMUL ST,ST(1) | 16 17 22 22 22 27 38 56
- FMUL ST(1),ST | 16 17 22 22 22 27 40 60
- FMULP ST(1),ST | 16 17 22 22 22 27 38 59
- FIMUL [Word] | 22 23 36 36 50 43 50 77
- FIMUL [DWord] | 22 25 36 36 45 38 46 73
- FMUL [DWord] | 11 14 36 36 32 38 31 48
- FMUL [QWord] | 14 16 46 46 42 48 41 72
- FDIV ST,ST(0) | 73 74 38 23 52 57 92 95
- FDIV ST,ST(1) | 73 74 42 36 52 57 78 95
- FDIV ST(1),ST | 73 74 42 36 52 57 78 99
- FDIVR ST(1),ST | 73 74 42 36 53 57 77 100
- FDIVRP ST(1),ST | 73 74 42 36 52 57 78 101
- FIDIV [Word] | 84 85 61 54 79 73 105 144
- FIDIV [DWord] | 84 85 54 47 74 68 101 129
- FDIV [DWord] | 73 74 54 48 63 62 78 100
- FDIV [QWord] | 73 74 64 57 72 72 79 113
- FSQRT (0.0) | 26 28 17 17 17 22 27 35
- FSQRT (1.0) | 83 84 72 36 87 57 112 128
- FSQRT (L2T) | 86 87 72 36 87 57 102 133
- FXTRACT (L2T) | 17 17 22 17 32 76 56 68
- FSCALE (PI,5) | 30 31 22 36 47 77 57 80
- FRNDINT (PI) | 31 31 27 19 32 27 47 74
- FPREM (99,PI) | 58 60 102 52 57 52 77 100
- FPREM1(99,PI) | 90 91 102 57 62 52 102 119
- FCOM | 5 7 17 17 27 17 27 34
- FCOMP | 6 7 17 17 27 17 28 35
- FCOMPP | 7 8 17 17 27 22 28 34
- FICOM [Word] | 16 20 36 36 49 37 61 77
- FICOM [DWord] | 18 25 30 30 44 32 48 61
- FCOM [DWord] | 7 14 30 30 33 32 31 35
- FCOM [QWord] | 7 15 40 40 43 42 41 51
- FSIN (0.0) | 25 27 97 17 17 22 37 45
- FSIN (1.0) | 310 314 162 116 492 222 512 593
- FSIN (PI) | 88 90 187 121 67 217 132 155
- FSIN (LG2) | 284 288 84 73 445 184 434 505
- FSIN (L2T) | 299 303 177 121 472 217 452 533
- FCOS (0.0) | 25 27 157 17 22 22 37 44
- FCOS (1.0) | 302 306 107 87 487 212 457 540
- FCOS (PI) | 89 92 257 151 62 222 197 230
- FCOS (LG2) | 300 304 152 106 452 192 502 584
- FCOS (L2T) | 307 311 242 156 467 222 507 598
- FSINCOS (0.0) | 26 29 17 17 22 31 41 54
- FSINCOS (1.0) | 353 357 172 126 492 416 536 637
- FSINCOS (PI) | 105 107 262 161 67 421 226 273
- FSINCOS (LG2) | 340 344 157 116 457 361 531 628
- FSINCOS (L2T) | 347 351 247 166 472 421 536 643
- FPTAN (0.0) | 26 28 17 17 22 31 36 43
- FPTAN (1.0) | 267 269 147 121 537 306 322 392
- FPTAN (PI) | 145 146 227 136 112 306 167 212
- FPTAN (LG2) | 244 246 132 91 502 276 297 363
- FPTAN (L2T) | 247 249 217 136 517 306 297 363
- FPATAN (0.0) | 39 41 27 22 22 27 97 92
- FPATAN (1.0) | 294 298 157 121 372 602 358 433
- FPATAN (PI) | 304 307 192 143 357 422 378 468
- FPATAN (LG2) | 289 293 157 126 362 382 373 447
- FPATAN (L2T) | 304 307 192 141 362 422 373 463
- F2XM1 (0.0) | 26 28 17 17 17 22 37 38
- F2XM1 (LN2) | 209 212 122 86 392 287 297 348
- F2XM1 (LG2) | 204 207 107 76 377 287 292 340
- FYL2X (1.0) | 60 60 42 36 72 92 112 127
- FYL2X (PI) | 294 297 162 111 452 357 393 497
- FYL2X (LG2) | 311 314 162 106 457 337 408 512
- FYL2X (L2T) | 293 296 162 111 437 357 393 496
- FYL2XP1 (LG2) | 334 337 167 101 462 282 433 533
-
-
-
- 80386 + 80386 + 80386 +
- Intel Intel Franke387 TP 6.0 EM87
- 8087 80287 Emulator Emulator Emulator
-
- FSTP ST(0) | 26 54 507 358 2115
- FLD1 | 26 55 481 422 1626
- FLDZ | 21 53 480 416 1646
- FLDPI | 26 55 486 443 1626
- FLDLG2 | 26 56 486 423 1626
- FLDL2T | 26 55 486 440 1626
- FLDL2E | 26 53 486 423 1626
- FLDLN2 | 26 55 486 441 1626
- FLD ST(0) | 31 55 493 362 1851
- FST ST(1) | 26 54 489 355 1931
- FSTP ST(1) | 21 55 507 356 2116
- FLD ST(1) | 26 55 493 362 1852
- FXCH ST(1) | 21 57 497 486 2187
- FILD [Word] | 58 90 667 712 2259
- FILD [DWord] | 64 74 608 812 2164
- FILD [QWord] | 74 93 652 707 2971
- FLD [DWord] | 49 44 633 473 2077
- FLD [QWord] | 54 57 641 524 2336
- FLD [TByte] | 59 45 607 492 2063
- FBLD [TByte] | 309 310 2019 1512 17827
- FIST [Word] | 79 72 854 766 2418
- FIST [DWord] | 84 80 865 518 2325
- FST [DWord] | 89 85 686 441 2200
- FST [QWord] | 99 92 703 516 2481
- FISTP [Word] | 79 80 864 794 2620
- FISTP [DWord] | 79 81 879 541 2523
- FISTP [QWord] | 88 75 904 916 3226
- FSTP [DWord] | 89 75 713 467 2400
- FSTP [QWord] | 93 72 732 538 2678
- FSTP [TByte] | 49 21 685 467 2124
- FBSTP [TByte] | 528 472 3305 1555 27013
- FINIT | 11 10 742 641 1369
- FCLEX | 11 10 440 323 912
- FCHS | 21 54 460 354 1744
- FABS | 21 54 456 349 1738
- FXAM | 21 54 481 380 1551
- FTST | 51 75 585 386 2721
- FSTENV | 54 57 928 519 2104
- FLDENV | 48 50 1125 450 1631
- FSAVE | 214 244 1949 976 2749
- FRSTOR | 209 227 2182 657 2225
- FSTSW [mem] | 28 10 516 401 1189
- FSTSW AX | N/A 55 451 N/A N/A
- FSTCW [mem] | 28 10 506 359 1167
- FLDCW [mem] | 19 47 524 437 1584
- FADD ST,ST(0) | 86 128 643 706 2805
- FADD ST,ST(1) | 85 116 707 808 3093
- FADD ST(1),ST | 92 131 664 812 3146
- FADDP ST(1),ST | 92 129 704 799 3143
- FADD [DWord] | 105 122 874 969 3139
- FADD [QWord] | 115 122 888 1021 3396
- FIADD [Word] | 115 122 940 1211 3330
- FIADD [DWord] | 125 122 882 1297 3215
- FSUB ST(1),ST | 88 130 738 817 3156
- FSUBR ST(1),ST | 96 132 740 868 3004
- FSUBRP ST(1),ST | 99 132 733 805 3301
- FSUB [DWord] | 119 122 918 1018 3127
- FSUB [QWord] | 129 123 932 1070 3632
- FISUB [Word] | 115 123 977 1081 3802
- FISUB [DWord] | 125 125 940 980 4161
- FMUL ST,ST(1) | 145 151 810 1368 3924
- FMUL ST(1),ST | 145 151 817 1377 3962
- FMULP ST(1),ST | 148 168 840 1365 4164
- FIMUL [Word] | 132 151 1039 1517 4039
- FIMUL [DWord] | 141 151 980 1643 3976
- FMUL [DWord] | 125 123 948 1480 3445
- FMUL [QWord] | 175 192 991 1602 4416
- FDIV ST,ST(0) | 201 207 726 1536 9789
- FDIV ST,ST(1) | 203 218 808 1658 10332
- FDIV ST(1),ST | 207 214 825 1655 10342
- FDIVR ST(1),ST | 201 206 819 1806 10213
- FDIVRP ST(1),ST | 201 205 845 1803 10409
- FIDIV [Word] | 237 227 980 1779 11225
- FIDIV [DWord] | 246 227 944 1680 11572
- FDIV [DWord] | 229 226 893 1722 10577
- FDIV [QWord] | 236 227 993 1777 10829
- FSQRT (0.0) | 21 57 512 382 1755
- FSQRT (1.0) | 186 206 1106 2504 37836
- FSQRT (L2T) | 186 207 1398 2467 37925
- FXTRACT (L2T) | 51 56 726 571 3326
- FSCALE (PI,5) | 41 56 817 443 3194
- FRNDINT (PI) | 51 58 808 800 7092
- FPREM (99,PI) | 81 131 1696 941 4098
- FPREM1(99,PI) | N/A N/A 1625 N/A N/A
- FCOM | 56 75 582 483 2799
- FCOMP | 61 92 616 485 2983
- FCOMPP | 61 90 661 476 3198
- FICOM [Word] | 79 77 808 861 3654
- FICOM [DWord] | 89 77 750 964 3684
- FCOM [DWord] | 74 75 741 625 3643
- FCOM [QWord] | 74 76 754 667 3771
- FSIN (0.0) | N/A N/A 639 N/A N/A
- FSIN (1.0) | N/A N/A 4640 N/A N/A
- FSIN (PI) | N/A N/A 2488 N/A N/A
- FSIN (LG2) | N/A N/A 3911 N/A N/A
- FSIN (L2T) | N/A N/A 3767 N/A N/A
- FCOS (0.0) | N/A N/A 740 N/A N/A
- FCOS (1.0) | N/A N/A 4777 N/A N/A
- FCOS (PI) | N/A N/A 2557 N/A N/A
- FCOS (LG2) | N/A N/A 4176 N/A N/A
- FCOS (L2T) | N/A N/A 3905 N/A N/A
- FSINCOS (0.0) | N/A N/A 714 N/A N/A
- FSINCOS (1.0) | N/A N/A 6049 N/A N/A
- FSINCOS (PI) | N/A N/A 4091 N/A N/A
- FSINCOS (LG2) | N/A N/A 5640 N/A N/A
- FSINCOS (L2T) | N/A N/A 5405 N/A N/A
- FPTAN (0.0) | 41 58 752 8381 2324
- FPTAN (1.0) | 581 582 6366 10817 29824
- FPTAN (PI) | 606 587 4388 12410 2300
- FPTAN (LG2) | 516 513 5939 12502 26770
- FPTAN (L2T) | 576 586 5723 12483 2301
- FPATAN (0.0) | 41 55 616 1208 10578
- FPATAN (1.0) | 736 736 1426 13446 34208
- FPATAN (PI) | 206 207 12835 13305 46903
- FPATAN (LG2) | 756 736 12490 13319 41312
- FPATAN (L2T) | 206 204 12922 13364 50149
- F2XM1 (0.0) | 16 56 563 723 1722
- F2XM1 (LN2) | 631 624 4178 11070 33823
- F2XM1 (LG2) | 611 585 4798 11116 32163
- FYL2X (1.0) | 56 57 961 1214 4327
- FYL2X (PI) | 946 961 8987 12858 40148
- FYL2X (LG2) | 1081 1038 8933 12748 46821
- FYL2X (L2T) | 926 886 8982 12712 38986
- FYL2XP1 (LG2) | 1026 1037 10485 11867 44708
-
- The Weitek 3167 and 4167 processors only implement the basic
- arithmetic functions (add, subtract, multiply, divide, square
- root) in hardware. Transcendental functions are implemented
- by means of a software library supplied by Weitek that uses
- the Weitek hardware to approximate the transcendental functions
- with polynomial and rational approximations. The clock cycle
- timings for the transcendental functions are average values,
- since execution time differs with the value of argument. The
- speed of transcendental functions for the 4167 is estimated
- based on the numbers in [31,33], from which this timing
- information has been extracted.
-
-
- Execution time for floating-point operations in clock cycles on
- Weitek coprocessors
-
- Single Precision Double Precision
-
- 3167 4167 3167 4167
-
- ABS 3 2 3 2
- NEG 6 2 6 2
- ADD 6 2 6 2
- SUB 6 2 6 2
- SUBR 6 2 6 2
- MUL 6 2 10 3
- DIVR 38 17 66 31
- SQRT 60 17 118 31
- SIN 146 ~50 292 ~100
- COS 140 ~50 285 ~100
- TAN 188 ~60 340 ~110
- EXP 179 ~60 401 ~130
- LOG 171 ~60 365 ~120
- F->ASCII 1000 N/A 1700 N/A //
- ASCII->F 1100 N/A 1800 N/A //
-
- // rough average of the timings given for different numeric
- formats by Weitek. Note that these conversions routines
- do much more work than the FBLD and FBSTP instructions
- provided by the 80x87 coprocessors. FBLD and FBSTP are
- useful for conversion routines but quite a bit of additional
- code is need for this purpose.
-
-
- Accuracy
-
- The IEEE-754 Standard for Binary Floating-Point Arithmetic [10,11]
- is fully implemented by Intel's 387 coprocessor [17]. Among other
- things, this means that the add, subtract, multiply, divide,
- remainder, and square root operations always deliver the 'exact'
- result. By exact it is meant that the coprocessor always delivers
- the machine number closest to the real result, which may not
- be representable exactly in the available numeric format. The
- 80387 implements the single, double, and double extended formats
- as specified in the standard as well as all functions required
- by it [17]. Note that earlier Intel coprocessors (the 8087 and
- the 80287) comply with a draft version of the standard that differs
- from the final version. These chips came out before the IEEE-754
- standard was finally accepted in 1985. As in the 80387, the basic
- arithmetic in the 8087 and the 80287 is exact in the sense that
- the computed result is always the machine number closest to the
- real result. However, there are some differences regarding certain
- operands like infinities and some operation like the remainder are
- defined differently. Some instructions have been added in the 80387,
- most notably the FSIN and FCOS operations. The argument range for
- some transcendental function has been extended [17]. Note that the
- IEEE-754 standard says nothing about the quality of the implementation
- of transcendental functions like sin, cos, tan, arctan, log. Intel
- uses a modified CORDIC [18,19] technique to compute the transcendental
- functions. Intel claims that maximum error in the 8087, 80287, and
- 80387 for all transcendental functions does not exceeed two bits
- in the mantissa of the double extended format, which features 64
- mantissa bits for an accuracy of approximately 19 decimal places
- [22,23]. This claim has been independently verified by a competing
- vendor [13]. This means that at least 62 of the 64 mantissa bits
- in a transcendental function result are correct.
-
- The Weitek Abacus 3167 and 4167 are 'mostly compatible' with
- IEEE-754 [31,32,33]. It supports the single precision and double
- precision numeric formats formats described in the standard as
- well as the four rounding modes required by it. However, due to
- the need for extremely high speed operation, some of the finer
- points of IEEE-754 have not been implemented. One of the most
- notable omissions is the missing support for denormal numbers.
- Denormals are always flushed to zero.
-
- The 387 clone makers claim 100% compatibility with Intel's 80387.
- So one would expect the same accuracy from their chips. For example,
- on the packaging of the IIT 3C87 it says that ".. the requirements
- of ANSI/IEEE standards are fulfilled and exceeded". Cyrix states
- that their 83D87 complies fully with the IEEE-754 standard [12].
- Cyrix delivers with their copocessors some diagnostic software.
- This includes the program IEEETEST which is based on the IEEE test
- vectors from the Ph.D. thesis of Jerome T. Coonen [9]. A test using
- the IEEE test vectors has also been included into the RUNDIAG
- program on the Intel RapidCAD diagnostic disk. Rather than performing
- random tests, the test vectors check specific cases that may
- be hard to get right. Each test vector specifies the operation
- to be performed, the operands, precision and rounding mode to be
- used, and the result (including flags set) to be expected according
- to IEEE-754. I ran IEEETEST on all the available coprocessors/ FPUs.
- The Intel 486, Intel RapidCAD, Intel 387, Intel 387DX, Cyrix 83D87,
- and the Cyrix 387+ passed with no errors. The ULSI 83C87 showed
- some minor flaws in the FCOM, FDIV, FMUL, and FSCALE operations,
- getting flag errors in about 1% of the tested cases, but no
- computational errors. However, for the IIT 3C87, the IEEETEST
- program showed flag *and* some computational errors (that is, wrong
- results) for all tested operations except FXTRACT and FCHS. The Intel
- 80287 shows numerous errors, but this it not surprising, since the
- 80287 does not comply with IEEE-754 but with an earlier draft of that
- standard, so it does some thing differently than required by the final
- version of the standard.
-
- Although IEEETEST is written in Turbo Pascal, the coprocessor
- emulator in the TP 6.0 library could not be tested since IEEETEST
- was compiled with the $E- switch excluding the emulator from
- program code. The public domain emulator EM87 could be tested, but
- hung in the last test which checks the implementation of the
- remainder operation. This is probably caused by some bug in the
- emulation of the FPREM instruction tested in this test. It is
- interesting to note how the error profile of EM87 matches exactly
- that of the Intel 80287, so it can be assumed that EM87 is a very
- good emulation of the 80287. The Franke387 V2.4 emulator hung in
- the division test quite early in IEEETEST. The tests performed
- up to the division test reported several errors.
-
-
-
- Explanatory text printed at the start of the IEEETEST program:
-
- JT Coonen's 1984 UC Berkeley Ph.D. thesis centers around his
- activities as a member of the floating-point working group that
- defined the IEEE 754-1985 Standard for Binary Floating-Point
- Arithmetic. Appendix C of his thesis presents FPTEST, a Pascal
- program written by J Thomas and JT Coonen. IEEETEST is a port of
- FPTEST and runs on PCs whose math coprocessor accepts 80387
- compatible floating-point instructions.
-
- IEEETEST reads test vectors from the file TESTVECS and compares
- the answer returned by the math coprocessor with the answer listed
- in the test vector. If these answers differ an 'F' is displayed,
- otherwise a '.'is displayed. Answers can differ due to two types
- of failures: numeric failures or flag failures. Numeric failures
- occur when the computed answer has the wrong value. Flag failures
- occur when the status (invalid operation, divide by zero, underflow,
- overflow, inexact) is incorrectly identified.
-
- TESTVECS is the concatenation of unmodified versions of all the
- test vectors distributed by UC Berkeley. The test data base is
- copyrighted by UC Berkeley (1985) and is being distributed with
- their permission. FPTEST and the test data base can be obtained
- by asking for 'IEEE-754 Test Vector' from UC Berkeley, Electrical
- Engineering and Computer Science, Industrial Liaison Program,
- 479 Corey Hall, Berkeley, CA, 94720 (415)643-6687.
-
- The initial version of this test data base for the proposed IEEE
- 754 binary floating-point standard (draft 8.0) was developed for
- Zilog, Inc. and was donated to the floating-point working group
- for dissemination. Errors in or additions to the distributed data
- base should be reported to the agency of distribution, with copies
- to Zilog, Inc., 1315 Dell Avenue, Campbell, CA, 95008.
-
-
- IEEETEST output for Intel 80387, Intel 387DX, Intel 486,
- Cyrix 83D87, Cyrix 387+, RapidCAD
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 3528 0 | 0 0 0 | 0 0 0
- Comparison C | 4320 0 | 0 0 0 | 0 0 0
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4311 0 | 0 0 0 | 0 0 0
- Fraction Part F | 624 0 | 0 0 0 | 0 0 0
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3978 0 | 0 0 0 | 0 0 0
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 2832 0 | 0 0 0 | 0 0 0
- Round to Integer I | 558 0 | 0 0 0 | 0 0 0
- Scalb S | 948 0 | 0 0 0 | 0 0 0
- Square Root V | 744 0 | 0 0 0 | 0 0 0
- Subtraction - | 3528 0 | 0 0 0 | 0 0 0
- Remainder % | 2984 0 | 0 0 0 | 0 0 0
- Totals | 31235 0 |
-
-
- IEEETEST output for ULSI 83C87
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 3528 0 | 0 0 0 | 0 0 0
- Comparison C | 4312 8 | 0 0 0 | 0 0 8
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4250 61 | 0 0 0 | 28 28 5
- Fraction Part F | 624 0 | 0 0 0 | 0 0 0
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3936 42 | 0 0 0 | 19 19 4
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 2828 4 | 0 0 0 | 0 0 4
- Round to Integer I | 558 0 | 0 0 0 | 0 0 0
- Scalb S | 930 18 | 0 0 0 | 6 6 6
- Square Root V | 744 0 | 0 0 0 | 0 0 0
- Subtraction - | 3528 0 | 0 0 0 | 0 0 0
- Remainder % | 2984 0 | 0 0 0 | 0 0 0
- Totals | 31102 133 |
-
-
- IEEETEST output for IIT 3C87
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 200 16 | 0 0 16 | 0 0 0
- Addition + | 3336 192 | 0 0 128 | 0 0 96
- Comparison C | 4224 96 | 0 0 96 | 0 0 0
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 4159 152 | 0 0 124 | 0 0 116
- Fraction Part F | 600 24 | 0 0 24 | 0 0 24
- Logb L | 960 0 | 0 0 0 | 0 0 0
- Multiplication * | 3702 276 | 0 0 248 | 0 0 100
- Negation - | 200 16 | 0 0 16 | 0 0 0
- Next After N | 2248 584 | 0 0 584 | 0 0 168
- Round to Integer I | 542 16 | 0 0 4 | 0 0 16
- Scalb S | 874 74 | 5 5 44 | 8 8 20
- Square Root V | 688 56 | 0 0 56 | 0 0 56
- Subtraction - | 3336 192 | 0 0 128 | 0 0 96
- Remainder % | 2844 140 | 0 0 140 | 0 0 116
- Totals | 29401 1834 |
-
-
- IEEETEST output for Intel 80287 run together with a 80386 CPU
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 2886 642 | 16 16 112 | 174 174 174
- Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 3777 534 | 18 18 37 | 169 169 165
- Fraction Part F | 552 72 | 24 24 24 | 24 24 24
- Logb L | 900 60 | 12 12 12 | 20 20 20
- Multiplication * | 2944 1034 | 105 105 197 | 303 303 231
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 348 2484 | 768 768 768 | 504 504 526
- Round to Integer I | 546 12 | 0 0 0 | 4 4 4
- Scalb S | 663 285 | 45 43 26 | 102 98 46
- Square Root V | 720 24 | 4 4 4 | 8 8 8
- Subtraction - | 2886 642 | 16 16 112 | 174 174 174
- Remainder % | 708 2276 | 768 768 560 | 216 216 216
- Totals | 18850 12385 |
-
-
- IEEETEST output for EM87 coprocessor emulator run on a Intel 386 CPU
-
- IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended
- | TESTS | numeric TYPE OF FAILURE flag
- Operation Code | Passed Failed | S D E | S D E
- ----------------------------------------------------------------------
- Absolute Value A | 216 0 | 0 0 0 | 0 0 0
- Addition + | 2886 642 | 16 16 112 | 174 174 174
- Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332
- Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0
- Division / | 3777 534 | 18 18 37 | 169 169 165
- Fraction Part F | 552 72 | 24 24 24 | 24 24 24
- Logb L | 900 60 | 12 12 12 | 20 20 20
- Multiplication * | 2944 1034 | 105 105 197 | 303 303 231
- Negation - | 216 0 | 0 0 0 | 0 0 0
- Next After N | 348 2484 | 768 768 768 | 504 504 526
- Round to Integer I | 546 12 | 0 0 0 | 4 4 4
- Scalb S | 663 285 | 45 43 26 | 102 98 46
- Square Root V | 720 24 | 4 4 4 | 8 8 8
- Subtraction - | 2886 642 | 16 16 112 | 174 174 174
-
-
- To complement the checks done by IEEETEST I wrote some short
- programs DENORMTS, RCTRL, PCTRL in Turbo Pascal 6.0 that test
- the following features:
-
- 1. support for denormals in all precisions (single, double, extended)
- 2. support for the four IEEE rounding modes (up, down, nearest, chop)
- 3. support for precision control
-
- Note that 1) and 2) are required for IEEE conformance, while 3)
- is required for compatibility with Intel's coprocessors. Precision
- control forces the results of the FADD, FSUB, FMUL, FDIV, and FSQRT
- instruction to be rounded to the specified precision (single, double,
- double extended). This feature is provided to obtain compatibility
- with certain programming languages [17]. By specifying lower
- precision, one effectively nullifies the advantages of extended
- precision intermediate results. The programs that test precision
- control and rounding control are designed to return a different
- result for each of the modes for the same sequence of operation.
- The source code of the programs can be found in appendix A. The
- Intel 8087 and 80287 were not tested with DENORMTS since Turbo
- Pascal does not support extended precision denormals on 8087/80287
- processors, so the denormal test fails anyway. The 8087 and 287
- pass the RCTRL and PCTRL tests, though.
-
-
- These are the results for the Intel 387, Intel 387DX, Intel 486,
- Intel RapidCAD, Cyrix 83D87, Cyrix 387+, and the EM87 emulator
- (on a 80386 machine)
-
- Precision Control SINGLE 1.13311278820037842E+0000
- DOUBLE 1.23456789006442125E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals supported
- EXTENDED denormal prints as: 1.31640625000000E-4934
- Denormal should be printed as 1.3164...E-4934
-
-
- These are the results for the ULSI 83C87
-
- Precision Control SINGLE 1.23456789012337585E+0000
- DOUBLE 1.23456789012337585E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals supported
- EXTENDED denormal prints as: 1.31640625000000E-4934
- Denormal should be printed as 1.3164...E-4934
-
-
- These are the results for the IIT 3C87
-
- Precision Control SINGLE 1.13311278820037842E+0000
- DOUBLE 1.23456789006442125E+0000
- EXTENDED 1.23456789012337585E+0000
-
- Rounding Control NEAREST -1.23427629010100635E+0100
- DOWN -1.23427623555772409E+0100
- UP -1.23457760966801097E+0100
- CHOP -1.23397493540770643E+0100
-
- Denormal support
-
- SINGLE denormals supported
- SINGLE denormal prints as: 4.60943116855005E-0041
- Denormal should be printed as 4.60943...E-0041
-
- DOUBLE denormals supported
- DOUBLE denormal prints as: 8.75000000000016E-0311
- Denormal should be printed as 8.75...E-0311
-
- EXTENDED denormals not supported
-
-
- These are the results for the TP 6.0 coprocessor emulator:
-
- Precision Control SINGLE 1.23456789012351396E+0000
- DOUBLE 1.23456789012351396E+0000
- EXTENDED 1.23456789012351396E+0000
-
- Rounding Control NEAREST -1.23457766383395931E+0100
- DOWN -1.23457766383395931E+0100
- UP -1.23457766383395931E+0100
- CHOP -1.23457766383395931E+0100
-
- Denormal support
-
- SINGLE denormals not supported
- DOUBLE denormals not supported
- EXTENDED denormals not supported
-
-
- The test results show that the IIT 3C87 does not conform to the
- IEEE-754 floating-point standard in that it does not support
- denormals in double extended precision. The ULSI 83C87 is not
- Intel 387 compatible in that it does not support precision control,
- but allways uses double extended precision. The TP 6.0 emulator
- supports neither precision control, rounding control nor support
- for any denormals. In addition, its basic arithmetic operations
- do not seem to conform to the IEEE standard as the results of
- the test programs differ from that of any result computed by a
- coprocessor for any mode.
-
-
- With regard to the accuracy of transcendental functions, Cyrix
- claims that the relative error of the transcendental functions
- on the 83D87 never exceeds 0.5 units in the last place (0.5 ULP)
- of the double extended format [13]. This means that the maximum
- relative error is below 2**-64, while Intel's published error
- limit is 2**-62. While Intel uses a modified CORDIC algorithm
- [18,19] to compute the transcendental functions, Cyrix uses
- rational approximations that utilize a very fast array multiplier.
- For an explanation why this approach is superior to CORDIC with
- todays technology, see [61]. Also, Cyrix uses an internal 75 bit
- data path for the mantissa [15], so intermediate computations in
- the generation of transcendental function values will enjoy some
- additional accuracy over the 64 bits provided by the double
- extended format. Using 75 mantissa bits also provides an advantage
- over other coprocessors like the Intel 387DX and ULSI 83C87 which
- use only a 68 bit data path for the mantissa [58,59]. Note that a
- maximum relative error of 0.5 ULP for the Cyrix coprocessor does
- not mean that it returns the 'exact' result (machine number closest
- to infinitely precise result) all the time. Just consider the case
- where the infinitely precise result of a transcendental function
- falls nearly half way between two machine numbers. A relative error
- of 0.5 ULP can cause the result to be either of the numbers after
- rounding, depending on the direction of the error. But the 83D87
- should deliver results that never differ from the 'exact' result
- by more than one ULP. Cyrix also claims that its transcendental
- functions satisfy the monotonicity criterion [13], a claim not
- made by any of the competitors. Monotonicity means that for all
- x1 > x2, it always follows that f(x1) >= f(x2) for an increasing
- function like sin on [0..pi/4]. Likewise, for a decreasing
- function like cos on [0..pi/4], for all x1 > x2, it follows that
- f(x1) <= f(x2).
-
- The Weitek Abacus 3167 and 4167 implement only the basic arithmetic
- operations (add, subtract, negate, multiply, divide, square root)
- in hardware. Transcendental functions are provided via a software
- library provided by Weitek. For these library functions Weitek
- claims a maximum relative error of 5 ULPs [31,33] (ULP = Unit in
- the Last Place, numeric weight of the least significant mantissa
- bit). This means that the last three bits in the mantissa of a
- double precision result can be wrong. Note that the Intel 387 and
- compatible math coprocessors generate the transcendental functions
- with a small relative error with regard to the _extended double
- precision_ format. Thus, when rounded to double precision, their
- function values are nearly always 'exact'. 387 type coprocessors
- have superior accuracy when compared with Weitek's coprocesssors.
-
- The test diskette distributed with early versions of the
- Cyrix 83D87 contained a program TRANCK that checks the
- accuracy of the transcendental functions in the coprocessor
- against a more precise software arithmetic [16]. I used this
- program to compare the accuracy of the transcendental functions
- on those 287/387/486 coprocessors/FPUs available to me. As TRANCK
- will not accept negative numbers as intervall limits, I tested
- each function on an intervall along the positive x-axis. The
- functions tested are F2XM1 (2**x-1), FSIN (sine), FCOS (cosine),
- FPTAN (tangent), FPATAN (arctangent), FYL2X (y * log2 (x)),
- and FYL2XP1 (y * log2 (x+1)). These are all the transcendental
- functions implemented on the 80387. Note that the square root
- (FSQRT) is *not* a transcendental function. For every function,
- 100,000 arguments were evaluated. The arguments were uniformally
- distributed within the intervall tested. The EM87 emulator could
- not be checked with TRANCK, since the multiple precision package
- in TRANCK would always return with an error message immediately.
- However, the Franke387 could be tested and
-
-
- Test results for accuracy of transcendental functions for double
- extended precision as returned by the program TRANCK. 100,000
- trials per function.
-
- %wrong is the percentage of results that differ from the 'exact'
- result (infinitely precise result rounded to 64 bits)
- ULP_hi is the number of results where the returned result was
- greater than the 'exact' (correctly rounded) result by
- one ULP (the numeric weight of the last mantissa bit,
- 2**-64 to 2**-63 depending of the size of the number).
- ULPs_hi is the number of results where the returned result was
- greater than the 'exact' result by two or more ULPs.
- ULP_lo is the number of results where the returned result was
- smaller than the 'exact' (correctly rounded) result by
- one ULP (the numeric weight of the last mantissa bit,
- 2**-64 to 2**-63 depending of the size of the number).
- ULPs_lo is the number of results where the returned result was
- smaller than the 'exact' result by two or more ULPs.
- max ULP err is the maximum deviation of a returned result from the
- 'exact' answer expressed in ULPs.
-
-
- Franke387 V2.4 emulator
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 39.042 25301 708 13029 4 2
- COS 0,pi/4 75.714 49827 25887 0 0 3
- TAN 0,pi/4 76.976 14230 10029 24323 28394 9
- ATAN 0,1 55.826 26028 1529 24044 4225 4
- 2XM1 0,0.5 96.717 0 0 47910 48807 5
- YL2XP1 0,sqrt(2)-1 93.007 578 9 27416 65004 8
- YL2X 0.1,10 62.252 16817 4712 37082 3641 2953
-
-
- INTEL 80287
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 N/A N/A N/A N/A N/A N/A
- COS 0,pi/4 N/A N/A N/A N/A N/A N/A
- TAN 0,pi/4 37.001 18756 524 17405 316 2
- ATAN 0,1 9.666 6065 0 3601 0 1
- 2XM1 0,0.5 19.920 0 0 19920 0 1
- YL2XP1 0,sqrt(2)-1 7.780 868 0 6912 0 1
- YL2X 0.1,10 1.287 723 0 564 0 1
-
-
- INTEL 387
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 28.872 2467 0 26392 13 2
- COS 0,pi/4 27.213 27169 35 9 0 2
- TAN 0,pi/4 10.532 441 0 10091 0 1
- ATAN 0,1 7.088 2386 0 4691 1 2
- 2XM1 0,0.5 32.024 0 0 32024 0 1
- YL2XP1 0,sqrt(2)-1 22.611 3461 0 19150 0 1
- YL2X 0.1,10 13.020 6508 0 6512 0 1
-
-
- INTEL 387DX
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 28.873 2467 0 26393 13 2
- COS 0,pi/4 27.121 27090 22 9 0 2
- TAN 0,pi/4 10.711 457 0 10254 0 1
- ATAN 0,1 7.088 2386 0 4691 1 2
- 2XM1 0,0.5 32.024 0 0 32024 0 1
- YL2XP1 0,sqrt(2)-1 22.611 3461 0 19150 0 1
- YL2X 0.1,10 13.020 6508 0 6512 0 1
-
-
- ULSI 83C87
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 35.530 4989 6 30238 297 2
- COS 0,pi/4 43.989 11193 675 31393 728 2
- TAN 0,pi/4 48.539 18880 1015 26349 2295 3
- ATAN 0,1 20.858 62 0 20796 0 1
- 2XM1 0,0.5 21.257 4 0 21253 0 1
- YL2XP1 0,sqrt(2)-1 27.893 9446 0 18213 234 2
- YL2X 0.1,10 13.603 9816 0 3787 0 1
-
-
- IIT 3C87
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 18.650 11171 0 7479 0 1
- COS 0,pi/4 7.700 3024 0 4676 0 1
- TAN 0,pi/4 20.973 9681 0 11291 1 2
- ATAN 0,1 19.280 13186 0 6094 0 1
- 2XM1 0,0.5 25.660 17570 0 8090 0 1
- YL2XP1 0,sqrt(2)-1 45.830 23503 1896 19654 777 3
- YL2X 0.1,10 10.888 5638 357 4845 48 3
-
-
- CYRIX 83D87
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 1.554 1015 0 539 0 1
- COS 0,pi/4 0.925 143 0 782 0 1
- TAN 0,pi/4 4.147 881 0 3266 0 1
- ATAN 0,1 0.656 229 0 427 0 1
- 2XM1 0,0.5 2.628 1433 0 1194 0 1
- YL2XP1 0,sqrt(2)-1 3.242 825 0 2417 0 1
- YL2X 0.1,10 0.931 256 0 675 0 1
-
-
- CYRIX 387+
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 1.486 864 0 622 0 1
- COS 0,pi/4 2.072 12 0 2060 0 1
- TAN 0,pi/4 0.602 63 0 539 0 1
- ATAN 0,1 0.384 12 0 372 0 1
- 2XM1 0,0.5 1.985 27 0 1958 0 1
- YL2XP1 0,sqrt(2)-1 3.662 1705 0 1957 0 1
- YL2X 0.1,10 0.764 367 0 397 0 1
-
-
- INTEL RapidCAD, Intel 486
- max
- funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err
-
- SIN 0,pi/4 16.991 1517 0 15474 0 1
- COS 0,pi/4 9.003 7603 0 1400 0 1
- TAN 0,pi/4 10.532 441 0 10091 0 1
- ATAN 0,1 7.078 2386 0 4691 1 2
- 2XM1 0,0.5 32.025 0 0 32025 0 1
- YL2XP1 0,sqrt(2)-1 21.800 533 0 21267 0 1
- YL2X 0.1,10 3.894 1879 0 2015 0 1
-
-
- The test results above indicate that all 80x87 compatibles do not
- exceed Intel's stated error bound of 3 ULPs for the transcendental
- functions. However, some coprocessors are more accurate than others.
- Rating the coprocessors according to the accuracy of their trans-
- cendental functions gives the following list (highest accuracy
- first): Cyrix 387+, Cyrix 83D87, Intel 486, Intel RapidCAD, Intel
- 80287(!), Intel 387DX, Intel 80387, IIT 3C87, ULSI 83C87. The tests
- also show that the problems with excessive inaccuracy of the trans-
- cendental functions in early versions of the IIT coprocessors with
- errors of up to 8 ULPs [8] have been eliminated. According to [56],
- certain problems with the FPATAN instruction on the IIT 3C87 occuring
- under the UNIX version of AutoCAD have been corrected in June, 1990.
- The Franke387 has acceptable accuracy for the FSIN, FCOS, and FPATAN
- instructions, taking into consideration that according to its
- documentation, Franke387 uses only 64 bits of precision for the
- intermediate results, while coprocessorsa typically use 68 bits
- and more. However, the larger error in the FPTAN, F2XM1, FYL2XP1,
- and especially the FYL2X operations show that the emulator doesn't
- use state of the art algorithms, which ensure an error of only a
- very few ULPs even if no extra precise intermediate results are
- available.
-