NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / sys / intel / 1534 < prev next >

Wrap

Internet Message Format | 1992-08-19 | 58.5 KB

Path: sparky!uunet!kithrup!hoptoad!pacbell.com!mips!swrinde!elroy.jpl.nasa.gov!usc!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa) Newsgroups: comp.sys.intel Subject: What you always wanted to know about math coprocessors for 80x86 3/4 Message-ID: <16tnqrINNcs1@iraul1.ira.uka.de> Date: 19 Aug 92 15:02:51 GMT Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt. Lines: 971 NNTP-Posting-Host: irav1.ira.uka.de X-News-Reader: VMS NEWS 1.23 HW configuration for test of 387 coprocessors and Intel RapidCAD: System A: Motherboard with Forex chip set, 128 kB CPU Cache, 8 MB RAM HW configuration for test of 486 FPU (extra fan for 40 MHz operation): System B: Motherboard with SIS chip set, 256 kB CPU Cache, 8 MB RAM ## EM87 V1.2 by Ron Kimball is a public domain coprocessor emulator that loads as a TSR. It uses INT 7 traps emitted by 80286, 80386 systems with no coprocessor upon encountering coprocessor instructions to catch coprocessor instructions and emulate them. Whetstone and Savage benchmarks for this test were compiled with the original TP 6.0 library, as EM87 chokes on the 387 specific FSIN and FCOS instructions used in my own library if a 387 is detected. Obviously EM87 identifies itself as a 387, but has no support for 387 specific instructions. $$ Franke387 is a commercial 387 emulator that is also available in a shareware version. For this test, shareware version V2.4 was used. Franke387 unlike many other emulators supports all 387 instructions. It is loaded as a device driver and uses INT 7 to trap coprocessor instructions. %% These benchmarks were run using the built-in coprocessor emulators of the TP 6.0 and the MS FORTRAN 5.0 run-time libraries. ?? The 3C87 specific F4X4 instruction was used in the vector trans- formation benchmark. ++ Older motherboard with no chip set (discrete logic), no CPU cache, 16 MB RAM && System A, CPU cache disabled via extended set-up, turbo-switch set to half speed (that is, 20 MHz) !! 80386 @ 20 MHz / Intel 80287 @ 5 MHz, no CPU cache, 4 MB RAM due to the fast CPU used here, performance figures are somewhat higher than can be expected for a 80286/287 combination, except for the PEAKFLOP benchmark, which is basically coprocessor limited ** 8086/8087 system with 640 kB RAM Since neither a Weitek coprocessor nor a compiler that generates code for the Weitek chips were available, performance data for the Weitek Abacus are given here according to [31,32] and scaled to show performance of a 33 MHz system. The benchmarks were compiled using highly optimizing 32-bit compilers. Single Prec. Double Prec. Double Prec. 3167 4167 3167 4167 387 486 Linpack MFLOPS 1.8 5.0 0.8 3.2 0.4 1.6 Whetstone kWhet/sec 7470 22700 4900 14000 3290 12300 Note that for the Intel coprocessors, running programs in single vs. double precision doesn't provide much of an performance advantage since all internal calculations are always done in extended precision. Using Weitek coprocessors however, performance nearly doubles when switching fron double to single precision. For double precision calculations using only basic arithmetic, the Weitek Abacus can provide performance at twice the level of the respective Intel coprocessor (387/486) clocked at the same speed at most. Speed of various coprocessor instructions measured in clock cycles as measured with my program 87TIMES. Error is +/- one clock cycle, except for the Intel 80287. Times for the 80287 were determined on a system with a 20 MHz 80386 and a 5 MHz Intel 80287. Therefore, times may differ from a genuine 80286/287 system, especially for those instructions that access an operand in memory. Since the times are stated as the number of coprocessor clock cycles used, the faster 386 which can execute four clock cycles where the 80287 executes one clock cycle may decrease memory access times as seen by the coprocessor. Intel Intel Cyrix Cyrix ULSI IIT Intel Intel i486 RapidCAD 387+ 83D87 83C87 3C87 387DX 80387 FLD1 | 5 7 17 17 17 22 27 35 FLDZ | 5 7 17 17 17 22 22 29 FLDPI | 8 9 17 17 17 22 37 45 FLDLG2 | 8 9 17 17 17 22 37 44 FLDL2T | 8 9 17 17 17 22 37 44 FLDL2E | 8 9 17 17 17 22 37 44 FLDLN2 | 8 9 17 17 17 22 37 45 FLD ST(0) | 5 7 17 17 17 22 17 24 FST ST(1) | 4 7 17 17 17 17 17 24 FSTP ST(0) | 5 7 17 17 17 18 23 25 FSTP ST(1) | 5 7 17 17 17 17 23 25 FLD ST(1) | 5 7 17 17 17 22 17 25 FXCH ST(1) | 5 7 17 17 17 22 22 25 FILD [Word] | 13 16 35 36 41 46 46 65 FILD [DWord] | 12 17 30 30 37 37 40 51 FILD [QWord] | 13 20 40 40 47 47 45 66 FLD [DWord] | 7 13 30 36 32 37 25 35 FLD [QWord] | 7 15 40 44 42 47 35 45 FLD [TByte] | 10 19 52 52 52 57 57 61 FBLD [TByte] | 83 91 84 66 145 205 70 278 FIST [Word] | 32 34 43 42 45 54 72 92 FIST [DWord] | 33 35 48 44 48 57 74 91 FST [DWord] | 11 14 44 42 49 41 46 47 FST [QWord] | 16 18 56 54 60 53 58 60 FISTP [Word] | 32 35 43 42 45 49 73 93 FISTP [DWord] | 34 37 48 44 48 52 75 88 FISTP [QWord] | 35 37 57 53 61 63 86 96 FSTP [DWord] | 12 13 44 42 48 37 46 42 FSTP [QWord] | 16 17 56 55 60 50 59 57 FSTP [TByte] | 14 16 59 58 58 56 67 70 FBSTP [TByte] | 171 175 101 98 126 216 147 535 FINIT | 18 35 18 18 18 18 19 25 FCLEX | 8 24 18 18 18 18 19 25 FCHS | 8 11 17 17 17 17 31 35 FABS | 6 8 17 17 17 17 28 31 FXAM | 13 15 17 17 17 17 37 40 FTST | 5 7 22 17 22 22 32 35 FSTENV | 68 85 127 127 135 127 162 169 FLDENV | 45 62 109 109 123 109 122 132 FSAVE | 160 172 359 359 366 377 467 504 FRSTOR | 131 206 361 361 369 367 424 453 FSTSW [mem] | 4 7 16 16 17 16 17 22 FSTSW AX | 4 7 14 14 14 14 14 17 FSTCW [mem] | 4 7 16 16 16 16 16 22 FLDCW [mem] | 5 14 28 28 29 29 29 34 FADD ST,ST(0) | 8 9 22 17 17 22 27 30 FADD ST,ST(1) | 9 10 22 17 17 22 22 34 FADD ST(1),ST | 10 10 22 17 17 22 23 35 FADDP ST(1),ST | 11 11 22 17 17 22 23 34 FADD [DWord] | 9 14 30 30 33 32 31 42 FADD [QWord] | 9 16 40 40 43 42 41 51 FIADD [Word] | 20 21 36 36 43 43 49 77 FIADD [DWord] | 20 25 30 30 38 38 43 65 FSUB ST(1),ST | 10 10 22 17 17 22 23 35 FSUBR ST(1),ST | 9 10 22 17 20 25 27 35 FSUBRP ST(1),ST | 10 10 22 17 17 22 23 35 FSUB [DWord] | 11 14 30 30 32 32 30 41 FSUB [QWord] | 11 16 40 40 42 43 40 51 FISUB [Word] | 21 21 36 36 44 43 56 77 FISUB [DWord] | 21 25 30 30 39 38 43 65 FMUL ST,ST(1) | 16 17 22 22 22 27 38 56 FMUL ST(1),ST | 16 17 22 22 22 27 40 60 FMULP ST(1),ST | 16 17 22 22 22 27 38 59 FIMUL [Word] | 22 23 36 36 50 43 50 77 FIMUL [DWord] | 22 25 36 36 45 38 46 73 FMUL [DWord] | 11 14 36 36 32 38 31 48 FMUL [QWord] | 14 16 46 46 42 48 41 72 FDIV ST,ST(0) | 73 74 38 23 52 57 92 95 FDIV ST,ST(1) | 73 74 42 36 52 57 78 95 FDIV ST(1),ST | 73 74 42 36 52 57 78 99 FDIVR ST(1),ST | 73 74 42 36 53 57 77 100 FDIVRP ST(1),ST | 73 74 42 36 52 57 78 101 FIDIV [Word] | 84 85 61 54 79 73 105 144 FIDIV [DWord] | 84 85 54 47 74 68 101 129 FDIV [DWord] | 73 74 54 48 63 62 78 100 FDIV [QWord] | 73 74 64 57 72 72 79 113 FSQRT (0.0) | 26 28 17 17 17 22 27 35 FSQRT (1.0) | 83 84 72 36 87 57 112 128 FSQRT (L2T) | 86 87 72 36 87 57 102 133 FXTRACT (L2T) | 17 17 22 17 32 76 56 68 FSCALE (PI,5) | 30 31 22 36 47 77 57 80 FRNDINT (PI) | 31 31 27 19 32 27 47 74 FPREM (99,PI) | 58 60 102 52 57 52 77 100 FPREM1(99,PI) | 90 91 102 57 62 52 102 119 FCOM | 5 7 17 17 27 17 27 34 FCOMP | 6 7 17 17 27 17 28 35 FCOMPP | 7 8 17 17 27 22 28 34 FICOM [Word] | 16 20 36 36 49 37 61 77 FICOM [DWord] | 18 25 30 30 44 32 48 61 FCOM [DWord] | 7 14 30 30 33 32 31 35 FCOM [QWord] | 7 15 40 40 43 42 41 51 FSIN (0.0) | 25 27 97 17 17 22 37 45 FSIN (1.0) | 310 314 162 116 492 222 512 593 FSIN (PI) | 88 90 187 121 67 217 132 155 FSIN (LG2) | 284 288 84 73 445 184 434 505 FSIN (L2T) | 299 303 177 121 472 217 452 533 FCOS (0.0) | 25 27 157 17 22 22 37 44 FCOS (1.0) | 302 306 107 87 487 212 457 540 FCOS (PI) | 89 92 257 151 62 222 197 230 FCOS (LG2) | 300 304 152 106 452 192 502 584 FCOS (L2T) | 307 311 242 156 467 222 507 598 FSINCOS (0.0) | 26 29 17 17 22 31 41 54 FSINCOS (1.0) | 353 357 172 126 492 416 536 637 FSINCOS (PI) | 105 107 262 161 67 421 226 273 FSINCOS (LG2) | 340 344 157 116 457 361 531 628 FSINCOS (L2T) | 347 351 247 166 472 421 536 643 FPTAN (0.0) | 26 28 17 17 22 31 36 43 FPTAN (1.0) | 267 269 147 121 537 306 322 392 FPTAN (PI) | 145 146 227 136 112 306 167 212 FPTAN (LG2) | 244 246 132 91 502 276 297 363 FPTAN (L2T) | 247 249 217 136 517 306 297 363 FPATAN (0.0) | 39 41 27 22 22 27 97 92 FPATAN (1.0) | 294 298 157 121 372 602 358 433 FPATAN (PI) | 304 307 192 143 357 422 378 468 FPATAN (LG2) | 289 293 157 126 362 382 373 447 FPATAN (L2T) | 304 307 192 141 362 422 373 463 F2XM1 (0.0) | 26 28 17 17 17 22 37 38 F2XM1 (LN2) | 209 212 122 86 392 287 297 348 F2XM1 (LG2) | 204 207 107 76 377 287 292 340 FYL2X (1.0) | 60 60 42 36 72 92 112 127 FYL2X (PI) | 294 297 162 111 452 357 393 497 FYL2X (LG2) | 311 314 162 106 457 337 408 512 FYL2X (L2T) | 293 296 162 111 437 357 393 496 FYL2XP1 (LG2) | 334 337 167 101 462 282 433 533 80386 + 80386 + 80386 + Intel Intel Franke387 TP 6.0 EM87 8087 80287 Emulator Emulator Emulator FSTP ST(0) | 26 54 507 358 2115 FLD1 | 26 55 481 422 1626 FLDZ | 21 53 480 416 1646 FLDPI | 26 55 486 443 1626 FLDLG2 | 26 56 486 423 1626 FLDL2T | 26 55 486 440 1626 FLDL2E | 26 53 486 423 1626 FLDLN2 | 26 55 486 441 1626 FLD ST(0) | 31 55 493 362 1851 FST ST(1) | 26 54 489 355 1931 FSTP ST(1) | 21 55 507 356 2116 FLD ST(1) | 26 55 493 362 1852 FXCH ST(1) | 21 57 497 486 2187 FILD [Word] | 58 90 667 712 2259 FILD [DWord] | 64 74 608 812 2164 FILD [QWord] | 74 93 652 707 2971 FLD [DWord] | 49 44 633 473 2077 FLD [QWord] | 54 57 641 524 2336 FLD [TByte] | 59 45 607 492 2063 FBLD [TByte] | 309 310 2019 1512 17827 FIST [Word] | 79 72 854 766 2418 FIST [DWord] | 84 80 865 518 2325 FST [DWord] | 89 85 686 441 2200 FST [QWord] | 99 92 703 516 2481 FISTP [Word] | 79 80 864 794 2620 FISTP [DWord] | 79 81 879 541 2523 FISTP [QWord] | 88 75 904 916 3226 FSTP [DWord] | 89 75 713 467 2400 FSTP [QWord] | 93 72 732 538 2678 FSTP [TByte] | 49 21 685 467 2124 FBSTP [TByte] | 528 472 3305 1555 27013 FINIT | 11 10 742 641 1369 FCLEX | 11 10 440 323 912 FCHS | 21 54 460 354 1744 FABS | 21 54 456 349 1738 FXAM | 21 54 481 380 1551 FTST | 51 75 585 386 2721 FSTENV | 54 57 928 519 2104 FLDENV | 48 50 1125 450 1631 FSAVE | 214 244 1949 976 2749 FRSTOR | 209 227 2182 657 2225 FSTSW [mem] | 28 10 516 401 1189 FSTSW AX | N/A 55 451 N/A N/A FSTCW [mem] | 28 10 506 359 1167 FLDCW [mem] | 19 47 524 437 1584 FADD ST,ST(0) | 86 128 643 706 2805 FADD ST,ST(1) | 85 116 707 808 3093 FADD ST(1),ST | 92 131 664 812 3146 FADDP ST(1),ST | 92 129 704 799 3143 FADD [DWord] | 105 122 874 969 3139 FADD [QWord] | 115 122 888 1021 3396 FIADD [Word] | 115 122 940 1211 3330 FIADD [DWord] | 125 122 882 1297 3215 FSUB ST(1),ST | 88 130 738 817 3156 FSUBR ST(1),ST | 96 132 740 868 3004 FSUBRP ST(1),ST | 99 132 733 805 3301 FSUB [DWord] | 119 122 918 1018 3127 FSUB [QWord] | 129 123 932 1070 3632 FISUB [Word] | 115 123 977 1081 3802 FISUB [DWord] | 125 125 940 980 4161 FMUL ST,ST(1) | 145 151 810 1368 3924 FMUL ST(1),ST | 145 151 817 1377 3962 FMULP ST(1),ST | 148 168 840 1365 4164 FIMUL [Word] | 132 151 1039 1517 4039 FIMUL [DWord] | 141 151 980 1643 3976 FMUL [DWord] | 125 123 948 1480 3445 FMUL [QWord] | 175 192 991 1602 4416 FDIV ST,ST(0) | 201 207 726 1536 9789 FDIV ST,ST(1) | 203 218 808 1658 10332 FDIV ST(1),ST | 207 214 825 1655 10342 FDIVR ST(1),ST | 201 206 819 1806 10213 FDIVRP ST(1),ST | 201 205 845 1803 10409 FIDIV [Word] | 237 227 980 1779 11225 FIDIV [DWord] | 246 227 944 1680 11572 FDIV [DWord] | 229 226 893 1722 10577 FDIV [QWord] | 236 227 993 1777 10829 FSQRT (0.0) | 21 57 512 382 1755 FSQRT (1.0) | 186 206 1106 2504 37836 FSQRT (L2T) | 186 207 1398 2467 37925 FXTRACT (L2T) | 51 56 726 571 3326 FSCALE (PI,5) | 41 56 817 443 3194 FRNDINT (PI) | 51 58 808 800 7092 FPREM (99,PI) | 81 131 1696 941 4098 FPREM1(99,PI) | N/A N/A 1625 N/A N/A FCOM | 56 75 582 483 2799 FCOMP | 61 92 616 485 2983 FCOMPP | 61 90 661 476 3198 FICOM [Word] | 79 77 808 861 3654 FICOM [DWord] | 89 77 750 964 3684 FCOM [DWord] | 74 75 741 625 3643 FCOM [QWord] | 74 76 754 667 3771 FSIN (0.0) | N/A N/A 639 N/A N/A FSIN (1.0) | N/A N/A 4640 N/A N/A FSIN (PI) | N/A N/A 2488 N/A N/A FSIN (LG2) | N/A N/A 3911 N/A N/A FSIN (L2T) | N/A N/A 3767 N/A N/A FCOS (0.0) | N/A N/A 740 N/A N/A FCOS (1.0) | N/A N/A 4777 N/A N/A FCOS (PI) | N/A N/A 2557 N/A N/A FCOS (LG2) | N/A N/A 4176 N/A N/A FCOS (L2T) | N/A N/A 3905 N/A N/A FSINCOS (0.0) | N/A N/A 714 N/A N/A FSINCOS (1.0) | N/A N/A 6049 N/A N/A FSINCOS (PI) | N/A N/A 4091 N/A N/A FSINCOS (LG2) | N/A N/A 5640 N/A N/A FSINCOS (L2T) | N/A N/A 5405 N/A N/A FPTAN (0.0) | 41 58 752 8381 2324 FPTAN (1.0) | 581 582 6366 10817 29824 FPTAN (PI) | 606 587 4388 12410 2300 FPTAN (LG2) | 516 513 5939 12502 26770 FPTAN (L2T) | 576 586 5723 12483 2301 FPATAN (0.0) | 41 55 616 1208 10578 FPATAN (1.0) | 736 736 1426 13446 34208 FPATAN (PI) | 206 207 12835 13305 46903 FPATAN (LG2) | 756 736 12490 13319 41312 FPATAN (L2T) | 206 204 12922 13364 50149 F2XM1 (0.0) | 16 56 563 723 1722 F2XM1 (LN2) | 631 624 4178 11070 33823 F2XM1 (LG2) | 611 585 4798 11116 32163 FYL2X (1.0) | 56 57 961 1214 4327 FYL2X (PI) | 946 961 8987 12858 40148 FYL2X (LG2) | 1081 1038 8933 12748 46821 FYL2X (L2T) | 926 886 8982 12712 38986 FYL2XP1 (LG2) | 1026 1037 10485 11867 44708 The Weitek 3167 and 4167 processors only implement the basic arithmetic functions (add, subtract, multiply, divide, square root) in hardware. Transcendental functions are implemented by means of a software library supplied by Weitek that uses the Weitek hardware to approximate the transcendental functions with polynomial and rational approximations. The clock cycle timings for the transcendental functions are average values, since execution time differs with the value of argument. The speed of transcendental functions for the 4167 is estimated based on the numbers in [31,33], from which this timing information has been extracted. Execution time for floating-point operations in clock cycles on Weitek coprocessors Single Precision Double Precision 3167 4167 3167 4167 ABS 3 2 3 2 NEG 6 2 6 2 ADD 6 2 6 2 SUB 6 2 6 2 SUBR 6 2 6 2 MUL 6 2 10 3 DIVR 38 17 66 31 SQRT 60 17 118 31 SIN 146 ~50 292 ~100 COS 140 ~50 285 ~100 TAN 188 ~60 340 ~110 EXP 179 ~60 401 ~130 LOG 171 ~60 365 ~120 F->ASCII 1000 N/A 1700 N/A // ASCII->F 1100 N/A 1800 N/A // // rough average of the timings given for different numeric formats by Weitek. Note that these conversions routines do much more work than the FBLD and FBSTP instructions provided by the 80x87 coprocessors. FBLD and FBSTP are useful for conversion routines but quite a bit of additional code is need for this purpose. Accuracy The IEEE-754 Standard for Binary Floating-Point Arithmetic [10,11] is fully implemented by Intel's 387 coprocessor [17]. Among other things, this means that the add, subtract, multiply, divide, remainder, and square root operations always deliver the 'exact' result. By exact it is meant that the coprocessor always delivers the machine number closest to the real result, which may not be representable exactly in the available numeric format. The 80387 implements the single, double, and double extended formats as specified in the standard as well as all functions required by it [17]. Note that earlier Intel coprocessors (the 8087 and the 80287) comply with a draft version of the standard that differs from the final version. These chips came out before the IEEE-754 standard was finally accepted in 1985. As in the 80387, the basic arithmetic in the 8087 and the 80287 is exact in the sense that the computed result is always the machine number closest to the real result. However, there are some differences regarding certain operands like infinities and some operation like the remainder are defined differently. Some instructions have been added in the 80387, most notably the FSIN and FCOS operations. The argument range for some transcendental function has been extended [17]. Note that the IEEE-754 standard says nothing about the quality of the implementation of transcendental functions like sin, cos, tan, arctan, log. Intel uses a modified CORDIC [18,19] technique to compute the transcendental functions. Intel claims that maximum error in the 8087, 80287, and 80387 for all transcendental functions does not exceeed two bits in the mantissa of the double extended format, which features 64 mantissa bits for an accuracy of approximately 19 decimal places [22,23]. This claim has been independently verified by a competing vendor [13]. This means that at least 62 of the 64 mantissa bits in a transcendental function result are correct. The Weitek Abacus 3167 and 4167 are 'mostly compatible' with IEEE-754 [31,32,33]. It supports the single precision and double precision numeric formats formats described in the standard as well as the four rounding modes required by it. However, due to the need for extremely high speed operation, some of the finer points of IEEE-754 have not been implemented. One of the most notable omissions is the missing support for denormal numbers. Denormals are always flushed to zero. The 387 clone makers claim 100% compatibility with Intel's 80387. So one would expect the same accuracy from their chips. For example, on the packaging of the IIT 3C87 it says that ".. the requirements of ANSI/IEEE standards are fulfilled and exceeded". Cyrix states that their 83D87 complies fully with the IEEE-754 standard [12]. Cyrix delivers with their copocessors some diagnostic software. This includes the program IEEETEST which is based on the IEEE test vectors from the Ph.D. thesis of Jerome T. Coonen [9]. A test using the IEEE test vectors has also been included into the RUNDIAG program on the Intel RapidCAD diagnostic disk. Rather than performing random tests, the test vectors check specific cases that may be hard to get right. Each test vector specifies the operation to be performed, the operands, precision and rounding mode to be used, and the result (including flags set) to be expected according to IEEE-754. I ran IEEETEST on all the available coprocessors/ FPUs. The Intel 486, Intel RapidCAD, Intel 387, Intel 387DX, Cyrix 83D87, and the Cyrix 387+ passed with no errors. The ULSI 83C87 showed some minor flaws in the FCOM, FDIV, FMUL, and FSCALE operations, getting flag errors in about 1% of the tested cases, but no computational errors. However, for the IIT 3C87, the IEEETEST program showed flag *and* some computational errors (that is, wrong results) for all tested operations except FXTRACT and FCHS. The Intel 80287 shows numerous errors, but this it not surprising, since the 80287 does not comply with IEEE-754 but with an earlier draft of that standard, so it does some thing differently than required by the final version of the standard. Although IEEETEST is written in Turbo Pascal, the coprocessor emulator in the TP 6.0 library could not be tested since IEEETEST was compiled with the $E- switch excluding the emulator from program code. The public domain emulator EM87 could be tested, but hung in the last test which checks the implementation of the remainder operation. This is probably caused by some bug in the emulation of the FPREM instruction tested in this test. It is interesting to note how the error profile of EM87 matches exactly that of the Intel 80287, so it can be assumed that EM87 is a very good emulation of the 80287. The Franke387 V2.4 emulator hung in the division test quite early in IEEETEST. The tests performed up to the division test reported several errors. Explanatory text printed at the start of the IEEETEST program: JT Coonen's 1984 UC Berkeley Ph.D. thesis centers around his activities as a member of the floating-point working group that defined the IEEE 754-1985 Standard for Binary Floating-Point Arithmetic. Appendix C of his thesis presents FPTEST, a Pascal program written by J Thomas and JT Coonen. IEEETEST is a port of FPTEST and runs on PCs whose math coprocessor accepts 80387 compatible floating-point instructions. IEEETEST reads test vectors from the file TESTVECS and compares the answer returned by the math coprocessor with the answer listed in the test vector. If these answers differ an 'F' is displayed, otherwise a '.'is displayed. Answers can differ due to two types of failures: numeric failures or flag failures. Numeric failures occur when the computed answer has the wrong value. Flag failures occur when the status (invalid operation, divide by zero, underflow, overflow, inexact) is incorrectly identified. TESTVECS is the concatenation of unmodified versions of all the test vectors distributed by UC Berkeley. The test data base is copyrighted by UC Berkeley (1985) and is being distributed with their permission. FPTEST and the test data base can be obtained by asking for 'IEEE-754 Test Vector' from UC Berkeley, Electrical Engineering and Computer Science, Industrial Liaison Program, 479 Corey Hall, Berkeley, CA, 94720 (415)643-6687. The initial version of this test data base for the proposed IEEE 754 binary floating-point standard (draft 8.0) was developed for Zilog, Inc. and was donated to the floating-point working group for dissemination. Errors in or additions to the distributed data base should be reported to the agency of distribution, with copies to Zilog, Inc., 1315 Dell Avenue, Campbell, CA, 95008. IEEETEST output for Intel 80387, Intel 387DX, Intel 486, Cyrix 83D87, Cyrix 387+, RapidCAD IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended | TESTS | numeric TYPE OF FAILURE flag Operation Code | Passed Failed | S D E | S D E ---------------------------------------------------------------------- Absolute Value A | 216 0 | 0 0 0 | 0 0 0 Addition + | 3528 0 | 0 0 0 | 0 0 0 Comparison C | 4320 0 | 0 0 0 | 0 0 0 Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0 Division / | 4311 0 | 0 0 0 | 0 0 0 Fraction Part F | 624 0 | 0 0 0 | 0 0 0 Logb L | 960 0 | 0 0 0 | 0 0 0 Multiplication * | 3978 0 | 0 0 0 | 0 0 0 Negation - | 216 0 | 0 0 0 | 0 0 0 Next After N | 2832 0 | 0 0 0 | 0 0 0 Round to Integer I | 558 0 | 0 0 0 | 0 0 0 Scalb S | 948 0 | 0 0 0 | 0 0 0 Square Root V | 744 0 | 0 0 0 | 0 0 0 Subtraction - | 3528 0 | 0 0 0 | 0 0 0 Remainder % | 2984 0 | 0 0 0 | 0 0 0 Totals | 31235 0 | IEEETEST output for ULSI 83C87 IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended | TESTS | numeric TYPE OF FAILURE flag Operation Code | Passed Failed | S D E | S D E ---------------------------------------------------------------------- Absolute Value A | 216 0 | 0 0 0 | 0 0 0 Addition + | 3528 0 | 0 0 0 | 0 0 0 Comparison C | 4312 8 | 0 0 0 | 0 0 8 Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0 Division / | 4250 61 | 0 0 0 | 28 28 5 Fraction Part F | 624 0 | 0 0 0 | 0 0 0 Logb L | 960 0 | 0 0 0 | 0 0 0 Multiplication * | 3936 42 | 0 0 0 | 19 19 4 Negation - | 216 0 | 0 0 0 | 0 0 0 Next After N | 2828 4 | 0 0 0 | 0 0 4 Round to Integer I | 558 0 | 0 0 0 | 0 0 0 Scalb S | 930 18 | 0 0 0 | 6 6 6 Square Root V | 744 0 | 0 0 0 | 0 0 0 Subtraction - | 3528 0 | 0 0 0 | 0 0 0 Remainder % | 2984 0 | 0 0 0 | 0 0 0 Totals | 31102 133 | IEEETEST output for IIT 3C87 IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended | TESTS | numeric TYPE OF FAILURE flag Operation Code | Passed Failed | S D E | S D E ---------------------------------------------------------------------- Absolute Value A | 200 16 | 0 0 16 | 0 0 0 Addition + | 3336 192 | 0 0 128 | 0 0 96 Comparison C | 4224 96 | 0 0 96 | 0 0 0 Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0 Division / | 4159 152 | 0 0 124 | 0 0 116 Fraction Part F | 600 24 | 0 0 24 | 0 0 24 Logb L | 960 0 | 0 0 0 | 0 0 0 Multiplication * | 3702 276 | 0 0 248 | 0 0 100 Negation - | 200 16 | 0 0 16 | 0 0 0 Next After N | 2248 584 | 0 0 584 | 0 0 168 Round to Integer I | 542 16 | 0 0 4 | 0 0 16 Scalb S | 874 74 | 5 5 44 | 8 8 20 Square Root V | 688 56 | 0 0 56 | 0 0 56 Subtraction - | 3336 192 | 0 0 128 | 0 0 96 Remainder % | 2844 140 | 0 0 140 | 0 0 116 Totals | 29401 1834 | IEEETEST output for Intel 80287 run together with a 80386 CPU IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended | TESTS | numeric TYPE OF FAILURE flag Operation Code | Passed Failed | S D E | S D E ---------------------------------------------------------------------- Absolute Value A | 216 0 | 0 0 0 | 0 0 0 Addition + | 2886 642 | 16 16 112 | 174 174 174 Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332 Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0 Division / | 3777 534 | 18 18 37 | 169 169 165 Fraction Part F | 552 72 | 24 24 24 | 24 24 24 Logb L | 900 60 | 12 12 12 | 20 20 20 Multiplication * | 2944 1034 | 105 105 197 | 303 303 231 Negation - | 216 0 | 0 0 0 | 0 0 0 Next After N | 348 2484 | 768 768 768 | 504 504 526 Round to Integer I | 546 12 | 0 0 0 | 4 4 4 Scalb S | 663 285 | 45 43 26 | 102 98 46 Square Root V | 720 24 | 4 4 4 | 8 8 8 Subtraction - | 2886 642 | 16 16 112 | 174 174 174 Remainder % | 708 2276 | 768 768 560 | 216 216 216 Totals | 18850 12385 | IEEETEST output for EM87 coprocessor emulator run on a Intel 386 CPU IEEE-754 Test Vector Precisions: S=Single D=Double E=Double Extended | TESTS | numeric TYPE OF FAILURE flag Operation Code | Passed Failed | S D E | S D E ---------------------------------------------------------------------- Absolute Value A | 216 0 | 0 0 0 | 0 0 0 Addition + | 2886 642 | 16 16 112 | 174 174 174 Comparison C | 0 4320 | 1324 1324 1324 |1332 1332 1332 Copy Sign @ | 1488 0 | 0 0 0 | 0 0 0 Division / | 3777 534 | 18 18 37 | 169 169 165 Fraction Part F | 552 72 | 24 24 24 | 24 24 24 Logb L | 900 60 | 12 12 12 | 20 20 20 Multiplication * | 2944 1034 | 105 105 197 | 303 303 231 Negation - | 216 0 | 0 0 0 | 0 0 0 Next After N | 348 2484 | 768 768 768 | 504 504 526 Round to Integer I | 546 12 | 0 0 0 | 4 4 4 Scalb S | 663 285 | 45 43 26 | 102 98 46 Square Root V | 720 24 | 4 4 4 | 8 8 8 Subtraction - | 2886 642 | 16 16 112 | 174 174 174 To complement the checks done by IEEETEST I wrote some short programs DENORMTS, RCTRL, PCTRL in Turbo Pascal 6.0 that test the following features: 1. support for denormals in all precisions (single, double, extended) 2. support for the four IEEE rounding modes (up, down, nearest, chop) 3. support for precision control Note that 1) and 2) are required for IEEE conformance, while 3) is required for compatibility with Intel's coprocessors. Precision control forces the results of the FADD, FSUB, FMUL, FDIV, and FSQRT instruction to be rounded to the specified precision (single, double, double extended). This feature is provided to obtain compatibility with certain programming languages [17]. By specifying lower precision, one effectively nullifies the advantages of extended precision intermediate results. The programs that test precision control and rounding control are designed to return a different result for each of the modes for the same sequence of operation. The source code of the programs can be found in appendix A. The Intel 8087 and 80287 were not tested with DENORMTS since Turbo Pascal does not support extended precision denormals on 8087/80287 processors, so the denormal test fails anyway. The 8087 and 287 pass the RCTRL and PCTRL tests, though. These are the results for the Intel 387, Intel 387DX, Intel 486, Intel RapidCAD, Cyrix 83D87, Cyrix 387+, and the EM87 emulator (on a 80386 machine) Precision Control SINGLE 1.13311278820037842E+0000 DOUBLE 1.23456789006442125E+0000 EXTENDED 1.23456789012337585E+0000 Rounding Control NEAREST -1.23427629010100635E+0100 DOWN -1.23427623555772409E+0100 UP -1.23457760966801097E+0100 CHOP -1.23397493540770643E+0100 Denormal support SINGLE denormals supported SINGLE denormal prints as: 4.60943116855005E-0041 Denormal should be printed as 4.60943...E-0041 DOUBLE denormals supported DOUBLE denormal prints as: 8.75000000000016E-0311 Denormal should be printed as 8.75...E-0311 EXTENDED denormals supported EXTENDED denormal prints as: 1.31640625000000E-4934 Denormal should be printed as 1.3164...E-4934 These are the results for the ULSI 83C87 Precision Control SINGLE 1.23456789012337585E+0000 DOUBLE 1.23456789012337585E+0000 EXTENDED 1.23456789012337585E+0000 Rounding Control NEAREST -1.23427629010100635E+0100 DOWN -1.23427623555772409E+0100 UP -1.23457760966801097E+0100 CHOP -1.23397493540770643E+0100 Denormal support SINGLE denormals supported SINGLE denormal prints as: 4.60943116855005E-0041 Denormal should be printed as 4.60943...E-0041 DOUBLE denormals supported DOUBLE denormal prints as: 8.75000000000016E-0311 Denormal should be printed as 8.75...E-0311 EXTENDED denormals supported EXTENDED denormal prints as: 1.31640625000000E-4934 Denormal should be printed as 1.3164...E-4934 These are the results for the IIT 3C87 Precision Control SINGLE 1.13311278820037842E+0000 DOUBLE 1.23456789006442125E+0000 EXTENDED 1.23456789012337585E+0000 Rounding Control NEAREST -1.23427629010100635E+0100 DOWN -1.23427623555772409E+0100 UP -1.23457760966801097E+0100 CHOP -1.23397493540770643E+0100 Denormal support SINGLE denormals supported SINGLE denormal prints as: 4.60943116855005E-0041 Denormal should be printed as 4.60943...E-0041 DOUBLE denormals supported DOUBLE denormal prints as: 8.75000000000016E-0311 Denormal should be printed as 8.75...E-0311 EXTENDED denormals not supported These are the results for the TP 6.0 coprocessor emulator: Precision Control SINGLE 1.23456789012351396E+0000 DOUBLE 1.23456789012351396E+0000 EXTENDED 1.23456789012351396E+0000 Rounding Control NEAREST -1.23457766383395931E+0100 DOWN -1.23457766383395931E+0100 UP -1.23457766383395931E+0100 CHOP -1.23457766383395931E+0100 Denormal support SINGLE denormals not supported DOUBLE denormals not supported EXTENDED denormals not supported The test results show that the IIT 3C87 does not conform to the IEEE-754 floating-point standard in that it does not support denormals in double extended precision. The ULSI 83C87 is not Intel 387 compatible in that it does not support precision control, but allways uses double extended precision. The TP 6.0 emulator supports neither precision control, rounding control nor support for any denormals. In addition, its basic arithmetic operations do not seem to conform to the IEEE standard as the results of the test programs differ from that of any result computed by a coprocessor for any mode. With regard to the accuracy of transcendental functions, Cyrix claims that the relative error of the transcendental functions on the 83D87 never exceeds 0.5 units in the last place (0.5 ULP) of the double extended format [13]. This means that the maximum relative error is below 2**-64, while Intel's published error limit is 2**-62. While Intel uses a modified CORDIC algorithm [18,19] to compute the transcendental functions, Cyrix uses rational approximations that utilize a very fast array multiplier. For an explanation why this approach is superior to CORDIC with todays technology, see [61]. Also, Cyrix uses an internal 75 bit data path for the mantissa [15], so intermediate computations in the generation of transcendental function values will enjoy some additional accuracy over the 64 bits provided by the double extended format. Using 75 mantissa bits also provides an advantage over other coprocessors like the Intel 387DX and ULSI 83C87 which use only a 68 bit data path for the mantissa [58,59]. Note that a maximum relative error of 0.5 ULP for the Cyrix coprocessor does not mean that it returns the 'exact' result (machine number closest to infinitely precise result) all the time. Just consider the case where the infinitely precise result of a transcendental function falls nearly half way between two machine numbers. A relative error of 0.5 ULP can cause the result to be either of the numbers after rounding, depending on the direction of the error. But the 83D87 should deliver results that never differ from the 'exact' result by more than one ULP. Cyrix also claims that its transcendental functions satisfy the monotonicity criterion [13], a claim not made by any of the competitors. Monotonicity means that for all x1 > x2, it always follows that f(x1) >= f(x2) for an increasing function like sin on [0..pi/4]. Likewise, for a decreasing function like cos on [0..pi/4], for all x1 > x2, it follows that f(x1) <= f(x2). The Weitek Abacus 3167 and 4167 implement only the basic arithmetic operations (add, subtract, negate, multiply, divide, square root) in hardware. Transcendental functions are provided via a software library provided by Weitek. For these library functions Weitek claims a maximum relative error of 5 ULPs [31,33] (ULP = Unit in the Last Place, numeric weight of the least significant mantissa bit). This means that the last three bits in the mantissa of a double precision result can be wrong. Note that the Intel 387 and compatible math coprocessors generate the transcendental functions with a small relative error with regard to the _extended double precision_ format. Thus, when rounded to double precision, their function values are nearly always 'exact'. 387 type coprocessors have superior accuracy when compared with Weitek's coprocesssors. The test diskette distributed with early versions of the Cyrix 83D87 contained a program TRANCK that checks the accuracy of the transcendental functions in the coprocessor against a more precise software arithmetic [16]. I used this program to compare the accuracy of the transcendental functions on those 287/387/486 coprocessors/FPUs available to me. As TRANCK will not accept negative numbers as intervall limits, I tested each function on an intervall along the positive x-axis. The functions tested are F2XM1 (2**x-1), FSIN (sine), FCOS (cosine), FPTAN (tangent), FPATAN (arctangent), FYL2X (y * log2 (x)), and FYL2XP1 (y * log2 (x+1)). These are all the transcendental functions implemented on the 80387. Note that the square root (FSQRT) is *not* a transcendental function. For every function, 100,000 arguments were evaluated. The arguments were uniformally distributed within the intervall tested. The EM87 emulator could not be checked with TRANCK, since the multiple precision package in TRANCK would always return with an error message immediately. However, the Franke387 could be tested and Test results for accuracy of transcendental functions for double extended precision as returned by the program TRANCK. 100,000 trials per function. %wrong is the percentage of results that differ from the 'exact' result (infinitely precise result rounded to 64 bits) ULP_hi is the number of results where the returned result was greater than the 'exact' (correctly rounded) result by one ULP (the numeric weight of the last mantissa bit, 2**-64 to 2**-63 depending of the size of the number). ULPs_hi is the number of results where the returned result was greater than the 'exact' result by two or more ULPs. ULP_lo is the number of results where the returned result was smaller than the 'exact' (correctly rounded) result by one ULP (the numeric weight of the last mantissa bit, 2**-64 to 2**-63 depending of the size of the number). ULPs_lo is the number of results where the returned result was smaller than the 'exact' result by two or more ULPs. max ULP err is the maximum deviation of a returned result from the 'exact' answer expressed in ULPs. Franke387 V2.4 emulator max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 39.042 25301 708 13029 4 2 COS 0,pi/4 75.714 49827 25887 0 0 3 TAN 0,pi/4 76.976 14230 10029 24323 28394 9 ATAN 0,1 55.826 26028 1529 24044 4225 4 2XM1 0,0.5 96.717 0 0 47910 48807 5 YL2XP1 0,sqrt(2)-1 93.007 578 9 27416 65004 8 YL2X 0.1,10 62.252 16817 4712 37082 3641 2953 INTEL 80287 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 N/A N/A N/A N/A N/A N/A COS 0,pi/4 N/A N/A N/A N/A N/A N/A TAN 0,pi/4 37.001 18756 524 17405 316 2 ATAN 0,1 9.666 6065 0 3601 0 1 2XM1 0,0.5 19.920 0 0 19920 0 1 YL2XP1 0,sqrt(2)-1 7.780 868 0 6912 0 1 YL2X 0.1,10 1.287 723 0 564 0 1 INTEL 387 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 28.872 2467 0 26392 13 2 COS 0,pi/4 27.213 27169 35 9 0 2 TAN 0,pi/4 10.532 441 0 10091 0 1 ATAN 0,1 7.088 2386 0 4691 1 2 2XM1 0,0.5 32.024 0 0 32024 0 1 YL2XP1 0,sqrt(2)-1 22.611 3461 0 19150 0 1 YL2X 0.1,10 13.020 6508 0 6512 0 1 INTEL 387DX max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 28.873 2467 0 26393 13 2 COS 0,pi/4 27.121 27090 22 9 0 2 TAN 0,pi/4 10.711 457 0 10254 0 1 ATAN 0,1 7.088 2386 0 4691 1 2 2XM1 0,0.5 32.024 0 0 32024 0 1 YL2XP1 0,sqrt(2)-1 22.611 3461 0 19150 0 1 YL2X 0.1,10 13.020 6508 0 6512 0 1 ULSI 83C87 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 35.530 4989 6 30238 297 2 COS 0,pi/4 43.989 11193 675 31393 728 2 TAN 0,pi/4 48.539 18880 1015 26349 2295 3 ATAN 0,1 20.858 62 0 20796 0 1 2XM1 0,0.5 21.257 4 0 21253 0 1 YL2XP1 0,sqrt(2)-1 27.893 9446 0 18213 234 2 YL2X 0.1,10 13.603 9816 0 3787 0 1 IIT 3C87 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 18.650 11171 0 7479 0 1 COS 0,pi/4 7.700 3024 0 4676 0 1 TAN 0,pi/4 20.973 9681 0 11291 1 2 ATAN 0,1 19.280 13186 0 6094 0 1 2XM1 0,0.5 25.660 17570 0 8090 0 1 YL2XP1 0,sqrt(2)-1 45.830 23503 1896 19654 777 3 YL2X 0.1,10 10.888 5638 357 4845 48 3 CYRIX 83D87 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 1.554 1015 0 539 0 1 COS 0,pi/4 0.925 143 0 782 0 1 TAN 0,pi/4 4.147 881 0 3266 0 1 ATAN 0,1 0.656 229 0 427 0 1 2XM1 0,0.5 2.628 1433 0 1194 0 1 YL2XP1 0,sqrt(2)-1 3.242 825 0 2417 0 1 YL2X 0.1,10 0.931 256 0 675 0 1 CYRIX 387+ max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 1.486 864 0 622 0 1 COS 0,pi/4 2.072 12 0 2060 0 1 TAN 0,pi/4 0.602 63 0 539 0 1 ATAN 0,1 0.384 12 0 372 0 1 2XM1 0,0.5 1.985 27 0 1958 0 1 YL2XP1 0,sqrt(2)-1 3.662 1705 0 1957 0 1 YL2X 0.1,10 0.764 367 0 397 0 1 INTEL RapidCAD, Intel 486 max funct. intervall %wrong ULP_hi ULPs_hi ULP_lo ULPs_lo ULP err SIN 0,pi/4 16.991 1517 0 15474 0 1 COS 0,pi/4 9.003 7603 0 1400 0 1 TAN 0,pi/4 10.532 441 0 10091 0 1 ATAN 0,1 7.078 2386 0 4691 1 2 2XM1 0,0.5 32.025 0 0 32025 0 1 YL2XP1 0,sqrt(2)-1 21.800 533 0 21267 0 1 YL2X 0.1,10 3.894 1879 0 2015 0 1 The test results above indicate that all 80x87 compatibles do not exceed Intel's stated error bound of 3 ULPs for the transcendental functions. However, some coprocessors are more accurate than others. Rating the coprocessors according to the accuracy of their trans- cendental functions gives the following list (highest accuracy first): Cyrix 387+, Cyrix 83D87, Intel 486, Intel RapidCAD, Intel 80287(!), Intel 387DX, Intel 80387, IIT 3C87, ULSI 83C87. The tests also show that the problems with excessive inaccuracy of the trans- cendental functions in early versions of the IIT coprocessors with errors of up to 8 ULPs [8] have been eliminated. According to [56], certain problems with the FPATAN instruction on the IIT 3C87 occuring under the UNIX version of AutoCAD have been corrected in June, 1990. The Franke387 has acceptable accuracy for the FSIN, FCOS, and FPATAN instructions, taking into consideration that according to its documentation, Franke387 uses only 64 bits of precision for the intermediate results, while coprocessorsa typically use 68 bits and more. However, the larger error in the FPTAN, F2XM1, FYL2XP1, and especially the FYL2X operations show that the emulator doesn't use state of the art algorithms, which ensure an error of only a very few ULPs even if no extra precise intermediate results are available.