home *** CD-ROM | disk | FTP | other *** search
- +---------------------------------------------------------------------------+
- | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |
- | |
- | Copyright (C) 1992,1993,1994 |
- | W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |
- | Australia. E-mail billm@vaxc.cc.monash.edu.au |
- | |
- | This program is free software; you can redistribute it and/or modify |
- | it under the terms of the GNU General Public License version 2 as |
- | published by the Free Software Foundation. |
- | |
- | This program is distributed in the hope that it will be useful, |
- | but WITHOUT ANY WARRANTY; without even the implied warranty of |
- | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
- | GNU General Public License for more details. |
- | |
- | You should have received a copy of the GNU General Public License |
- | along with this program; if not, write to the Free Software |
- | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
- | |
- +---------------------------------------------------------------------------+
-
-
-
- wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
- which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
- in turn based upon emu387 which was written by DJ Delorie for djgpp.
- The interface to the Linux kernel is based upon the original Linux
- math emulator by Linus Torvalds.
-
- My target FPU for wm-FPU-emu is that described in the Intel486
- Programmer's Reference Manual (1992 edition). Unfortunately, numerous
- facets of the functioning of the FPU are not well covered in the
- Reference Manual. The information in the manual has been supplemented
- with measurements on real 80486's. Unfortunately, it is simply not
- possible to be sure that all of the peculiarities of the 80486 have
- been discovered, so there is always likely to be obscure differences
- in the detailed behaviour of the emulator and a real 80486.
-
- wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
- See "Limitations" later in this file for a list of some differences.
-
- Please report bugs, etc to me at:
- billm@vaxc.cc.monash.edu.au
- or at:
- billm@jacobi.maths.monash.edu.au
-
-
- --Bill Metzenthen
- March 1994
-
-
- ----------------------- Internals of wm-FPU-emu -----------------------
-
- Numeric algorithms:
- (1) Add, subtract, and multiply. Nothing remarkable in these.
- (2) Divide has been tuned to get reasonable performance. The algorithm
- is not the obvious one which most people seem to use, but is designed
- to take advantage of the characteristics of the 80386. I expect that
- it has been invented many times before I discovered it, but I have not
- seen it. It is based upon one of those ideas which one carries around
- for years without ever bothering to check it out.
- (3) The sqrt function has been tuned to get good performance. It is based
- upon Newton's classic method. Performance was improved by capitalizing
- upon the properties of Newton's method, and the code is once again
- structured taking account of the 80386 characteristics.
- (4) The trig, log, and exp functions are based in each case upon quasi-
- "optimal" polynomial approximations. My definition of "optimal" was
- based upon getting good accuracy with reasonable speed.
- (5) The argument reducing code for the trig function effectively uses
- a value of pi which is accurate to more than 128 bits. As a consequence,
- the reduced argument is accurate to more than 64 bits for arguments up
- to a few pi, and accurate to more than 64 bits for most arguments,
- even for arguments approaching 2^63. This is far superior to an
- 80486, which uses a value of pi which is accurate to 66 bits.
-
- The code of the emulator is complicated slightly by the need to
- account for a limited form of re-entrancy. Normally, the emulator will
- emulate each FPU instruction to completion without interruption.
- However, it may happen that when the emulator is accessing the user
- memory space, swapping may be needed. In this case the emulator may be
- temporarily suspended while disk i/o takes place. During this time
- another process may use the emulator, thereby changing some static
- variables (eg FPU_st0_ptr, etc). The code which accesses user memory
- is confined to five files:
- fpu_entry.c
- reg_ld_str.c
- load_store.c
- get_address.c
- errors.c
-
- ----------------------- Limitations of wm-FPU-emu -----------------------
-
- There are a number of differences between the current wm-FPU-emu
- (version beta 1.11) and the 80486 FPU (apart from bugs). Some of the
- more important differences are listed below:
-
- The Roundup flag does not have much meaning for the transcendental
- functions and its 80486 value with these functions is likely to differ
- from its emulator value.
-
- In a few rare cases the Underflow flag obtained with the emulator will
- be different from that obtained with an 80486. This occurs when the
- following conditions apply simultaneously:
- (a) the operands have a higher precision than the current setting of the
- precision control (PC) flags.
- (b) the underflow exception is masked.
- (c) the magnitude of the exact result (before rounding) is less than 2^-16382.
- (d) the magnitude of the final result (after rounding) is exactly 2^-16382.
- (e) the magnitude of the exact result would be exactly 2^-16382 if the
- operands were rounded to the current precision before the arithmetic
- operation was performed.
- If all of these apply, the emulator will set the Underflow flag but a real
- 80486 will not.
-
- NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
- unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
- and Unnormals. None of these will be generated by an 80486 or by the
- emulator. Do not use them. The emulator treats them differently in
- detail from the way an 80486 does.
-
- The emulator treats PseudoDenormals differently from an 80486. These
- numbers are in fact properly normalised numbers with the exponent
- offset by 1, and the emulator treats them as such. Unlike the 80486,
- the emulator does not generate a Denormal Operand exception for these
- numbers. The arithmetical results produced when using such a number as
- an operand are the same for the emulator and a real 80486 (apart from
- any slight precision difference for the transcendental functions).
- Neither the emulator nor an 80486 produces one of these numbers as the
- result of any arithmetic operation. An 80486 can keep one of these
- numbers in an FPU register with its identity as a PseudoDenormal, but
- the emulator will not; they are always converted to a valid number.
-
- Self modifying code can cause the emulator to fail. An example of such
- code is:
- movl %esp,[%ebx]
- fld1
- The FPU instruction may be (usually will be) loaded into the pre-fetch
- queue of the cpu before the mov instruction is executed. If the
- destination of the 'movl' overlaps the FPU instruction then the bytes
- in the prefetch queue and memory will be inconsistent when the FPU
- instruction is executed. The emulator will be invoked but will not be
- able to find the instruction which caused the device-not-present
- exception. For this case, the emulator cannot emulate the behaviour of
- an 80486DX.
-
- Handling of the address size override prefix byte (0x67) has not been
- extensively tested yet. A major problem exists because using it in
- vm86 mode can cause a general protection fault. Address offsets
- greater than 0xffff appear to be illegal in vm86 mode but are quite
- acceptable (and work) in real mode. A small test program developed to
- check the addressing, and which runs successfully in real mode,
- crashes dosemu under Linux and also brings Windows down with a general
- protection fault message when run under the MS-DOS prompt of Windows
- 3.1. (The program simply reads data from a valid address).
-
-
- ----------------------- Performance of wm-FPU-emu -----------------------
-
- Speed.
- -----
-
- The speed of floating point computation with the emulator will depend
- upon instruction mix. Relative performance is best for the instructions
- which require most computation. The simple instructions are adversely
- affected by the fpu instruction trap overhead.
-
-
- Timing: Some simple timing tests have been made on the emulator functions.
- The times include load/store instructions. All times are in microseconds
- measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
- ms-dos, the next two columns are for emulators running with the djgpp
- ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
- using libm4.0 (hard).
-
- function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu
-
- + 60.5 154.8 76.5 139.4
- - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
- * 71.0 190.8 79.6 146.6
- / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1
-
- sin() 310.8 4692.0 319.0 398.5
- cos() 284.4 4855.2 308.0 388.7
- tan() 495.0 8807.1 394.9 504.7
- atan() 328.9 4866.4 601.1 419.5-491.9
-
- sqrt() 128.7 crashed 145.2 227.0
- log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
- exp() 479.1 6619.2 469.1 850.8
-
-
- The performance under Linux is improved by the use of look-ahead code.
- The following results show the improvement which is obtained under
- Linux due to the look-ahead code. Also given are the times for the
- original Linux emulator with the 4.1 'soft' lib.
-
- [ Linus' note: I changed look-ahead to be the default under linux, as
- there was no reason not to use it after I had edited it to be
- disabled during tracing ]
-
- wm-FPU-emu w original w
- look-ahead 'soft' lib
- + 106.4 190.2
- - 108.6-111.6 192.4-216.2
- * 113.4 193.1
- / 108.8-124.4 700.1-706.2
-
- sin() 390.5 2642.0
- cos() 381.5 2767.4
- tan() 496.5 3153.3
- atan() 367.2-435.5 2439.4-3396.8
-
- sqrt() 195.1 4732.5
- log() 358.0-387.5 3359.2-3390.3
- exp() 619.3 4046.4
-
-
- These figures are now somewhat out-of-date. The emulator has become
- progressively slower for most functions as more of the 80486 features
- have been implemented.
-
-
- ----------------------- Accuracy of wm-FPU-emu -----------------------
-
-
- Accuracy: The following table gives the accuracy of the sqrt(), trig
- and log functions. Each function was tested at about 400 points. Ideal
- results would be 64 bits. The reduced accuracy of cos() and tan() for
- arguments greater than pi/4 can be thought of as being due to the
- precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
- accurate to 64 bits can result in a relative accuracy in cos() of about
- 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
- in the last column.
-
-
- Function Tested x range Worst result Turbo C
- (relative bits)
-
- sqrt(x) 1 .. 2 64.1 63.2
- atan(x) 1e-10 .. 200 62.6 62.8
- cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
- 35.2 (x = pi/2-(1e-10)) 31.9
- sin(x) 1e-10 .. pi/2 63.0 62.8
- tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
- 35.2 (x = pi/2-(1e-10)) 31.9
- exp(x) 0 .. 1 63.1 62.9
- log(x) 1+1e-6 .. 2 62.4 62.1
-
-
- As of version 1.3 of the emulator, the accuracy of the basic
- arithmetic has been improved (by a small fraction of a bit). Care has
- been taken to ensure full accuracy of the rounding of the basic
- arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
- results which are exact to the 64th bit (unless there are any bugs
- left). To ensure this, it was necessary to effectively get information
- of up to about 128 bits precision. The emulator now passes the
- "paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
- bit precision numbers) when precision control is set to 24, 53 or 64
- bits, and for 'double' variables (53 bit precision numbers) when
- precision control is set to 53 bits (a properly performing FPU cannot
- pass the 'paranoia' tests for 'double' variables when precision
- control is set to 64 bits).
-
- For version 1.5, the accuracy of fprem and fprem1 has been improved.
- These functions now produce exact results. The code for reducing the
- argument for the trig functions (fsin, fcos, fptan and fsincos) has
- been improved and now effectively uses a value for pi which is
- accurate to more than 128 bits precision. As a consquence, the
- accuracy of these functions for large arguments has been dramatically
- improved (and is now very much better than an 80486 FPU). There is
- also now no degradation of accuracy for fcos and ftan for operands
- close to pi/2. Measured results are (note that the definition of
- accuracy has changed slightly from that used for the above table):
-
- Function Tested x range Worst result
- (absolute bits)
-
- cos(x) 0 .. 9.22e+18 62.0
- sin(x) 1e-16 .. 9.22e+18 62.1
- tan(x) 1e-16 .. 9.22e+18 61.8
-
- It is possible with some effort to find very large arguments which
- give much degraded precision. For example, the integer number
- 8227740058411162616.0
- is within about 10e-7 of a multiple of pi. To find the tan (for
- example) of this number to 64 bits precision it would be necessary to
- have a value of pi which had about 150 bits precision. The FPU
- emulator computes the result to about 42.6 bits precision (the correct
- result is about -9.739715e-8). On the other hand, an 80486 FPU returns
- 0.01059, which in relative terms is hopelessly inaccurate.
-
- For arguments close to critical angles (which occur at multiples of
- pi/2) the emulator is more accurate than an 80486 FPU. For very large
- arguments, the emulator is far more accurate.
-
- ------------------------- Contributors -------------------------------
-
- A number of people have contributed to the development of the
- emulator, often by just reporting bugs, sometimes with suggested
- fixes, and a few kind people have provided me with access in one way
- or another to an 80486 machine. Contributors include (to those people
- who I may have forgotten, please forgive me):
-
- Linus Torvalds
- Tommy.Thorn@daimi.aau.dk
- Andrew.Tridgell@anu.edu.au
- Nick Holloway, alfie@dcs.warwick.ac.uk
- Hermano Moura, moura@dcs.gla.ac.uk
- Jon Jagger, J.Jagger@scp.ac.uk
- Lennart Benschop
- Brian Gallew, geek+@CMU.EDU
- Thomas Staniszewski, ts3v+@andrew.cmu.edu
- Martin Howell, mph@plasma.apana.org.au
- M Saggaf, alsaggaf@athena.mit.edu
- Peter Barker, PETER@socpsy.sci.fau.edu
- tom@vlsivie.tuwien.ac.at
- Dan Russel, russed@rpi.edu
- Daniel Carosone, danielce@ee.mu.oz.au
- cae@jpmorgan.com
- Hamish Coleman, t933093@minyos.xx.rmit.oz.au
- Bruce Evans, bde@kralizec.zeta.org.au
- Timo Korvola, Timo.Korvola@hut.fi
- Rick Lyons, rick@razorback.brisnet.org.au
-
- ...and numerous others who responded to my request for help with
- a real 80486.
-
-