home *** CD-ROM | disk | FTP | other *** search
- Design notes for FPLIB Release 2.0
- ==================================
-
-
- NUMBER REPRESENTATION
- ---------------------
-
- The floating point format used is Motorola's "Fast Floating Point":
-
- -----------------------------------------
- | 24 bits mantissa |S| 7 bits exp |
- -----------------------------------------
-
- The mantissa has an implicit binary point just off to the left. The
- exponent is binary, stored excess-64. In other words, the actual
- mathematical exponent has 64 added, and the resulting 7-bit number is
- unsigned.
-
- Negative numbers are stored as sign-and-magnitude.
-
- All floating-point operations expect normalized numbers, and generate
- normalized numbers if the inputs are normalized. A number is
- normalized if the most significant bit is 1 (except for 0.0, which is
- stored as 32 bits of zero, never negative). Also, a nonzero mantissa
- with a biased exponent of 0 is not normalized, letting us test the low
- byte or whole long as a test for 0.0. This all means that the most
- significant bit is actually redundant, and could be left off, giving
- one extra bit of precision. Oh well.
-
- Some of these routines make use of the assumption that all arguments
- are normalized, and may generate slightly bogus results if that isn't
- true. Release 1.0 assumed that a biased exponent of zero could have
- a nonzero mantissa; that has been fixed.
-
- Some examples: 1.0 is hex 80000041 , or 0.5 * 2 ** 1 (binary 0.1,
- exponent 1 + bias of 64). -0.75 is hex C00000C0 (note the sign bit is
- inverted, but no other bit is).
-
-
- ACCURACY AND LIMITS
- -------------------
-
- The accuracy is in the 24-bit mantissa, which gives between 7.22 and
- 7.52 decimal digits of relative error. Decimal constants are given to
- 8 digits, and the compiler is trusted to convert them right (it
- generally does a reasonable job).
-
- The range is in the exponent. The smallest nonzero and largest
- positive normalized numbers are ~5.4e-20 and ~9.2e18.
-
- Rounding of ambiguous numbers uses round-to-even (see D. E. Knuth's
- Seminumerical Algorithms for a discussion on this). This means that
- if the trailing portion is exactly one half of the least significant
- bit, the lsb is forced to be the nearer even number. We often keep
- several insignificant bits to be sure this is done correctly.
-
-
- CODING TRICKS
- -------------
-
- It's quite normal in a library of this type to resort to bit-twiddling
- in the internal representation. To help this, FPLIB.H defines a C
- union, thus:
-
- union {
- unsigned long ul;
- float f;
- char sc[4];
- };
-
- The number can be looked at as a floating or integer long (for handling
- bits in the mantissa), the sign is the sign of sc[3], and the exponent
- is the rest of sc[3]. Other shortcuts are commented in the inividual
- routines.
-
-
- MATHEMATICAL INFORMATION
- ------------------------
-
- Well, this is mostly what I remember from my undergraduate days. Many
- of these functions have a precise mathematical formula or algorithm
- that will give the result, but is too slow for practical use. Instead,
- we look for an approximation that is quick to compute, and gives us
- just enough precision and no more. After all, why compute to 100
- digits accuracy when you can only represent less than 8? Some of these
- approximations look very strange, but they all have sound mathematical
- bases. The name Chebyshev springs to mind.
-
- Dedicated mathematicians have struggled for decades to bring you these
- formulae and, just as important, a predictable accuracy of better than
- 7.5 decimal digits. Because of this, we can execute a known number of
- operations instead of loop-and-test (as you would in, for example, a
- simple Newton's iteration for square root).
-
- The approximations are generally polynomials, usually avoid division, and
- are valid only over a small range of the input parameter. We use simple
- mathematical identities to extend the range. For example, the cyclic
- nature of the trigonometric functions means we only need calculate in the
- range 0..pi/2. For another example, we can do much of the square root
- calculation by the simple expedient of dividing the binary exponent by 2.
-
-
- RANDOM LIBRARY INFORMATION
- --------------------------
-
- Release 1.0 of Sozobon C had a few problems in the floating support:
-
- - Two negative floating numbers compared wrong (returned > when it
- should be <). I corrected that. However, the other routines don't
- assume the correction has been installed in the library (which it
- wasn't, at first...).
-
- - The library was not optimally ordered, and had to be scanned twice.
- I straightened that out.
-
- - Add and subtract were truncating. Rounding is generally a better idea.
-
- atof() calls sscanf(). Ick.
-
- -------------------------------------------------------------------------------
-
- These routines may be used without restriction, but I retain the copyright
- on them, and the Only True Upgrades will be those I make. If you have any
- improvements, please send them to me.
-
- March 1989
-
- David W. Brooks
- 36 Elm Street
- Medfield, MA 02052
-
- BROOKS@CSSS-A.PRIME.COM
- {uunet,mit-eddie}!csss-a.prime.com!brooks
-