4-9 MINIMIZING FLOATING-POINT ERRORS
*************************************
Practical methods for analyzing the severity of floating-point
errors can be found in the chapter: 'practical issues'.
Note that an interval/stochastic arithmetic package is not just a
diagnostic tool for FP errors, a result without an error estimation
is not very useful, as errors can never be eliminated completely
in experimental data and computation.
The chapter on 'FORTRAN pitfalls' discusses various programming
practises that may amplify floating-point (and other) errors,
and their avoidance is very important.
Using REAL*16 (QUAD PRECISION)
------------------------------
This is the most simple solution on machines that supports this
data type. REAL*16 takes more CPU time than REAL*8/REAL*4,
but introduces very small roundoff errors, and has a huge range.
Performance cost of different size floats is VERY machine dependent
(see the performance chapter).
A crude example program:
PROGRAM REALS
C ------------------------------------------------------------------
REAL*4
* X4, Y4, Z4
C ------------------------------------------------------------------
REAL*8
* X8, Y8, Z8
C ------------------------------------------------------------------
REAL*16
* X16, Y16, Z16
C ------------------------------------------------------------------
X4 = 1.0E+00
Y4 = 0.9999999E+00
Z4 = X4 - Y4
WRITE(*,*) Z4 ** 0.5
C ------------------------------------------------------------------
X8 = 1.0D+00
Y8 = 0.9999999D+00
Z8 = X8 - Y8
WRITE(*,*) Z8 ** 0.5
C ------------------------------------------------------------------
X16 = 1.0Q+00
Y16 = 0.9999999Q+00
Z16 = X16 - Y16
WRITE(*,*) Z16 ** 0.5
C ------------------------------------------------------------------
END
Normalization of equations
--------------------------
Floating-point arithmetic is best when dealing with numbers with
magnitudes of the order of 1.0, the 'representation density' is not
maximal but we are in the 'middle' of the range.
Usually you can decrease the range of numbers appearing in the
computation, by transforming the system of units, so that you get
dimensionless equations.
The diffusion equation will serve as an example, I apologize for
the horrible notation:
Ut = K * Uxx
Where the solution is U(X,T), the lowercase letters denote the
partial derivatives, and K is a constant.
Let:
L be a typical length in the problem
U0 a typical value of U
Substitute:
X' = X / L U' = U / U0
Then:
Ux = Ux' / L
Uxx = Ux'x' / (L*L)
Substitute in the original equation:
(U' * U0)t = (K / (L*L)) * (U' * U0)x'x'
((L * L) / K) U't = U'x'x'
Substitute:
T' = (K * T) / (L * L)
And you get:
U't' = U'x'x'
With: X' = X / L U' = U / U0 T' = (K * T) / (L * L)
Multi-precision arithmetic
--------------------------
That is a really bright idea, you can simulate floating-point numbers
with very large sizes, using character strings (or other data types),
and create routines for doing arithmetic on these giant numbers.
Of course such software simulated arithmetic will be slow.
By the way, the function overloading feature of Fortran 90, makes using
multi-precision arithmetic packages with existing programs easy.
Two free packages are "mpfun" and "bmp" (Brent's multiple precision),
which are available from Netlib.
Using special tricks
--------------------
A simple trick for summing a series, is sorting the numbers and adding
them in ascending order.
An example program:
PROGRAM RNDOF
C ------------------------------------------------------------------
INTEGER
* i
C ------------------------------------------------------------------
REAL
* sum
C ------------------------------------------------------------------
sum = 0.0
DO I = 1, 10000000, 1
sum = sum + 1.0 / REAL(i)
END DO
WRITE (*,*) 'Decreasing order: ', sum
C ------------------------------------------------------------------
sum = 0.0
DO I = 10000000, 1, -1
sum = sum + 1.0 / REAL(i)
END DO
WRITE (*,*) 'Increasing order: ', sum
C ------------------------------------------------------------------
END
There is no need here for sorting, as the series is monotonic.
Executing 2 * 10**7 iterations will take some CPU seconds, but
the result is very illuminating.
Another way (though not as good as doubling the precision) is
using the Kahan Summation Formula.
Suppose the series is stored in an array X(1:N)
SUM = X(1)
C = 0.0
DO J = 2, N
Y = X(J) - C
T = SUM + Y
C = (T - SUM) - Y
SUM = T
ENDDO
Yet another method is using Knuth's formula.
The recommended method is sorting and adding.
Using integers instead of floats
--------------------------------
See the chapter on integers.
Manual safeguarding
-------------------
You can check manually every dangerous arithmetic operation, you
may construct special routines for 'safe' arithmetical operations.
Hardware support
----------------
IEEE conforming FPUs can raise an exception whenever a roundoff
was performed on an arithmetical result.
You can write an exception handler that will report the exceptions,
but as the result of most operations may have to be rounded, your
program will be slowed down, and you will get huge log files.
Rational arithmetic
-------------------
We can represent (possibly with an error) every number as a quotient
of two integers, and keep along the dividend and divisor without
actually performing the division.
Return to contents page