home *** CD-ROM | disk | FTP | other *** search
-
- LINPACK
-
- Version 1.1
- February 22, 1985
-
- Adam Fritz
- 133 Main Street
- Afton, New York 13730
-
-
- INTRODUCTION
- ============
-
- Thi≤á writeu≡á discusse≤ Pasca∞ anΣ ├ conversioε anΣá microì
- compute≥ performancσ measuremen⌠ oε representativσ routine≤á froφ ì
- thσ LINPAC╦ (1⌐ library.
-
- Referencσ ▒ discusse≤ anΣ list≤ thσ routine≤ called;
-
- LINPAC╦ - LINea≥ system≤ PACKage,
-
- along with supporting routines called;
-
- BLAS - Basic Linear Algebra Subprograms.
-
- Thesσá routine≤ comprisσ FORTRA╬ prograφ encoding≤ oµá algorithm≤ ì
- intendeΣá t∩á solvσ problem≤ connecteΣ witΦá simultaneou≤á linea≥ ì
- equations.
-
- Referencσ ▓ present≤ tabulation≤ oµ runtimσ performancσ fo≥ ì
- applicatioε oµ tw∩ LINPAC╦ algorithms¼ viz.;
-
- SGEFA - Single precision/GEneral matrix/FActor
-
- and
-
- SGESL - Single precision/GEneral matrix/SoLve,
-
- for different precisions and different computers.
-
- The routines that accompany this writeup are listed below
- witΦ brieµ discussions«á Pascal¼á C¼á anΣ correspondinτ FORTRA╬ ì
- routine≤á arσá presenteΣ iεá librarie≤á LPAK11/T¼á LPAK11/C¼á anΣ ì
- LPAK10/F, respectively.
-
- CODR - Driver for SGECO and SGEFA.
- SLDR - Driver for SGEFA and SGESL.
- DIDR - Driver for SGECO, SGEFA, and SGEDI.
-
- These are test driver programs. They prompt the user for
- test system order and printout code. They call SYSGEN
- to compose a coefficient matrix and a right hand side
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730è vector, then call SGECO or SGEFA as appropriate, and then
- call SGESL or SGEDI with intermediate printout conditioned
- on a user prompted print flag.
-
- SYSGEN - Generate Test Matrix and Object Vector.
-
- Test system generator. This program is taken from a SICE
- test program by the author. It generates a Hilbert matrix
- and a right hand side vector comprised of the matrix diago
- nal. The test matrix is arbitrary because Reference 2
- compare≤á runtimσ anΣ no⌠ precision«á Thi≤ tes⌠á matri°á i≤ ì
- áááááásymmetriπ rathe≥ thaε genera∞ anΣ i≤ ß poorl∙ conditioneΣ
- system.
-
- SGECO - Single Precision General Matrix Condition.
- SGEFA - Single Precision General Matrix Factorization.
- SGESL - Single Ptrecison General System Solution.
- SGEDI - Single Precision General Determinant and Inverse.
-
- Translated from Reference 1.
-
- OUT - Show Intermediate Results Conditioned on Print Code.
-
- Adapted from Reference 3.
-
- ISAMAX - Integer Valued Single Precision Real Array Maximum.
- SASUM - Sum of Absolute Values.
- SAXPY - Single Precision Real Vector Times a Constant Plus a
- Vector.
- SCOPY - Copies Vector to Vector.
- SDOT - Single Precison Real Vector Dot Product of Two Vectors.
- SSCAL - Single Precision Real Vector Times a Constant.
- SSWAP - Swap Two Vectors.
-
- Translated from Reference 1 onto BLAS.
-
-
- CONVERSION
- ==========
-
- Pascal
- ------
-
- There are two important conversion issues; FORTRAN vrs.
- Pasca∞á subprograφá arguments¼á anΣ FORTRA╬ vrs«á Pasca∞á storagσ ì
- allocatioεá t∩ multidimensiona∞ arrays«á Therσ arσá man∙á change≤ ì
- tha⌠ arσ syntactical¼á somσ tha⌠ arσ matter≤ oµ programinτ style¼ ì
- anΣ ß fe≈ baseΣ oε expedien⌠ conformit∙ t∩ microcompute≥ runtimσ ì
- environments.
-
- Pascal is a so-called strongly typed language. If a vari
- able is declared to be an array and is used as an actual argument
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730èto a subprogram with a formal argument that is scalar, then a
- compiler error message notes the inconsistent usage. FORTRAN has
- no such qualms. Part of the terseness and ingenuity of BLAS and
- LINPACK in FORTRAN is the way arrays and parts of arrays are
- communicated. FORTRAN uses call by name so that when an element
- of an array is used as an actual argument the address of that
- elemen⌠á i≤ presenteΣ t∩ thσ subprogram«á Iµ thσ subprograφá deì
- clare≤ the corresponding formal argument to be a vector then the
- corresponding components of the actual array can be accessed. In
- Pascal such usage is that associated with pointers. The BLAS
- routines have been modified to make explicit use of pointers. The
- changes do not materially affect the routines.
-
- LINPACK and BLAS, in particular, are written to conform to
- FORTRAN usage in allocating storage to multidimensional arrays by
- column or with first subscript changing fastest. Pascal, TURBO
- Pasca∞ anΣ M╙ Pasca∞ anyway¼á store≤ b∙ row«á Thσ foresigh⌠ showε ì
- iεá desigεá oµ BLA╙ allow≤ thi≤ probleφ t∩ bσá solveΣá simpl∙á b∙ ì
- changinτá thσ incremen⌠ specifieΣ oε call≤ t∩ BLA╙ froφá LINPACK« ì
- Thσ LINPAC╦ change≤ arσ simplσ anΣ thσ BLA╙ routine≤ arσ altereΣ ì
- t∩ removσ thσ partiall∙ unrolleΣ loop≤ fo≥ no≈ unlikel∙ case≤ oµ ì
- unit∙ increment.
-
- Therσ arσ syntactica∞ difference≤ betweeε Pasca∞ version≤ iε ì
- handlinτá pointers«á TURB╧á use≤ '^Objectº t∩ denotσá ßá pointe≥ ì
- whilσá M╙á Pasca∞ use≤ 'AD╙ O╞ Objectº witΦá simila≥á difference≤ ì
- betweeε usagσ oµ Ptr¼ anΣ Add≥ o≥ Seg/Of≤ witΦ TURB╧ anΣ AD╙ witΦ ì
- MS«á Thσá routine≤ distributeΣ witΦ thi≤ releasσ arσá compatiblσ ì
- with TURBO.
-
- C
- -
- Iεá additioεá t∩ thσ chieµ Pasca∞ conversioεá issues¼á viz.¼ ì
- argumen⌠ convention≤ anΣ ro≈ majo≥ arra∙ storage¼á ├ als∩á raise≤ ì
- thσá zer∩á origiεá indexinτ issue«á Thi≤ probleφ i≤á avoideΣá iε ì
- Pasca∞ b∙ declarinτ dimension≤ [1..n]«á Iε C¼ declarinτ ß dimenì
- sioεá [n▌ mean≤ indice≤ froφ ░ t∩ n-1«á Thi≤ create≤á ßá probleφ ì
- becausσá i⌠ force≤ distinctioε betweeε variable≤ useΣ a≤á indice≤ ì
- anΣ thosσ useΣ a≤ counters.
-
- Iε additioε t∩ thσ routine≤ distributeΣ fo≥ Pasca∞ therσá i≤ ì
- ßá routinσá iε ├ nameΣ Abs(⌐ whicΦ return≤ floa⌠ absolutσ oµá it≤ ì
- argument.
-
-
-
- PERFORMANCE
- ===========
-
- The table below is excerpted from Reference 2 showing per
- formancσá usinτ singlσ precisioε arithmetiπ fo≥ ß 100x10░á linea≥ ì
-
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730èsysteφá witΦá variou≤á equipmen⌠ anΣá softwarσá configurations« ì
- Entrie≤á fo≥ SBC-200¼á CO16¼á anΣ Heath/Zenith-10░ microcomputer≤ ì
- arσ interpolated.
-
- Thσá micr∩á compute≥ softwarσ whosσ performancσ i≤á reporteΣ ì
- herσá doe≤ no⌠ convenientl∙ suppor⌠ 100x10░ arrays«á Thσá 100x10░ ì
- result≤ arσ extrapolateΣ froφ nxε case≤ iε accorΣ witΦá Ref. ▒;
-
- 3 2
- Time(100) = (100/n) * SGEFA(n) + (100/n) * SGESL(n)
-
- wherσá εá wa≤ eithe≥ 7╡ o≥ 5░ dependinτ, generall∙, upoεá whethe≥ ì
- singlσ o≥ ful∞ precisioε wa≤ employed;
-
- --------------------------------------------------------------
- | |
- | Solving a System of Linear Equations |
- | with LINPACK in Half Precision |
- | |
- | computer os/compiler ratio mflops time |
- | secs |
- | |
- | Fujitsu vp-200 f77 (c dir) 0.62 20 .035 |
- | Fujitsu vp-200 f77 (r blas) 0.69 18 .039 |
- | Hitachi s-810/20 f77/hap (r blas) 0.78 16 .044 |
- | NAS 9060 w/vpf vs opt=2 (c blas) 1.5 8.4 .082 |
- | Fujitsu m-380 f77 opt=3 1.7 7.0 .098 |
- | Amdahl 5860 hsfpf h enh opt=3 2.2 5.5 .125 |
- | NAS 9060 vs opt=2 2.4 5.2 .133 |
- | Amdahl 5860 hsppf vs opt=3 2.4 5.1 .135 |
- | Amdahl 470 v/8 h enh opt=3 4.4 2.8 .246 |
- | Amdahl 470 v/8 vs opt=3 4.5 2.7 .254 |
- | IBM 3081 k h enh opt=3 5.1 2.4 .283 |
- | IBM 3081 k vs opt=3 5.6 2.2 .311 |
- | IBM 3033 vs fortran 6.3 1.9 .353 |
- | IBM 3081 d vs opt=3 6.7 1.8 .376 |
- | IBM 4381 vs opt=3 14 .86 .353 |
- | Harris 1000 vos 3.3 (c blas) 15 .83 .825 |
- | Harris 1000 vos 3.3 opt g 22 .57 1.21 |
- | Concept 32/8750 utx/32 23 .54 1.27 |
- | VAX 11/785 fpa vms (c blas) 23 .53 1.30 |
- | Univac 1100/81 ascii opt=zeo 24 .52 1.32 |
- | IBM 4361 vs opt=3 29 .42 1.65 |
- | DG mv/10000 f77 opt level 2 31 .39 1.75 |
- | VAX 11/785 fpa vms 36 .34 2.01 |
- | VAX 11/780 fpa vms (c blas) 37 .33 2.08 |
- | IBM 370/158 h opt=3 42 .29 2.35 |
- | ND-500 fortran-500-e 43 .27 2.58 |
- | DEC kl-20 f20 46 .27 2.59 |
- | IBM 370/158 vs opt=3 46 .26 2.60 |
- | Univac 1100/62 ascii opt=zeo 49 .25 2.77 |
- | ICL 2988 f77 opt=2 50 .25 2.79 |
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730è | Harris 800 f77 53 .23 2.99 |
- | VAX 11/750 fpa vms (c blas) 56 .22 3.14 |
- | IBM 4341 mg10 cs opt=3 57 .22 3.18 |
- | VAX 11/780 fpa bsd unix 4.2 f77 58 .21 3.25 |
- | VAX 11/780 fpa vms v4.2 58 .21 3.27 |
- | Honeywell 6080 y 62 .20 3.46 |
- | Concept 32/6750 utx/32 65 .19 3.63 |
- | DG mv/8000 f77 opt level 2 69 .18 3.84 |
- | VAX 11/780 vms 74 .17 4.13 |
- | VAX 11/750 fpa vms v3.4 88 .14 4.90 |
- | Ridge 32 f77 88 .14 4.90 |
- | Prime 750 primos f77 v19.1 89 .14 5.00 |
- | VAX 11/750 fpa bsd unix 4.2 f77 91 .13 5.12 |
- | Prime 850 primos 97 .13 5.41 |
- | HP 9000 series 500 fortran 1.7 125 .098 7.00 |
- | VAX 11/750 vms v3.4 138 .089 7.71 |
- | IBM 4331 h opt=3 140 .088 7.84 |
- | Apollo dn 460 ftn opt 145 .085 8.11 |
- | Pyramid w/o fpa f77 151 .081 8.47 |
- | Apollo peb 4.1 (c blas) 177 .069 9.92 |
- | VAX 11/750 bsd unix 4.1 f77 204 .060 11.4 |
- | VAX 11/730 fpa vms (c blas) 205 .060 11.5 |
- | VAX 11/725 fpa vms (c clas) 205 .060 11.5 |
- | Masscomp 500 w/fp unix f77 opt 227 .054 12.7 |
- | Burroughs 6700 h 234 .052 13.1 |
- | Prime 2250 f77 258 .048 14.5 |
- | VAX 11/730 fpa vms 259 .047 14.5 |
- | VAX 11/725 fpa vms 259 .047 14.5 |
- | CRDS 6835+sky svs f77 284 .043 15.9 |
- | IBM pc-xt/370 h opt=3 303 .040 17.0 |
- | DEC ka-10 f40 305 .040 17.1 |
- | Canaan vs 306 .040 17.1 |
- | SUN 2+sky unix f77 opt 314 .039 17.6 |
- | Apollo peb 4.1 334 .037 18.7 |
- | Compaq pc/8087 ms3.13(c blas) 591 .021 33.1 |
- | CRDS 6835 svs f77 770 .016 43.1 |
- * CO16 8mhz 8087 ms3.2 846 .0145 47.4 *
- | Cadtrak ds1/8087 intel f77 893 .013 50 |
- | SUN 2 unix f77 opt 966 .013 54.1 |
- | Masscomp 500 unix f77 opt 1015 .012 56.8 |
- | IBM pc/8087 ms3.1 1071 .011 60 |
- | HP 9000 series 200 hp-ux 1196 .010 67 |
- | SUN unix f77 no opt 1298 .0094 72.7 |
- | Tandy 2000 ms3.13 3763 .0033 211 |
- * CO16 8mhz 8086 tp3.01a 4911 .0025 275 *
- * SBC-200 4mhz z80 cp/m2.2/ms3.2 10946 .00112 613 *
- * SBC-200 4mhz z80 cp/m2.2/tp2.0 15301 .00080 857 *
- * CO16 8MHz 8086 tp2.00b 16250 .00075 910 *
- | IBM pc ms3.1 21875 .00056 1225 |
- * SBC-200 4mhz z80 cp/m2.2/eco-c3.1 36071 .00034 2020 *
- | APPLE iii pascal 50232 .00024 2813 |
- * SBC-200 4mhz z80 cp/m.2./cii1.06b 84677 .00014 4742 *
- |_____________________________________________________________|
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730èwhere;
-
- ratio - is to a Cray-1 S using FORTRAN coded BLAS. (Ref. 2)
- See file LINPACK.ADD for FULL precision listing.
- c blas & r blas - denote coded and rolled versions of BLAS.
- mflops - millions of floating point operations per second and
- there are (2/3 n**3 + 2 n**2) operations. Each
- operation comprises a multiplication, an addition,
- and the associated indexing and load and store func
- tions. For n = 100 there are 686667 such operations.
- time - one call to SGEFA and one call to SGESL.
-
- TURBO Pascal performs floating point arithmetic with about
- 11 decimal digits precision. This is more than single precision
- and less than double or extended precision. This difference is
- important for applicability of numerical results.
-
- Thσá routine≤ CODR.¬ arσ driver≤ fo≥ LINPACK'≤á SGECO«á Thσ ì
- tablσ belo≈ show≤ thσ ra≈ anΣ foldeΣ valuσ oµá RCOND¼á reciproca∞ ì
- condition¼ oµ thσ matri° fo≥ differen⌠ order≤ oµ thσ samσ Hilber⌠ ì
- tes⌠ matrix« Thσ foldeΣ valuσ i≤ tha⌠ computeΣ (1.0+RCOND)-1.0«
-
- =================================================================
- TURBO | MS Fortran
- -----------------------------------------------------------------
- RCond RCond | RCond RCond
- order folded | folded
- -----------------------------------------------------------------
- 3 | 1.46884015E-03 1.46884014E-03 | .14688422E-02 .14688969E-02
- 4 | 4.64608613E-05 4.64608602E-05 | .46460722E-04 .46491623E-04
- 5 | 1.44074255E-06 1.44074147E-06 | .14412633E-05 .14305115E-05
- 6 | 4.42113838E-08 4.42105375E-08 | .44548891E-07 .00000000E+01
- 7 | 1.34783286E-09 1.34605216E-09 |
- 8 | 4.08837011E-11 4.00177669E-11 |
- 9 | 1.21641596E-12 0.00000000E+00 |
- =================================================================
-
- The test matrix is singular to working precision at order 6
- fo≥ FORTRA╬ anΣ a⌠ orde≥ ╣ fo≥ TURBO«á Thi≤ difficult∙ i≤ furthe≥ ì
- illustrateΣ b∙ considerinτ RConΣ fo≥ Azteπ ├ I╔ anΣ Eco-C;
-
- =================================================================
- Aztec C II | Eco-C
- -----------------------------------------------------------------
- RCond RCond | RCond RCond
- order folded | folded
- -----------------------------------------------------------------
- 3 | 1.468846E-03 1.464844E-03 | 1.468843E-3 1.468897E-3
- 4 | 4.649139E-05 4.577637E-05 | 4.646098E-5 4.649162E-5
- 5 | 1.464302E-06 0.000000E+00 | 1.441097E-6 1.430511E-6
- 6 | | 4.475317E-8 0.000000E0
- =================================================================
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730è Fo≥á botΦá thesσ case≤ ß 3▓ bi⌠ singlσ precisioεá forma⌠á i≤ ì
- used«á However¼á Azteπ ├ I╔ use≤ ß basσ 25╢ exponen⌠ whilσá Eco-├ ì
- use≤á ßá basσ ▓ exponen⌠ anΣ hiddeε bi⌠á normalization«á Simila≥ ì
- difference≤ arisσ oε minΘ anΣ mainframσ computer≤ usinτ variousl∙ ì
- 32¼á 36¼á etc«á singlσ precisioε encoding≤ witΦ basσ 2¼á 16¼ etc« ì
- exponents«á Azteπ ├ I╔ simpl∙ dramatize≤ thσ probleφ becausσá i⌠ ì
- allow≤á aε encodinτ witΦ a≤ fe≈ a≤ 1╖ significan⌠ bit≤ anΣá fail≤ ì
- t∩ providσ meaningfu∞ result≤ fo≥ ß 5x╡ tes⌠ matrix.
-
-
- COMMENT
- =======
-
- It is fatuous to suggest that timing benchmarks have any
- meaning for application to this test matrix for orders beyond
- those at which working precision vanishes.
-
- Thσ runtimσ anΣ precisioε benchmark≤ presenteΣ herσ contras⌠ ì
- favorabl∙áwitΦ publisheΣ microcompute≥ benchmarks«á LINPACKá rouì
- tine≤ arσ iε dail∙ applicatioε arounΣ thσ worlΣ solvinτá meaningì
- fu∞ problems.
-
-
- NOTES
- =====
-
- 1. CODR, SLDR, and DIDR use a leading dimension that is speci
- fied in LINPACK.CON or LINPACK.H, as appropriate.
-
- 2. SGECO, SGEFA, SGESL, and SGEDI are just one part of
- LINPACK. Other routines are offfered by Reference 1 that
- deal with COMPLEX variables for symmetric and banded
- systems and other linear systems problems. Their
- translations are not presented here.
-
- 3. In FORTRAN it is a simple matter to edit the source files
- to declare DOUBLE rather than REAL to get extended preci
- sion results. There is no corresponding facility in PASCAL.
- C admits float and double.
-
- 4. FORTRAN has a COMPLEX data type. Adaptation of the routines
- to COMPLEX variables using Pascal or C requires more work.
-
- 5. ISAMAX, SASUM, SAXPY, SCOPY, SDOT, SSCAL, and SSWAP are
- those components of BLAS required to support the routines
- used here. Translations of the other BLAS routines -
- SNRM2, SROT, and SROTG are not presented here.
-
-
-
-
-
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730èREFERENCE
- =========
-
- 1. LINPACK Users' Guide
- J.J. Dongarra, C.B. Moler, J.R. Bunch, and G.W. Stewart
- Society for Industrial and Applied Mathematics
- 1979
-
- 2. Performance of Various Computers Using Standard Linear
- Equations Software in a Fortran Environment
- Technical Memorandum No. 23
- J.J. Dongarra
- Mathematics and Computer Science Division
- Argonne National Laboratory
- August 1984, Revision 2
-
- 3. Algorithm 589
- Collected Algorithms of the ACM
- ACM Transactions on Mathematical Software
- Volume 8, Number 4
- December, 1982
-
-
- NOTICES
- =======
-
- CO16 (tm) Hallock Systems Company
- SBC-200 (tm) SD Systems
- H/Z-100 (tm) Zenith Data Syatems
-
- CP/M (tm) Digital Research
- MS-DOS (tm) Microsoft
-
- TURBO Pascal (tm) Borland International
- Aztec C II (C) Manx Software Systems
- Eco-C (C) Ecosoft
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright (C) 1985 Adam Fritz, 133 Main Street, Afton, NY 13730è