home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!europa.asd.contel.com!darwin.sura.net!jvnc.net!netnews.upenn.edu!msuinfo!vthnw.cvm.msu.edu!eoq
- From: eoq@vthnw.cvm.msu.edu (Edward Quillen)
- Newsgroups: comp.benchmarks
- Subject: Fortran source of Livermore Loops benchmark
- Message-ID: <eoq.41.0@vthnw.cvm.msu.edu>
- Date: 6 Sep 92 01:43:52 GMT
- Sender: news@msuinfo.cl.msu.edu
- Organization: Veterinary Teaching Hospital, Michigan State Univ.
- Lines: 1636
-
- C PROGRAM MFLOPS(TAPE6=OUTPUT)
- C LATEST FILE MODIFICATION DATE: 22/DEC/86R
- C****************************************************************************
- C MEASURES CPU PERFORMANCE RANGE OF THE COMPUTER/COMPILER/COMPUTATION COMPLEX
- C****************************************************************************
- C *
- C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S *
- C *
- C These kernels measure Fortran numerical computation rates for a *
- C spectrum of CPU-limited computational structures. Mathematical *
- C through-put is measured in units of millions of floating-point *
- C operations executed per second, called Megaflops/sec. *
- C *
- C This program measures a realistic CPU performance range for the *
- C Fortran programming system on a given day. The CPU performance *
- C rates depend strongly on the maturity of the Fortran compiler's *
- C ability to translate Fortran code into efficient machine code. *
- C *
- C [ The CPU hardware capability apart from compiler maturity (or *
- C availability), could be measured (or simulated) by programming the *
- C kernels in assembly or machine code directly. These measurements *
- C can also serve as a framework for tracking the maturation of the *
- C Fortran compiler during system development.] *
- C *
- C While this test evaluates the performance of a broad sampling of *
- C Fortran computations, it is not an application program and hence *
- C it is not a benchmark per se. The performance of benchmarks and *
- C even workloads, if CPU limited, could be roughly estimated by *
- C choosing appropriate weights and loop limits for each kernel (see *
- C Block Data). The LFK methodology is discussed in subroutine REPORT. *
- C The glossary and module hierarchy are documented in subr. INDEX. *
- C *
- C Use of this program is granted with the request that a copy of the *
- C results be sent to the author at the address shown below, to be *
- C added to our studies of computer performance. Please send your *
- C complete MFLOPS output file on a 5" PC/DOS diskette, if possible. *
- C Your timing results will be held as proprietary data, if so marked. *
- C In return, you will recieve a copy of our latest report. *
- C *
- C *
- C F.H. McMahon L-35 *
- C Lawrence Livermore National Laboratory *
- C P.0. Box 808 *
- C Livermore, CA. 94550 *
- C *
- C *
- C (C) Copyright 1983 the Regents of the *
- C University of California. All Rights Reserved. *
- C *
- C This work was produced under the sponsorship of *
- C the U.S. Department of Energy. The Government *
- C retains certain rights therein. *
- C *
- C****************************************************************************
- C
- C
- C
- C
- C
- C
- C DIRECTIONS
- C
- C 1. We REQUIRE one test-run of the Fortran kernels as is, that is, with
- C no reprogramming. Standard product compiler directives may be
- C used for optimization as these do not constitute reprogramming.
- C
- C In addition, the vendor may, if so desired, reprogram the kernels to
- C demonstrate high performance hardware features. Kernels 13,14,23
- C are partially vectorisable and kernels 15,16,24 are vectorisable if
- C re-written. Kernels 5,11,17,19,20 are implicit computations that
- C must not be explicitly vectorised using compiler directives to
- C ignore dependencies. In any case, compiler listings of the codes
- C actually used should be returned along with the timing results.
- C
- C 2. For vector processors, we REQUIRE an ALL-scalar compilation test-run
- C to measure the basic scalar performance range of the processor.
- C
- C 3. On computers where default single precision is REAL*4 we REQUIRE an
- C additional test-run with all mantissas.ge.47 . Declare all REAL
- C variables REAL*8 using one of the following declarations in each routine:
- C
- cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z)
- cIBM IMPLICIT REAL*8 (A-H,O-Z)
- c ( Then there is a redundant declaration in subrs SIGNAL and SUMO.)
- C
- C 4. On computers with Cache memories and high resolution CPU clocks we
- C REQUIRE, if feasible, another ALL-scalar test-run setting Loop= 1
- C in SIZES to test un-primed cache (as well as encached) cpu rates.
- C Increase the size of array CACHE(in subr. VALUES) from 8192 to cache size.
- C
- C 5. Installation includes verifying or changing the following:
- C
- C First : the I/O output device number= IOU assignment in MAIN.
- C Second: the definition of function SECOND for CPU time only, and
- C the value of TIC:= minimum cpu clock time(sec) in SIZES.
- C Third : the definition of function MOD2N in KERNEL
- C Fourth: the system names Komput, Kontrl, and Kompil in REPORT
- C Fifth : after checkout set Nruns=7 in SIZES for Standard Benchmark Test
- C
- C 6. Each kernel's computation is check-summed for easy validation.
- C Verify correct processing using the checksums in subroutine REPORT
- C which were computed setting MULTI= 10 in BLOCK DATA.
- C Your checksums should compare to the precision used, within round-off.
- C
- C 7. Verify CPU Time measurements from function SECOND by comparing the clock
- C calibration printout of total CPU time with system or real-time measures.
- C The accuracy of SECOND is also tested using the test routine VERIFY.
- C
- C 8. On computers with Virtual Storage Systems assure a working-set space
- C larger than the entire program so that page faults are negligible,
- C because we must measure the CPU-limited computation rates.
- C IT IS ALSO NECESSARY to run this test stand-alone, i.e. NO timesharing.
- C In VS Systems a series of runs are needed to show stable CPU timings.
- C
- C 9. On parallel computer systems which compile vectors or Multi-tasking
- C at the Do-loop level (Micro-tasking) parallelisation of the first
- C DO (on L) in each kernel must be prevented by using a compiler directive
- C or by setting Loop= 1. This outermost DO Loop is merely repitition
- C used to increase timing accuracy and could distort the computation
- C sample if parallelisation is based on this artificial iteration level.
- C****************************************************************************
- C
- C
- C
- c
- C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24)
- C/ PARAMETER( nk= 47, nl= 3, nr= 8 )
- c
- COMMON /ALPHA/ mk,ik,ml,il,Nruns,jr, NPFS(8,3,47)
- DIMENSION FLOPS(141), TR(141), RATES(141), ID(141)
- DIMENSION LSPAN(141), WG(141), OSUM (141), TERR(141)
- c
- CLOX REAL*8 SECOND
- c
- t= SECOND(0.0)
- iou= 6
- OPEN (UNIT=6, FILE='output', STATUS='NEW')
- cLLNL call Q8EBM
- cLLNL call PFM( 0, iou)
- c
- c Record name in active linkage chain in COMMON /DEBUG/
- CALL TRACKS(' MAIN. ')
- c
- c Verify Sufficient Loop Size Versus Cpu Clock Accuracy
- CALL VERIFY( iou)
- c
- c Define control limits: Nruns(runs), Loop(time), tic,
- CALL SIZES(-1)
- c
- c
- c Run test Nruns times Cpu-limited; I/O is deferred:
- DO 1 k= 1,Nruns
- jr= k
- c Run test using one of 3 sets of DO-Loop spans:
- c Set iou Negative to supress all I/O during Cpu timing.
- DO 1 j= 1,ml
- il= j
- tock= TICK( -iou)
- c
- CALL KERNEL
- 1 continue
- c
- c
- c
- c Report timing errors, Mflops statistics:
- DO 2 j= 1,ml
- il= j
- CALL RESULT( iou,FLOPS,TR,RATES,LSPAN,WG,OSUM,TERR,ID)
- c
- CALL REPORT( iou, mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID)
- 2 continue
- c
- CALL REPORT( iou,3*mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID)
- c
- c
- t= SECOND(0.0) - t
- WRITE( iou,9) t
- 9 FORMAT( 1H1,//,26H Version: 22/DEC/86R ,/,
- . 26H CHECK CLOCK CALIBRATION: ,/,
- . 18H Total cpu Time = ,e14.5, 5H Sec. )
- STOP
- c
- c
- c
- c
- c
- c
- c
- c
- c
- c
- c Subroutine timing of all-scalar execution on CRAY-1:
- c
- c Subroutine Time(%)
- c
- c KERNEL 43.46%
- c SUPPLY 21.82%
- c VERIFY 13.12%
- c STATS 8.83%
- c SQRT 1.84%
- c SORDID 1.21%
- c VALUES .74%
- c SUMO .47%
- c SIGNAL .34%
- c IQRANF .26%
- c STATW .17%
- c
- END
- c***********************************************
- BLOCK DATA
- C***********************************************
- C
- cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z)
- cIBM IMPLICIT REAL*8 (A-H,O-Z)
- DOUBLE PRECISION SUMS
- C
- C l1 := param-dimension governs the size of most 1-d arrays
- C l2 := param-dimension governs the size of most 2-d arrays
- C
- C ISPAN := Array of limits for DO loop control in the kernels
- C IPASS := Array of limits for multiple pass execution of each kernel
- C FLOPN := Array of floating-point operation counts for one pass thru kernel
- C WT := Array of weights to average kernel execution rates.
- C SKALE := Array of scale factors for SIGNAL data generator.
- C BIAS := Array of scale factors for SIGNAL data generator.
- C
- C MUL := Array of multipliers * FLOPN for each pass
- C WTP := Array of multipliers * WT for each pass
- C FR := Array of vectorisation fractions in REPORT
- C SUMW := Array of quartile weights in REPORT
- C IQ := Array of workload weights in REPORT
- C SUMS := Array of Verified Checksums of Kernels results: Nruns= 1 and 7.
- C
- C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 )
- C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 )
- C/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 )
- C
- C/ PARAMETER( l1= 27, l2= 15, l1d= 2*1001 )
- C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 )
- C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15)
- C
- C
- C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 )
- C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 )
- C/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25)
- C
- C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24)
- C/ PARAMETER( m1= 1001-1, m2= 101-1, m7= 1001-6 )
- C
- COMMON /SPACES/ ion,j5,k2,k3,MULTI,Loop,m,kr,it,n13h,ibuf,
- 1 n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2
- C
- COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks,
- 1 FR(9), TERR1(47), SUMW(7), START,
- 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47),
- 3 IQ(7), NPF, NPFS1(47)
- C
- COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3)
- C
- COMMON /ORDER/ index, match, NSTACK(20)
- C
- COMMON /PROOF/ SUMS(24,3,2)
- C ****************************************************************
- C
- DATA ( ISPAN(i,1), i= 1,47) /
- : 1001, 101, 1001, 1001, 1001, 64, 995, 100,
- : 101, 101, 1001, 1000, 64, 1001, 101, 75,
- : 101, 100, 101, 1000, 101, 101, 100, 1001, 23*0/
- C
- C* : l1, l2, l1, l1, l1, l13, m7, m2,
- C* : l2, l2, l1, m1, l13, l1, l2, l16,
- C* : l2, m2, l2, m1, l21, l2, m2, l1, 23*0/
- C
- DATA ( ISPAN(i,2), i= 1,47) /
- : 101, 101, 101, 101, 101, 32, 101, 100,
- : 101, 101, 101, 100, 32, 101, 101, 40,
- : 101, 100, 101, 100, 50, 101, 100, 101, 23*0/
- C
- DATA ( ISPAN(i,3), i= 1,47) /
- : 27, 15, 27, 27, 27, 8, 21, 14,
- : 15, 15, 27, 26, 8, 27, 15, 15,
- : 15, 14, 15, 26, 20, 15, 14, 27, 23*0/
- C
- DATA ( IPASS(i,1), i= 1,47) /
- : 7, 67, 9, 14, 10, 3, 4, 10, 36, 34, 11, 12,
- : 36, 2, 1, 25, 35, 2, 39, 1, 1, 11, 8, 5, 23*0/
- C
- DATA ( IPASS(i,2), i= 1,47) /
- : 40, 40, 53, 70, 55, 7, 22, 6, 21, 19, 64, 68,
- : 41, 10, 1, 27, 20, 1, 23, 8, 1, 7, 5, 31, 23*0/
- C
- DATA ( IPASS(i,3), i= 1,47) /
- : 28, 46, 37, 38, 40, 21, 20, 9, 26, 25, 46, 48,
- : 31, 8, 1, 14, 26, 2, 28, 7, 1, 8, 7, 23, 23*0/
- C
- DATA ( MUL(i), i= 1,3) / 1, 2, 8 /
- DATA ( WTP(i), i= 1,3) / 1.0, 2.0, 1.0 /
- c
- c The following flop-counts (FLOPN) are required for scalar or serial
- c execution. The scalar version defines the NECESSARY computation
- c generally, in the absence of proof to the contrary. The vector
- c or parallel executions are only credited with executing the same
- c necessary computation. If the parallel methods do more computation
- c than is necessary then the extra flops are not counted as through-put.
- c
- DATA ( FLOPN(i), i= 1,47)
- : /5., 4., 2., 2., 2., 2., 16., 36., 17., 9., 1., 1.,
- : 7., 11., 33., 7., 9., 44., 6., 26., 2., 17., 11., 1., 23*0.0/
- C
- DATA ( WT(i), i= 1,47) /
- : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
- : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
- : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 23*0.0/
- C
- C/ : .08, .04, .02, .03, .03, .04, .10, .05, .04, .03, HLN
- C/ : .01, .02, .03, .02, .03, .05, .03, .20, .02, .02, HLN
- C/ : .03, .03, .04, .01, 23*0.0/ HLN
- C
- DATA ( SKALE(i), i= 1,47) /
- & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0,
- & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0,
- & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0,
- & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0,
- & 23*0.000D0 /
- C
- c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
- c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
- c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 23*0.0/
- C
- DATA ( BIAS(i), i= 1,47) /
- : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
- : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
- : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 23*0.0/
- C
- DATA ( FR(i), i= 1,9) /
- : 0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0/
- C
- DATA ( SUMW(i), i= 1,7) /
- : 1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5/
- C
- DATA ( IQ(i), i= 1,7) /
- : 1, 2, 1, 2, 1, 2, 1/
- C
- DATA START /0.0/, NPF/0/, ibuf/0/, match/0/, MULTI/10/
- C
- DATA ( SUMS(i,1,1), i= 1,24 ) /
- &.5114652693224705D+05,.5150345372943066D+03,.1000742883066623D+02,
- &.5999250595474070D+00,.4548871642388544D+04,.5229095383954675D+13,
- &.6104251075163778D+05,.1501268005627157D+06,.1189443609975085D+06,
- &.7310369784325972D+05,.3342910972650531D+08,.2907141428639174D-04,
- &.4057110454105263D+10,.2982036205992255D+10,.3943816690352311D+05,
- &.2832600000000000D+05,.1114641772903091D+04,.5165625410757306D+05,
- &.5421816960150398D+03,.3040644339317275D+08,.8289464835786202D+07,
- &.2938604376567099D+03,.3549834542446150D+05,.5000000000000000D+03/
- c
- DATA ( SUMS(i,2,1), i= 1,24 ) /
- &.5253344778938000D+03,.5150345372943066D+03,.1009741436579188D+01,
- &.5999250595474070D+00,.4589031939602131D+02,.2693280957416549D+16,
- &.6345586315772524D+03,.1501268005627157D+06,.1189443609975085D+06,
- &.7310369784325972D+05,.3433560407476162D+05,.7127569144561925D-05,
- &.2325318944820836D+10,.3045676741897511D+08,.3943816690352311D+05,
- &.3244100000000000D+05,.1114641772903091D+04,.5165625410757306D+05,
- &.5421816960150398D+03,.3126205178811007D+05,.3986531136462291D+07,
- &.2938604376567099D+03,.3549894609776936D+05,.5000000000000000D+02/
- c
- DATA ( SUMS(i,3,1), i= 1,24 ) /
- &.3855104502494983D+02,.1199847611437483D+02,.2699309089321296D+00,
- &.5999250595474070D+00,.3182615248448271D+01,.8303480073326955D+12,
- &.2845720217638848D+02,.2960543667877649D+04,.2623968460874419D+04,
- &.1651291227698377D+04,.6551161335846537D+03,.1943435981776804D-05,
- &.4755211251524563D+09,.2547733008933910D+07,.1108997288135066D+04,
- &.2577600000000000D+05,.2947368618590713D+02,.9700646212341513D+03,
- &.1268230698051747D+02,.5987713249471801D+03,.2516870081042209D+07,
- &.6109968728264795D+01,.4850340602751675D+03,.1300000000000000D+02/
- c
- DATA ( SUMS(i,1,2), i= 1,7 ) /
- &.2982036205992255D+10,.6118901630090488D+10,.9103526877478772D+10,
- &.1215176334476067D+11,.1519764492169999D+11,.1820312504465359D+11,
- &.2116750694993432D+11/
- c
- DATA ( SUMS(i,2,2), i= 1,7 ) /
- &.3045676741897511D+08,.5718526521576222D+08,.8885029941358330D+08,
- &.1174925822726987D+09,.1501582054695641D+09,.1819691693283694D+09,
- &.2130649341195080D+09/
- c
- DATA ( SUMS(i,3,2), i= 1,7 ) /
- &.2547733008933910D+07,.5131714230651644D+07,.7946120246231201D+07,
- &.1008019578807808D+08,.1269997234773526D+08,.1504905863862026D+08,
- &.1721399839433381D+08/
- END
- C
- C***********************************************
- SUBROUTINE INDEX
- C***********************************************
- C
- C MODULE PURPOSE
- C ------ -----------------------------------------------
- C IQRANF computes a vector of pseudo-random indices
- C
- C KERNEL executes 24 samples of Fortran computation
- C
- C PFM optional call to system hardware performance monitor
- C
- C REPORT prints timing results
- C
- C RESULT computes execution rates into pushdown store
- C
- C SECOND cumulative CPU time for task in seconds (MKS units)
- C
- C SENSIT sensitivity analysis of harmonic mean to 49 workloads
- C
- C SIGNAL generates a set of floating-point numbers near 1.0
- C
- C SIMD sensitivity analysis of harmonic mean to SISD/SIMD model
- C
- C SIZES test and set the loop controls before each kernel test
- C
- C SORDID simple sort
- C
- C SPACE sets memory pointers for array variables. optional.
- C
- C STATS calculates unweighted statistics
- C
- C STATW calculates weighted statistics
- C
- C SUMO check-sum with ordinal dependency
- C
- C SUPPLY initializes common blocks containing type real arrays.
- C
- C TALLY computes average and minimum Cpu timings and variances.
- C
- C TDIGIT counts lead digits followed by trailing zeroes
- C
- C TEST times, tests, and initializes each kernel test
- C
- C TICK measures timing overhead of subroutine test
- C
- C TILE computes m-tile value and corresponding index
- C
- C TRACKS,TRACKX push/pop caller's name and serial nr. in /DEBUG/
- C
- C TRAP checks that index-list values are in valid domain
- C
- C VALID compresses valid timing results
- C
- C VALUES initializes special values
- C
- C VERIFY verifies sufficient Loop size versus cpu clock accuracy
- C ------ -----------------------------------------------
- c
- c ------------ -------- -------- -------- -------- -------- --------
- c ENTRY LEVELS: 1 2 3 4 5 6
- c ------------ -------- -------- -------- -------- -------- --------
- c MAIN. SECOND
- c VERIFY SECOND
- c SIZES
- c STATS SQRT
- c TDIGIT LOG10
- c SIZES
- c
- c TICK TEST SECOND
- c SIZES
- c SUMO
- c VALUES SUPPLY SIGNAL
- c IQRANF MOD
- c VALID TRAP TRAP
- c STATS SQRT
- c IQRANF MOD
- c TRAP
- c KERNEL SPACE
- c SQRT
- c EXP
- c TEST SECOND
- c SIZES
- c SUMO
- c VALUES SUPPLY SIGNAL
- c IQRANF MOD
- c RESULT TALLY SIZES TRAP
- c PAGE
- c STATS SQRT
- c LOG10
- c
- c REPORT VALID TRAP
- c MOD
- c STATW SORDID TRAP
- c TILE
- c SQRT
- c LOG10
- c PAGE
- c TRAP
- c SENSIT VALID TRAP
- c SORDID TRAP
- c PAGE
- c STATW SORDID TRAP
- c TILE
- c SIMD VALID TRAP
- c STATW SORDID TRAP
- c TILE
- C STOP
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- c ------ ---- ------ ----- ------------------------------------
- c BASE TYPE CLASS NAME GLOSSARY
- c ------ ---- ------ ----- ------------------------------------
- c SPACE0 R Array BIAS - scale factors for SIGNAL data generator
- c SPACE0 R Array CSUM - checksums of KERNEL result arrays
- c BETA R Array CSUMS - sets of CSUM for all test runs
- c BETA R Array DOS - sets of TOTAL flops for all test runs
- c SPACE0 R Array FLOPN - flop counts for one execution pass
- c BETA R Array FOPN - sets of FLOPN for all test runs
- c SPACE0 R Array FR - vectorisation fractions; abscissa for REPORT
- c SPACES I scalar ibuf - flag enables one call to SIGNAL
- c ALPHA I scalar ik - current number of executing kernel
- c ALPHA I scalar il - selects one of three sets of loop spans
- c SPACES I scalar ion - logical I/O unit number for output
- c SPACEI I Array IPASS - Loop control limits for multiple-pass loops
- c SPACE0 I Array IQ - set of workload weights for REPORT
- c SPACEI I Array ISPAN - loop control limits for each kernel
- c SPACES I scalar it - flags timing call to TEST from TICK
- c SPACES I scalar j5 - datum in kernel 16
- c ALPHA I scalar jr - current test run number (1 thru 7)
- c SPACES I scalar k2 - counter in kernel 16
- c SPACES I scalar k3 - counter in kernel 16
- c SPACES I scalar kr - a copy of mk
- c SPACES I scalar Loop - current multiple-pass loop limit in KERNEL
- c SPACES I scalar MULTI - Multiplier used to compute Loop in SIZES
- c SPACES I scalar m - temp integer datum
- c ALPHA I scalar mk - number of kernels to evaluate .LE.24
- c ALPHA I scalar ml - maximum value of il= 3
- c SPACEI I Array MUL - multipliers * IPASS defines Loop
- c SPACES I scalar n - current DO loop limit in KERNEL
- c SPACES I scalar n1 - dimension of most 1-D arrays
- c SPACES I scalar n13 - dimension used in kernel 13
- c SPACES I scalar n13h - dimension used in kernel 13
- c SPACES I scalar n14 - dimension used in kernel 14
- c SPACES I scalar n16 - dimension used in kernel 16
- c SPACES I scalar n2 - dimension of most 2-D arrays
- c SPACES I scalar n21 - dimension used in kernel 21
- c SPACES I scalar n213 - dimension used in kernel 21
- c SPACES I scalar n416 - dimension used in kernel 16
- c SPACES I scalar n813 - dimension used in kernel 13
- c SPACE0 I scalar npf - temp integer datum
- c ALPHA I Array NPFS - sets of NPFS1 for all test runs
- c SPACE0 I Array NPFS1 - number of page-faults for each kernel
- c ALPHA I scalar Nruns - number of complete test runs
- c SPACES I scalar nt1 - total size of common -SPACE1- words
- c SPACES I scalar nt2 - total size of common -SPACE2- words
- c BETA R Array SEE - (i,1,jr,il) sets of TEST overhead times
- c BETA R Array SEE - (i,2,jr,il) sets of csums of SPACE1
- c BETA R Array SEE - (i,3,jr,il) sets of csums of SPACE2
- c SPACE0 R Array SKALE - scale factors for SIGNAL data generator
- c SPACE0 R scalar start - temp start time of each kernel
- c PROOF R Array SUMS - sets of verified checksums for all test runs
- c SPACE0 R Array SUMW - set of quartile weights for REPORT
- c SPACE0 R Array TERR1 - overhead-time errors for each kernel
- c BETA R Array TERRS - sets of TERR1 for all runs
- c BETA R scalar tic - minimum cpu clock time= resolution
- c SPACE0 R scalar ticks - average overhead time in TEST linkage
- c SPACE0 R Array TIME - net execution times for all kernels
- c BETA R Array TIMES - sets of TIME for all test runs
- c SPACE0 R Array TOTAL - total flops computed by each kernel
- c SPACE0 R Array WS - unused
- c SPACE0 R Array WT - weights for each kernel sample
- c SPACEI R Array WTP - weights for the 3 span-varying passes
- c SPACE0 R Array WW - unused
- C
- C
- c --------- -----------------------------------------------------------------
- c COMMON Usage
- c --------- -----------------------------------------------------------------
- C
- C /ALPHA /
- C VERIFY TICK TALLY SIZES RESULT REPORT KERNEL
- C MAIN.
- C /BASE1 /
- C SUPPLY
- C /BASE2 /
- C SUPPLY
- C /BASER /
- C SUPPLY
- C /BETA /
- C TICK TALLY SIZES RESULT REPORT KERNEL
- C /DEBUG /
- C TRACKS TRACKX TRAP
- C /ORDER /
- C TRACKS TRACKX TRAP
- C /PROOF /
- C RESULT BLOCKDATA
- C /SPACE0/
- C VALUES TICK TEST TALLY SUPPLY SIZES RESULT
- C REPORT KERNEL BLOCKDATA
- C /SPACE1/
- C VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL
- C /SPACE2/
- C VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL
- C /SPACE3/
- C VALUES
- C /SPACEI/
- C VERIFY VALUES TICK TEST SIZES RESULT REPORT
- C KERNEL BLOCKDATA
- C /SPACER/
- C VALUES TICK TEST SUPPLY SIZES KERNEL
- C /SPACES/
- C VERIFY VALUES TICK TEST SUPPLY SIZES KERNEL
- C BLOCKDATA
- c --------- -----------------------------------------------------------------
- RETURN
- END
- C
- C
- C***************************************
- SUBROUTINE IQRANF( M, Mmin,Mmax, n)
- C***********************************************************************
- C *
- c IQRANF - computes a vector of psuedo-random indices *
- c in the domain (Mmin,Mmax) *
- C *
- C M - result array , psuedo-random positive integers *
- C Mmin - input integer, lower bound for random integers *
- C Mmax - input integer, upper bound for random integers *
- C n - input integer, number of results in M. *
- C *
- C M(i)= Mmin + INT( (Mmax-Mmin) * RANF(0)) *
- C *
- c CALL IQRANF( IX, 1,1001, 30) should produce in IX: *
- c 3 674 435 415 389 54 44 790 900 282 *
- c 177 971 728 851 687 604 815 971 155 112 *
- c 877 814 779 192 619 894 544 404 496 505 ... *
- C *
- C***********************************************************************
- C
- cANSI IMPLICIT DOUBLE PRECISION (A-H,K,O-Z)
- cIBM IMPLICIT REAL*8 (A-H,K,O-Z)
- DOUBLE PRECISION dq, dp, per, dk, spin, span
- C
- dimension M(n)
- save k
- CALL TRACKS('IQRANF ')
- IF( n.LE.0 ) GO TO 73
- inset= Mmin
- span= Mmax - Mmin
- c spin= 16807.00D0
- c per= 2147483647.00D0
- spin= 16807
- per= 2147483647
- realn= n
- scale= 1.0000100D0
- q= scale*(span/realn)
- C
- dk= k
- DO 1 i= 1,n
- dp= dk*spin
- c dk= DMOD( dp, per)
- dk= dp -INT( dp/per)*per
- dq= dk*span
- M(i)= inset + ( dq/ per)
- IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q
- 1 continue
- k= dk
- C
- C
- ciC double precision k, ip, iq, id
- ci inset= Mmin
- ci ispan= Mmax - Mmin
- ci ispin= 16807
- ci id= 2147483647
- ci q= (REAL(ispan)/REAL(n))*1.00001
- ciC
- ci DO 2 i= 1,n
- ci ip= k*ispin
- ci k= MOD( ip, id)
- ci iq= k*ispan
- ci M(i)= inset + ( iq/ id)
- ci IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q
- ci 2 continue
- C
- CALL TRAP( M, 8H IQRANF , 1, Mmax, n)
- C
- 73 CONTINUE
- CALL TRACKX
- RETURN
- DATA k /256/
- END
- C***********************************************
- SUBROUTINE KERNEL
- C***********************************************************************
- C *
- C KERNEL executes 24 samples of Fortran computation *
- C *
- C***********************************************************************
- C *
- C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S *
- C *
- C These kernels measure Fortran numerical computation *
- C rates for a spectrum of cpu-limited computational *
- C structures or benchmarks. Mathematical through-put *
- C is measured in units of millions of floating-point *
- C operations executed per second, called Megaflops/sec. *
- C *
- C Fonzi's Law: There is not now and there never will be a language *
- C in which it is the least bit difficult to write *
- C bad programs. *
- C F.H.MCMAHON 1972 *
- C***********************************************************************
- C
- C l1 := param-dimension governs the size of most 1-d arrays
- C l2 := param-dimension governs the size of most 2-d arrays
- C
- C Loop := multiple pass control to execute kernel long enough to time.
- C n := DO loop control for each kernel. Controls are set in subr. SIZES
- C
- C ******************************************************************
- C
- cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z)
- cIBM IMPLICIT REAL*8 (A-H,O-Z)
- C
- C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 )
- C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 )
- C/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 )
- C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24)
- C
- C
- C/ PARAMETER( nk= 47, nl= 3, nr= 8 )
- C
- COMMON /ALPHA/ mk,ik,ml,il,Nruns,jr, NPFS(8,3,47)
- COMMON /BETA / tic, TIMES(8,3,47), SEE(5,3,8,3),
- 1 TERRS(8,3,47), CSUMS(8,3,47),
- 2 FOPN(8,3,47), DOS(8,3,47)
- C
- COMMON /SPACES/ ion,j5,k2,k3,MULTI,Loop,m,kr,it,n13h,ibuf,
- 1 n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2
- C
- COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33,
- 2 AR,BR,C0,CR,DI,DK,
- 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX,
- 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM
- C
- COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks,
- 1 FR(9), TERR1(47), SUMW(7), START,
- 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47),
- 3 IQ(7), NPF, NPFS1(47)
- C
- COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3)
- C
- C/ INTEGER E,F,ZONE
- C/ COMMON /ISPACE/ E(l213), F(l213),
- C/ 1 IX(l1), IR(l1), ZONE(l416)
- C/C
- C/ COMMON /SPACE1/ U(l1), V(l1), W(l1),
- C/ 1 X(l1), Y(l1), Z(l1), G(l1),
- C/ 2 DU1(l2), DU2(l2), DU3(l2), GRD(l1), DEX(l1),
- C/ 3 XI(l1), EX(l1), EX1(l1), DEX1(l1),
- C/ 4 VX(l14), XX(l14), RX(l14), RH(l14),
- C/ 5 VSP(l2), VSTP(l2), VXNE(l2), VXND(l2),
- C/ 6 VE3(l2), VLR(l2), VLIN(l2), B5(l2),
- C/ 7 PLAN(l416), D(l416), SA(l2), SB(l2)
- C/C
- C/ COMMON /SPACE2/ P(4,l813), PX(l21,l2), CX(l21,l2),
- C/ 1 VY(l2,l21), VH(l2,7), VF(l2,7), VG(l2,7), VS(l2,7),
- C/ 2 ZA(l2,7) , ZP(l2,7), ZQ(l2,7), ZR(l2,7), ZM(l2,7),
- C/ 3 ZB(l2,7) , ZU(l2,7), ZV(l2,7), ZZ(l2,7),
- C/ 4 B(l13,l13), C(l13,l13), H(l13,l13),
- C/ 5 U1(5,l2,2), U2(5,l2,2), U3(5,l2,2)
- C
- C ******************************************************************
- C
- C
- C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 )
- C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 )
- C/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25)
- C
- C
- care
- C
- INTEGER E,F,ZONE
- COMMON /ISPACE/ E(96), F(96),
- 1 IX(1001), IR(1001), ZONE(300)
- C
- COMMON /SPACE1/ U(1001), V(1001), W(1001),
- 1 X(1001), Y(1001), Z(1001), G(1001),
- 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001),
- 3 XI(1001), EX(1001), EX1(1001), DEX1(1001),
- 4 VX(1001), XX(1001), RX(1001), RH(2048),
- 5 VSP(101), VSTP(101), VXNE(101), VXND(101),
- 6 VE3(101), VLR(101), VLIN(101), B5(101),
- 7 PLAN(300), D(300), SA(101), SB(101)
- C
- COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101),
- 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7),
- 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7),
- 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7),
- 4 B(64,64), C(64,64), H(64,64),
- 5 U1(5,101,2), U2(5,101,2), U3(5,101,2)
- C
- C ******************************************************************
- C
- DIMENSION XZ(2001), ZX(2001)
- EQUIVALENCE ( XZ(1), X(1)), ( ZX(1), Z(1))
- C
- C
- C// DIMENSION E(96), F(96), U(1001), V(1001), W(1001),
- C// 1 X(1001), Y(1001), Z(1001), G(1001),
- C// 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001),
- C// 3 IX(1001), XI(1001), EX(1001), EX1(1001), DEX1(1001),
- C// 4 VX(1001), XX(1001), IR(1001), RX(1001), RH(2048),
- C// 5 VSP(101), VSTP(101), VXNE(101), VXND(101),
- C// 6 VE3(101), VLR(101), VLIN(101), B5(101),
- C// 7 PLAN(300), ZONE(300), D(300), SA(101), SB(101)
- C//C
- C// DIMENSION P(4,512), PX(25,101), CX(25,101),
- C// 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7),
- C// 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7),
- C// 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7),
- C// 4 B(64,64), C(64,64), H(64,64),
- C// 5 U1(5,101,2), U2(5,101,2), U3(5,101,2)
- C//C
- C//C ******************************************************************
- C//C
- C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD,
- C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP,
- C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB,
- C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU,
- C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3
- C//C
- C// POINTER (ME,E), (MF,F), (MU,U), (MV,V), (MW,W),
- C// 1 (MX,X), (MY,Y), (MZ,Z), (MG,G),
- C// 2 (MDU1,DU1),(MDU2,DU2),(MDU3,DU3),(MGRD,GRD),(MDEX,DEX),
- C// 3 (MIX,IX), (MXI,XI), (MEX,EX), (MEX1,EX1), (MDEX1,DEX1),
- C// 4 (MVX,VX), (MXX,XX), (MIR,IR), (MRX,RX), (MRH,RH),
- C// 5 (MVSP,VSP), (MVSTP,VSTP), (MVXNE,VXNE), (MVXND,VXND),
- C// 6 (MVE3,VE3), (MVLR,VLR), (MVLIN,VLIN), (MB5,B5),
- C// 7 (MPLAN,PLAN), (MZONE,ZONE), (MD,D), (MSA,SA), (MSB,SB)
- C//C
- C// POINTER (MP,P), (MPX,PX), (MCX,CX),
- C// 1 (MVY,VY), (MVH,VH), (MVF,VF), (MVG,VG), (MVS,VS),
- C// 2 (MZA,ZA), (MZP,ZP), (MZQ,ZQ), (MZR,ZR), (MZM,ZM),
- C// 3 (MZB,ZB), (MZU,ZU), (MZV,ZV), (MZZ,ZZ),
- C// 4 (MB,B), (MC,C), (MH,H),
- C// 5 (MU1,U1), (MU2,U2), (MU3,U3)
- C.. COMMON DUMMY(2000)
- C.. LOC(X) =.LOC.X
- C.. IQ8QDSP = 64*LOC(DUMMY)
- C
- C ******************************************************************
- C
- C STANDARD PRODUCT COMPILER DIRECTIVES MAY BE USED FOR OPTIMIZATION
- C
- CDIR$ VECTOR
- CLLL. OPTIMIZE LEVEL i
- CLLL. OPTION INTEGER (7)
- CLLL. OPTION ASSERT (NO HAZARD)
- CLLL. OPTION NODYNEQV
- C
- C ******************************************************************
- C BINARY MACHINES MAY USE THE AND(P,Q) FUNCTION IF AVAILABLE
- C IN PLACE OF THE FOLLOWING CONGRUENCE FUNCTION (SEE KERNEL 13, 14)
- C
- c IAND(j,k) = AND(j,k)
- CLLL. IAND(j,k) = j.INT.k
- c MOD2N(i,j)= MOD(i,j)
- MOD2N(i,j)= IAND(i,j-1)
- C i is Congruent to MOD2N(i,j) mod(j)
- C ******************************************************************
- C
- C
- C
- C
- C
- C
- CALL TRACKS('KERNEL ')
- C
- CALL SPACE
- C
- cLLNL call PFM( 0, ion)
- CALL TEST(0)
- C
- C*******************************************************************************
- C*** KERNEL 1 HYDRO FRAGMENT
- C*******************************************************************************
- C
- DO 1 L = 1,Loop
- DO 1 k = 1,n
- 1 X(k)= Q + Y(k)*(R*ZX(k+10) + T*ZX(k+11))
- C
- C...................
- CALL TEST(1)
- C
- C*******************************************************************************
- C*** KERNEL 2 ICCG EXCERPT (INCOMPLETE CHOLESKY - CONJUGATE GRADIENT)
- C*******************************************************************************
- C
- DO 200 L= 1,Loop
- II= n
- IPNTP= 0
- 222 IPNT= IPNTP
- IPNTP= IPNTP+II
- II= II/2
- i= IPNTP
- CDIR$ IVDEP
- C
- DO 2 k= IPNT+2,IPNTP,2
- i= i+1
- 2 X(i)= X(k) - V(k)*X(k-1) - V(k+1)*X(k+1)
- IF( II.GT.1) GO TO 222
- 200 CONTINUE
- C
- C...................
- CALL TEST(2)
- C
- C*******************************************************************************
- C*** KERNEL 3 INNER PRODUCT
- C*******************************************************************************
- C
- DO 3 L= 1,Loop
- Q= 0.000D0
- DO 3 k= 1,n
- 3 Q= Q + Z(k)*X(k)
- C
- C...................
- CALL TEST(3)
- C
- C
- C
- C*******************************************************************************
- C*** KERNEL 4 BANDED LINEAR EQUATIONS
- C*******************************************************************************
- C
- m= (1001-7)/2
- DO 444 L= 1,Loop
- DO 444 k= 7,1001,m
- lw= k-6
- temp= X(k-1)
- CDIR$ IVDEP
- DO 4 j= 5,n,5
- temp = temp - XZ(lw)*Y(j)
- 4 lw= lw+1
- X(k-1)= Y(5)*temp
- 444 CONTINUE
- C
- C...................
- CALL TEST(4)
- C
- C*******************************************************************************
- C*** KERNEL 5 TRI-DIAGONAL ELIMINATION, BELOW DIAGONAL (NO VECTORS)
- C*******************************************************************************
- C
- DO 5 L = 1,Loop
- CDIR$ NOVECTOR
- DO 5 i = 2,n
- 5 X(i)= Z(i)*(Y(i) - X(i-1))
- CDIR$ VECTOR
- C
- C...................
- CALL TEST(5)
- C
- C*******************************************************************************
- C*** KERNEL 6 GENERAL LINEAR RECURRENCE EQUATIONS
- C*******************************************************************************
- C
- DO 6 L= 1,Loop
- DO 6 i= 2,n
- C W(i)= 0.0100D0 use only if overflow occurs
- DO 6 k= 1,i-1
- W(i)= W(i) + B(i,k) * W(i-k)
- 6 CONTINUE
- C
- C...................
- CALL TEST(6)
- C
- C*******************************************************************************
- C*** KERNEL 7 EQUATION OF STATE FRAGMENT
- C*******************************************************************************
- C
- DO 7 L= 1,Loop
- DO 7 k= 1,n
- X(k)= U(k ) + R*( Z(k ) + R*Y(k )) +
- . T*( U(k+3) + R*( U(k+2) + R*U(k+1)) +
- . T*( U(k+6) + R*( U(k+5) + R*U(k+4))))
- 7 CONTINUE
- C
- C...................
- CALL TEST(7)
- C
- C
- C*******************************************************************************
- C*** KERNEL 8 A.D.I. INTEGRATION
- C*******************************************************************************
- C
- DO 8 L = 1,Loop
- nl1 = 1
- nl2 = 2
- fw= 2.000D0
- DO 8 kx = 2,3
- CDIR$ IVDEP
- DO 8 ky = 2,n
- DU1(ky)=U1(kx,ky+1,nl1) - U1(kx,ky-1,nl1)
- DU2(ky)=U2(kx,ky+1,nl1) - U2(kx,ky-1,nl1)
- DU3(ky)=U3(kx,ky+1,nl1) - U3(kx,ky-1,nl1)
- U1(kx,ky,nl2)=U1(kx,ky,nl1) +A11*DU1(ky) +A12*DU2(ky) +A13*DU3(ky)
- . + SIG*(U1(kx+1,ky,nl1) -fw*U1(kx,ky,nl1) +U1(kx-1,ky,nl1))
- U2(kx,ky,nl2)=U2(kx,ky,nl1) +A21*DU1(ky) +A22*DU2(ky) +A23*DU3(ky)
- . + SIG*(U2(kx+1,ky,nl1) -fw*U2(kx,ky,nl1) +U2(kx-1,ky,nl1))
- U3(kx,ky,nl2)=U3(kx,ky,nl1) +A31*DU1(ky) +A32*DU2(ky) +A33*DU3(ky)
- . + SIG*(U3(kx+1,ky,nl1) -fw*U3(kx,ky,nl1) +U3(kx-1,ky,nl1))
- 8 CONTINUE
- C
- C...................
- CALL TEST(8)
- C
- C*******************************************************************************
- C*** KERNEL 9 INTEGRATE PREDICTORS
- C*******************************************************************************
- C
- DO 9 L = 1,Loop
- DO 9 i = 1,n
- PX( 1,i)= DM28*PX(13,i) + DM27*PX(12,i) + DM26*PX(11,i) +
- . DM25*PX(10,i) + DM24*PX( 9,i) + DM23*PX( 8,i) +
- . DM22*PX( 7,i) + C0*(PX( 5,i) + PX( 6,i))+ PX( 3,i)
- 9 CONTINUE
- C
- C...................
- CALL TEST(9)
- C
- C*******************************************************************************
- C*** KERNEL 10 DIFFERENCE PREDICTORS
- C*******************************************************************************
- C
- DO 10 L= 1,Loop
- DO 10 i= 1,n
- AR = CX(5,i)
- BR = AR - PX(5,i)
- PX(5,i) = AR
- CR = BR - PX(6,i)
- PX(6,i) = BR
- AR = CR - PX(7,i)
- PX(7,i) = CR
- BR = AR - PX(8,i)
- PX(8,i) = AR
- CR = BR - PX(9,i)
- PX(9,i) = BR
- AR = CR - PX(10,i)
- PX(10,i)= CR
- BR = AR - PX(11,i)
- PX(11,i)= AR
- CR = BR - PX(12,i)
- PX(12,i)= BR
- PX(14,i)= CR - PX(13,i)
- PX(13,i)= CR
- 10 CONTINUE
- C
- C...................
- CALL TEST(10)
- C
- C*******************************************************************************
- C*** KERNEL 11 FIRST SUM. PARTIAL SUMS. (NO VECTORS)
- C*******************************************************************************
- C
- fw= 1.000D-25
- DO 11 L = 1,Loop
- C Y(1)= Y(1) + L*fw use only if optimization eliminates L-loop.
- X(1)= Y(1)
- CDIR$ NOVECTOR
- DO 11 k = 2,n
- 11 X(k)= X(k-1) + Y(k)
- CDIR$ VECTOR
- C
- C...................
- CALL TEST(11)
- C
- C*******************************************************************************
- C*** KERNEL 12 FIRST DIFF.
- C*******************************************************************************
- C
- fw= 1.000D-25
- DO 12 L = 1,Loop
- C Y(1)= Y(1) + L*fw use only if optimization eliminates L-loop.
- DO 12 k = 1,n
- 12 X(k)= Y(k+1) - Y(k)
- C
- C...................
- CALL TEST(12)
- C
- C*******************************************************************************
- C*** KERNEL 13 2-D PIC Particle In Cell
- C*******************************************************************************
- C
- fw= 1.000D0
- DO 13 L= 1,Loop
- DO 13 ip= 1,n
- i1= P(1,ip)
- j1= P(2,ip)
- i1= 1 + MOD2N(i1,64)
- j1= 1 + MOD2N(j1,64)
- P(3,ip)= P(3,ip) + B(i1,j1)
- P(4,ip)= P(4,ip) + C(i1,j1)
- P(1,ip)= P(1,ip) + P(3,ip)
- P(2,ip)= P(2,ip) + P(4,ip)
- i2= P(1,ip)
- j2= P(2,ip)
- i2= MOD2N(i2,64)
- j2= MOD2N(j2,64)
- P(1,ip)= P(1,ip) + Y(i2+32)
- P(2,ip)= P(2,ip) + Z(j2+32)
- i2= i2 + E(i2+32)
- j2= j2 + F(j2+32)
- H(i2,j2)= H(i2,j2) + fw
- 13 CONTINUE
- C
- C...................
- CALL TEST(13)
- C
- C*******************************************************************************
- C*** KERNEL 14 1-D PIC Particle In Cell
- C*******************************************************************************
- C
- C
- fw= 1.000D0
- DO 14 L= 1,Loop
- DO 141 k= 1,n
- VX(k)= 0.0
- XX(k)= 0.0
- IX(k)= INT( GRD(k))
- XI(k)= REAL( IX(k))
- EX1(k)= EX ( IX(k))
- DEX1(k)= DEX ( IX(k))
- 141 CONTINUE
- C
- DO 142 k= 1,n
- VX(k)= VX(k) + EX1(k) + (XX(k) - XI(k))*DEX1(k)
- XX(k)= XX(k) + VX(k) + FLX
- IR(k)= XX(k)
- RX(k)= XX(k) - IR(k)
- IR(k)= MOD2N( IR(k),2048) + 1
- XX(k)= RX(k) + IR(k)
- 142 CONTINUE
- C
- DO 14 k= 1,n
- RH(IR(k) )= RH(IR(k) ) + fw - RX(k)
- RH(IR(k)+1)= RH(IR(k)+1) + RX(k)
- 14 CONTINUE
- C
- C...................
- CALL TEST(14)
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C*******************************************************************************
- C*** KERNEL 15 CASUAL FORTRAN. DEVELOPMENT VERSION.
- C*******************************************************************************
- C
- C
- C CASUAL ORDERING OF SCALAR OPERATIONS IS TYPICAL PRACTICE.
- C THIS EXAMPLE DEMONSTRATES THE NON-TRIVIAL TRANSFORMATION
- C REQUIRED TO MAP INTO AN EFFICIENT MACHINE IMPLEMENTATION.
- C
- DO 45 L = 1,Loop
- NG= 7
- NZ= n
- AR= 0.05300D0
- BR= 0.07300D0
- 15 DO 45 j = 2,NG
- DO 45 k = 2,NZ
- IF( j-NG) 31,30,30
- 30 VY(k,j)= 0.0
- GO TO 45
- 31 IF( VH(k,j+1) -VH(k,j)) 33,33,32
- 32 T= AR
- GO TO 34
- 33 T= BR
- 34 IF( VF(k,j) -VF(k-1,j)) 35,36,36
- 35 R= MAX( VH(k-1,j), VH(k-1,j+1))
- S= VF(k-1,j)
- GO TO 37
- 36 R= MAX( VH(k,j), VH(k,j+1))
- S= VF(k,j)
- 37 VY(k,j)= SQRT( VG(k,j)**2 +R*R)*T/S
- 38 IF( k-NZ) 40,39,39
- 39 VS(k,j)= 0.0
- GO TO 45
- 40 IF( VF(k,j) -VF(k,j-1)) 41,42,42
- 41 R= MAX( VG(k,j-1), VG(k+1,j-1))
- S= VF(k,j-1)
- T= BR
- GO TO 43
- 42 R= MAX( VG(k,j), VG(k+1,j))
- S= VF(k,j)
- T= AR
- 43 VS(k,j)= SQRT( VH(k,j)**2 +R*R)*T/S
- 45 CONTINUE
- C
- C...................
- CALL TEST(15)
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C
- C*******************************************************************************
- C*** KERNEL 16 MONTE CARLO SEARCH LOOP
- C*******************************************************************************
- C
- II= n/3
- LB= II+II
- k2= 0
- k3= 0
- C
- DO 485 L= 1,Loop
- m= 1
- 405 i1= m
- 410 j2= (n+n)*(m-1)+1
- DO 470 k= 1,n
- k2= k2+1
- j4= j2+k+k
- j5= ZONE(j4)
- IF( j5-n ) 420,475,450
- 415 IF( j5-n+II ) 430,425,425
- 420 IF( j5-n+LB ) 435,415,415
- 425 IF( PLAN(j5)-R) 445,480,440
- 430 IF( PLAN(j5)-S) 445,480,440
- 435 IF( PLAN(j5)-T) 445,480,440
- 440 IF( ZONE(j4-1)) 455,485,470
- 445 IF( ZONE(j4-1)) 470,485,455
- 450 k3= k3+1
- IF( D(j5)-(D(j5-1)*(T-D(j5-2))**2+(S-D(j5-3))**2
- . +(R-D(j5-4))**2)) 445,480,440
- 455 m= m+1
- IF( m-ZONE(1) ) 465,465,460
- 460 m= 1
- 465 IF( i1-m) 410,480,410
- 470 CONTINUE
- 475 CONTINUE
- 480 CONTINUE
- 485 CONTINUE
- C
- C...................
- CALL TEST(16)
- C
- C*******************************************************************************
- C*** KERNEL 17 IMPLICIT, CONDITIONAL COMPUTATION (NO VECTORS)
- C*******************************************************************************
- C
- C RECURSIVE-DOUBLING VECTOR TECHNIQUES CAN NOT BE USED
- C BECAUSE CONDITIONAL OPERATIONS APPLY TO EACH ELEMENT.
- C
- dw= 5.0000D0/3.0000D0
- fw= 1.0000D0/3.0000D0
- tw= 1.0300D0/3.0700D0
- CDIR$ NOVECTOR
- DO 62 L= 1,Loop
- i= n
- j= 1
- INK= -1
- SCALE= dw
- XNM= fw
- E6= tw
- GO TO 61
- C STEP MODEL
- 60 E6= XNM*VSP(i)+VSTP(i)
- VXNE(i)= E6
- XNM= E6
- VE3(i)= E6
- i= i+INK
- IF( i.EQ.j) GO TO 62
- 61 E3= XNM*VLR(i) +VLIN(i)
- XNEI= VXNE(i)
- VXND(i)= E6
- XNC= SCALE*E3
- C SELECT MODEL
- IF( XNM .GT.XNC) GO TO 60
- IF( XNEI.GT.XNC) GO TO 60
- C LINEAR MODEL
- VE3(i)= E3
- E6= E3+E3-XNM
- VXNE(i)= E3+E3-XNEI
- XNM= E6
- i= i+INK
- IF( i.NE.j) GO TO 61
- 62 CONTINUE
- CDIR$ VECTOR
- C
- C...................
- CALL TEST(17)
- C
- C*******************************************************************************
- C*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT
- C*******************************************************************************
- C
- DO 75 L= 1,Loop
- T= 0.003700D0
- S= 0.004100D0
- KN= 6
- JN= n
- DO 70 k= 2,KN
- DO 70 j= 2,JN
- ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k))
- . *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1))
- ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k))
- . *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k))
- 70 CONTINUE
- C
- DO 72 k= 2,KN
- DO 72 j= 2,JN
- ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k))
- . -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k))
- . -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1))
- . +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1)))
- ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k))
- . -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k))
- . -ZB(j,k) *(ZR(j,k)-ZR(j,k-1))
- . +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1)))
- 72 CONTINUE
- C
- DO 75 k= 2,KN
- DO 75 j= 2,JN
- ZR(j,k)= ZR(j,k)+T*ZU(j,k)
- ZZ(j,k)= ZZ(j,k)+T*ZV(j,k)
- 75 CONTINUE
- C
- C...................
- CALL TEST(18)
- C
- C*******************************************************************************
- C*** KERNEL 19 GENERAL LINEAR RECURRENCE EQUATIONS (NO VECTORS)
- C*******************************************************************************
- C
- C IF( JR.GT.1 ) GO TO 192
- KB5I= 0
- CDIR$ NOVECTOR
- DO 194 L= 1,Loop
- DO 191 k= 1,n
- B5(k+KB5I)= SA(k) +STB5*SB(k)
- STB5= B5(k+KB5I) -STB5
- 191 CONTINUE
- C GO TO 194
- C
- 192 DO 193 i= 1,n
- k= n-i+1
- B5(k+KB5I)= SA(k) +STB5*SB(k)
- STB5= B5(k+KB5I) -STB5
- 193 CONTINUE
- 194 CONTINUE
- CDIR$ VECTOR
- C
- C...................
- CALL TEST(19)
- C
- C*******************************************************************************
- C*** KERNEL 20 DISCRETE ORDINATES TRANSPORT: RECURRENCE (NO VECTORS)
- C*******************************************************************************
- C
- dw= 0.200D0
- CDIR$ NOVECTOR
- DO 20 L= 1,Loop
- DO 20 k= 1,n
- DI= Y(k)-G(k)/( XX(k)+DK)
- DN= dw
- IF( DI.NE.0.0) DN= MAX( S,MIN( Z(k)/DI, T))
- X(k)= ((W(k)+V(k)*DN)* XX(k)+U(k))/(VX(k)+V(k)*DN)
- XX(k+1)= (X(k)- XX(k))*DN+ XX(k)
- 20 CONTINUE
- CDIR$ VECTOR
- C
- C...................
- CALL TEST(20)
- C
- C*******************************************************************************
- C*** KERNEL 21 MATRIX*MATRIX PRODUCT
- C*******************************************************************************
- C
- DO 21 L= 1,Loop
- DO 21 k= 1,25
- DO 21 i= 1,25
- DO 21 j= 1,n
- PX(i,j)= PX(i,j) +VY(i,k) * CX(k,j)
- 21 CONTINUE
- C
- C...................
- CALL TEST(21)
- C
- C
- C
- C
- C
- C
- C
- C*******************************************************************************
- C*** KERNEL 22 PLANCKIAN DISTRIBUTION
- C*******************************************************************************
- C
- C
- C EXPMAX= 234.500D0
- EXPMAX= 20.0000D0
- fw= 1.00000D0
- U(n)= 0.99000D0*EXPMAX*V(n)
- DO 22 L= 1,Loop
- DO 22 k= 1,n
- CARE IF( U(k) .LT. EXPMAX*V(k)) THEN
- Y(k)= U(k)/V(k)
- CARE ELSE
- CARE Y(k)= EXPMAX
- CARE ENDIF
- W(k)= X(k)/( EXP( Y(k)) -fw)
- 22 CONTINUE
- C...................
- CALL TEST(22)
- C
- C*******************************************************************************
- C*** KERNEL 23 2-D IMPLICIT HYDRODYNAMICS FRAGMENT
- C*******************************************************************************
- C
- fw= 0.17500D0
- DO 23 L= 1,Loop
- DO 23 j= 2,6
- DO 23 k= 2,n
- QA= ZA(k,j+1)*ZR(k,j) +ZA(k,j-1)*ZB(k,j) +
- . ZA(k+1,j)*ZU(k,j) +ZA(k-1,j)*ZV(k,j) +ZZ(k,j)
- 23 ZA(k,j)= ZA(k,j) +fw*(QA -ZA(k,j))
- C
- C...................
- CALL TEST(23)
- C
- C*******************************************************************************
- C*** KERNEL 24 FIND LOCATION OF FIRST MINIMUM IN ARRAY
- C*******************************************************************************
- C
- C X( n/2)= -1.000D+50
- X( n/2)= -1.000D+10
- DO 24 L= 1,Loop
- m= 1
- DO 24 k= 2,n
- IF( X(k).LT.X(m)) m= k
- 24 CONTINUE
- C
- C m= imin1( n,x,1) 35 nanosec./element STACKLIBE/CRAY
- C...................
- CALL TEST(24)
- C
- C*******************************************************************************
- C
- C
- IF( jr .LT. 1) jr= 1
- IF( jr .GT. 8) jr= 8-1
- IF( il .LT. 1) il= 1
- IF( il .GT. 3) il= 3
- C
- DO 999 k= 1,mk
- TIMES(jr,il,k)= TIME (k)
- TERRS(jr,il,k)= TERR1(k)
- NPFS (jr,il,k)= NPFS1(k)
- CSUMS(jr,il,k)= CSUM (k)
- DOS (jr,il,k)= TOTAL(k)
- FOPN (jr,il,k)= FLOPN(k)
- 999 continue
- C
- CALL TRACKX
- RETURN
- END
- C
- C***********************************************
- SUBROUTINE PAGE( iou)
- C***********************************************
- CALL TRACKS('PAGE ')
- WRITE(iou,1)
- 1 FORMAT(1H1)
- c 1 FORMAT(1H)
- CALL TRACKX
- RETURN
- END
- C***********************************************
- SUBROUTINE REPORT( iou, ntk,nek,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID)
- C***********************************************************************
- C *
- C REPORT - Prints Statistical Evaluation Of Fortran Kernel Timings*
- C *
- C iou - Logical Output Device Number *
- C ntk - Total number of Kernels to Edit in Report *
- C nek - Number of Effective Kernels in each set to Edit *
- C FLOPS - Array: Number of Flops executed by each kernel *
- C TR - Array: Time of execution of each kernel(microsecs) *
- C RATES - Array: Rate of execution of each kernel(megaflops/sec)*
- C LSPAN - Array: Span of inner DO loop in each kernel *
- C WG - Array: Weight assigned to each kernel for statistics *
- C OSUM - Array: Checksums of the results of each kernel *
- C***********************************************************************
- c
- c REFERENCE
- c
- c F.H.McMahon, The Livermore Fortran Kernels:
- c A Computer Test Of The Numerical Performance Range,
- c Lawrence Livermore National Laboratory,
- c Livermore, California, UCRL-53745, December 1986.
- c
- c from: National Technical Information Service
- c U.S. Department of Commerce
- c 5285 Port Royal Road
- c Springfield, VA. 22161
- c
- c NOTICE
- c
- c "This report was prepared as an account
- c of work sponsored by the United States
- c Government. Neither the United States
- c nor the United States Department of
- c Energy, nor any of their employees, nor
- c any of their contractors, subcontractors,
- c or their employees, makes any warranty,
- c express or implied, or assumes any legal
- c liability or responsibility for the
- c accuracy, completeness or usefulness of
- c any information, apparatus, product or
- c process disclosed, or represents that its
- c use would not infringe privateiy-owned
- c rights."
- c
- c Reference to a company or product name
- c does not impiy approval or recommendation
- c of the product by the University of
- c California or the U.S. Department of
- c Energy to the exclusion of others that
- c may be suitable.
- c
- c
- c Work performed under the auspices of the
- c U.S. Department of Energy by the Lawrence
- c Livermore Laboratory under contract
- c number W-7405-ENG-48.
- c
- c***********************************************************************
- c
- c Abstract
- c
- c A computer performance test that measures a realistic floating-point
- c performance range for Fortran applications is described. A variety
- c of computer performance analyses may be easily carried out using this
- c small central processing unit (cpu) test that would be infeasible or
- c too costly using complete applications as benchmarks, particularly in
- c the developmental phase of an immature computer system. The problem
- c of benchmarking numerical applications sufficiently, especially on
- c new supercomputers, is analyzed to identify several useful roles for
- c the Livermore Fortran Kernal (LFK) test. The 24 LFK contain enough
- c samples of Fortran practice to expose many specific inefficiencies in
- c the formulation of the Fortran source, in the quality of compiled cpu
- c code, and in the capability of the instruction architecture.
- c Examples show how the LFK may be used to study compiled Fortran code
- c efficiency, to test the ability of compilers to vectorize Fortran, to
- c simulate mature coding of Fortran on new computers, and to estimate
- c the effective subrange of supercomputer performance for Fortran
- c applications.
- c
- c Cpu performance measurements of several Fortran benchmarks and
- c numerical applications that correlate well with the cpu performance
- c range measured by the LFK test are presented. The numerical
- c performance metric Mflops, first introduced in 1970 in this cpu test
- c to quantify the cpu performance range of numerical applications, is
- c discussed. Analyses of the LFK performance results argue against
- c reducing the cpu performance range of supercomputers to a single
- c number. The 24 LFK measured rates show a realistic variance in
- c Fortran cpu performance that is essential data for circumspect
- c computer evaluations. Cpu performance data measured by the LFK test
- c on a number of recent computer systems are tabulated for reference.
- c
- c
- c
- c I: FORTRAN CPU PERFORMANCE ANALYSIS
- c
- c
- c These kernels measure Fortran numerical computation rates for a
- c spectrum of CPU-limited computational structures or benchmarks.
- c The kernels benchmark contains extracts or kernels from more
- c than a score CPU-limited scientific application programs. These
- c kernels are The most important CPU time components from The
- c application programs. This benchmark may be easily extended
- c with important new kernels leaving performance statistics intact.
- c
- c The time required to convert, debug, execute and time many,
- c entire, large programs on new machines each having a new
- c implementation of Fortran, or several implementations or
- c dialects rapidly becomes excessive. Almost all The conversion
- c costs are in segments of The programs which are irrelevant for
- c evaluation of The CPU, e.g., I/O, Fortran variations, memory
- c allocation, overlays, job control, etc. all of these
- c complexities are reduced to a single, small benchmark which uses
- c a minimum of I/O and a single level of storage. further, the
- c computation in the kernels is the most stable part of the
- c Fortran language.
- c
- c The kernels benchmark is sufficient to determine a range of CPU
- c performance for many different computational structures in a
- c single computer run. Since The range in performance is usually
- c large the mean has a secondary significance. To estimate the
- c
- c
- c
- c
- c
-
- +++++++++++++++++++++++++++++++++++++++++++++++++
- +++ ED (Edward Quillen) (517) 336-1293 +++
- +++ Vet Teaching Hosp. System Admin. +++
- +++ Email: quillen@cps.msu.edu +++
- +++++++++++++++++++++++++++++++++++++++++++++++++
-