NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / benchmar / 1378 < prev next >

Wrap

Internet Message Format | 1992-09-07 | 63.0 KB

Path: sparky!uunet!europa.asd.contel.com!darwin.sura.net!jvnc.net!netnews.upenn.edu!msuinfo!vthnw.cvm.msu.edu!eoq From: eoq@vthnw.cvm.msu.edu (Edward Quillen) Newsgroups: comp.benchmarks Subject: Fortran source of Livermore Loops benchmark Message-ID: <eoq.41.0@vthnw.cvm.msu.edu> Date: 6 Sep 92 01:43:52 GMT Sender: news@msuinfo.cl.msu.edu Organization: Veterinary Teaching Hospital, Michigan State Univ. Lines: 1636 C PROGRAM MFLOPS(TAPE6=OUTPUT) C LATEST FILE MODIFICATION DATE: 22/DEC/86R C**************************************************************************** C MEASURES CPU PERFORMANCE RANGE OF THE COMPUTER/COMPILER/COMPUTATION COMPLEX C**************************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation rates for a * C spectrum of CPU-limited computational structures. Mathematical * C through-put is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C This program measures a realistic CPU performance range for the * C Fortran programming system on a given day. The CPU performance * C rates depend strongly on the maturity of the Fortran compiler's * C ability to translate Fortran code into efficient machine code. * C * C [ The CPU hardware capability apart from compiler maturity (or * C availability), could be measured (or simulated) by programming the * C kernels in assembly or machine code directly. These measurements * C can also serve as a framework for tracking the maturation of the * C Fortran compiler during system development.] * C * C While this test evaluates the performance of a broad sampling of * C Fortran computations, it is not an application program and hence * C it is not a benchmark per se. The performance of benchmarks and * C even workloads, if CPU limited, could be roughly estimated by * C choosing appropriate weights and loop limits for each kernel (see * C Block Data). The LFK methodology is discussed in subroutine REPORT. * C The glossary and module hierarchy are documented in subr. INDEX. * C * C Use of this program is granted with the request that a copy of the * C results be sent to the author at the address shown below, to be * C added to our studies of computer performance. Please send your * C complete MFLOPS output file on a 5" PC/DOS diskette, if possible. * C Your timing results will be held as proprietary data, if so marked. * C In return, you will recieve a copy of our latest report. * C * C * C F.H. McMahon L-35 * C Lawrence Livermore National Laboratory * C P.0. Box 808 * C Livermore, CA. 94550 * C * C * C (C) Copyright 1983 the Regents of the * C University of California. All Rights Reserved. * C * C This work was produced under the sponsorship of * C the U.S. Department of Energy. The Government * C retains certain rights therein. * C * C**************************************************************************** C C C C C C C DIRECTIONS C C 1. We REQUIRE one test-run of the Fortran kernels as is, that is, with C no reprogramming. Standard product compiler directives may be C used for optimization as these do not constitute reprogramming. C C In addition, the vendor may, if so desired, reprogram the kernels to C demonstrate high performance hardware features. Kernels 13,14,23 C are partially vectorisable and kernels 15,16,24 are vectorisable if C re-written. Kernels 5,11,17,19,20 are implicit computations that C must not be explicitly vectorised using compiler directives to C ignore dependencies. In any case, compiler listings of the codes C actually used should be returned along with the timing results. C C 2. For vector processors, we REQUIRE an ALL-scalar compilation test-run C to measure the basic scalar performance range of the processor. C C 3. On computers where default single precision is REAL*4 we REQUIRE an C additional test-run with all mantissas.ge.47 . Declare all REAL C variables REAL*8 using one of the following declarations in each routine: C cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) c ( Then there is a redundant declaration in subrs SIGNAL and SUMO.) C C 4. On computers with Cache memories and high resolution CPU clocks we C REQUIRE, if feasible, another ALL-scalar test-run setting Loop= 1 C in SIZES to test un-primed cache (as well as encached) cpu rates. C Increase the size of array CACHE(in subr. VALUES) from 8192 to cache size. C C 5. Installation includes verifying or changing the following: C C First : the I/O output device number= IOU assignment in MAIN. C Second: the definition of function SECOND for CPU time only, and C the value of TIC:= minimum cpu clock time(sec) in SIZES. C Third : the definition of function MOD2N in KERNEL C Fourth: the system names Komput, Kontrl, and Kompil in REPORT C Fifth : after checkout set Nruns=7 in SIZES for Standard Benchmark Test C C 6. Each kernel's computation is check-summed for easy validation. C Verify correct processing using the checksums in subroutine REPORT C which were computed setting MULTI= 10 in BLOCK DATA. C Your checksums should compare to the precision used, within round-off. C C 7. Verify CPU Time measurements from function SECOND by comparing the clock C calibration printout of total CPU time with system or real-time measures. C The accuracy of SECOND is also tested using the test routine VERIFY. C C 8. On computers with Virtual Storage Systems assure a working-set space C larger than the entire program so that page faults are negligible, C because we must measure the CPU-limited computation rates. C IT IS ALSO NECESSARY to run this test stand-alone, i.e. NO timesharing. C In VS Systems a series of runs are needed to show stable CPU timings. C C 9. On parallel computer systems which compile vectors or Multi-tasking C at the Do-loop level (Micro-tasking) parallelisation of the first C DO (on L) in each kernel must be prevented by using a compiler directive C or by setting Loop= 1. This outermost DO Loop is merely repitition C used to increase timing accuracy and could distort the computation C sample if parallelisation is based on this artificial iteration level. C**************************************************************************** C C C c C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) C/ PARAMETER( nk= 47, nl= 3, nr= 8 ) c COMMON /ALPHA/ mk,ik,ml,il,Nruns,jr, NPFS(8,3,47) DIMENSION FLOPS(141), TR(141), RATES(141), ID(141) DIMENSION LSPAN(141), WG(141), OSUM (141), TERR(141) c CLOX REAL*8 SECOND c t= SECOND(0.0) iou= 6 OPEN (UNIT=6, FILE='output', STATUS='NEW') cLLNL call Q8EBM cLLNL call PFM( 0, iou) c c Record name in active linkage chain in COMMON /DEBUG/ CALL TRACKS(' MAIN. ') c c Verify Sufficient Loop Size Versus Cpu Clock Accuracy CALL VERIFY( iou) c c Define control limits: Nruns(runs), Loop(time), tic, CALL SIZES(-1) c c c Run test Nruns times Cpu-limited; I/O is deferred: DO 1 k= 1,Nruns jr= k c Run test using one of 3 sets of DO-Loop spans: c Set iou Negative to supress all I/O during Cpu timing. DO 1 j= 1,ml il= j tock= TICK( -iou) c CALL KERNEL 1 continue c c c c Report timing errors, Mflops statistics: DO 2 j= 1,ml il= j CALL RESULT( iou,FLOPS,TR,RATES,LSPAN,WG,OSUM,TERR,ID) c CALL REPORT( iou, mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) 2 continue c CALL REPORT( iou,3*mk,mk,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) c c t= SECOND(0.0) - t WRITE( iou,9) t 9 FORMAT( 1H1,//,26H Version: 22/DEC/86R ,/, . 26H CHECK CLOCK CALIBRATION: ,/, . 18H Total cpu Time = ,e14.5, 5H Sec. ) STOP c c c c c c c c c c c Subroutine timing of all-scalar execution on CRAY-1: c c Subroutine Time(%) c c KERNEL 43.46% c SUPPLY 21.82% c VERIFY 13.12% c STATS 8.83% c SQRT 1.84% c SORDID 1.21% c VALUES .74% c SUMO .47% c SIGNAL .34% c IQRANF .26% c STATW .17% c END c*********************************************** BLOCK DATA C*********************************************** C cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) DOUBLE PRECISION SUMS C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C ISPAN := Array of limits for DO loop control in the kernels C IPASS := Array of limits for multiple pass execution of each kernel C FLOPN := Array of floating-point operation counts for one pass thru kernel C WT := Array of weights to average kernel execution rates. C SKALE := Array of scale factors for SIGNAL data generator. C BIAS := Array of scale factors for SIGNAL data generator. C C MUL := Array of multipliers * FLOPN for each pass C WTP := Array of multipliers * WT for each pass C FR := Array of vectorisation fractions in REPORT C SUMW := Array of quartile weights in REPORT C IQ := Array of workload weights in REPORT C SUMS := Array of Verified Checksums of Kernels results: Nruns= 1 and 7. C C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 ) C C/ PARAMETER( l1= 27, l2= 15, l1d= 2*1001 ) C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) C C C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25) C C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) C/ PARAMETER( m1= 1001-1, m2= 101-1, m7= 1001-6 ) C COMMON /SPACES/ ion,j5,k2,k3,MULTI,Loop,m,kr,it,n13h,ibuf, 1 n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2 C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks, 1 FR(9), TERR1(47), SUMW(7), START, 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47), 3 IQ(7), NPF, NPFS1(47) C COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3) C COMMON /ORDER/ index, match, NSTACK(20) C COMMON /PROOF/ SUMS(24,3,2) C **************************************************************** C DATA ( ISPAN(i,1), i= 1,47) / : 1001, 101, 1001, 1001, 1001, 64, 995, 100, : 101, 101, 1001, 1000, 64, 1001, 101, 75, : 101, 100, 101, 1000, 101, 101, 100, 1001, 23*0/ C C* : l1, l2, l1, l1, l1, l13, m7, m2, C* : l2, l2, l1, m1, l13, l1, l2, l16, C* : l2, m2, l2, m1, l21, l2, m2, l1, 23*0/ C DATA ( ISPAN(i,2), i= 1,47) / : 101, 101, 101, 101, 101, 32, 101, 100, : 101, 101, 101, 100, 32, 101, 101, 40, : 101, 100, 101, 100, 50, 101, 100, 101, 23*0/ C DATA ( ISPAN(i,3), i= 1,47) / : 27, 15, 27, 27, 27, 8, 21, 14, : 15, 15, 27, 26, 8, 27, 15, 15, : 15, 14, 15, 26, 20, 15, 14, 27, 23*0/ C DATA ( IPASS(i,1), i= 1,47) / : 7, 67, 9, 14, 10, 3, 4, 10, 36, 34, 11, 12, : 36, 2, 1, 25, 35, 2, 39, 1, 1, 11, 8, 5, 23*0/ C DATA ( IPASS(i,2), i= 1,47) / : 40, 40, 53, 70, 55, 7, 22, 6, 21, 19, 64, 68, : 41, 10, 1, 27, 20, 1, 23, 8, 1, 7, 5, 31, 23*0/ C DATA ( IPASS(i,3), i= 1,47) / : 28, 46, 37, 38, 40, 21, 20, 9, 26, 25, 46, 48, : 31, 8, 1, 14, 26, 2, 28, 7, 1, 8, 7, 23, 23*0/ C DATA ( MUL(i), i= 1,3) / 1, 2, 8 / DATA ( WTP(i), i= 1,3) / 1.0, 2.0, 1.0 / c c The following flop-counts (FLOPN) are required for scalar or serial c execution. The scalar version defines the NECESSARY computation c generally, in the absence of proof to the contrary. The vector c or parallel executions are only credited with executing the same c necessary computation. If the parallel methods do more computation c than is necessary then the extra flops are not counted as through-put. c DATA ( FLOPN(i), i= 1,47) : /5., 4., 2., 2., 2., 2., 16., 36., 17., 9., 1., 1., : 7., 11., 33., 7., 9., 44., 6., 26., 2., 17., 11., 1., 23*0.0/ C DATA ( WT(i), i= 1,47) / : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, : 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 23*0.0/ C C/ : .08, .04, .02, .03, .03, .04, .10, .05, .04, .03, HLN C/ : .01, .02, .03, .02, .03, .05, .03, .20, .02, .02, HLN C/ : .03, .03, .04, .01, 23*0.0/ HLN C DATA ( SKALE(i), i= 1,47) / & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, & 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, 0.100D0, & 23*0.000D0 / C c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, c : 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 23*0.0/ C DATA ( BIAS(i), i= 1,47) / : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, : 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 23*0.0/ C DATA ( FR(i), i= 1,9) / : 0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0/ C DATA ( SUMW(i), i= 1,7) / : 1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5/ C DATA ( IQ(i), i= 1,7) / : 1, 2, 1, 2, 1, 2, 1/ C DATA START /0.0/, NPF/0/, ibuf/0/, match/0/, MULTI/10/ C DATA ( SUMS(i,1,1), i= 1,24 ) / &.5114652693224705D+05,.5150345372943066D+03,.1000742883066623D+02, &.5999250595474070D+00,.4548871642388544D+04,.5229095383954675D+13, &.6104251075163778D+05,.1501268005627157D+06,.1189443609975085D+06, &.7310369784325972D+05,.3342910972650531D+08,.2907141428639174D-04, &.4057110454105263D+10,.2982036205992255D+10,.3943816690352311D+05, &.2832600000000000D+05,.1114641772903091D+04,.5165625410757306D+05, &.5421816960150398D+03,.3040644339317275D+08,.8289464835786202D+07, &.2938604376567099D+03,.3549834542446150D+05,.5000000000000000D+03/ c DATA ( SUMS(i,2,1), i= 1,24 ) / &.5253344778938000D+03,.5150345372943066D+03,.1009741436579188D+01, &.5999250595474070D+00,.4589031939602131D+02,.2693280957416549D+16, &.6345586315772524D+03,.1501268005627157D+06,.1189443609975085D+06, &.7310369784325972D+05,.3433560407476162D+05,.7127569144561925D-05, &.2325318944820836D+10,.3045676741897511D+08,.3943816690352311D+05, &.3244100000000000D+05,.1114641772903091D+04,.5165625410757306D+05, &.5421816960150398D+03,.3126205178811007D+05,.3986531136462291D+07, &.2938604376567099D+03,.3549894609776936D+05,.5000000000000000D+02/ c DATA ( SUMS(i,3,1), i= 1,24 ) / &.3855104502494983D+02,.1199847611437483D+02,.2699309089321296D+00, &.5999250595474070D+00,.3182615248448271D+01,.8303480073326955D+12, &.2845720217638848D+02,.2960543667877649D+04,.2623968460874419D+04, &.1651291227698377D+04,.6551161335846537D+03,.1943435981776804D-05, &.4755211251524563D+09,.2547733008933910D+07,.1108997288135066D+04, &.2577600000000000D+05,.2947368618590713D+02,.9700646212341513D+03, &.1268230698051747D+02,.5987713249471801D+03,.2516870081042209D+07, &.6109968728264795D+01,.4850340602751675D+03,.1300000000000000D+02/ c DATA ( SUMS(i,1,2), i= 1,7 ) / &.2982036205992255D+10,.6118901630090488D+10,.9103526877478772D+10, &.1215176334476067D+11,.1519764492169999D+11,.1820312504465359D+11, &.2116750694993432D+11/ c DATA ( SUMS(i,2,2), i= 1,7 ) / &.3045676741897511D+08,.5718526521576222D+08,.8885029941358330D+08, &.1174925822726987D+09,.1501582054695641D+09,.1819691693283694D+09, &.2130649341195080D+09/ c DATA ( SUMS(i,3,2), i= 1,7 ) / &.2547733008933910D+07,.5131714230651644D+07,.7946120246231201D+07, &.1008019578807808D+08,.1269997234773526D+08,.1504905863862026D+08, &.1721399839433381D+08/ END C C*********************************************** SUBROUTINE INDEX C*********************************************** C C MODULE PURPOSE C ------ ----------------------------------------------- C IQRANF computes a vector of pseudo-random indices C C KERNEL executes 24 samples of Fortran computation C C PFM optional call to system hardware performance monitor C C REPORT prints timing results C C RESULT computes execution rates into pushdown store C C SECOND cumulative CPU time for task in seconds (MKS units) C C SENSIT sensitivity analysis of harmonic mean to 49 workloads C C SIGNAL generates a set of floating-point numbers near 1.0 C C SIMD sensitivity analysis of harmonic mean to SISD/SIMD model C C SIZES test and set the loop controls before each kernel test C C SORDID simple sort C C SPACE sets memory pointers for array variables. optional. C C STATS calculates unweighted statistics C C STATW calculates weighted statistics C C SUMO check-sum with ordinal dependency C C SUPPLY initializes common blocks containing type real arrays. C C TALLY computes average and minimum Cpu timings and variances. C C TDIGIT counts lead digits followed by trailing zeroes C C TEST times, tests, and initializes each kernel test C C TICK measures timing overhead of subroutine test C C TILE computes m-tile value and corresponding index C C TRACKS,TRACKX push/pop caller's name and serial nr. in /DEBUG/ C C TRAP checks that index-list values are in valid domain C C VALID compresses valid timing results C C VALUES initializes special values C C VERIFY verifies sufficient Loop size versus cpu clock accuracy C ------ ----------------------------------------------- c c ------------ -------- -------- -------- -------- -------- -------- c ENTRY LEVELS: 1 2 3 4 5 6 c ------------ -------- -------- -------- -------- -------- -------- c MAIN. SECOND c VERIFY SECOND c SIZES c STATS SQRT c TDIGIT LOG10 c SIZES c c TICK TEST SECOND c SIZES c SUMO c VALUES SUPPLY SIGNAL c IQRANF MOD c VALID TRAP TRAP c STATS SQRT c IQRANF MOD c TRAP c KERNEL SPACE c SQRT c EXP c TEST SECOND c SIZES c SUMO c VALUES SUPPLY SIGNAL c IQRANF MOD c RESULT TALLY SIZES TRAP c PAGE c STATS SQRT c LOG10 c c REPORT VALID TRAP c MOD c STATW SORDID TRAP c TILE c SQRT c LOG10 c PAGE c TRAP c SENSIT VALID TRAP c SORDID TRAP c PAGE c STATW SORDID TRAP c TILE c SIMD VALID TRAP c STATW SORDID TRAP c TILE C STOP C C C C C C C C C C C C c ------ ---- ------ ----- ------------------------------------ c BASE TYPE CLASS NAME GLOSSARY c ------ ---- ------ ----- ------------------------------------ c SPACE0 R Array BIAS - scale factors for SIGNAL data generator c SPACE0 R Array CSUM - checksums of KERNEL result arrays c BETA R Array CSUMS - sets of CSUM for all test runs c BETA R Array DOS - sets of TOTAL flops for all test runs c SPACE0 R Array FLOPN - flop counts for one execution pass c BETA R Array FOPN - sets of FLOPN for all test runs c SPACE0 R Array FR - vectorisation fractions; abscissa for REPORT c SPACES I scalar ibuf - flag enables one call to SIGNAL c ALPHA I scalar ik - current number of executing kernel c ALPHA I scalar il - selects one of three sets of loop spans c SPACES I scalar ion - logical I/O unit number for output c SPACEI I Array IPASS - Loop control limits for multiple-pass loops c SPACE0 I Array IQ - set of workload weights for REPORT c SPACEI I Array ISPAN - loop control limits for each kernel c SPACES I scalar it - flags timing call to TEST from TICK c SPACES I scalar j5 - datum in kernel 16 c ALPHA I scalar jr - current test run number (1 thru 7) c SPACES I scalar k2 - counter in kernel 16 c SPACES I scalar k3 - counter in kernel 16 c SPACES I scalar kr - a copy of mk c SPACES I scalar Loop - current multiple-pass loop limit in KERNEL c SPACES I scalar MULTI - Multiplier used to compute Loop in SIZES c SPACES I scalar m - temp integer datum c ALPHA I scalar mk - number of kernels to evaluate .LE.24 c ALPHA I scalar ml - maximum value of il= 3 c SPACEI I Array MUL - multipliers * IPASS defines Loop c SPACES I scalar n - current DO loop limit in KERNEL c SPACES I scalar n1 - dimension of most 1-D arrays c SPACES I scalar n13 - dimension used in kernel 13 c SPACES I scalar n13h - dimension used in kernel 13 c SPACES I scalar n14 - dimension used in kernel 14 c SPACES I scalar n16 - dimension used in kernel 16 c SPACES I scalar n2 - dimension of most 2-D arrays c SPACES I scalar n21 - dimension used in kernel 21 c SPACES I scalar n213 - dimension used in kernel 21 c SPACES I scalar n416 - dimension used in kernel 16 c SPACES I scalar n813 - dimension used in kernel 13 c SPACE0 I scalar npf - temp integer datum c ALPHA I Array NPFS - sets of NPFS1 for all test runs c SPACE0 I Array NPFS1 - number of page-faults for each kernel c ALPHA I scalar Nruns - number of complete test runs c SPACES I scalar nt1 - total size of common -SPACE1- words c SPACES I scalar nt2 - total size of common -SPACE2- words c BETA R Array SEE - (i,1,jr,il) sets of TEST overhead times c BETA R Array SEE - (i,2,jr,il) sets of csums of SPACE1 c BETA R Array SEE - (i,3,jr,il) sets of csums of SPACE2 c SPACE0 R Array SKALE - scale factors for SIGNAL data generator c SPACE0 R scalar start - temp start time of each kernel c PROOF R Array SUMS - sets of verified checksums for all test runs c SPACE0 R Array SUMW - set of quartile weights for REPORT c SPACE0 R Array TERR1 - overhead-time errors for each kernel c BETA R Array TERRS - sets of TERR1 for all runs c BETA R scalar tic - minimum cpu clock time= resolution c SPACE0 R scalar ticks - average overhead time in TEST linkage c SPACE0 R Array TIME - net execution times for all kernels c BETA R Array TIMES - sets of TIME for all test runs c SPACE0 R Array TOTAL - total flops computed by each kernel c SPACE0 R Array WS - unused c SPACE0 R Array WT - weights for each kernel sample c SPACEI R Array WTP - weights for the 3 span-varying passes c SPACE0 R Array WW - unused C C c --------- ----------------------------------------------------------------- c COMMON Usage c --------- ----------------------------------------------------------------- C C /ALPHA / C VERIFY TICK TALLY SIZES RESULT REPORT KERNEL C MAIN. C /BASE1 / C SUPPLY C /BASE2 / C SUPPLY C /BASER / C SUPPLY C /BETA / C TICK TALLY SIZES RESULT REPORT KERNEL C /DEBUG / C TRACKS TRACKX TRAP C /ORDER / C TRACKS TRACKX TRAP C /PROOF / C RESULT BLOCKDATA C /SPACE0/ C VALUES TICK TEST TALLY SUPPLY SIZES RESULT C REPORT KERNEL BLOCKDATA C /SPACE1/ C VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL C /SPACE2/ C VERIFY VALUES TICK TEST SUPPLY SPACE KERNEL C /SPACE3/ C VALUES C /SPACEI/ C VERIFY VALUES TICK TEST SIZES RESULT REPORT C KERNEL BLOCKDATA C /SPACER/ C VALUES TICK TEST SUPPLY SIZES KERNEL C /SPACES/ C VERIFY VALUES TICK TEST SUPPLY SIZES KERNEL C BLOCKDATA c --------- ----------------------------------------------------------------- RETURN END C C C*************************************** SUBROUTINE IQRANF( M, Mmin,Mmax, n) C*********************************************************************** C * c IQRANF - computes a vector of psuedo-random indices * c in the domain (Mmin,Mmax) * C * C M - result array , psuedo-random positive integers * C Mmin - input integer, lower bound for random integers * C Mmax - input integer, upper bound for random integers * C n - input integer, number of results in M. * C * C M(i)= Mmin + INT( (Mmax-Mmin) * RANF(0)) * C * c CALL IQRANF( IX, 1,1001, 30) should produce in IX: * c 3 674 435 415 389 54 44 790 900 282 * c 177 971 728 851 687 604 815 971 155 112 * c 877 814 779 192 619 894 544 404 496 505 ... * C * C*********************************************************************** C cANSI IMPLICIT DOUBLE PRECISION (A-H,K,O-Z) cIBM IMPLICIT REAL*8 (A-H,K,O-Z) DOUBLE PRECISION dq, dp, per, dk, spin, span C dimension M(n) save k CALL TRACKS('IQRANF ') IF( n.LE.0 ) GO TO 73 inset= Mmin span= Mmax - Mmin c spin= 16807.00D0 c per= 2147483647.00D0 spin= 16807 per= 2147483647 realn= n scale= 1.0000100D0 q= scale*(span/realn) C dk= k DO 1 i= 1,n dp= dk*spin c dk= DMOD( dp, per) dk= dp -INT( dp/per)*per dq= dk*span M(i)= inset + ( dq/ per) IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q 1 continue k= dk C C ciC double precision k, ip, iq, id ci inset= Mmin ci ispan= Mmax - Mmin ci ispin= 16807 ci id= 2147483647 ci q= (REAL(ispan)/REAL(n))*1.00001 ciC ci DO 2 i= 1,n ci ip= k*ispin ci k= MOD( ip, id) ci iq= k*ispan ci M(i)= inset + ( iq/ id) ci IF( M(i).LT.Mmin .OR. M(i).GT.Mmax ) M(i)= inset + i*q ci 2 continue C CALL TRAP( M, 8H IQRANF , 1, Mmax, n) C 73 CONTINUE CALL TRACKX RETURN DATA k /256/ END C*********************************************** SUBROUTINE KERNEL C*********************************************************************** C * C KERNEL executes 24 samples of Fortran computation * C * C*********************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation * C rates for a spectrum of cpu-limited computational * C structures or benchmarks. Mathematical through-put * C is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C Fonzi's Law: There is not now and there never will be a language * C in which it is the least bit difficult to write * C bad programs. * C F.H.MCMAHON 1972 * C*********************************************************************** C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C Loop := multiple pass control to execute kernel long enough to time. C n := DO loop control for each kernel. Controls are set in subr. SIZES C C ****************************************************************** C cANSI IMPLICIT DOUBLE PRECISION (A-H,O-Z) cIBM IMPLICIT REAL*8 (A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14=2048, l16= 75, l416= 4*l16 , l21= 25 ) C/ PARAMETER( kn= 47, kn2= 95, np= 3, ls= 3*47, krs= 24) C C C/ PARAMETER( nk= 47, nl= 3, nr= 8 ) C COMMON /ALPHA/ mk,ik,ml,il,Nruns,jr, NPFS(8,3,47) COMMON /BETA / tic, TIMES(8,3,47), SEE(5,3,8,3), 1 TERRS(8,3,47), CSUMS(8,3,47), 2 FOPN(8,3,47), DOS(8,3,47) C COMMON /SPACES/ ion,j5,k2,k3,MULTI,Loop,m,kr,it,n13h,ibuf, 1 n,n1,n2,n13,n213,n813,n14,n16,n416,n21,nt1,nt2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), ticks, 1 FR(9), TERR1(47), SUMW(7), START, 2 SKALE(47), BIAS(47), WS(95), TOTAL(47), FLOPN(47), 3 IQ(7), NPF, NPFS1(47) C COMMON /SPACEI/ WTP(3), MUL(3), ISPAN(47,3), IPASS(47,3) C C/ INTEGER E,F,ZONE C/ COMMON /ISPACE/ E(l213), F(l213), C/ 1 IX(l1), IR(l1), ZONE(l416) C/C C/ COMMON /SPACE1/ U(l1), V(l1), W(l1), C/ 1 X(l1), Y(l1), Z(l1), G(l1), C/ 2 DU1(l2), DU2(l2), DU3(l2), GRD(l1), DEX(l1), C/ 3 XI(l1), EX(l1), EX1(l1), DEX1(l1), C/ 4 VX(l14), XX(l14), RX(l14), RH(l14), C/ 5 VSP(l2), VSTP(l2), VXNE(l2), VXND(l2), C/ 6 VE3(l2), VLR(l2), VLIN(l2), B5(l2), C/ 7 PLAN(l416), D(l416), SA(l2), SB(l2) C/C C/ COMMON /SPACE2/ P(4,l813), PX(l21,l2), CX(l21,l2), C/ 1 VY(l2,l21), VH(l2,7), VF(l2,7), VG(l2,7), VS(l2,7), C/ 2 ZA(l2,7) , ZP(l2,7), ZQ(l2,7), ZR(l2,7), ZM(l2,7), C/ 3 ZB(l2,7) , ZU(l2,7), ZV(l2,7), ZZ(l2,7), C/ 4 B(l13,l13), C(l13,l13), H(l13,l13), C/ 5 U1(5,l2,2), U2(5,l2,2), U3(5,l2,2) C C ****************************************************************** C C C/ PARAMETER( l1= 1001, l2= 101, l1d= 2*1001 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 2048, l16= 75, l416= 4*75 , l21= 25) C C care C INTEGER E,F,ZONE COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(2048), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C ****************************************************************** C DIMENSION XZ(2001), ZX(2001) EQUIVALENCE ( XZ(1), X(1)), ( ZX(1), Z(1)) C C C// DIMENSION E(96), F(96), U(1001), V(1001), W(1001), C// 1 X(1001), Y(1001), Z(1001), G(1001), C// 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), C// 3 IX(1001), XI(1001), EX(1001), EX1(1001), DEX1(1001), C// 4 VX(1001), XX(1001), IR(1001), RX(1001), RH(2048), C// 5 VSP(101), VSTP(101), VXNE(101), VXND(101), C// 6 VE3(101), VLR(101), VLIN(101), B5(101), C// 7 PLAN(300), ZONE(300), D(300), SA(101), SB(101) C//C C// DIMENSION P(4,512), PX(25,101), CX(25,101), C// 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), C// 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), C// 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), C// 4 B(64,64), C(64,64), H(64,64), C// 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C//C C//C ****************************************************************** C//C C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 C//C C// POINTER (ME,E), (MF,F), (MU,U), (MV,V), (MW,W), C// 1 (MX,X), (MY,Y), (MZ,Z), (MG,G), C// 2 (MDU1,DU1),(MDU2,DU2),(MDU3,DU3),(MGRD,GRD),(MDEX,DEX), C// 3 (MIX,IX), (MXI,XI), (MEX,EX), (MEX1,EX1), (MDEX1,DEX1), C// 4 (MVX,VX), (MXX,XX), (MIR,IR), (MRX,RX), (MRH,RH), C// 5 (MVSP,VSP), (MVSTP,VSTP), (MVXNE,VXNE), (MVXND,VXND), C// 6 (MVE3,VE3), (MVLR,VLR), (MVLIN,VLIN), (MB5,B5), C// 7 (MPLAN,PLAN), (MZONE,ZONE), (MD,D), (MSA,SA), (MSB,SB) C//C C// POINTER (MP,P), (MPX,PX), (MCX,CX), C// 1 (MVY,VY), (MVH,VH), (MVF,VF), (MVG,VG), (MVS,VS), C// 2 (MZA,ZA), (MZP,ZP), (MZQ,ZQ), (MZR,ZR), (MZM,ZM), C// 3 (MZB,ZB), (MZU,ZU), (MZV,ZV), (MZZ,ZZ), C// 4 (MB,B), (MC,C), (MH,H), C// 5 (MU1,U1), (MU2,U2), (MU3,U3) C.. COMMON DUMMY(2000) C.. LOC(X) =.LOC.X C.. IQ8QDSP = 64*LOC(DUMMY) C C ****************************************************************** C C STANDARD PRODUCT COMPILER DIRECTIVES MAY BE USED FOR OPTIMIZATION C CDIR$ VECTOR CLLL. OPTIMIZE LEVEL i CLLL. OPTION INTEGER (7) CLLL. OPTION ASSERT (NO HAZARD) CLLL. OPTION NODYNEQV C C ****************************************************************** C BINARY MACHINES MAY USE THE AND(P,Q) FUNCTION IF AVAILABLE C IN PLACE OF THE FOLLOWING CONGRUENCE FUNCTION (SEE KERNEL 13, 14) C c IAND(j,k) = AND(j,k) CLLL. IAND(j,k) = j.INT.k c MOD2N(i,j)= MOD(i,j) MOD2N(i,j)= IAND(i,j-1) C i is Congruent to MOD2N(i,j) mod(j) C ****************************************************************** C C C C C C CALL TRACKS('KERNEL ') C CALL SPACE C cLLNL call PFM( 0, ion) CALL TEST(0) C C******************************************************************************* C*** KERNEL 1 HYDRO FRAGMENT C******************************************************************************* C DO 1 L = 1,Loop DO 1 k = 1,n 1 X(k)= Q + Y(k)*(R*ZX(k+10) + T*ZX(k+11)) C C................... CALL TEST(1) C C******************************************************************************* C*** KERNEL 2 ICCG EXCERPT (INCOMPLETE CHOLESKY - CONJUGATE GRADIENT) C******************************************************************************* C DO 200 L= 1,Loop II= n IPNTP= 0 222 IPNT= IPNTP IPNTP= IPNTP+II II= II/2 i= IPNTP CDIR$ IVDEP C DO 2 k= IPNT+2,IPNTP,2 i= i+1 2 X(i)= X(k) - V(k)*X(k-1) - V(k+1)*X(k+1) IF( II.GT.1) GO TO 222 200 CONTINUE C C................... CALL TEST(2) C C******************************************************************************* C*** KERNEL 3 INNER PRODUCT C******************************************************************************* C DO 3 L= 1,Loop Q= 0.000D0 DO 3 k= 1,n 3 Q= Q + Z(k)*X(k) C C................... CALL TEST(3) C C C C******************************************************************************* C*** KERNEL 4 BANDED LINEAR EQUATIONS C******************************************************************************* C m= (1001-7)/2 DO 444 L= 1,Loop DO 444 k= 7,1001,m lw= k-6 temp= X(k-1) CDIR$ IVDEP DO 4 j= 5,n,5 temp = temp - XZ(lw)*Y(j) 4 lw= lw+1 X(k-1)= Y(5)*temp 444 CONTINUE C C................... CALL TEST(4) C C******************************************************************************* C*** KERNEL 5 TRI-DIAGONAL ELIMINATION, BELOW DIAGONAL (NO VECTORS) C******************************************************************************* C DO 5 L = 1,Loop CDIR$ NOVECTOR DO 5 i = 2,n 5 X(i)= Z(i)*(Y(i) - X(i-1)) CDIR$ VECTOR C C................... CALL TEST(5) C C******************************************************************************* C*** KERNEL 6 GENERAL LINEAR RECURRENCE EQUATIONS C******************************************************************************* C DO 6 L= 1,Loop DO 6 i= 2,n C W(i)= 0.0100D0 use only if overflow occurs DO 6 k= 1,i-1 W(i)= W(i) + B(i,k) * W(i-k) 6 CONTINUE C C................... CALL TEST(6) C C******************************************************************************* C*** KERNEL 7 EQUATION OF STATE FRAGMENT C******************************************************************************* C DO 7 L= 1,Loop DO 7 k= 1,n X(k)= U(k ) + R*( Z(k ) + R*Y(k )) + . T*( U(k+3) + R*( U(k+2) + R*U(k+1)) + . T*( U(k+6) + R*( U(k+5) + R*U(k+4)))) 7 CONTINUE C C................... CALL TEST(7) C C C******************************************************************************* C*** KERNEL 8 A.D.I. INTEGRATION C******************************************************************************* C DO 8 L = 1,Loop nl1 = 1 nl2 = 2 fw= 2.000D0 DO 8 kx = 2,3 CDIR$ IVDEP DO 8 ky = 2,n DU1(ky)=U1(kx,ky+1,nl1) - U1(kx,ky-1,nl1) DU2(ky)=U2(kx,ky+1,nl1) - U2(kx,ky-1,nl1) DU3(ky)=U3(kx,ky+1,nl1) - U3(kx,ky-1,nl1) U1(kx,ky,nl2)=U1(kx,ky,nl1) +A11*DU1(ky) +A12*DU2(ky) +A13*DU3(ky) . + SIG*(U1(kx+1,ky,nl1) -fw*U1(kx,ky,nl1) +U1(kx-1,ky,nl1)) U2(kx,ky,nl2)=U2(kx,ky,nl1) +A21*DU1(ky) +A22*DU2(ky) +A23*DU3(ky) . + SIG*(U2(kx+1,ky,nl1) -fw*U2(kx,ky,nl1) +U2(kx-1,ky,nl1)) U3(kx,ky,nl2)=U3(kx,ky,nl1) +A31*DU1(ky) +A32*DU2(ky) +A33*DU3(ky) . + SIG*(U3(kx+1,ky,nl1) -fw*U3(kx,ky,nl1) +U3(kx-1,ky,nl1)) 8 CONTINUE C C................... CALL TEST(8) C C******************************************************************************* C*** KERNEL 9 INTEGRATE PREDICTORS C******************************************************************************* C DO 9 L = 1,Loop DO 9 i = 1,n PX( 1,i)= DM28*PX(13,i) + DM27*PX(12,i) + DM26*PX(11,i) + . DM25*PX(10,i) + DM24*PX( 9,i) + DM23*PX( 8,i) + . DM22*PX( 7,i) + C0*(PX( 5,i) + PX( 6,i))+ PX( 3,i) 9 CONTINUE C C................... CALL TEST(9) C C******************************************************************************* C*** KERNEL 10 DIFFERENCE PREDICTORS C******************************************************************************* C DO 10 L= 1,Loop DO 10 i= 1,n AR = CX(5,i) BR = AR - PX(5,i) PX(5,i) = AR CR = BR - PX(6,i) PX(6,i) = BR AR = CR - PX(7,i) PX(7,i) = CR BR = AR - PX(8,i) PX(8,i) = AR CR = BR - PX(9,i) PX(9,i) = BR AR = CR - PX(10,i) PX(10,i)= CR BR = AR - PX(11,i) PX(11,i)= AR CR = BR - PX(12,i) PX(12,i)= BR PX(14,i)= CR - PX(13,i) PX(13,i)= CR 10 CONTINUE C C................... CALL TEST(10) C C******************************************************************************* C*** KERNEL 11 FIRST SUM. PARTIAL SUMS. (NO VECTORS) C******************************************************************************* C fw= 1.000D-25 DO 11 L = 1,Loop C Y(1)= Y(1) + L*fw use only if optimization eliminates L-loop. X(1)= Y(1) CDIR$ NOVECTOR DO 11 k = 2,n 11 X(k)= X(k-1) + Y(k) CDIR$ VECTOR C C................... CALL TEST(11) C C******************************************************************************* C*** KERNEL 12 FIRST DIFF. C******************************************************************************* C fw= 1.000D-25 DO 12 L = 1,Loop C Y(1)= Y(1) + L*fw use only if optimization eliminates L-loop. DO 12 k = 1,n 12 X(k)= Y(k+1) - Y(k) C C................... CALL TEST(12) C C******************************************************************************* C*** KERNEL 13 2-D PIC Particle In Cell C******************************************************************************* C fw= 1.000D0 DO 13 L= 1,Loop DO 13 ip= 1,n i1= P(1,ip) j1= P(2,ip) i1= 1 + MOD2N(i1,64) j1= 1 + MOD2N(j1,64) P(3,ip)= P(3,ip) + B(i1,j1) P(4,ip)= P(4,ip) + C(i1,j1) P(1,ip)= P(1,ip) + P(3,ip) P(2,ip)= P(2,ip) + P(4,ip) i2= P(1,ip) j2= P(2,ip) i2= MOD2N(i2,64) j2= MOD2N(j2,64) P(1,ip)= P(1,ip) + Y(i2+32) P(2,ip)= P(2,ip) + Z(j2+32) i2= i2 + E(i2+32) j2= j2 + F(j2+32) H(i2,j2)= H(i2,j2) + fw 13 CONTINUE C C................... CALL TEST(13) C C******************************************************************************* C*** KERNEL 14 1-D PIC Particle In Cell C******************************************************************************* C C fw= 1.000D0 DO 14 L= 1,Loop DO 141 k= 1,n VX(k)= 0.0 XX(k)= 0.0 IX(k)= INT( GRD(k)) XI(k)= REAL( IX(k)) EX1(k)= EX ( IX(k)) DEX1(k)= DEX ( IX(k)) 141 CONTINUE C DO 142 k= 1,n VX(k)= VX(k) + EX1(k) + (XX(k) - XI(k))*DEX1(k) XX(k)= XX(k) + VX(k) + FLX IR(k)= XX(k) RX(k)= XX(k) - IR(k) IR(k)= MOD2N( IR(k),2048) + 1 XX(k)= RX(k) + IR(k) 142 CONTINUE C DO 14 k= 1,n RH(IR(k) )= RH(IR(k) ) + fw - RX(k) RH(IR(k)+1)= RH(IR(k)+1) + RX(k) 14 CONTINUE C C................... CALL TEST(14) C C C C C C C C C C C C C C C C C C C C******************************************************************************* C*** KERNEL 15 CASUAL FORTRAN. DEVELOPMENT VERSION. C******************************************************************************* C C C CASUAL ORDERING OF SCALAR OPERATIONS IS TYPICAL PRACTICE. C THIS EXAMPLE DEMONSTRATES THE NON-TRIVIAL TRANSFORMATION C REQUIRED TO MAP INTO AN EFFICIENT MACHINE IMPLEMENTATION. C DO 45 L = 1,Loop NG= 7 NZ= n AR= 0.05300D0 BR= 0.07300D0 15 DO 45 j = 2,NG DO 45 k = 2,NZ IF( j-NG) 31,30,30 30 VY(k,j)= 0.0 GO TO 45 31 IF( VH(k,j+1) -VH(k,j)) 33,33,32 32 T= AR GO TO 34 33 T= BR 34 IF( VF(k,j) -VF(k-1,j)) 35,36,36 35 R= MAX( VH(k-1,j), VH(k-1,j+1)) S= VF(k-1,j) GO TO 37 36 R= MAX( VH(k,j), VH(k,j+1)) S= VF(k,j) 37 VY(k,j)= SQRT( VG(k,j)**2 +R*R)*T/S 38 IF( k-NZ) 40,39,39 39 VS(k,j)= 0.0 GO TO 45 40 IF( VF(k,j) -VF(k,j-1)) 41,42,42 41 R= MAX( VG(k,j-1), VG(k+1,j-1)) S= VF(k,j-1) T= BR GO TO 43 42 R= MAX( VG(k,j), VG(k+1,j)) S= VF(k,j) T= AR 43 VS(k,j)= SQRT( VH(k,j)**2 +R*R)*T/S 45 CONTINUE C C................... CALL TEST(15) C C C C C C C C C C C C C C C******************************************************************************* C*** KERNEL 16 MONTE CARLO SEARCH LOOP C******************************************************************************* C II= n/3 LB= II+II k2= 0 k3= 0 C DO 485 L= 1,Loop m= 1 405 i1= m 410 j2= (n+n)*(m-1)+1 DO 470 k= 1,n k2= k2+1 j4= j2+k+k j5= ZONE(j4) IF( j5-n ) 420,475,450 415 IF( j5-n+II ) 430,425,425 420 IF( j5-n+LB ) 435,415,415 425 IF( PLAN(j5)-R) 445,480,440 430 IF( PLAN(j5)-S) 445,480,440 435 IF( PLAN(j5)-T) 445,480,440 440 IF( ZONE(j4-1)) 455,485,470 445 IF( ZONE(j4-1)) 470,485,455 450 k3= k3+1 IF( D(j5)-(D(j5-1)*(T-D(j5-2))**2+(S-D(j5-3))**2 . +(R-D(j5-4))**2)) 445,480,440 455 m= m+1 IF( m-ZONE(1) ) 465,465,460 460 m= 1 465 IF( i1-m) 410,480,410 470 CONTINUE 475 CONTINUE 480 CONTINUE 485 CONTINUE C C................... CALL TEST(16) C C******************************************************************************* C*** KERNEL 17 IMPLICIT, CONDITIONAL COMPUTATION (NO VECTORS) C******************************************************************************* C C RECURSIVE-DOUBLING VECTOR TECHNIQUES CAN NOT BE USED C BECAUSE CONDITIONAL OPERATIONS APPLY TO EACH ELEMENT. C dw= 5.0000D0/3.0000D0 fw= 1.0000D0/3.0000D0 tw= 1.0300D0/3.0700D0 CDIR$ NOVECTOR DO 62 L= 1,Loop i= n j= 1 INK= -1 SCALE= dw XNM= fw E6= tw GO TO 61 C STEP MODEL 60 E6= XNM*VSP(i)+VSTP(i) VXNE(i)= E6 XNM= E6 VE3(i)= E6 i= i+INK IF( i.EQ.j) GO TO 62 61 E3= XNM*VLR(i) +VLIN(i) XNEI= VXNE(i) VXND(i)= E6 XNC= SCALE*E3 C SELECT MODEL IF( XNM .GT.XNC) GO TO 60 IF( XNEI.GT.XNC) GO TO 60 C LINEAR MODEL VE3(i)= E3 E6= E3+E3-XNM VXNE(i)= E3+E3-XNEI XNM= E6 i= i+INK IF( i.NE.j) GO TO 61 62 CONTINUE CDIR$ VECTOR C C................... CALL TEST(17) C C******************************************************************************* C*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT C******************************************************************************* C DO 75 L= 1,Loop T= 0.003700D0 S= 0.004100D0 KN= 6 JN= n DO 70 k= 2,KN DO 70 j= 2,JN ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k)) . *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1)) ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k)) . *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k)) 70 CONTINUE C DO 72 k= 2,KN DO 72 j= 2,JN ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k)) . -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k)) . -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1)) . +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1))) ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k)) . -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k)) . -ZB(j,k) *(ZR(j,k)-ZR(j,k-1)) . +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1))) 72 CONTINUE C DO 75 k= 2,KN DO 75 j= 2,JN ZR(j,k)= ZR(j,k)+T*ZU(j,k) ZZ(j,k)= ZZ(j,k)+T*ZV(j,k) 75 CONTINUE C C................... CALL TEST(18) C C******************************************************************************* C*** KERNEL 19 GENERAL LINEAR RECURRENCE EQUATIONS (NO VECTORS) C******************************************************************************* C C IF( JR.GT.1 ) GO TO 192 KB5I= 0 CDIR$ NOVECTOR DO 194 L= 1,Loop DO 191 k= 1,n B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 191 CONTINUE C GO TO 194 C 192 DO 193 i= 1,n k= n-i+1 B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 193 CONTINUE 194 CONTINUE CDIR$ VECTOR C C................... CALL TEST(19) C C******************************************************************************* C*** KERNEL 20 DISCRETE ORDINATES TRANSPORT: RECURRENCE (NO VECTORS) C******************************************************************************* C dw= 0.200D0 CDIR$ NOVECTOR DO 20 L= 1,Loop DO 20 k= 1,n DI= Y(k)-G(k)/( XX(k)+DK) DN= dw IF( DI.NE.0.0) DN= MAX( S,MIN( Z(k)/DI, T)) X(k)= ((W(k)+V(k)*DN)* XX(k)+U(k))/(VX(k)+V(k)*DN) XX(k+1)= (X(k)- XX(k))*DN+ XX(k) 20 CONTINUE CDIR$ VECTOR C C................... CALL TEST(20) C C******************************************************************************* C*** KERNEL 21 MATRIX*MATRIX PRODUCT C******************************************************************************* C DO 21 L= 1,Loop DO 21 k= 1,25 DO 21 i= 1,25 DO 21 j= 1,n PX(i,j)= PX(i,j) +VY(i,k) * CX(k,j) 21 CONTINUE C C................... CALL TEST(21) C C C C C C C C******************************************************************************* C*** KERNEL 22 PLANCKIAN DISTRIBUTION C******************************************************************************* C C C EXPMAX= 234.500D0 EXPMAX= 20.0000D0 fw= 1.00000D0 U(n)= 0.99000D0*EXPMAX*V(n) DO 22 L= 1,Loop DO 22 k= 1,n CARE IF( U(k) .LT. EXPMAX*V(k)) THEN Y(k)= U(k)/V(k) CARE ELSE CARE Y(k)= EXPMAX CARE ENDIF W(k)= X(k)/( EXP( Y(k)) -fw) 22 CONTINUE C................... CALL TEST(22) C C******************************************************************************* C*** KERNEL 23 2-D IMPLICIT HYDRODYNAMICS FRAGMENT C******************************************************************************* C fw= 0.17500D0 DO 23 L= 1,Loop DO 23 j= 2,6 DO 23 k= 2,n QA= ZA(k,j+1)*ZR(k,j) +ZA(k,j-1)*ZB(k,j) + . ZA(k+1,j)*ZU(k,j) +ZA(k-1,j)*ZV(k,j) +ZZ(k,j) 23 ZA(k,j)= ZA(k,j) +fw*(QA -ZA(k,j)) C C................... CALL TEST(23) C C******************************************************************************* C*** KERNEL 24 FIND LOCATION OF FIRST MINIMUM IN ARRAY C******************************************************************************* C C X( n/2)= -1.000D+50 X( n/2)= -1.000D+10 DO 24 L= 1,Loop m= 1 DO 24 k= 2,n IF( X(k).LT.X(m)) m= k 24 CONTINUE C C m= imin1( n,x,1) 35 nanosec./element STACKLIBE/CRAY C................... CALL TEST(24) C C******************************************************************************* C C IF( jr .LT. 1) jr= 1 IF( jr .GT. 8) jr= 8-1 IF( il .LT. 1) il= 1 IF( il .GT. 3) il= 3 C DO 999 k= 1,mk TIMES(jr,il,k)= TIME (k) TERRS(jr,il,k)= TERR1(k) NPFS (jr,il,k)= NPFS1(k) CSUMS(jr,il,k)= CSUM (k) DOS (jr,il,k)= TOTAL(k) FOPN (jr,il,k)= FLOPN(k) 999 continue C CALL TRACKX RETURN END C C*********************************************** SUBROUTINE PAGE( iou) C*********************************************** CALL TRACKS('PAGE ') WRITE(iou,1) 1 FORMAT(1H1) c 1 FORMAT(1H) CALL TRACKX RETURN END C*********************************************** SUBROUTINE REPORT( iou, ntk,nek,FLOPS,TR,RATES,LSPAN,WG,OSUM,ID) C*********************************************************************** C * C REPORT - Prints Statistical Evaluation Of Fortran Kernel Timings* C * C iou - Logical Output Device Number * C ntk - Total number of Kernels to Edit in Report * C nek - Number of Effective Kernels in each set to Edit * C FLOPS - Array: Number of Flops executed by each kernel * C TR - Array: Time of execution of each kernel(microsecs) * C RATES - Array: Rate of execution of each kernel(megaflops/sec)* C LSPAN - Array: Span of inner DO loop in each kernel * C WG - Array: Weight assigned to each kernel for statistics * C OSUM - Array: Checksums of the results of each kernel * C*********************************************************************** c c REFERENCE c c F.H.McMahon, The Livermore Fortran Kernels: c A Computer Test Of The Numerical Performance Range, c Lawrence Livermore National Laboratory, c Livermore, California, UCRL-53745, December 1986. c c from: National Technical Information Service c U.S. Department of Commerce c 5285 Port Royal Road c Springfield, VA. 22161 c c NOTICE c c "This report was prepared as an account c of work sponsored by the United States c Government. Neither the United States c nor the United States Department of c Energy, nor any of their employees, nor c any of their contractors, subcontractors, c or their employees, makes any warranty, c express or implied, or assumes any legal c liability or responsibility for the c accuracy, completeness or usefulness of c any information, apparatus, product or c process disclosed, or represents that its c use would not infringe privateiy-owned c rights." c c Reference to a company or product name c does not impiy approval or recommendation c of the product by the University of c California or the U.S. Department of c Energy to the exclusion of others that c may be suitable. c c c Work performed under the auspices of the c U.S. Department of Energy by the Lawrence c Livermore Laboratory under contract c number W-7405-ENG-48. c c*********************************************************************** c c Abstract c c A computer performance test that measures a realistic floating-point c performance range for Fortran applications is described. A variety c of computer performance analyses may be easily carried out using this c small central processing unit (cpu) test that would be infeasible or c too costly using complete applications as benchmarks, particularly in c the developmental phase of an immature computer system. The problem c of benchmarking numerical applications sufficiently, especially on c new supercomputers, is analyzed to identify several useful roles for c the Livermore Fortran Kernal (LFK) test. The 24 LFK contain enough c samples of Fortran practice to expose many specific inefficiencies in c the formulation of the Fortran source, in the quality of compiled cpu c code, and in the capability of the instruction architecture. c Examples show how the LFK may be used to study compiled Fortran code c efficiency, to test the ability of compilers to vectorize Fortran, to c simulate mature coding of Fortran on new computers, and to estimate c the effective subrange of supercomputer performance for Fortran c applications. c c Cpu performance measurements of several Fortran benchmarks and c numerical applications that correlate well with the cpu performance c range measured by the LFK test are presented. The numerical c performance metric Mflops, first introduced in 1970 in this cpu test c to quantify the cpu performance range of numerical applications, is c discussed. Analyses of the LFK performance results argue against c reducing the cpu performance range of supercomputers to a single c number. The 24 LFK measured rates show a realistic variance in c Fortran cpu performance that is essential data for circumspect c computer evaluations. Cpu performance data measured by the LFK test c on a number of recent computer systems are tabulated for reference. c c c c I: FORTRAN CPU PERFORMANCE ANALYSIS c c c These kernels measure Fortran numerical computation rates for a c spectrum of CPU-limited computational structures or benchmarks. c The kernels benchmark contains extracts or kernels from more c than a score CPU-limited scientific application programs. These c kernels are The most important CPU time components from The c application programs. This benchmark may be easily extended c with important new kernels leaving performance statistics intact. c c The time required to convert, debug, execute and time many, c entire, large programs on new machines each having a new c implementation of Fortran, or several implementations or c dialects rapidly becomes excessive. Almost all The conversion c costs are in segments of The programs which are irrelevant for c evaluation of The CPU, e.g., I/O, Fortran variations, memory c allocation, overlays, job control, etc. all of these c complexities are reduced to a single, small benchmark which uses c a minimum of I/O and a single level of storage. further, the c computation in the kernels is the most stable part of the c Fortran language. c c The kernels benchmark is sufficient to determine a range of CPU c performance for many different computational structures in a c single computer run. Since The range in performance is usually c large the mean has a secondary significance. To estimate the c c c c c +++++++++++++++++++++++++++++++++++++++++++++++++ +++ ED (Edward Quillen) (517) 336-1293 +++ +++ Vet Teaching Hosp. System Admin. +++ +++ Email: quillen@cps.msu.edu +++ +++++++++++++++++++++++++++++++++++++++++++++++++