Programmer's ROM - The Computer Language Library

home *** CD-ROM | disk | FTP | other *** search

/ Programmer's ROM - The Computer Language Library / programmersrom.iso / ada / test / jpmobnch.doc < prev next >

Wrap

Text File | 1988-05-03 | 20.0 KB | 406 lines

From: MAILER 22-FEB-1985 13:13 To: WIS Subj: [Netmail From: court@mitre] BACKGROUND AND DATA ON LEVEL 6 ADA TESTS Mail-From: ARPAnet host MITRE rcvd at Fri Feb 22 13:13-PST Date: 22 Feb 1985 15:47:17 EST (Friday) From: Terry Courtwright <court@mitre> Subject: BACKGROUND AND DATA ON LEVEL 6 ADA TESTS To: wis at nosc-tecr ----BEGINNING OF FORWARDED MESSAGES---- Date: 19 Feb 1985 11:48 PST From: WWHITAKER@USC-ECLB.ARPA Subject: BACKGROUND AND DATA ON LEVEL 6 ADA TESTS To: ALEXANDER@MITRE, COURT@MITRE There are a series of very simple benchmarks called COMPA, COMPB, ... which are used to test the validity of various assumptions that one might make about the behavior of a compiler. Probably all the implicit assumptions are valid, these tests just check that something has not been overlooked that could severely distort detailed quantitative tests. There should be no significance given to the numerical results of these tests, they just provide a framework for other tests. There is not even a pressing reason to make sure of the status (or emptiness) of the machine on which they are run, since the desired comparison is one to another, not to some absolute. COMPA contains the minimal procedure framework surrounding 500 separate Ada declaration statements, each on a separate physical line, yielding 503 statements and 505 lines. COMPB also has the 500 variables declared, but in one long Ada statement spread over 50 physical lines, for a total of 3 statements and 55 lines. COMPC has the minimal framework with 500 executable assignment statements, 5 per line, yielding 503 statements and 106 lines. COMPD also has the 500 assignment statements, but they are placed 1 per line, yielding 503 statements and 506 lines. COMPE is like COMPD, but interjects a comment line for every assignment, yielding 503 statements and 1006 lines. COMPF has the same lines as COMPE, but all the comments are bunched together, it also has 503 statements and 1006 lines. COMPG has the structure of COMPE, but is twice as long to test linearity, it has 1003 statements and 2006 lines. COMPK is a single package containing 25 very small, but unique, packages. COMPK1-COMPK25 are the same packages as are contained in COMPK, but as separate packages. COMPL "with"s the COMPK package, and exercises a function from each of the included 25 packages. COMPM "with"s the 25 separate COMPKn packages and exercises a function from each, testing, by comparison to COMPL, the cost of "with"ing packages. COMPN is a null procedure, with neither declaration nor executable statement, yielding 2 statements and 4 lines. COMPT is like COMPZ, a minimal program, but "with" and "use" TEXT_IO, yielding 5 statements and 6 lines. COMPZ is a nominal minimal program with 1 declaration and 1 assignment, yielding 3 statements and 5 lines. A typical set of runs might yield the following information: On a lightly loaded VAX 11/780 running Telesoft 1.3d compiler - COMPN took a minimum of 13 clock seconds. This is presumably the time to load in the compiler off disk and perform a minimal job. Other compiles might be compared by subtracting this minimum, or the value that is obtained for the loading at the time of run. With as many as 6 users on the machine occasionally the time would go to 25 seconds, but that much variation would be extreme. COMPZ had the same minimum of 13 seconds, but longer times seemed to be more common. There may be an effect of doing that small aditional work. COMPT had the additional burden of "withing" TEXT_IO and took an additional 5 seconds for 18 total. COMPB regularly took 25 seconds under the load that was yielding mostly 13 seconds for COMPN. COMPA took 47 seconds under the same conditions. This means that the breaking up of the 500 declarations into separate statements had a effect, but it was not proportional to the number of statements. Nor was it even proportional to the number of lines. This may be interpreted to indicate that the exact formatting of declarations, while it may produce a measureable difference in extreme cases, should not be significant for the small differences that could be found between semantically identical programs written by people with somewhat different style, or machine formatted differently. COMPD, with 506 lines, compiled somewhat slower (53 seconds) than COMPC, which is compressed to 106 lines but otherwise identical. Again this shows that extreme variations in format introduce much smaller variations in compile time, for this compiler. Benchmark results should certainly not be significant to the 10% level, and within such limits the number of Ada statements should be the appropriate measure of compiler performance, rather than "lines", and that measure should be essentially independent of normal variations in formatting, for this compiler. COMPE introduced 500 lines of comments, doubling the "lines", into COMPD. The time to compile was 64 seconds. If one took the 13 seconds for a minimal program off, the relative times of the 500 statement program, without and with the 500 comment lines, are 40 and 51 seconds, indicating the relative time to process comments compared to the simplest statement. COMPF compiled in 63 seconds, within measurement error of COMPE. The grouping of comments had no effect. COMPG was double COMPE and, after subtracting the minimal 13 seconds, its time of 115 seconds was exactly double (51 to 102), so the expectation of linearity holds in this case. This was also a fairly large Ada program as measured in "line of code" (which we would not do), but the lines are very simple and short (half are short comments). It could be used to compute an absolute maximum on the compile speed in lines per minute. There is no way to avoid someone doing this, but the number has no meaning in an absolute sense for comparing to real programs. Whether it is of use in relative comparisions is problematic. ACKER and SIEVE are very common elementary benchmarks which may have been run in every possible language and on every machine in existance. They are included to provide a very rough measure of the quality of code generated by the compilers. While the purpose of the COMP benchmarks is to measure compile-time properties, these simple measures of code performance may provide some indication of how much effort goes into the code generation. For the purposes of comparing with other languages, all Ada exceptions have been suppressed with pragma SUPPRESS. This is only advisory to the compiler and may or may not speed up the code. Runs are with as bare a machine as possible. ACKER computes the ACKERMANN function for (3, 6). SIEVE is the BYTE benchmark, ten iterations of calculating the prime numbers up to 16384 by the method of Eratosthes. IMP is a program that contains the timing runs for ACKER and SIEVE in addition to printing various information about the system. It continually changes as it evolves and as the systems differ (does it have LONG_FLOAT?) so it is not a regular benchmark. But for some systems the compile "lines per minute" is recorded to compare with the COMPG values. IMP is a simple, but by no means typical, program, and there is no claim that it represents a good test. The tests were also run on an INTELLIMAC 7000M using TeleSoft 1.3d. The 7000M is a 12.5 MHz (1.5 MIPS!) 68K with a 330 Meg Fujitsu SMD disk and a 25Meg fixed/25 Meg removable Amcodyne cartridge disk. We have 1 Meg of RAM w/ ECC, 8 RS232 ports and a serial printer port. The Unix is a Unisoft port of Version 7 to the 68K. It includes some "Berkley Extensions" (e.g. Vi and Termcap). A short series of tests were run on the IBM 3083 E1 at Billerica under the Telesoft 1.3d and the CMS operating system. This gathered no further information on the compiler, but showed some things about the operating system and provided an example on a mainframe. The Labtek WICAT is a 68000 (8MHz) based desk-top computer running Telesoft 1.3d and the Telesoft ROS operating system. It represents a fairly good stand-alone capability as far as compile speed is concerned and therefore is a measure of how a convenient compiler system should appear to the user. It is, of course, not validated and is therefore not acceptable itself. The tests on the Honeywell Level-6 gathered data on both the clock time and the CPU time. In a couple of cases link time was also noted since these programs must be explicitly linked before execution. For the Honeywell equipment "best" times were given when available. Compilation in a domain that has some previous runs can reduce the time somewhat by having available some previously instantiated units, or specs. After about 50 library files have been created, the search of the library takes longer and the compiler can slow down by a factor of 2. The compilation of IMP, the implementation test program, took 602 (367 CPU) seconds on the 95 at the end of the day, and only 280 (178) seconds in an empty domain. COMPB did not compile, perhaps because of the large number of declarations (COMPA which also has 500 declarations took an anomolously long time to compile). While the machines had up to 4MB of memory, they can only address 1MB at a time, so the extra memory helps only for multiple jobs. Some of the 75 runs were made with VIDEO turned off. This status display absorbed as much as 15-20% of the CPU in this machine. For the 75 the compiles were very high CP users, on the faster 95 the CP utilization was significantly less at the end of the tests when the domain library was large, as shown by the extraordinary run of 25 packages (spec and body) in a single compilation file (COMPK1-25). For COMPL and COMPM the link times are also shown. In some Ada systems most of the linking is done with the compile run and is counted in that time (like Telesoft 1.3). In others, linking is a separate program to be run before executation. When this time is significant, it should be added to the compile times for comparison of these benchmarks. Most Ada programs are composed of a number of packages and one main program; there is only one link necessary, not one per package. ----------------------------------------------------------------------- TELESOFT TELESOFT TELESOFT TELESOFTHONEYWELL HONEYWELL HONEYWELL HONEYWELL 1.3d 1.3d 1.3d 1.3d DDC DDC DDC DDC VAX INTELLIMAC IBM LABTEK L-6/95+ L-6/75 L-6/45 L-6/45 11/780 7000M 3083E1 WICAT 4MB 2MB 2MB 1.25MB TEST VMS UNIX CMS ROS CP CLOCK CP CLOCK CP CLOCK CP CLOCK ------------------------------------------------------------------------- These are null tests of starting the compiler. The COMPN time has been subtracted from the test time of all others reported. COMPN 13 26 .8 11 8 26 20 46 36 71 72 310 COMPZ 13 26 .8 11 8 26 25 63 ------------------------------------------------------------------------- COMPA 34 43 3.3 35 67 83 182 218 361 466 COMPB 12 22 1.6 18 -- -- COMPC 31 39 7.0 39 35 45 87 102 160 193 210 721 COMPD 40 40 7.1 41 41 51 100 119 COMPE 51 46 7.6 43 46 59 117 119 COMPF 7.6 43 118 139 COMPG 102 92 15.3 86 94 125 232 273 max lpm 1180 1308 7867 1400 963 441 COMPK 14 30 79 COMPK1-25 511 2482 COMPL 1.0 4 17 145 Link 11 30 COMPM 4.3 24 165 Link 36 80 COMPN (13) (26) (.8) (11) (8) (26) (20) (46) COMPT 5 5 1.0 6 2 9 7 19 COMPZ (13) (26) (.8) (11) (8) (26) (25) (63) ------------------------------------------------------------------------- Lines per minute IMP 122 Run Times ACKER 2.7 8.3 SIEVE 2.8 7.7 ------- ----END OF FORWARDED MESSAGES---- From: MAILER 27-FEB-1985 05:28 To: WIS Subj: [Netmail From: court@mitre] LEVEL 6 REPORT Mail-From: ARPAnet host MITRE rcvd at Wed Feb 27 05:28-PST Date: 27 Feb 1985 8:00:22 EST (Wednesday) From: Terry Courtwright <court@mitre> Subject: LEVEL 6 REPORT To: wis at nosc-tecr ----BEGINNING OF FORWARDED MESSAGES---- Date: 25 Feb 1985 07:37 PST From: WWHITAKER@USC-ECLB.ARPA Subject: LEVEL 6 REPORT To: COURT@MITRE There are a series of very simple benchmarks called COMPA, COMPB, ... which are used to test the validity of various assumptions that one might make about the behavior of a compiler. Probably all the implicit assumptions are valid, these tests just check that something has not been overlooked that could severely distort detailed quantitative tests. There should be no significance given to the numerical results of these tests, they just provide a framework for other tests. There is not even a pressing reason to make sure of the status (or emptiness) of the machine on which they are run, since the desired comparison is one to another, not to some absolute. COMPA contains the minimal procedure framework surrounding 500 separate Ada declaration statements, each on a separate physical line, yielding 503 statements and 505 lines. COMPB also has the 500 variables declared, but in one long Ada statement spread over 50 physical lines, for a total of 3 statements and 55 lines. COMPC has the minimal framework with 500 executable assignment statements, 5 per line, yielding 503 statements and 106 lines. COMPD also has the 500 assignment statements, but they are placed 1 per line, yielding 503 statements and 506 lines. COMPE is like COMPD, but interjects a comment line for every assignment, yielding 503 statements and 1006 lines. COMPF has the same lines as COMPE, but all the comments are bunched together, it also has 503 statements and 1006 lines. COMPG has the structure of COMPE, but is twice as long to test linearity, it has 1003 statements and 2006 lines. COMPK is a single package containing 25 very small, but unique, packages. COMPK1-COMPK25 are the same packages as are contained in COMPK, but as separate packages. COMPL "with"s the COMPK package, and exercises a function from each of the included 25 packages. COMPM "with"s the 25 separate COMPKn packages and exercises a function from each, testing, by comparison to COMPL, the cost of "with"ing packages. COMPN is a null procedure, with neither declaration nor executable statement, yielding 2 statements and 4 lines. COMPT is like COMPZ, a minimal program, but "with" and "use" TEXT_IO, yielding 5 statements and 6 lines. COMPZ is a nominal minimal program with 1 declaration and 1 assignment, yielding 3 statements and 5 lines. A typical set of runs might yield the following information: On a lightly loaded VAX 11/780 running Telesoft 1.3d compiler - COMPN took a minimum of 13 clock seconds. This is presumably the time to load in the compiler off disk and perform a minimal job. Other compiles might be compared by subtracting this minimum, or the value that is obtained for the loading at the time of run. With as many as 6 users on the machine occasionally the time would go to 25 seconds, but that much variation would be extreme. COMPZ had the same minimum of 13 seconds, but longer times seemed to be more common. There may be an effect of doing that small aditional work. COMPT had the additional burden of "withing" TEXT_IO and took an additional 5 seconds for 18 total. COMPB regularly took 25 seconds under the load that was yielding mostly 13 seconds for COMPN. COMPA took 47 seconds under the same conditions. This means that the breaking up of the 500 declarations into separate statements had a effect, but it was not proportional to the number of statements. Nor was it even proportional to the number of lines. This may be interpreted to indicate that the exact formatting of declarations, while it may produce a measureable difference in extreme cases, should not be significant for the small differences that could be found between semantically identical programs written by people with somewhat different style, or machine formatted differently. COMPD, with 506 lines, compiled somewhat slower (53 seconds) than COMPC, which is compressed to 106 lines but otherwise identical. Again this shows that extreme variations in format introduce much smaller variations in compile time, for this compiler. Benchmark results should certainly not be significant to the 10% level, and within such limits the number of Ada statements should be the appropriate measure of compiler performance, rather than "lines", and that measure should be essentially independent of normal variations in formatting, for this compiler. COMPE introduced 500 lines of comments, doubling the "lines", into COMPD. The time to compile was 64 seconds. If one took the 13 seconds for a minimal program off, the relative times of the 500 statement program, without and with the 500 comment lines, are 40 and 51 seconds, indicating the relative time to process comments compared to the simplest statement. COMPF compiled in 63 seconds, within measurement error of COMPE. The grouping of comments had no effect. COMPG was double COMPE and, after subtracting the minimal 13 seconds, its time of 115 seconds was exactly double (51 to 102), so the expectation of linearity holds in this case. This was also a fairly large Ada program as measured in "line of code" (which we would not do), but the lines are very simple and short (half are short comments). It could be used to compute an absolute maximum on the compile speed in lines per minute. There is no way to avoid someone doing this, but the number has no meaning in an absolute sense for comparing to real programs. Whether it is of use in relative comparisions is problematic. ACKER and SIEVE are very common elementary benchmarks which may have been run in every possible language and on every machine in existance. They are included to provide a very rough measure of the quality of code generated by the compilers. While the purpose of the COMP benchmarks is to measure compile-time properties, these simple measures of code performance may provide some indication of how much effort goes into the code generation. For the purposes of comparing with other languages, all Ada exceptions have been suppressed with pragma SUPPRESS. This is only advisory to the compiler and may or may not speed up the code. Runs are with as bare a machine as possible. ACKER computes the ACKERMANN function for (3, 6). SIEVE is the BYTE benchmark, ten iterations of calculating the prime numbers up to 16384 by the method of Eratosthes. IMP is a program that contains the timing runs for ACKER and SIEVE in addition to printing various information about the system. It continually changes as it evolves and as the systems differ (does it have LONG_FLOAT?) so it is not a regular benchmark. But for some systems the compile "lines per minute" is recorded to compare with the COMPG values. IMP is a simple, but by no means typical, program, and there is no claim that it represents a good test.