Prelude for those that don't read the documentation: Do not mail me bug reports. I can't fix them... Other opinions on the program are welcome. I do not know if this program works on a CPU without math co-proc (like the 486-SX) System Benchmark "SysBench" 0.9.0 --------------------------------- (C) 1994 Henrik Harmsen The disk IO code: (C) 1994 Kai Uwe Rommel Contents: 1 Introduction 2 Tests 3 Copyright notice 4 Thanks Appendix A : Todo Appendix B : Building Appendix C : Example results --- 1 Introduction I thought OS/2 needed a benchmark program, so I wrote one. This program is not quite finished, and probably never will be, not by me anyway, since I'm saying goodbye to OS/2 and turning my attention to Linux. The reasons for this has not so much to do with OS/2, which is still a great OS, as it has to do with Linux. Linux is slick, super-fast, finally has drivers for my Viper card, has free TCP/IP and last but not least, Linux is Unix. This means I am probably not going to make updates to this program, since I won't have OS/2 on my disk anymore. I'm saying probably, since I can't read the future. Maybe one day my whimsical mind will think OS/2 is more fun that Linux, who knows ? :-) It also means that I am donating this program to anyone who is willing to continue working on it. If you think you want to continue working on this program, make sure you clearly note that this is released by you, not me. To do this, change the version number to 0.9.0xxx, where xxx are your initials. For example 0.9.0hch, which would indicate that I (Henrik C Harmsen) has made this release. The version numbering scheme should follow that of GCC. The first number is the major release number, to be increased when major enhancements have been made to the program or it is considered out of beta. The second number is the minor release number, increase it when you have made small changes to the program. The last number should be increased when making bug-fixes only. Take a look at the appendices for more information on what needs to be done, what's not quite finished yet, and how to re-build the program. Among other things, this document needs rewriting. Do not send me complaints about bugs and errors, since I will have no way of fixing them... Now, that said, let's take a look at what this program tests. 2 Tests HANDLE WITH CARE! DO NOT BLINDLY TRUST BENCHMARK VALUES. THEY ARE ONLY GOOD IF YOU KNOW WHAT THEY ARE TESTING AND KNOW WHAT THEY ARE NOT TESTING... The values obtained here are not useful for comparing against values obtained from other benchmarks programs. Even though one of the tests for example measure Linpack performance and yields a value in MFLOPS, this value is not useful in comparing with other values from a different benchmark program. The only exception here is the dhrystone 2.1 value which might possibly be compared to values from other dhrystone 2.1 benchmarks. As a rule: Only compare values with people running this same benchmark program. Almost all tests are adaptive in that they will first measure the approximate speed of your computer so the test will take about 10-15 seconds in total, no matter how slow or fast your computer is. The ones that are not adaptive are the floating point tests and the CPU integer tests with the exception of the dhrystone test. 2.1 Graphic tests These tests test how fast the video hardware/display driver combination can pump pixels to the screen. OS/2 has long had abysmal display drivers for many cards, these tests are meant to sort out whether they really are bad, good or stink. Most window operations are using only a few key operations of the video card accelerator. Take a look at your windows, they're mostly built from filled rectangles, with some text and vertical and horizontal lines. Maybe a few bitmaps here and there (icons and such). The PM-marks are calculated from the other values as a weighted arithmetic mean-value. 2.1.1 BitBlit S->S Copy Tests the speed of the bitblit screen->screen copy operation. One of the most important values, since it affects how fast you can scroll text, and move large windows. 2.1.2 BitBlit M->S Copy Tests the speed of the bitblit memory->screen copy operation. This affects how fast updates of large bitmaps are and all operations that copy data from RAM to Video RAM. 2.1.3 Filled rectangle, patterned filled rectangle. Tests how fast the blitter can blank areas with a color or stipple pattern. When updating a window, the background is usually blanked with a single color or pattern before text or other things are drawn on it. 2.1.4 Lines Tests the speed of line-drawing in different directions. The horizontal and vertical line drawing speed is important when drawing frames around windows and such. 2.1.5 Text render Extremely important function for speedy updates in text editors, shell windows, word processors etc. 2.2 CPU Integer tests The CPU tests are divided into two sections, one to test 'integer' performance, meaning not only integer arithmetics but also every other 'normal' program that does some kind of data processing. 99% of all applications do not use floating-point arithmetic. Those that do are usually ray-tracers, scientific engineering type of programs etc. The CPU-int marks are calculated as a weighted mean average of the other tests. 2.2.1 Dhrystone VAX MIPS When reading about how many MIPS a computer performs, that is usually tested by running this Dhrystone test and adjusting the result to be relative to one VAX 11/780 MIPS. That means, this test does not benchmark the number of million instructions per second (MIPS) as defined by machine instructions, but rather a weighted value against the base reference of one VAX 11/780 MIPS. This test uses very little memory, meaning it will measure the CPU performance only, not taking into account other vital parts as memory speed etc. Here is an excerpt from the sources from where I got this program: "Dhrystone is a short synthetic benchmark program intended to be representative for system (integer) programming. Based on published statistics on use of programming language features: see original publication in CACM 27,10 (Oct 1984). Orginally published in ADA, now mostly used in C. Version 2 (in C) published in SIGPLAN Notices 23,8 (Aug 1988), together with measurement rules. Version 1 is no longer recommended since state-of-the-art compilers can eliminate too much 'dead code' from the benchmark (However, quoted MIPS numbers are often based on version 1). Problems: Due to its small size (100 HLL statements, 1-1.5 KB code), the memory system outside the cache is not tested; compilers can too easily optimize for Dhrystone; string operations are somewhat over-represented. Recommendation: Use it for controlled experiments only; don't blindly trust single Dhrystone MIPS numbers quoted somewhere (don't do this for any benchmark)." This test is based on the C-version of Dhrystone 2.1. 2.2.2 Hanoi An integer program which solves the Towers of Hanoi puzzle using recursive function calls. It uses very little memory, and thus does not test memory speed. 2.2.3 Heapsort Tests how fast your computer can sort a large array of random values using the heapsort algorithm. Tests both CPU and memory speed. The MIPS are just a measurement against some arbitrary base MIPS reference. This test uses about 1 MB memory. 2.2.4 Sieve Tests how fast your computer can find lots of prime numbers using the sieve of Eratosthenes using arrays from 8 kB to 1.2 MB. The result is a weighted mean value of the different speeds. Tests both CPU and memory speed. 2.3 CPU floating point tests These tests measure how fast your computer is at floating point arithmetics. (Floating point means non-integer numbers like 2.3, 0.24 etc.) The CPUfloat-marks are calculated as a weighted mean average of the other values. 2.3.1 Linpack This is the Linpack program (floating-point) converted to C. Results here are sensitive to cache effects and memory speed. This version tests only the rolled double precision version. 2.3.2 Flops Estimates MFLOPS rating for specific FADD, FSUB, FMUL, and FDIV instruction mixes. Four distinct MFLOPS ratings are provided based on the FDIV weightings from 25% to 0% and using register-register operations. Works with both scalar and vector machines. Since the program trys to maximize register usage the results are NOT sensitive to main memory speed. In this sense flops yields a peak rating. The four different values are used to get a weighted mean average. 2.3.3 The Fast Fourier Transform This program performs FFT's using the Duhamel-Hollman method for FFT's from 32 to 262,144 points in size. 2.4 DIVE tests DIVE means Direct Interface to video extensions. It is a library in OS/2 that gives fast access to video routines used for programming games or other very demanding graphic applications. It gives the games programmer access to the Holy Graal - a pointer to the frame buffer. The tests here are not incorporated into the benchmark since the DIVE functionality will not actually appear until OS/2 3.0. I will describe them, nonetheless. The DIVE-marks are calculated as a weighted mean average of the other values. 2.4.1 Video bus bandwidth This test makes a copy of the frame buffer and copies it back to the screen a lot of times in order to measure how many bytes per second you can pump data to the video RAM. On my 486-66 machine with a Diamond Viper card this amounts to about 13 MB/s! That means about 42 frames per second in 640x480x256... 2.4.2 DIVE fun This was an entry I added since I had a few ideas on fun screen hacks you can do with DIVE. One of them is smoothly turning the screen upside down and back again. The value obtained here will be highly correlated with the Video Bus Bandwidth test. 2.4.3 Memory to screen copy with DIVE DIVE has built-in routines for copying a large amount of data from RAM or Video RAM to the display with the help of an hardware blitter (if one is available), or software. There are three such tests. The first test just blits an image to the screen, the second performs pixel-doubling, effectivly doubling the size of the display. The third test tests arbitrary stretching of the bitmap when displaying it on screen. If you have Warp II or OS/2 3.0 you will have seen the ability to stretch a running video clip to any size you want. These tests are not finished yet. 2.5 Disk IO tests These tests were programmed by Kai Uwe Rommel, although I have made a lot of changes to his source code. Thanks Kai Uwe!. The tests are available as a free-standing package called diskio14.zip at ftp.cdrom.com. If there are any errors or strange behaviour in these tests then blame me, not Kai Uwe. The test can test all you fixed disks in your system. There is a menu choice to change which disk to test. The DiskIO-marks are calculated as a weighted mean average of the other values. 2.5.1 Average seek time Tests the average seek time of the currently selected disk. I have seen that this is often a bit higher than what the disk manufacturers promise... This is most likely due to different ways of testing things. 2.5.2 Disk transfer speed. Measures how fast the disk can be read NOT using the cache. When I first came across the diskio program by Kai Uwe, my disk performed at about 1.0 MB/s. I thought that was not very good, but perhaps acceptable. Then I started to muck around with the CMOS parameters and by changing the IO block read delay (I think that is what it was called) the speed of the disk jumped from 1.0 to 1.5 MB/s ! Not bad, I thought. But when I upgraded to Warp II the disk performance suddenly jumped to 2.2 MB/s. This is probably due to OS/2 using multiple mode block transfer mode. Then finally, I changed the AT bus speed from 8.3 MHz to 11 MHz and the disk transfer speed jumped again from 2.2 to 2.6 MB/s ! From this can be learned that there seems to be a lot that can be done about slow IO. Just be careful when you muck around with the CMOS parameters though, since there is a very high likelyhood of making mistakes that can make the machine unusable or prone to strange errors. Usually, this is not dangerous, just reset the value to the old one and your machine should perform as before. Sometimes, though, you _can_ destroy your computer by changing values incorrectly. Be warned... 2.6 Memory speed tests Memory speed seems to be a forgotten area when talking about the speed of a computer. You hear a lot about CPU speed and disk speed and video speed and such, but rarely of memory speed. This is wrong IMHO, since a lot of the performance of a computer has to do with memory IO. When PC Magazine measured memory speed in one of their grande tests they discovered a lot of difference between the good and bad performers. I would like to bring this fact into focus: Memory IO speed is a vital part of the performance of your computer, even more so with faster and faster processors. A really fast RISC processor can execute as much as 40 instructions in one memory read... Of course, memory speed timing is a complex issue. How fast a memory access is depends on: The pattern of the access : Random, sequential, local, global ? Cache : Primary and secondary cache size and type. Virtual memory : Paging algorithm, disk IO performace. Motherboard Memory controller : This is the key component to fast mem IO Speed of SIMMS : 60, 70 or 100 ns? etc. etc. These tests are also limited. They cannot test the whole truth about the speed of your memory IO. The Mem-marks are calculated as a weighted mean average of the other values. 2.6.1 Memory copy This test first allocates a chunk of memory and then reads and writes it back and forth a few times to "activate" the memory: Initialize the physical pages, and read it into the caches. This is done to obtain as stable as possible value between measures. It also has the effect of maximizing the access speed. Then it proceeds to copy the first half of the memory to the second and then the second half to the first. This is to diminish the strange effects you get from write-through and copy-back caches. When it says 5 kB copy, that means copying 2.5 kB back and forth. You can clearly see the effects of your caches. As long as the access is within the cache, it is a lot faster. There is also another factor that will make the larger (80-160kB) values jump up and down, and that is the effect of virtual memory. The second level cache performs well on a sequential memory range, but the virtual memory will chop the physical memory into 4kB pages and shuffle them around in physical memory. If you are lucky, the physical pages are sequential but they don't have to be. When they are not, the pages are scattered around and the second level cache (which is almost always a direct-mapped cache) will have a larger probability of mapping several physical pages to the same area. Higher level cache (2-way, 4-way) techniques should help here, but that is not certain. Again, CMOS settings can very much affect the speed of your memory access. Be sure to use as low value as possible on the various wait state entries and make sure the whole memory is cached, not just the first 16 MB if you have more. 2.6.2 Memory read Tested by calculating the checksum over the specified amount of bytes over and over again. 2.6.3 Memory write Tested by writing a value into all longwords of the specified amount of memory. 3 Copyright notice There is no warranty. Use this software at your own risk. Due to the complexity and variety of today's hardware and software which may be used to run this program, I am not responsible for any damage or loss of data caused by use of this software. It was tested and is expected to work correctly, but nobody can actually guarantee this for any circumstances. And because this software is free, you get what you pay for... This program can be used freely for non-commercial purposes. 4. Thanks Thanks to Kai Uwe Rommel (rommel@ars.muc.de) for supplying the disk IO benchmark code and to Al Aburto (aburto@marlin.nosc.mil) for supplying the CPU integer and CPU float benchmark code. -- Henrik Harmsen Email: harmsen@eritel.se Appendix A - TODO 1 Make the CPU integer and CPU float tests adaptive to the speed of the computer. 2 DIVE: Support for bank-switched cards. Better error handling. Finish the Memory->Screen bitblit tests. 3 Graphics test: The Memory to screen bitblit copy is probably not correct for 16 and 24 bit displays. Appendix B - Building You need Cset++ 2.1. Cd src, run nmake. It is probably quite easy to port to emx-gcc. Why are all the source code files named pmb_* ? Well I first wanted to call it PMBench, as a play with WinBench, but it turned out that PC Magazine already had a PMBench program... So I changed the name to SysBench, but I did not have time to change all the 'pmb' to 'sysb'... Appendix C - Example results Example of a result file, when benchmarking my own system, which is: Software: -------------- OS/2 2.11 Diamond Viper display drivers 1.02beta running 1024x768x8 Hardware: -------------- CPU : 486DX2-66 Chipset : UMC Cache : 8 kB level 1, 256 kB copy-back level 2. Memory : 20 MB 70ns. Harddisk: disk 1: Seagate 340 MB. disk 2: Conner CFA540A 540 MB. Video : Diamond Viper VLB, 2MB VRAM, 2.02 BIOS. ------- Sysbench 0.9.0 result file created Sat Oct 22 14:31:27 1994 Graphics BitBlt S->S cpy : 52.640 Mpixels/s BitBlt M->S cpy : 15.581 Mpixels/s Filled Rectangle : 356.366 Mpixels/s Pattern Fill : 90.477 Mpixels/s Vertical Lines : 6.233 Mpixels/s Horizontal Lines : 9.656 Mpixels/s Diagonal Lines : 7.545 Mpixels/s Text Render : 18.553 Mpixels/s ------------------------------------------------------------ Total : 73.835 PM-marks CPU integer Dhrystone : 39.800 VAX 11/780 MIPS Hanoi : 27.083 moves/25 usec Heapsort : 19.290 MIPS Sieve : 37.741 MIPS ------------------------------------------------------------ Total : 32.938 CPUint-marks CPU float Linpack : 2.535 MFLOPS Flops : 3.572 MFLOPS Fast Fourier Tr. : 4.291 VAX FFT's ------------------------------------------------------------ Total : 3.472 CPUfloat-marks Direct Interface to video extensions - DIVE Video bus bandw. : --.--- MB/s (on Warp II, this was ca. 13 MB/s) DIVE fun : --.--- fps M->S, DD, 1.00:1 : --.--- fps M->S, DD, 2.00:1 : --.--- fps M->S, DD, 2.43:1 : --.--- fps ------------------------------------------------------------ Total : --.--- DIVE-marks Disk I/O - disk 2: 528 MB Average seek time : 16.852 ms Transfer speed : 1.990 MB/s ------------------------------------------------------------ Total : 1.465 DiskIO-marks Memory 5 kB copy : 61.561 MB/s 10 kB copy : 49.211 MB/s 20 kB copy : 33.167 MB/s 40 kB copy : 25.707 MB/s 80 kB copy : 25.571 MB/s 160 kB copy : 17.578 MB/s 320 kB copy : 15.526 MB/s 640 kB copy : 13.385 MB/s 1280 kB copy : 11.941 MB/s 5 kB read : 70.885 MB/s 10 kB read : 42.156 MB/s 20 kB read : 42.970 MB/s 40 kB read : 32.170 MB/s 80 kB read : 31.747 MB/s 160 kB read : 21.777 MB/s 320 kB read : 19.533 MB/s 640 kB read : 17.150 MB/s 1280 kB read : 15.710 MB/s 5 kB write : 50.263 MB/s 10 kB write : 47.512 MB/s 20 kB write : 49.802 MB/s 40 kB write : 50.763 MB/s 80 kB write : 48.561 MB/s 160 kB write : 47.028 MB/s 320 kB write : 44.140 MB/s 640 kB write : 44.034 MB/s 1280 kB write : 42.258 MB/s ------------------------------------------------------------ Total : 28.007 Mem-marks