PC Player 1997 July

home *** CD-ROM | disk | FTP | other *** search

/ PC Player 1997 July / PCPL0797.iso / PROGRAMM / VGABENCH / VBENCH.TXT < prev next >

Wrap

Text File | 1994-04-21 | 10.2 KB | 265 lines

--------------------------------- VBENCH ----------------------------------- Requirements: ------------ * 80286 or higher processor * DOS 2.0 or higher * VGA compatible graphics card To compile you need: * C++ compiler -> BORLAND C++ 3.1 * Assembler -> TASM 3.1+ * Linker * Mark Betz's HTimer class Where to get: Compuserve-> the Gamer's forum, Game Design library Files Included: -------------- In addition to this file, the following files should also be in the ZIP file: * VBENCH.EXE - The executable video benchmark program * VBENCH.CPP - The main C++ source module, takes care of calling and calculating time for the benchmarks. * BENCH.ASM - The benchmark Assembly language source module, includes all benchmark code. * VIDEO.ASM - The video mode setup and buffer management code. * VBENCH.MAK - The make file used to compile program * VBENCH.PRJ - The Borland C++ 3.1 project file used to compile program Description: ----------- The VBENCH program was developed for the prime purpose of comparing different blit (block-transfer) techniques in both mode 13h and tweaked mode (planar mode 13h with 4 pages). Hopefully, the benchmarks will serve the purpose of helping graphics programmers choose the technique that suits their application best, based on some of the timing results. By no means should these results be used as the _sole_ reason for choosing a technique, because there are many special cases that aren't accounted for in the benchmarks. Usage: ----- To use the benchmark program, all that's required is that you type in it's name at the command line, like so: VBENCH Press a key at the prompt, and the tests will then go underway... depending on your system, and on the amount of tests being done, this may take some time. Once they are finished, the program will exit and display the benchmark results on your screen. Benchmark Info: --------------- - For Ram-to-Video, AND for Video-to-Ram benchmarks, a 64,016 byte buffer was used as the Ram buffer, and therefore named ram_buffer. The extra 16 bytes on the end are for special non-aligned accesses. For Video-to-Video benchmarks, I copied from higher addresses to lower addresses, using an incrementing index. (ex: 1st copy moves from byte 4 to byte 2, 2nd copy moved from byte 5 to byte 3,etc). The reason for me doing this was because this seemed to be the only way to get the average speed of moves. It may sound weird, but in my tests, at least, on my system, I got much faster speeds if I copied _ahead_, by, say, about 2 bytes, as compared to copying ahead by 32 bytes or copying 'backwards'. I'd be interested in hearing if the same situation occurs to others out there.. all you need do is change the video transfer functions in BENCH.ASM, and compare those results to the 'backwards' moves. - As of yet, all benchmarks are 64,000 byte moves, repeated 10 times. The functions are not _called_ 10 times, but the functions are _performed_ 10 times. Perhaps in the future I will change this, but it doesn't make a difference right now, since I am timing video speed, not function speed. No parameters were passed to the benchmark functions, and no variables were accessed by the function. Each function was aligned on a paragraph boundary. The program was compiled in COMPACT model, so all CALLs were NEAR calls. - One thing to note.. all current functions are aligned optimally for the type of move being done,assuming the type is of word or dword. The reason for this is that aligned moves work much faster than unaligned moves, and you should try to avoid those types of moves as best you can. On my system, I have found that unaligned moves can be just as slow, or even slower than BYTE moves. - For mode 13h benchmarks, ram_buffer was treated as a 64,000 pixel buffer setup just like the screen (linear bitmap). - For Tweaked mode benchmarks, ram_buffer was treated as though it was set up in a planar fashion, meaning every 16,000 bytes of the buffer represented a different plane. This isn't a cheat, but an ideal setup for Tweaked mode video transfers. - I didn't do a interleaved write measure because I usually always code my loops to use REP MOVS' instructions. The interleaved write requires that you do a LOOP, which will always be slower than a REP instruction, since REP doesn't need to load the instruction pointer and whatnot. If you wanted to do a fair comparison of interleaved and non-interleaved writes, you would want to make them _BOTH_ contain LOOP instructions, avoiding the REP MOVS instructions where possible. Specific Benchmark Info: * Shared Benchmarks (benchmarks done in both mode 13h and Tweaked mode) - Byte Blit Write to the screen, using BYTE moves. - Word Blit Write to the screen, using WORD moves. - Word Read Read from the screen, using WORD moves. * Mode 13h-specific Benchmarks - Word Video Transfer Video Transfer (moving data using the video card as the source and the destination), using WORD moves. * Tweaked mode-specific Benchmarks - Hardware Video Transfer Video Transfer (moving data using the video card as the source and the destination), with the video card in Write Mode 1, using BYTE moves. Write mode 1 gives a hardware-assisted move which allows 32-bits to be moved with one MOVSB instruction. 32-bits equals 4 pixels in tweaked mode. Adding Benchmarks: ----------------- To add a benchmark is basically straightforward. The main module, VBENCH.CPP includes a file called VBENCH.H.. this file serves the purpose of defining the benchmarks to be done by the main program. There are two class definitions in VBENCH.H.. one named SharedBenchData, the other named BenchData. Each of these holds a description of the benchmark in string form, and a pointer or pointers to the benchmark function(s). The only thing these classes lack is the timing results (which are stored elsewhere in the program) for each test, but the reason for this is to make adding more benchmarks less work. Now, to add a benchmark test to the list, all you need to do is: 1) Define the function prototype. There are 2 different 'slots' for function prototypes, one for mode 13h function prototypes, and one for tweaked mode function prototypes. While you don't have to put the prototypes in these places, it does help in readability and organization. 2) Depending on the type of benchmark you are peforming, you either A) For mode-specific benchmarks, find the correct list, either the Tweaked mode or Mode 13h list, and add another BenchData object to the list. To add another BenchData object, you must define it like such: function_address,bench_description The function_address is just the benchmark function name, without the parentheses(). The bench_description is a description of the benchmark being performed, in a string form. Example: New13Blit,"A new blit function" B) For shared benchmarks (benchmarks that can be performed in both mode 13h or Tweaked mode), find the shared_benchmark list, and add another SharedBenchData object to the list. To add another SharedBenchData object, you must define it like such: m13_function_address,tw_function_address,bench_description The m13_function address is just the mode 13h benchmark function name, without the parentheses(), and the tw_function_address is the Tweaked mode benchmark function name. bench_description is a description of the benchmark being performed. Example: NewBlit13,NewBlitTw,"A new blit function" Note: the bench_description strings are limited to 30 characters! 3) Include the benchmark functions file in the compiling project, and then compile away! Guidelines for creating benchmark functions: 1) Right now I only do a loop of 10 64,000 byte moves. This can be performed using byte,word, or dword transfers..just be sure to indicate which kind was being done in the benchmark description. 2) I align all benchmark functions on a paragraph boundary, to make sure the timed function is at optimal speed. (though it might not make _too_much_ difference in time) 3) All system ram access is done on the ram_buffer which is located in the Uninitialized Far Data segment, and also paragraph aligned. 4) No parameters are passed, and no variables accessed from within the functions. 5) All functions requiring OUT's or somesuch activity do these within the loop. Even if the OUT is needed only once, it still should be included in the loop, to make sure no cheats are performed. The loop I speak of is the outside loop, not the inner transfer loops. You can use the current benchmark code as a reference if you like. Notes: ----- * This program can be used and distributed without any worry. It is asked, though, that it not be sold for profit. The benchmark can be modified, but with these restrictions: 1) Adding more benchmarks to the program is allowed, so long as the current benchmarks remain in the program. 2) Modifying the current benchmarks is allowed ONLY if you contact the author of that benchmark. Right now, there's only one programmer (me), but I hope that others will also contribute to this benchmark program. 3) Modifying the _way_ in which the benchmarks are timed is allowed ONLY if you contact me first (Dan Corritore) * Actually, I need to find some way of making sure version numbers and additions to the benchmarks are handled correctly, so for now, don't upload any additions/changes until speaking with me (Dan Corritore). * Eventually I will rewrite this documentation once other benchmarks are added, or perhaps if the benchmark program is changed in any way. Any help or suggestions with documentation layout and stuff would be greatly appreciated, as I'm not the best documentor. * Please, if you see a problem with the program code, or any mistakes, or perhaps think I'm going about doing the benchmarks totally wrong, let me know! Email address(es): ----------------- Dan Corritore, author of VBENCH 1.0: CompuServe address: 70243,1110