home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.mac.programmer
- Path: sparky!uunet!sun-barr!decwrl!adobe!jholt@adobe.com
- From: jholt@adobe.com (joe holt)
- Subject: Re: Game Techniques (was: NON-QUICKDRAW GAMES)
- Message-ID: <1992Sep8.191342.15509@adobe.com>
- Sender: usenet@adobe.com (USENET NEWS)
- Organization: Adobe Systems Inc.
- References: <1992Sep8.004821.11323@adobe.com> <Bu8oyn.LD6@acsu.buffalo.edu>
- Date: Tue, 8 Sep 1992 19:13:42 GMT
- Lines: 157
-
- Nick asks some good questions. The biz about 1K per row of 640 bytes is a real shocker,
- isn't it? *All* of that left over memory! But that's the way it's done--it's easier in
- hardware to divide by 1K. You'll notice other such wastes as you go to other screen
- depths/configurations. Most cards do this. RAM is cheap, right?
-
- There are three things that slow down your small screen blasting loop:
-
- while ( !Button() )
- *myScrnPtr++ = 0xAA;
-
- #1 is that the Button() call here is a real cycle sink--altho' it's quite snappy
- compared to, say, CopyDeepMask() with a maskRgn. If you were to profile this code
- you'd see about 95% going to the Button() call. Here's how it compiles for me:
-
- mouse_down
- +003E 00EFECF4 *BRA.S mouse_down+0044 ; 00EFECFA | 6004
- +0040 00EFECF6 MOVE.B #$AA,(A3)+ | 16FC 00AA
- +0044 00EFECFA CLR.B -(A7) | 4227
- +0046 00EFECFC _Button ; A974 | A974
- +0048 00EFECFE TST.B (A7)+ | 4A1F
- +004A 00EFED00 BEQ.S mouse_down+0040 ; 00EFECF6 | 67F4
-
- The branch at the start is only done once to get into the while() loop, so it
- doesn't count. The button testing code, including the trap dispatch overhead and
- the actual Button() code in ROM, comes to about 300 cycles. In contrast, the *one*
- instruction which actually writes to the screen and increments myScrnPtr:
-
- +0040 00EFECF6 MOVE.B #$AA,(A3)+ | 16FC 00AA
-
- Is all of 12 cycles, or just 4% of the loop. Wow. So get rid of the Button()! You
- could test the low-mem global MBState yourself, or make it a for loop. Do this and
- you'll see a blinding increase:
-
- while ( *((char *)0x172) < 0 ) // MBState, if you didn't know...
- *myScrnPtr++ = 0xAA;
-
- Get ready to hit that mouse button!
-
-
- #2 is that you're only stuffing bytes at a time. The 680x0 is non-intuitive in that
- it does *not* take twice as long to write a word into memory, or four times as long
- to write a long. Words are just as fast as bytes. Let me repeat: whenever you store
- a byte, you can store a word just as fast. Gee, that makes sense, doesn't it?
- Whatever the reason for this quantum leap, we game programmers must take advantage
- of it. But wait. Long words are faster than two words, so that's what you've got
- to use. Change the loop above to:
-
- long *longPtr = (long *) myScrnPtr;
-
- while ( *((char *)0x172) < 0 )
- *longPtr++ = 0xAAAAAAAA;
-
- Now you're talking fast. We're also getting into that gray area where it really would
- be better just to code the dang thing in assembly. For example, the compiler is likely
- to code the 0xAAAAAAAA as an immediate operand, but if it's put into a data register
- the speed almost doubles. This might do it:
-
- long *longPtr = (long *) myScrnPtr;
- long color = 0xAAAAAAAA;
-
- while ( *((char *)0x172) < 0 )
- *longPtr++ = color;
-
- Which in assembly looks like this:
-
- mouse_down
- +0042 00ECA5F8 *MOVEA.L A2,A4 | 284A
- +0044 00ECA5FA MOVE.L #$AAAAAAAA,D7 | 2E3C AAAA AAAA
- +004A 00ECA600 BRA.S mouse_down+004E ; 00ECA604 | 6002
- +004C 00ECA602 MOVE.L D7,(A4)+ | 28C7
- +004E 00ECA604 TST.B MBState | 4A38 0172
- +0052 00ECA608 BLT.S mouse_down+004C ; 00ECA602 | 66F8
-
- The first three instructions are only done once, so they don't count. The loop itself--
- the last three instructions--is just 34 cycles. 40x faster than the original version!
-
-
- #3 is nit-picking, but it will really speed things up, too. It's pretty silly to test
- for a mouse down between *every* long. Do it between every ten or so:
-
- while ( *((char *)0x172) < 0 ) {
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- }
-
- The previous version required 34 cycles per long (12 for the write, 22 for the button
- test), or 340 cycles for ten longs. Here, however, the time for ten longs has been cut
- down to 142 cycles (120 for the writes, 22 for the button test). Let's go to the big
- board and see what our totals are so far: 90x (90x!) faster than the original Button()/
- single-byte loop. Jeez, you'd think we could just keep going and eventually fill the
- entire screen in just a few cycles, right?
-
- Well, no. We've sorta reached the end of the speed-up line. These three tricks are
- at the heart of every fast memory operation:
-
- 1. find the fastest instruction that moves the most bytes,
- 2. keep everything in registers, and
- 3. unroll the loops.
-
- Oops. There is a #4.
-
-
- #4. Only write to what you need to write to. This goes back to the observation at the
- beginning of the message: there're 384 bytes per scan line (in Nick's configuration)
- that are unseen (does this remind anyone of the old Apple II text screen holes?). So
- why write to them?! Change the loops to for loops: one for 640/4 = 160 longs per scan
- line, and one for 480 scan lines. See how fast you can fill the screen. Change the
- color on each fill and let it rip. You'll be amazed!
-
- Here's my final version of the thing:
-
- register int x, y;
- register long color = 0;
- char mmuMode;
-
- mmuMode = true32b;
- SwapMMUMode( &mmuMode );
-
- while ( *((char *)0x172) < 0 ) {
- register long *rowBase = (long *) baseAddr;
-
- for ( y = 480; y; --y ) {
- register long *longPtr = rowBase;
-
- for ( x = 640/4/10; x; --x ) {
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- *longPtr++ = color;
- }
- rowBase = (long *) ((char *)rowBase + rowBytes);
-
- }
- color += 0x01010101; // next color
-
- }
-
- SwapMMUMode( &mmuMode );
-
-
- Have fun!
-
- /joe
-