home *** CD-ROM | disk | FTP | other *** search
-
- 3DGame Engine, by: Jason Freund and Gabe Dalbec.
-
-
-
-
- INTRO
-
- This paper describes the algorithm used to create the demo, the current
- problems with it, and possiblities for future expansion. It's intended for
- people who might want to join a team to develop this into a PD game (see
- rot.invite file) or people curious about the technical aspects of the
- program.
-
-
-
- CURRENT WORK
-
- I haven't put much time into this program lately because my partner is
- in Mexico for a couple of months and because of school. But there are two
- issues I'm trying to fix now.
-
- First, the perspective problem, described below. The first thing you
- notice about the demo is that walls get stretched.
-
- The second problem is with using the 5th bitplane. Right now, the
- v-flipping the top half of the screen to create the bottom is done with the
- copper. First, I need to rewrite the fast-to-chip routine to draw the
- bottom half of the screen. This will slow down the program a little, but
- it's necessary if I want objects on the floor (and I do). I also need to
- modify this fast-to-chip routine to draw into the fifth bitplane instead of
- allowing the blitter to do it (as described below). The fifth bitplane is
- also a nuisance with my custom brush routines. My partner wrote his own
- brush compression and optimized assembly custom blitter routines so that
- graphics are compressed in memory. My fake fifth bitplane technique isn't
- compatible with this. So before I can display objects into the lower 16
- color palette, I need to modify the blitting routines.
-
-
-
-
- OVERVIEW OF THE ALGORITHM
-
- This program doesn't use any texture mapping or ray tracing. Everything
- is faked. There are many limitations on the engine which allow us to fake
- things. First, walls are all equal-length and equal height. They can only be
- spaced multiples of 1 wall length apart. You cannot see through walls.
- Walls must be perpendicular to each other. The whole dungeon is vertically
- symmetrical. These assumptions allow us to generate each frame in "screen"
- space -- by looking at the start and endpoints of the tops of each wall face.
-
- To reduce the number of rotations per frame, the dungeon data is in
- third normal form. That means we have a list of (X,Y,Z) points representing
- endpoints of wall faces and a list of edges that reference this list of
- points. We also create a list of midpoints from the list of edges. The Z
- values of each point (or height of the walls) are a +constant (ie 0.6).
- This way each edge represents the top line of a wall. All you need is this
- top edge to compute the "zbuffer".
-
- Each frame is defined by a 1 line "z-buffer": typedef struct {
- short height;
- short bitmap;
- short offset;
- short junk; /* to make structure 2^3 long */ } mapline; mapline
- zbuffer[SCREENWIDTH];
-
- Now, for each 1 pixel column on the screen, you have:
-
- 1) height == how far from the top of the screen to start drawing the column.
- You draw from height, down to the center of the screen. 2) bitmap == which
- source bitmap number to draw from at that column. 3) offset == How far in
- the source bitmap to go to grab the column to draw.
-
- For each frame you generate a new "zbuffer" by examining every point
- between the start and endpoint of every edge (that you can at least
- partially see) in the dungeon. First, set height=MIDDLEOFSCREEN for every
- column in the zbuffer. I use integer Breshenham's to generate every point
- between the endpoints of the line segment. Then for each point on the line,
- check to see if the Y coord is higher than the corresponding
- zbuffer[i].height. If it is, then that point represents a column that will
- be in front of everything else and you should update every field in
- zbuffer[i]. When you've done this for every edge, your zbuffer will contain
- everything you need to describe the scene.
-
- Once you generate the zbuffer, just loop through the array, yank _column
- from _bitmap and draw it from _height. Since you know how high the wall is,
- you can calculate how much to shrink or expand that source bitmap column and
- skip or redraw rows of the column right in the loop that copies it to the
- screen.
-
-
- COPY-OUT
-
- The "copy-out" routine is the biggest bottleneck in the program. The
- "copy-out" routine copies a column from the source bitmap to the screen,
- shrinking or expanding it as it copies. I will spend some time writing
- about this aspect of the program since it is the most important part.
- To begin with, it is the most optimized assembly we've ever written. It
- uses lookup tables for every calculation where using a lookup table will
- save a cycle. This is the routine where we made huge changes to our code to
- implement schemes that would remove 1 cycle from the innermost loop. We
- tried about a half dozen schemes before we came up with our final version.
-
- First, our bitmaps are preconverted to 1 big chunky plane rather than 4
- small bitplanes; the first ulong represents the first 8 pixels.
- Consequently, when you want to display a pixel, you do shifts, ands, and ors
- a longword at a time and get the bits for all 4 bitplanes at once. These
- calculations are all done in FAST memory and rendered onto a FAST memory
- screen-sized area. Then, when you're all done rendering the screen, you copy
- it all to planar-mode CHIP memory so you can see it.
-
- It's convenient that the 3000's memory and bus are all four bytes wide
- because that gives us all the colors we could ask for: 16. But we're
- getting 32 colors for almost free. The only cost is blit-clearing a fifth
- bitplane (half the size of the screen), 20% bus contention with denise chip
- which is in competition for bus time to display the picture, and expanding
- our copy-out loop by one to write a 1 or 0 into the fifth plane according to
- which palette is being used in that column. It turns out that all these
- costs don't add up to more than about 5% slowdown. So we took it. The
- "source bitmap number" field of our mapline is also an index into an array
- that tells us which palette (0 or 1) the bitmap uses. Our "copy-out" routine
- sets the appropriate bit in the fifth bitplane of the destination screen
- based on this lookup without taking any time away from the longword logical
- operations on the source bitmap. We used to employ the blitter to fill the
- fifth bitplane in one fell swoop instead of filling it byte-by-byte withe
- the CPU. This was done by drawing vertical lines everywhere the scene
- changed palette. Then we'd fill the screen, telling the blitter to change
- fill modes every time it hit a line. Not only was this slower, but we got a
- slight flickering when we drew the fifth bitplane with blitlines and a
- blitfill due to the fact that we were drawing to the screen at 2 different
- times. But when we started to fill the fifth bitplane inside the copy-out
- loop, the flickering was fixed.
-
- One of the copy-out methods we experimented with was a pipelined engine.
- I am describing it because it would be useful for people who want to modify
- our engine to run on the 500 which doesn't have a fast CPU. You can use the
- CPU to generate the zbuffer while the blitter clears the screen. Then the
- CPU would perform the copy-out. Then the Blitter would do the Vflip while
- the CPU rotated the points for the next frame, and so on. We rejected this
- and other schemes because a 68030 CPU is faster than the blitter for
- everything; any parallelism would not speed up the pipeline. In fact, after
- everything was optimized for the CPU, we didn't need the blitter at all.
-
-
-
- PROBLEMS:
-
- One problem we found is that at the corners of the walls, a few pixels
- from a wall behind would overlap the wall in front. This ocurred because
- we're not using any painter's algorithm to traverse the list of edges. At
- the corners where both walls at a corner are at the same height, there is no
- way to judge which is actually in front. Another problem we had is that
- corners didn't exactly line up because we were using breshenhams to
- calculate points along a line segment.
-
- Both of these problems were fixed by multiplying every Y value by 32
- before calculating points between the endpoints of the edge and then
- dividing by 32 after the zbuffer is generated. Be sure to check for -Y
- values before lsl/r #5's to preserve the sign of Y.
-
- Another problem that we still have is with the perspective of the walls.
- Flat walls or walls far away are fine. But walls that are close to you get
- smeared. This is because we don't calculate any real perspective. We just
- use a linear mapping based on the ratio of the width of the rotated wall to
- the width of the flat, original source bitmap to calculate offsets to the
- source bitmap. Walls that are at a steep angle and close to you end up
- drawing from a very small portion from the source bitmap.
-
- One solution that we tried was to generate real perspective based on a
- recursive algorithm. We already have a list of midpoints for each wall face
- that we use to detect wall collisions. If we rotate these midpoints, we can
- find the ratio between the width on the screen of the close half of the wall
- to the width of the far half of the wall. This midpoint will draw from the
- center of the source bitmap. Then we just recursively subdide each half of
- the wall, keeping track of the new corresponding center on the source bitmap
- as we go, building an array of source bitmap offsets. This is slow, but
- could be precomputed and simulated with a lookup table that just takes in a
- ratio to lookup offsets for the largest possible wall. Smaller walls could
- then use a linear mapping from this large wall. Unfortunately, we still
- haven't been able to get the recursive routine to terminate correctly.
-
-
-
-
- SPEEDUPS:
-
- Our first speedup was to only draw the top half of the screen and use
- the copper to reflect the top half onto the bottom half. You can also get a
- cute floor and ceiling for free out of this by changing the background
- colors at certain rows. This speedup makes the program a good 35% faster
- than drawing all the way down. If this is ever turned into a game with
- objects and monsters, this will obviously have to be re-done with the CPU
- (or blitter for the 500) because vertically symmetrical monsters would be a
- little annoying. But even without using the copper, this speedup should be
- well worth cost of losing non-symmetrical walls.
-
- The list of (X,Y) coordinates representing the endpoints of wall faces
- is sorted by X coordinate. Wall collisions are handled by checking every
- point in the maze. If any point is inside the box you are about to move
- into, an X and/or Y component of your movement is set to 0 so that you slide
- along the edge of a wall. To make the check against every point fast for
- large dungeons, we sort the list of points by X coordinate and only check
- the sublist until the X coordinate is out of the range of your box.
-
-
-
-
- AREAS FOR FUTURE EXPANSION
-
- 2 STORY BUILDINGS
-
- All you need to do is keep a second list of edges representing wall
- faces on the second story. The only difference is that you have to build
- the FAST mem off-screen in two stages. First build a mapline and copy-out
- the second story. Then build a mapline and copy-out the first story. Then
- copy the whole thing to CHIP memory, as usual. You probably don't want too
- many 2 story buildings -- so the first stage should be very quick compared
- to the second.
-
- WINDOWS
-
- Windows would be somewhat difficult to implement in our engine. If it
- were possible to see through a window, through a window, through a window,
- then you would need 4 zbuffers. Painter's algorithm is not necessary. As
- you build your zbuffer, when you find a wall closer than what was previously
- in the zbuffer for that column, instead of just overwriting what was
- previously in there with the new value, you simply shift every zbuffer down
- one and then write the new value in the top-most zbuffer. Without using any
- extra time, you now have zbuffers for every depth and can basically draw the
- screen left-to-right all at once.
-
- DOORS:
-
- Doors would be easy as long as they scroll horizontally as they open so
- that you still only need 1 zbuffer. Anything fancier than plain
- horizontally scrolling doors would enable you to see more than one wall at a
- given column which is not possible.
-
- OBJECTS/MONSTERS:
-
- This would also be very easy. Have a second set of (X,Y) points that
- represent objects. Just blit objects onto the screen after you've finished
- everything else. The zbuffer can easily tell you how much of an object is
- obscured by a wall so that you can clip the appropriate amount when you blit.
-
- A500 COMPATABILITY:
-
- This program was not meant for machines without a fast CPU or a math
- chip. We don't have access to a 500 and we have no particular desire to see
- it run fast on one. We'd be happy to turn our sourcecode over to anyone who
- can read C and can program real well on the 500.
-
- Some things that would have to be done are:
-
- 1) Shrink the screensize. I think one-quarter screen would be small enough.
- Our engine code is modular enough to make adjusting the screen size trivial.
- 2) Rewrite several routines to use integer math. I'm sure eurodemo coders
- have a lot of experience there. Integer math may even help the A3000
- version. 3) Vflip screen with blitter instead of CPU. Use parallel
- processing where-ever possible (see COPY-OUT, above). 4) The two 16 color
- palette trick is done with the CPU. It would have to be redone using
- vertical lines and 1 blitter fill. Actually, this is the way we originally
- did it, so the code is already there -- just commented out. Of course,
- using the blitter, as explained in COPY-OUT, above, will cause flickering
- because you are drawing once for the top four plains and then again later
- for the fifth plane. This could be fixed, however, with doublebuffering
- which you'll probably need anyway on the 500.
-