Anatomy of a 3D Games Accelerator


In the following 'anatomy of a 3D games accelerator', three essential areas will be discussed; speed, image quality and price. A well-engineered 3D graphics accelerator for the consumer market must first accelerate the frame rate of 3D games, and second, provide added image quality. The goal is to deliver the highest performance at the lowest cost. Speed and image quality contribute to the desired effect for realism and high-level game interactivity. However, the ultimate in both cannot be achieved without leading to a dramatic increase in the price of the product. In order to be considered a viable solution for the Home PC, a graphics accelerator must offer 3D texture-mapping as well as offer great performance for Windows, digital video and DOS applications, as well as provide an upgrade path for hardware mutlimedia add-ons.

When evaluating a graphics accelerator for the emerging consumer market, the following questions need to be answered: Does the accelerator correctly satisfy the key elements required by the consumer? Does the accelerator deliver high frame rates? What price will consumers be willing to pay for a 3D game board? Does the 3D accelerator provide a solution for all application areas essential to the consumer user?

1) SPEED -
High frame rates
The consumer expects the next generation of 3D game accelerators to provide compelling differences in graphics over what they can achieve on today's processors. The key aspect, usually, in making a computer game more compelling is speed, which means high frame rate. Image quality is also an important factor, but only as long as it does not compromise the speed of the game. A case in point: the entire computer gaming world played DOOM™ with "Graphics Details" set to LOW, simply to gain a higher frame rate.
A graphic engine's 3D speed is sometimes rated in terms of million of texels per second (Mtexels/second) or in polygons per second. This refers to the number of textured pixels or the number of 3D polygons the graphics chip is capable of rendering each second. In order to conduct a fair comparison of 3D chips based on their quoted Mtexels/second, the type of texels used to calculate the score must be clearly and completely identified. For example, Matrox Mystique® is capable of rendering 25 million true perspective-correct, shaded, transparent, 16-bit expanded from CLUT 4 or CLUT 8 texels per second. Each of the elements used to identify the texels has an effect on speed or image quality, and needs to be taken into consideration.
A more "real-life", or visually perceptive, evaluation of a 3D chip's capabilities is the number of frames per second (fps) it can render in an application. The ultimate game play is achieved at 30+ fps, but consumers will be satisfied if the graphics board can deliver 20 fps and good image quality. The speed of a 3D game is dependent of many things, but the more taxing to system performance are the 3D geometry and the rendering. The geometry consists of the calculations performed to determine an object's position and color on the screen, and the rendering refers to the action of drawing that object on the screen. By off-loading the rendering portion of the 3D game, the graphics accelerator aleviates the CPU to devote more processing power to other aspects, resulting in an increase in frame rates, and delivering much higher quality images than is the calculations were done in software. However, once the 3D rendering is off-loaded by the graphics accelerator a slow CPU can be the performance bottleneck.
Some techniques, such as PCI Bus Mastering, can also increase the overall performance of the graphics accelerator. This technique and others will be discussed in this paper.

2) QUALITY -
Greater detail adds realism
Consumer and professional 3D users have different needs. Consumers will expect to play fast texture-mapped games with a low-cost 2MB graphics accelerator, while professional 3D users, such as CAD designers, will require high-resolution Gouraud shaded rendering in real-time which can only be delivered by higher-end graphics accelerators. The more detailed the graphics, the more compelling the 3D game will be to the consumer. Greater detail is achieved in a number of ways: the fundamentals being high resolution, high color depth, and perspective-correct texture mapping. Without these key elements, a game's quality will not satisfy the consumer. Other elements can also be used by the manufacturer as additional ways to enhance quality, but are only viable when they do not have a negative impact on overall performance. The body of this white paper will explicitly discuss each 3D feature, and the direct effect of implementing each one on the price/performance of the 3D accelerator.

3) PRICE/PERFORMANCE -
All-in-one solution below $300
The price of a 3D consumer graphics board should not exceed the $300 limit if it is aimed at the mass market. In order to achieve this price point without sacrificing on the speed/quality requirement, important decisions must be made at design-time. For the 3D chip to be at a suitable price, the number of gates or circuitry must be kept to a minimum. The higher the gate count, the larger the chip and therefore the higher the cost. The design of the 3D chip consists of carefully weighting which features add value to the product and how their implementation will affect the speed and cost.
Furthermore, it is not only a question of which 3D functions to integrate in the chip but also what other application areas must be adequately accelerated. The consumer PC is used for many different purposes besides 3D game play: edutainment, home productivity, video playback, and Internet surfing. In order to satisfy this demanding market, the graphics chip must also include a fast 2D GUI, video, and DOS engine as well as have a memory and video upgrade path.
Three general categories of consumer 3D accelerators are being introduced in the market. The first one is an upgrade to the installed graphics accelerator to provide a high-performance pass-through solution. This solution is two-tiered, and therefore expensive, which means it will not be accepted by the integrators or OEMs, and will not be shipped in high volumes. The second category is high-performance 3D accelerators which also offer average 2D and video performance. While these solutions might have a strong presence in the retail channels, they are too expensive for OEMs and their overall performance is too limiting to Integrators. Finally, the third category is a range of products which offer a good balance between price and performance, integrating strong acceleration for 3D, 2D and video, while staying within the price limits of consumers. These boards, such as the Matrox Mystique, are ideal solutions for the Integrator and OEM markets, as well as the retail market, and will achieve high volume shipments.
What is 3D?

3D graphics is the graphical representation of a scene or object along three axes of reference: height, width and depth to make it look more realistic. This technique, which tricks the PC user into seeing a 3D image on a flat screen, is increasingly popular in the PC entertainment market, adding a more realistic and interactive aspect to graphical applications.
To display 3D animations, an object is first represented as a set of point or vertices, in a 3D coordinate systems (x,y,z axes). The vertices of this object - which may be a car, a plane or a complete 3D world - are stored in system RAM and completely define the object. In order to display this object on the flat 2D monitor, the object must then be rendered.
Rendering is the act of calculating on a per pixel basis, the different color and positioning information which will fool the viewer in perceiving depth on the 2D screen. Rendering fills in all of the points on the surface of the object that previously were stored only as a set of vertices.
In this way a solid object, shaded for 3D effect will be drawn on the monitors screens. In order to render the object, it is necessary to determine the color and position information mentioned previously. To do this efficiently, the vertices of the object are segmented into triangles and these triangles (set of three vertices) are then passed down the 3D pipeline one at a time as shown below.

3D Pipeline Diagram

Following the diagram, this is the 3D process:

3D Object Triangularize - segmentation of 3D object into triangles (set of three vertices)

Transformation - processing any translation, rotation or zooming that is required

Clipping - this eliminates any portions of the object that fall outside of the 'window' of the viewer's line of sight

Lighting - depending on where light sources in the world are positioned relative to the different intensity values must be calculated to display shading or shadowing information

Map to Screen - the triangularized, transformed and clipped object must then be mapped to the 2D screen. This is done using perspective algorithms which calculates that when an object is farther away it will appear smaller, and when it is closer it will seem larger

Draw triangle (rendering) - All of the above steps contribute to provide the color and position information that is necessary to render one triangle in the scene. The actual rendering process is the most time intensive portion of the pipeline as the processor is then dealing on a per pixel basis instead of only with sparse vertices. There are many ways to render an image; Gouraud Shading and Texture mapping are the most commonly used in mainstream applications, while Phong Shading, Ray tracing and other techniques might only be used for high end animation and illustration.

Achieving high 3D performance and quality without increasing the price

A number of features can be implemented in the graphics chip by the hardware manufacturer to improve the 3D quality. While some are essential, others are used for additional special effects or improvements in image quality. Some features can be implemented with little additional circuitry, therefore not impacting the overall cost of the graphics accelerator, but improving performance and quality. Others, however, might improve quality to the detriment of speed and price. The following is a discussion of various 3D features, and their impact on performance and cost for the graphics accelerator.

Bus mastering
Implementing PCI Bus Mastering within the graphics chip is an essential asset in achieving high frame rates. Two levels of bus mastering can be supported in typical 3D graphics engines; bus mastering texture fetches and bus mastering command lists. The bus mastering of texture fetches allows the graphics engine to retrieve source texture maps directly from system memory, without requiring the host CPU to be involved in the process. Without a bus master chip, the CPU's bandwidth would be greatly affected as source textures are copied from system memory to frame buffer memory. The second level of bus mastering allows the graphics engine to process rendering commands synchronously with the host CPU. For example, polygon information, which has been computed by the CPU, will be sent to the graphics engine to be rendered, during which the host will start calculating the next frame of polygon information. This translates into a large performance gain over rendering engines which do not support this level of bus mastering, by allowing the graphics processor and CPU to process information at the same time. This feature, which is implemented in the MGA-1064SG is also supported by today's standard 3D APIs. Bus Mastering
Of the engines that support bus mastering, there are two different implementations: the basic bus master, and the scatter-gather bus master. A basic bus master is capable of operating asynchronously with the host, but it can only do so for short periods of time, before it must interrupt the host and ask for direction. In data-intensive operations like 3D, this can minimize the advantages of bus mastering. By contrast, a scatter-gather bus master, as implemented in the Matrox Mystique, is able to operate almost independently from the host, therefore achieving the full performance benefits illustrated below.

Resolutions
Due to different limitations of the operating system or the graphics accelerator, most games have traditionally been developed and played at low resolutions, such as 320 x 200, in order to achieve high performance. Increasing the resolution means adding pixels to be displayed on the screen, which places more demand on the graphics sub-system. Some games developed in 320 x 200 could be played at 640 x 400, but the extra pixels were simply a replication of the existing ones, making the image blocky and pixelated. With today's new standards in software and access to new fast hardware accelerators such as the Matrox Mystique, developers can now include more unique pixel information in each scene, therefore increasing the level of graphics detail at higher resolutions of 640 x 400 or 800 x 600. Gamers can then play these games in high resolutions without any performance loss. Support for higher resolutions is therefore important in a 3D games accelerator.

Color depth
Support for higher color depth also increases the level of details needed to make the scene more realistic. The more colors are used in a scene, the more detailed and realistic it looks, but the more calculations are needed in order to determine the color of a pixel. With the new generation of 3D game accelerators, which support higher color depths without dramatic performance loss, developers can now use more colors in each scene. By using 16-bit (65K) or 24-bit (16.7M) colors instead of the traditional 8-bit (256 colors), developers can add more quality to the image. In the case of 3D games, for example, using these extra colors makes the scenes much richer and more life-like.

Perspective-correct texture mapping
In real life, objects have details which allow us to recognize them. For example, an object made of wood is granular, or has lines going through it, while steel is smooth and shimmering. In games, this detail is called a texture. 3D objects or scenes are made to look more realistic by applying two-dimensional textures onto the objects to add realism.
In the real world, our perspective or view point relative to an object changes as our position changes. For example, when a character is walking along the side of a house, the house will have a different perspective as the position of the character changes in relation to the position of the house. In order to create the same optical experience in a 3D game, texture maps must be continually "corrected" to fit the player's changing perspective in relation to the object. If the texture mapping is not perspective-correct, the image will be visually incorrect, and filled with artifacts. Some 3D accelerators perform non-perspective correct texture mapping in hardware. This trade-off might translate in slight gain in cost over solutions that implement perpective-correct texture mapping, however the resulting graphics will introduce unacceptable quality rendering errors, which heavily degrades image quality. Most new 3D game accelerators, like the Matrox Mystique, are powered by processors which perform true perspective-correction at full rendering speeds. This process is complicated, and requires a number of additional "gates", or circuits, in hardware, but it is the main criteria for quality game play.

Lighting
Lighting is applied on objects to accentuate their curves, or create a genuine ambiance in the scene. The ability to apply lighting or shading to a texture mapped object, therefore, provides a great deal of realism to a 3D scene. Software rendering is limited in the amount of lighting it can perform while maintaining a reasonable frame rate. A key advantage of performing hardware 3D rendering is the ability of the graphics accelerator to apply lighting to polygons while maintaining full rendering speed.

Texture transparency
The technique of texture transparency is similar to "chroma keying" in video. Including complicated objects, such as a tree, in a 3D scene is a challenge for the software developer. The developer must be able to map the tree on a transparent polygon so that the background of the scene can be seen through the branches. Objects like trees may not be essential, but they significantly improve the overall quality of a scene. Without texture transparency these objects are typically left out or roughly simplified to fit the needs of the developers. The MGA-1064SG chip supports texture transparency, allowing developers to add a higher level of detail while maintaining high performance. This technique can be implemented without a large amount of extra circuitry, therefore not resulting in a price increase.

Hardware z-buffering
The use of a z-buffer is necessary when two objects are intersecting each other. The z-buffer determines which portions of the intersecting objects gets drawn and which do not. However, a lot of software developers do not use a z-buffer for all objects in the scene, since the z-buffer takes up space in the off-screen memory, which could be used instead to store extra source textures for greater detail. For this reason, graphics chips such as the MGA-1064SG provide an optional z-buffer, which lets the developer decide whether to use the off-screen memory for z-buffering or texture storage. If a game using a z-buffer, such as id's "Quake"™, is played on a graphics accelerator which does not allow for a hardware z-buffer, the game will not run, or will run at very low frame rates, since all z-buffering will need to be done in software.

Palletized textures
The most taxing element of 3D games on the graphics frame buffer is the storage of source textures in off-screen memory. Each time a new scene is created, all source textures for that specific scene need to be loaded in off-screen memory to be used by the graphics chip. Because a 3D game accelerator generally has 2MB of memory, the amount of off-screen memory available to store these textures is limited. The number of texture that can be used is therefore restricted, reducing the amount of detail and quality of the scene. However, developers can use a method of palletized textures, which assigns a Color Look-Up Table (or CLUT) to each texture in the scene. This technique allows the developer to use a smaller amount of colors for each texture instead of using normal 16-bit color values (65K colors.) This smaller color format (CLUT) requires less space than would the 65K colors, which means more of them can be saved in memory to add detail to a scene. Most graphics accelerators, however, do not support palletized textures, which means the information can only be stored in 16-bit color format in the frame buffer, utilizing all of the available off-screen memory. In that case, the extra textures will have to be stored in and retrieved from system memory, resulting in a serious hit on performance, or textures will have to be dropped from the scene by the graphics accelerator in order to maintain performance. However, the Matrox Mystique provides full support for palletized textures, therefore allowing developers to create very detailed scenes with two to four times as many textures, and gives games a significant performance boost, because the games do not rely on speed of the system to convert the information to 16-bit colors.

The use of Palletized textures in 3D Games
Most new 3D games are developed in 16-bit or 65K colors, allowing for a high level of image quality and detail. Although each texture used to create a scene might only be made of of 200 or 300 colors, with most graphics accelerators, it will need to be stored as 16-bit information, therefore taking the space of 65K colors, in the offscreen memory. In order to assign smaller amounts of colors to the textures to save space in off-screen memory, developers create CLUTs for each texture. Very simple textures, such as a bricks, can be developed with CLUT-4 information, which is a range of 16 colors, and takes up 4 times less space than 65K color information in the offscreen memory. More detailed textures can be developed using CLUT-8 information, or 256 colors, allowing twice as many textures be stored in the same amount of memory compared to 65K color information. The MGA-1064SG is the only processor in the market today to support both CLUT-4 and CLUT-8 formats, and is capable of storing a different CLUT for each texture, meaning different set of 256 colors (or 16 colors) can be used for each texture in the scene. This means developers can use more textures per scene, adding detail, without taking a performance hit as they would with other graphics accelerators. In addition, this feature is fully supported by the Microsoft® Direct3D™API, which means all developers who create 3D applications under Windows 95 can take advantage of it.


Texture mapping methods
As explained earlier, texture mapping is a data-intensive operation which consists in warping a bitmap onto a 3D object or polygon to add more details, thus enhancing realism. The original bitmap used as the texture to be mapped is also called the source texture. There are several ways to map textures onto a 3D object with perspective correction:

· Point sampling:
The most common way to map a texture to a given polygon is through a method called point-sampling. This method allows the graphics engine to approximate the color value of a given pixel on the resulting texture map by replicating the value of the closest existing pixel on the source texture. Point sampling provides very good results when used in conjunction with tile-based MIP mapping (see below), and maintains high performance levels at a low cost

· Filtering:
In some cases source textures will need a considerable amount of warping, which might lead to some pixel blockiness. Although this blockiness is mostly visible in scenes with little motion, some vendors might decide to use a technique called bi-linear filtering, which can be employed to "blur" the textured pixels, making them appear smoother. Bilinear filtering of textures is similar to digital video; four source texel values are read, and their color values are then blended together with weightings based on proximity. The resulting values will be used for the texel to be drawn. While this technique is useful, the resulting quality cannot be compared to that achieved by using high resolution source textures. These larger source textures, however, will use higher portions of available off-screen memory, and therefore can only be done effectively with a graphics accelerator which supports some type of palletized textures, such as the Matrox Mystique. Graphics accelerators without support for palletized textures will have to scale down the textures to store them, and apply filtering to map them onto polygons, resulting in poor quality rendering.

· MIP mapping:
Another way to improve the quality of the 3D texture mapped scene is to use a method called MIP-mapping. The more alterations made to a texture to "fit" an object, the less it will resemble the source texture. One way to avoid this severe deviation from the original texture is to create three copies, or MIP levels, of the same source texture, in different sizes. MIP-mapping can be implemented in three ways:
a) Tile-based MIP-mapping:
Depending on the size of the polygon, the application will determine which MIP-level is the closest in size, and provide this MIP-level to the graphics accelerator to be used as the source texture for that polygon. Tile-based MIP-mapping, as supported by the MGA-1064SG, does not require extra circuitry, since it is programmed in software by the game developer. It results in better overall quality, while its negative effect on performance and cost is minimal.

b) Per-Pixel MIP mapping:
The graphics accelerator calculates, on a per pixel basis, which MIP-level provides the best source. In this way, the graphics accelerator can use different MIP levels on the same tile, accomodating to a change of size in the polygon being drawn. When performed in hardware, it will either result in a significant hit on performance, or if implemented to be effective in speed, will result in a dramatic increase in cost.

c) Tri-linear MIP-mapping:
The graphics accelerator reads source pixels from two different MIP levels, performing bilinear interpolation between the values of each MIP level, and interpolates the values of the two pixels to calculate the resulting pixel. This requires a lot of bandwith, because two source texture maps need to be read simultaneously. When performed in hardware, it will either result in a significant hit on performance, or if implemented to be effective in speed, will result in a dramatic increase in cost.

Using video as texture maps Some 3D chips, such as the MGA-1064SG, also incorporate video playback capabilities, such as scaling and color-space conversion. This allows them to store a video clip as it is being decompressed into the frame buffer, where the 3D graphics processor can then use it as it would a source texture, to apply it onto a 3D polygon. This cannot be performed by an graphics processor that does not incorporate color-space-conversion.

Fogging
In order to maintain high levels of performance, developers created "tricks" to reduce the amount of rendering needed for a given scene. One of these tricks is called fogging, and mostly used in landscape scenes, such as flight simulators. Fogging allows the developer to "hide" the background of a scene behind a layer of "fog", therefore mixing the textures' color values with a monochrome color such as white. Some graphics chips support fogging in hardware, which allows the developer to use this trick. A similar visual effect can also be achieved through depth-cued lighting tricks, a feature which is supported in hardware by the MGA-1064SG.

Blending
Blending is a visual effect which allows the developer to "mix" two textures together to apply them on the same object. Different levels of blending can be implemented to create visual effects. The simplest method, , which is supported by the MGA-1064SG, is called screen door or "stippling" "seeing-through" an object by writing only some of the pixels making up that image. For example, the developer would decide that an object would be 50% transparent. The graphics accelerator would therefore draw the background image, and then write only every second pixel of the "transparent" object. This approach is easy to implement in hardware and delivers a reasonable quality at a low cost.
By contrast, true alpha blending is a data-intensive operation, which involves reading the values of two source textures and performing the perspective calculations on both textures simultaneously. This effect is very taxing on performance, especially with a low-bandwidth frame buffer, and costly to implement in an effective way. The resulting effect does not warrant the loss in performance and therefore is not essential for 3D games.

Anti-aliasing and other rendering effects
High-end 3D application users rely on techniques to improve the quality of the graphics, such as anti-aliasing, Phong shading and Ray-Tracing. These techniques, however, if performed in hardware, are extremely taxing on performance, and require a large amount of dedicated circuitry which raises the cost of the graphics accelerator beyond the $1,000 range. Given the price/performance requirement of the gamer, anti-aliasing is not necessary or realistic in a 3D game accelerator.

Standards vs proprietary 3D techniques
The 3D market has been slow in growing because until recently it lacked a standardized interface, or Applications Programmers' Interface (API), to allow all software and hardware to work together seamlessly. To answer this lack of standardization, several software vendors have designed different APIs over the last two years. Under Windows 95, Microsoft's Direct3D™ has become the predominant standard API in the industry, adopted by the majority of developers.
However, graphics chip manufacturers such as Matrox also have designed their own proprietary API to let software developers write directly to their hardware, therefore allowing them to take full advantage of any feature available. While some developers might choose to write different versions of their applications to take advantage of different graphics accelerators, most of them will develop a Direct 3D version of the same application, in order to have access to the entire installed base of Direct3D compliant hardware. For this reason, Matrox has designed its MGA-1064SG processor for the Matrox Mystique to be as close to the Direct3D specifications as possible, allowing the standard Direct3D version of the applications to be fully optimized when running with the Matrox hardware.
Some graphics engines use proprietary architectures, such as rational quadratic patches (Nvidia) and infinite plane support (NEC), claiming to deliver better quality and performance. While these techniques are interesting in theory, they entail complicated re-compiling of the applications, which is time consuming for the developers. In fact, Microsoft has officially stated that it does not support non-traditional polygon setting in Direct3D. This means that unless games are written directly to these controllers, they will not take advantage of the special features built into them. It can be assumed that not a lot of developers will choose to do so. By contrast, Matrox's 3D architecture is based on the existing traditional software architectures, using triangle polygons, which ensures full compatibility and ease of porting for developers.

Summary
In the end, a good 3D graphics accelerator aimed at satisfying the demanding consumer market, must offer a wealth of functionality at the right price. The majority of this paper supports the fact that high frame rate is essential to a 3D game player. The appropriate mix of features, including resolution, color depth and perspective-correct texture mapping should also be implemented as long as frame rate is not affected negatively. A compelling 3D engine for the consumer market would be one in which 3D games would run at or above 20 fps at 640 x 400 resolutions in 16-bit color. Also to be noted are the other application areas that need to be accelerated in order for a 3D graphics accelerator to be a viable consumer solution. Aside from excellent accelerations for 3D games, high performance for Windows, video and DOS as well as a multimedia upgrade path, at a price below $300, will all be part of the decision making process of the home-buyer.


Bottom toolbar
Copyright © 1996 Matrox Graphics Inc. All rights reserved.
Send all questions and comments regarding this site's construction to webmaster@matrox.com