Instancing Sample
Microsoft DirectX 9.0 SDK Update (October 2004)

Instancing Sample


This sample demonstrates the instancing feature available with Microsoft DirectX 9.0c. A vs_3_0-capable device is required for this feature. The sample also shows alternate ways of achieving results similar to hardware instancing, but for adapters that do not support vs_3_0. The shader instancing technique shows the benefits of efficient batching of primitives.

Note  On graphics hardware that does not support vs_3_0, the sample will run as a reference device.

Path

Source:(SDK root)\Samples\C++\Direct3D\Instancing
Executable:(SDK root)\Samples\C++\Direct3D\Bin\Instancing.exe

How the Sample Works

The sample demonstrates four different rendering techniques to achieve the same result: to render many nearly identical boxes (objects with small numbers of polygons) in the scene. The boxes differ by their position and color.

The user can vary the number of boxes in the scene between one and 1000. Use the sample to monitor performance as the number of boxes increases. As you change the number of boxes, the vertex and index buffer resources are recreated by the sample application's OnCreateBuffers and OnDestroyBuffers methods.

Technique 1: Hardware Instancing

Hardware instancing requires a vs_3_0-capable device. The instance-specific data is stored in a second vertex buffer. The rendering is implemented in the sample application's OnRenderHWInstancing method. IDirect3DDevice9::SetStreamSourceFreq is used with D3DSTREAMSOURCE_INDEXEDDATA to specify the number of boxes and with D3DSTREAMSOURCE_INSTANCEDATA to specify the frequency of the instance data (in this case, frequency equals one).

Drawing the scene is accomplished as follows:

pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, 0, 0, 4 * 6, 0, 6 * 2 ); 

Technique 2: Hardware Instancing with Draw Call Batching

This is the most efficient technique that does not use the hardware to perform the instancing. One call to IDirect3DDevice9::DrawIndexedPrimitive handles multiple primitives at once. The instance data is stored in a system memory array, which is then copied into the vertex shader's constants at draw time. Since vs_2_0 only guarantees 256 constants, it is not possible for the entire array of instance data to fit at once. This sample's .fx file can only batch-process 120 constants (one float4 is required for box position, and one float4 is required for box color). Rendering is performed in the sample's OnRenderShaderInstancing method.

int nRenderBoxes = min( nRemainingBoxes, g_nNumBatchInstance ); 
... 
pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, 0, 0, nRenderBoxes * 4 * 6, 0, nRenderBoxes * 6 * 2 ); 

This method would also work on a vs_1_1 part, although not as efficiently because vs_1_1 only guarantees 96 constants. This means that batching is nearly three times more efficient on vs_2_0 hardware than on vs_1_1 hardware, but even on vs_1_1 hardware, batching still outperforms no batching.

Technique 3: Hardware Instancing without Draw Call Batching

This technique is somewhat similar to Technique 2 without the batching (batch size = 1). It will render one box at a time, by first setting its position and color and then calling IDirect3DDevice9::DrawIndexedPrimitive. The technique is implemented in the sample's OnRenderConstantsInstancing method.

Technique 4: Hardware Instancing with Stream Instancing

As in Technique 1, this technique uses a second vertex buffer to store instance data. Instead of setting vertex shader constants to update the position and color for each box, this technique changes the offset in the instance stream buffer and draws one box at a time. The technique is implemented in the sample's OnRenderStreamInstancing method.

CPU- vs. GPU-bound

When the sample renders a large number of boxes, only Techniques 1 and 2 are graphics processing unit (GPU)-bound, while Techniques 3 and 4 waste many CPU cycles by making numerous calls per frame to IDirect3DDevice9::DrawIndexedPrimitive. To demonstrate the benefits of not being CPU-bound in a real-world application, the sample offers a Goal slider control that simulates CPU resources consumed by secondary tasks such as game logic, artificial intelligence, or physics modeling.

If the sample determines that there is time left to do so before the next frame, it consumes remaining CPU resources according to the position of the Goal slider control. The Remaining for logic statistic displays the percentage of CPU cycles that are being spent on the simulated non-rendering algorithms such as game logic or physics. The higher this percentage is, the more efficiently the CPU is being used for rendering. The Goal settings are implemented in the sample's OnTimer method.

If the scene is loaded with 1000 boxes, the efficient techniques have twice the CPU efficiency of the CPU-bound techniques. Using Techniques 3 and 4 to render many instances will often starve other tasks of available CPU resources.

The following table summarizes hardware and memory requirements for different rendering techniques:

Rendering TechniqueRequires vs_3_0 CardCPU-boundRequires Additional Vertex BufferRequires Additional Shader Constant Buffer
Hardware InstancingYesNoYesNo
Shader InstancingNoNoNoYes
Constants InstancingNoYesNoYes
Stream InstancingNoYesYesNo

Related Topics



© 2004 Microsoft Corporation. All rights reserved.
Feedback? Please provide us with your comments on this topic.
For more help, visit the DirectX Developer Center.