![]() |
![]() |
![]() |
This sample demonstrates the instancing feature available with Microsoft DirectX 9.0c. A vs_3_0-capable device is required for this feature. The sample also shows alternate ways of achieving results similar to hardware instancing, but for adapters that do not support vs_3_0. The shader instancing technique shows the benefits of efficient batching of primitives.
Source: | (SDK root)\Samples\C++\Direct3D\Instancing |
Executable: | (SDK root)\Samples\C++\Direct3D\Bin\Instancing.exe |
The sample demonstrates four different rendering techniques to achieve the same result: to render many nearly identical boxes (objects with small numbers of polygons) in the scene. The boxes differ by their position and color.
The user can vary the number of boxes in the scene between one and 1000. Use the sample to monitor performance as the number of boxes increases. As you change the number of boxes, the vertex and index buffer resources are recreated by the sample application's OnCreateBuffers and OnDestroyBuffers methods.
Hardware instancing requires a vs_3_0-capable device. The instance-specific data is stored in a second vertex buffer. The rendering is implemented in the sample application's OnRenderHWInstancing method. IDirect3DDevice9::SetStreamSourceFreq is used with D3DSTREAMSOURCE_INDEXEDDATA to specify the number of boxes and with D3DSTREAMSOURCE_INSTANCEDATA to specify the frequency of the instance data (in this case, frequency equals one).
Drawing the scene is accomplished as follows:
pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, 0, 0, 4 * 6, 0, 6 * 2 );
This is the most efficient technique that does not use the hardware to perform the instancing. One call to IDirect3DDevice9::DrawIndexedPrimitive handles multiple primitives at once. The instance data is stored in a system memory array, which is then copied into the vertex shader's constants at draw time. Since vs_2_0 only guarantees 256 constants, it is not possible for the entire array of instance data to fit at once. This sample's .fx file can only batch-process 120 constants (one float4 is required for box position, and one float4 is required for box color). Rendering is performed in the sample's OnRenderShaderInstancing method.
int nRenderBoxes = min( nRemainingBoxes, g_nNumBatchInstance ); ... pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST, 0, 0, nRenderBoxes * 4 * 6, 0, nRenderBoxes * 6 * 2 );
This method would also work on a vs_1_1 part, although not as efficiently because vs_1_1 only guarantees 96 constants. This means that batching is nearly three times more efficient on vs_2_0 hardware than on vs_1_1 hardware, but even on vs_1_1 hardware, batching still outperforms no batching.
This technique is somewhat similar to Technique 2 without the batching (batch size = 1). It will render one box at a time, by first setting its position and color and then calling IDirect3DDevice9::DrawIndexedPrimitive. The technique is implemented in the sample's OnRenderConstantsInstancing method.
As in Technique 1, this technique uses a second vertex buffer to store instance data. Instead of setting vertex shader constants to update the position and color for each box, this technique changes the offset in the instance stream buffer and draws one box at a time. The technique is implemented in the sample's OnRenderStreamInstancing method.
When the sample renders a large number of boxes, only Techniques 1 and 2 are graphics processing unit (GPU)-bound, while Techniques 3 and 4 waste many CPU cycles by making numerous calls per frame to IDirect3DDevice9::DrawIndexedPrimitive. To demonstrate the benefits of not being CPU-bound in a real-world application, the sample offers a Goal slider control that simulates CPU resources consumed by secondary tasks such as game logic, artificial intelligence, or physics modeling.
If the sample determines that there is time left to do so before the next frame, it consumes remaining CPU resources according to the position of the Goal slider control. The Remaining for logic statistic displays the percentage of CPU cycles that are being spent on the simulated non-rendering algorithms such as game logic or physics. The higher this percentage is, the more efficiently the CPU is being used for rendering. The Goal settings are implemented in the sample's OnTimer method.
If the scene is loaded with 1000 boxes, the efficient techniques have twice the CPU efficiency of the CPU-bound techniques. Using Techniques 3 and 4 to render many instances will often starve other tasks of available CPU resources.
The following table summarizes hardware and memory requirements for different rendering techniques:
Rendering Technique | Requires vs_3_0 Card | CPU-bound | Requires Additional Vertex Buffer | Requires Additional Shader Constant Buffer |
---|---|---|---|---|
Hardware Instancing | Yes | No | Yes | No |
Shader Instancing | No | No | No | Yes |
Constants Instancing | No | Yes | No | Yes |
Stream Instancing | No | Yes | Yes | No |