SHPRTVertex Sample

Description
This sample demonstrates how use PRTEngine, a precomputed radiance transfer (PRT) simulator that uses low-order spherical harmonics (SH). The sample also demonstrates how to use these results to accomplish dynamic light transport using a dynamic lighting environment with a vs_1_1 vertex shader.



Path
Source: Samples\Managed\Direct3D\PRTPerVertex
Executable: Samples\Managed\Direct3D\Bin\csPRTPerVertex.exe


Why is this sample interesting?
Precomputed radiance transfer (PRT) using low-order spherical harmonic (SH) basis functions has a number of advantages over typical diffuse (N dot L) lighting. Area light sources and global effects such as inter-reflections, soft shadows, self shadowing, and subsurface scattering can be rendered in real time after a precomputed light transport simulation. Clustered principal component analysis (CPCA) allows the results of the per-vertex simulator to be compressed so the shader does not need as many constants or per vertex data.

Overview of PRT
The basic idea is to first run a PRT simulator offline as part of the art content creation process and save the compressed results for later real-time use. The PRT simulator models global effects that would typically be very difficult to do in real-time. The real-time engine approximates the lights using SH basis functions and sums the approximated light vectors into a single set of SH basis coefficients describing a low frequency approximation for the entire lighting environment. It then uses a vertex shader to arrive at the vertex's diffuse color by combining the compressed simulator results and the approximated lighting environment. Since the offline simulator did the work of computing the inter-reflections, soft shadows, etc this technique is visually impressive and high performing.

How does the sample work?
The sample performs both the offline and real time portions of PRT. The startup dialog box asks the user which step to perform. The user can run the offline simulator or view a mesh using previously saved results from the PRT simulator. The offline step would typically be done in a separate tool, but this sample does both in same executable.

How does this sample differ from SHPRTPixel sample?
Unlike the SHPRTPixel sample, this sample stores the PCA weights in a vertex stream instead of textures. Also it uses multiple clusters so the cluster ID is needed since every vertex could be a member of a potentially different cluster.

Step 1: Offline
The first step is to run the offline per vertex PRT simulator in the function D3DXSHPRTSimulation(). It takes in a number of parameters to control the operation of the simulator, an array of meshes, and an array of D3DXSHMATERIAL structures. Note that there is 1 material per mesh so each mesh is assumed to be homogenous. The simulator's input parameters and the members of SH material structure are explained extensively by the sample dialog's tooltips. Note that if you pass in more than one mesh they need to be pre-translated into the same coordinate space, and this sample handles just a single input mesh for simplicity.

Fortunately most of the simulator input parameters do not affect how the results are used. In particular one parameter that does affect how to use the results is the "order of SH approximation" parameter. This controls what order of SH basis functions are used to approximate transferred radiance. The explanation of the math behind spherical harmonics is rather involved, but there are a number of useful resources available on the Internet. For example, "Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments" by Peter-Pike Sloan, Jan Kautz, and John Snyder, SIGGRAPH 2002 is a good explanation of PRT and for a more graphics developer friendly introduction to spherical harmonics see "Spherical Harmonic Lighting: The Gritty Details" by Robin Green, GDC 2003.

In addition to the order parameter, the spectral parameter also affects the results. If spectral is on, that means there will be 3 color channels - red, green, and blue. However sometimes it's useful to work just with one channel (shadows for example). Note that with non-spectral you simply use the red channel when calling the D3DX SH functions as the other channels are optional.

The simulator runs for some period of time (minutes) depending on the complexity of the meshes, the number of rays, and other settings. The output is a GraphicsStream which contains an internal header and an array of floats for each vertex of the mesh.

The floats for each vertex are called radiance transfer vectors. These transfer vectors can be used by a shader to transform source radiance into exit radiance. However, since there are order^2 number of transfer coefficients per channel, that means that with spectral and order 6 there would be 3*36 or 108 scalers per vertex. Fortunately, you can compress this using an algorithm called CPCA. The number of coefficients per vertex will be reduced to the number of PCA vectors, and this number does not need to be large for good results. For example, 4 or 8 usually yields good results. So, for example, with 8 PCA vectors and order 6 then instead of 108 coefficients per vertex we will only need 8 coefficients per vertex. The number of PCA vectors must be less than the order^2. For more detail about the math behind CPCA, see "Clustered Principal Components for Precomputed Radiance Transfer" by Peter-Pike Sloan, Jesse Hall, John Hart, and John Snyder, SIGGRAPH 2003.

Step 2: Real time
The equation to render compressed PRT data is :

Note: to see how this equation is derived from a generic rendering equation, see the DirectX documentation.

where:

  • Rp is a single channel of exit radiance at vertex p and is evaluated at every vertex on the mesh.
  • Mk is the mean for cluster k. A cluster is simply some number of vertices that share the same mean vector. This is an order^2 vector of coefficients
  • k is the cluster ID for vertex p
  • L' is the approximation of the source radiance into the SH basis functions. This is an order^2 vector of coefficients
  • j sums over the number of PCA vectors
  • N is the number of PCA vectors
  • wpj is the jth PCA weight for point p. This is a single coefficient.
  • Bkj is the jth PCA basis vector for cluster k. This is an order^2 vector of coefficients

For the real time step, the sample simply collects all of the data needed for this equation and passes the appropriate data to a vertex shader that implements this equation.

How to implement the equation
Now lets look at the details of how the sample collects, stores, and uses the data to execute the above equation in the vertex shader.

In CMyD3DApplication::LoadMesh(), the sample loads the mesh as usual but since CPCA needs (m_dwNumPCAVectors + 1) scalars per vertex, it clones the mesh with a mesh decl that provides room to store this data. The "+ 1" comes from the fact that the vertex shader needs to know an index into an array of cluster data. The sample uses D3DDECLUSAGE_BLENDWEIGHT to store the CPCA data, but this semantic is arbitrary and is chosen because skinning and PRT do not work together. The sample defines BLENDWEIGHT[0] as a float1 and uses it to store an index into a constant array, and BLENDWEIGHT[1] through BLENDWEIGHT[6] are float4s. So the sample can store up to 24 PCA weights in the vertex buffer.

Next the sample reads the simulator's SH PRT results from a file, and puts this data back into a GraphicsStream. Then in CMyD3DApplication::CompressData(), it calls D3DXSHPRTCompress() to apply CPCA using some number of PCA vectors and some number of clusters. The output is a GraphicsStream (called pCPCABuffer) that contains the data needed for CPCA formula above. Note that this sample loads and saves an uncompressed SH PRT buffer, however it would be more efficient to compress then save the buffer to it does not have to be done upon init. The sample doesn't do this since it allows the developer to select the number of PCA vectors and clusters without running the simulator again.

Then the sample extracts CPCA data from pCPCABuffer by first calling D3DXSHPRTCompExtractClusterIDs() to get the cluster IDs for the vertices. This function writes to an array of UINTs where the cluster ID for vertex N will be at puClusterIDs[N]. It then uses this array calculates an array offset for each vertex and stores the offsets in the vertex buffer. The offset is used by the vertex shader to allow it index directly to the data for the current vertex's cluster. This offset is simply the cluster ID * the stride of the constant array filled with CPCA data.

Then the sample calls D3DXSHPRTCompExtractToMesh() with D3DDECLUSAGE_BLENDWEIGHT, 1 to tell the function to store the per vertex PCA weights in the mesh starting at the semantic BLENDWEIGHT[1], and continuing with BLENDWEIGHT[2] and so on until all the PCA weights have been written. Since the app defined BLENDWEIGHT index 1 through 6 to be a float4, if there 20 PCA vectors this function will write to BLENDWEIGHT index 1 through 5. Note that these vertex elements do not need to be float4s. They only need to be signed.

As the equation shows, to calculate the exit radiance in the shader you'll need not only per vertex compressed transfer vectors but also your lighting environment (also called source radiance) approximated using SH basis functions. D3DX provides a number of functions to help make this step easy:

  • D3DXSHEvalDirectionalLight()
  • D3DXSHEvalSphericalLight()
  • D3DXSHEvalConeLight()
  • D3DXSHEvalHemisphereLight()
  • D3DXSHProjectCubeMap()
Just use one of these functions to get an array of order^2 floats per channel for each light. Then simply add these arrays together using D3DXSHAdd() to arrive at a single set of order^2 SH coefficients per channel that describe the scene's source radiance, which the equation labels as L'. Note that these functions take the light direction in object space so you will typically have to transform the light's direction by the inverse of the world matrix.

The last piece of data the sample needs from pCPCABuffer is the cluster mean (M), and the PCA basis vectors (B). The sample stores this data in a large array of floats so that when the lights change it can reevaluate the lights and perform the M dot L and B dot L calculations. To do this it simple calls D3DXSHPRTCompExtractBasis() which extracts the basis a cluster at a time. Each cluster's basis is comprised of a mean and PCA basis vectors. So the size of an array, m_aClusterBases, needed to store all of the cluster bases is:

NumClusters * (NumPCAVectors+1) * (order^2 * NumChannels)

Note that the "+1" is to store the cluster mean. Also note that since both (Mi dot L') and (Bkj dot L') are constant, the sample calculates these values on the CPU and passes them as constants into the vertex shader, and since wpj changes for each vertex the sample store this per vertex data in the vertex buffer.

Finally, CMyD3DApplication::CompressData() calls another helper function CMyD3DApplication::EvalLightsAndSetConstants() which evaluates the lights as described above using D3DXSHEvalDirectionalLight() and D3DXSHAdd() and calls CMyD3DApplication::SetShaderConstants(). This function uses the m_aClusterBases array and the source radiance to perform the M dot L' and B dot L' calculations as described in the above equation and stores the result into another smaller float array, m_fClusteredPCA, of size:

NumClusters * (4 + MaxNumChannels * NumPCAVectors)

This array is passed directly to the vertex shader with Effect.SetValue(). Note that the vertex shader uses float4 since each register can hold 4 floats, so on the vertex shader side the array is of size:

NumClusters * (1 + MaxNumChannels * (NumPCAVectors / 4) )

Since we restrict MaxNumPCAVectors to a multiple of 4, this results in an integer. Also note that evaluating the lights, calculating and setting the constant table is fast enough to be done per frame, but for optimization purposes the sample only calls this when the lights are moved.

Now that the sample has extracted all the data it needs, the sample can render the scene using SH PRT with CPCA. The render loop uses the SHPRTVertex.fx technique called "PrecomputedSHLighting" and renders the scene. This technique uses a vertex shader called "SHPRTDiffuseVS" which implements the exit radiance formula above.

What are the limitations of PRT?
There are limitations of PRT since the transfer vectors are precomputed. The relative spatial relationship of the precomputed scene can not change. In other words, a mesh can be rotated, translated, or scaled since those rigid operations do not change the transfer vectors, but if the mesh is deformed or skinned then the results will be inaccurate. The same logic also applies for scenes. For example, if you pass a scene of 3 meshes to the simulator the real time engine could rotation, translate, scale them all as one, but could not rotate a single mesh independent of the others without getting inaccurate results.

Another major limitation of this technique is that it requires the mesh to be highly tessellated for accurate results since it operates on a vertex level. The sample SHPRTPixel operates on a texel level so it is not dependent on the mesh being tessellated.

However one advantage the vertex based technique has over the texture based technique is that it can run on vs_1_1 hardware while the texture technique requires ps_2_0 hardware.

As a side note, since this technique uses low order spherical harmonics the lighting environment is assumed to be low frequency.

Also note that if you mix meshes that have subsurface scattering with ones that do not then you will likely need to scale the transfer coefficients for the subsurface scattered mesh since they are around 3x darker. With a single mesh you can simply scale the projected light coefficients. You can scale that transfer coefficients by using D3DXSHPRTGetRawDataPointer() and scaling before compressing the data.

Media
Uffizi, St. Peter's, Grace, Galileo, and RNL Light Probe Images Copyright (C) 1999 Paul Debevec, used with permission.
For additional images and resources, see http://www.debevec.org/Probes/
The images are made available in Greg Ward's RADIANCE high dynamic range image format, http://radsite.lbl.gov/radiance/
The images were converted to the vertical cross format using HDR Shop: http://www.debevec.org/HDRShop/



Copyright (c) Microsoft Corporation. All rights reserved.