![]() |
![]() |
SHPRTVertex Sample
Path
Why is this sample interesting? Precomputed radiance transfer (PRT) using low-order spherical harmonic (SH) basis functions has a number of advantages over typical diffuse (N dot L) lighting. Area light sources and global effects such as inter-reflections, soft shadows, self shadowing, and subsurface scattering can be rendered in real time after a precomputed light transport simulation. Clustered principal component analysis (CPCA) allows the results of the per-vertex simulator to be compressed so the shader does not need as many constants or per vertex data.
Overview of PRT
How does the sample work?
How does this sample differ from SHPRTPixel sample?
Step 1: Offline Fortunately most of the simulator input parameters do not affect how the results are used. In particular one parameter that does affect how to use the results is the "order of SH approximation" parameter. This controls what order of SH basis functions are used to approximate transferred radiance. The explanation of the math behind spherical harmonics is rather involved, but there are a number of useful resources available on the Internet. For example, "Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments" by Peter-Pike Sloan, Jan Kautz, and John Snyder, SIGGRAPH 2002 is a good explanation of PRT and for a more graphics developer friendly introduction to spherical harmonics see "Spherical Harmonic Lighting: The Gritty Details" by Robin Green, GDC 2003. In addition to the order parameter, the spectral parameter also affects the results. If spectral is on, that means there will be 3 color channels - red, green, and blue. However sometimes it's useful to work just with one channel (shadows for example). Note that with non-spectral you simply use the red channel when calling the D3DX SH functions as the other channels are optional. The simulator runs for some period of time (minutes) depending on the complexity of the meshes, the number of rays, and other settings. The output is a GraphicsStream which contains an internal header and an array of floats for each vertex of the mesh. The floats for each vertex are called radiance transfer vectors. These transfer vectors can be used by a shader to transform source radiance into exit radiance. However, since there are order^2 number of transfer coefficients per channel, that means that with spectral and order 6 there would be 3*36 or 108 scalers per vertex. Fortunately, you can compress this using an algorithm called CPCA. The number of coefficients per vertex will be reduced to the number of PCA vectors, and this number does not need to be large for good results. For example, 4 or 8 usually yields good results. So, for example, with 8 PCA vectors and order 6 then instead of 108 coefficients per vertex we will only need 8 coefficients per vertex. The number of PCA vectors must be less than the order^2. For more detail about the math behind CPCA, see "Clustered Principal Components for Precomputed Radiance Transfer" by Peter-Pike Sloan, Jesse Hall, John Hart, and John Snyder, SIGGRAPH 2003.
Step 2: Real time
![]() Note: to see how this equation is derived from a generic rendering equation, see the DirectX documentation. where:
For the real time step, the sample simply collects all of the data needed for this equation and passes the appropriate data to a vertex shader that implements this equation.
How to implement the equation In CMyD3DApplication::LoadMesh(), the sample loads the mesh as usual but since CPCA needs (m_dwNumPCAVectors + 1) scalars per vertex, it clones the mesh with a mesh decl that provides room to store this data. The "+ 1" comes from the fact that the vertex shader needs to know an index into an array of cluster data. The sample uses D3DDECLUSAGE_BLENDWEIGHT to store the CPCA data, but this semantic is arbitrary and is chosen because skinning and PRT do not work together. The sample defines BLENDWEIGHT[0] as a float1 and uses it to store an index into a constant array, and BLENDWEIGHT[1] through BLENDWEIGHT[6] are float4s. So the sample can store up to 24 PCA weights in the vertex buffer. Next the sample reads the simulator's SH PRT results from a file, and puts this data back into a GraphicsStream. Then in CMyD3DApplication::CompressData(), it calls D3DXSHPRTCompress() to apply CPCA using some number of PCA vectors and some number of clusters. The output is a GraphicsStream (called pCPCABuffer) that contains the data needed for CPCA formula above. Note that this sample loads and saves an uncompressed SH PRT buffer, however it would be more efficient to compress then save the buffer to it does not have to be done upon init. The sample doesn't do this since it allows the developer to select the number of PCA vectors and clusters without running the simulator again. Then the sample extracts CPCA data from pCPCABuffer by first calling D3DXSHPRTCompExtractClusterIDs() to get the cluster IDs for the vertices. This function writes to an array of UINTs where the cluster ID for vertex N will be at puClusterIDs[N]. It then uses this array calculates an array offset for each vertex and stores the offsets in the vertex buffer. The offset is used by the vertex shader to allow it index directly to the data for the current vertex's cluster. This offset is simply the cluster ID * the stride of the constant array filled with CPCA data. Then the sample calls D3DXSHPRTCompExtractToMesh() with D3DDECLUSAGE_BLENDWEIGHT, 1 to tell the function to store the per vertex PCA weights in the mesh starting at the semantic BLENDWEIGHT[1], and continuing with BLENDWEIGHT[2] and so on until all the PCA weights have been written. Since the app defined BLENDWEIGHT index 1 through 6 to be a float4, if there 20 PCA vectors this function will write to BLENDWEIGHT index 1 through 5. Note that these vertex elements do not need to be float4s. They only need to be signed. As the equation shows, to calculate the exit radiance in the shader you'll need not only per vertex compressed transfer vectors but also your lighting environment (also called source radiance) approximated using SH basis functions. D3DX provides a number of functions to help make this step easy:
The last piece of data the sample needs from pCPCABuffer is the cluster mean (M), and the PCA basis vectors (B). The sample stores this data in a large array of floats so that when the lights change it can reevaluate the lights and perform the M dot L and B dot L calculations. To do this it simple calls D3DXSHPRTCompExtractBasis() which extracts the basis a cluster at a time. Each cluster's basis is comprised of a mean and PCA basis vectors. So the size of an array, m_aClusterBases, needed to store all of the cluster bases is:
NumClusters * (NumPCAVectors+1) * (order^2 * NumChannels)
Note that the "+1" is to store the cluster mean. Also note that since both (Mi dot L') and (Bkj dot L') are constant, the sample calculates these values on the CPU and passes them as constants into the vertex shader, and since wpj changes for each vertex the sample store this per vertex data in the vertex buffer. Finally, CMyD3DApplication::CompressData() calls another helper function CMyD3DApplication::EvalLightsAndSetConstants() which evaluates the lights as described above using D3DXSHEvalDirectionalLight() and D3DXSHAdd() and calls CMyD3DApplication::SetShaderConstants(). This function uses the m_aClusterBases array and the source radiance to perform the M dot L' and B dot L' calculations as described in the above equation and stores the result into another smaller float array, m_fClusteredPCA, of size:
NumClusters * (4 + MaxNumChannels * NumPCAVectors)
This array is passed directly to the vertex shader with Effect.SetValue(). Note that the vertex shader uses float4 since each register can hold 4 floats, so on the vertex shader side the array is of size:
NumClusters * (1 + MaxNumChannels * (NumPCAVectors / 4) )
Since we restrict MaxNumPCAVectors to a multiple of 4, this results in an integer. Also note that evaluating the lights, calculating and setting the constant table is fast enough to be done per frame, but for optimization purposes the sample only calls this when the lights are moved. Now that the sample has extracted all the data it needs, the sample can render the scene using SH PRT with CPCA. The render loop uses the SHPRTVertex.fx technique called "PrecomputedSHLighting" and renders the scene. This technique uses a vertex shader called "SHPRTDiffuseVS" which implements the exit radiance formula above.
What are the limitations of PRT? Another major limitation of this technique is that it requires the mesh to be highly tessellated for accurate results since it operates on a vertex level. The sample SHPRTPixel operates on a texel level so it is not dependent on the mesh being tessellated. However one advantage the vertex based technique has over the texture based technique is that it can run on vs_1_1 hardware while the texture technique requires ps_2_0 hardware. As a side note, since this technique uses low order spherical harmonics the lighting environment is assumed to be low frequency. Also note that if you mix meshes that have subsurface scattering with ones that do not then you will likely need to scale the transfer coefficients for the subsurface scattered mesh since they are around 3x darker. With a single mesh you can simply scale the projected light coefficients. You can scale that transfer coefficients by using D3DXSHPRTGetRawDataPointer() and scaling before compressing the data.
Media |