Render States


The new XNA 4.0 render states system is very nice. It imposes the “explicitly setting all state” paradigm, but makes it easy and efficient to explicitly set all states. Moreover, if a set state operation is performed and the current state is the same as the new one, then no CPU-GPU communication will occur. Let’s see this example:

for (int i = 0; i < 100000; i++)
{
    SystemInformation.Device.BlendState = BlendState.Opaque;
    SystemInformation.Device.DepthStencilState = DepthStencilState.None;
    SystemInformation.Device.BlendState = BlendState.Opaque;
    SystemInformation.Device.DepthStencilState = DepthStencilState.Default;
}

This runs at 700 frames per second.

for (int i = 0; i < 100000; i++)
{
    SystemInformation.Device.BlendState = BlendState.Opaque;
    SystemInformation.Device.DepthStencilState = DepthStencilState.None;
    SystemInformation.Device.BlendState = BlendState.AlphaBlend;
    SystemInformation.Device.DepthStencilState = DepthStencilState.Default;
}

And this code runs at 8 frames per second.

Render States in XNA Final Engine


The majority of the XNA Final Engine users won't have the need to set any render state and the current implementation of the XNA render state objects is perfect so no wrapper will be implemented.
FX render states won’t be used because there are problematic and slow. Also, the XNA states are more intuitive, except for the sampler states. However, for the sake of performance even the sampler states will be set using XNA.

Shawn Hargreaves said:
"There are some bugs in XNA 4.0 with how we handle parameter validation for effects that change states after those effects have been cloned. You can work around these issues by just not setting any renderstates from your .fx files, and doing all your state setting from C# instead (which will also be a little faster than using .fx to set states)."

Render States in DX9 and DX10


It is important to know how DX9 and DX10 work because it is possible that DX9 GPUs run a little bit slower with the default render states system. Moreover, XNA 4.0 was designed to work only in DX10 GPUs when Hi-Def mode is used thus it is possible that they were not considering DX9 GPUs performance (not sure because Reach mode uses the same system).

Charybdis summarizes the render state management of both APIs:

“In the days of Direct3D 9, the runtime would shadow all the state information to implement Get() operations. This overhead was avoided by using a PURE device, but that also meant that the runtime could no longer reject redundant state settings so it was up to the application to do this efficiently. For all hardware, there is a significant validation overhead to ensure that a combination of state is valid for the GPU and won't cause it to hang the hardware or have some other bad result, and with Direct3D 9's "a hundred little toggles" model of state there was a lot of possible combinations of changes. The Direct3D 9 runtime implemented "dirty flags" to block the state changes into a smaller subset but ultimately any change of state required a revalidation to be done by the driver, typically the next time a draw made it through the runtimes command queue to the driver mode (which itself was batched to minimize the user/kernel mode transition cost).

The best practice for Direct3D 9 is to always use the PURE device, but make sure your application filters out redundant state and does any shadowing needed for 'get' operations.

All of this motivated the major design changes in Direct3D 10/11 around state management (See http://msdn.microsoft.com/en-us/library/bb205071.aspx). States are managed as explicit object sets to keep the number of things to check at runtime to a minimum, and the GPU is able to cache GPU-specific equivalent state objects at the driver level for faster state changes. All the costly validation is handled at creation time for the state objects (and most other resources). The Direct3D 10/11 runtime does a little work at creation time to try to coalesce redundant state objects because there is a limited number of slots in the driver’s GPU state cache (the API supports up to 4096 ‘live’ state objects for each type), and it is trivial to reject redundant state sets when given the same object as is currently active. The challenge with Direct3D 10/11 and is not dealing with redundant state or avoiding get operations. It is understand the full combinations of state in your application to ensure good reuse and to front-load the creation of all these state objects you need for the application.

Which particular sort order is optimal for your draw operations will vary by scenario, API, and other factors. The ideal is to have all given state/shader/texture combinations set only once, and all geometry that needs those settings be drawn in as few draw calls as possible. This ideal is not easy to achieve, so you have to use various heuristics to find the best approximation that suites your scene. For Direct3D 10.x/11, state objects eliminate basic state changes as a major source of slowdown, so you can focus on sorting by shaders, resources, etc. For Direct3D 9, eliminating small draw batch sizes remains the primary way to get good performance."

Michael Brown (a colleague) uses Hi-Def mode under his DX9 GPU and he find out that the XNA 4.0 render state setting is not efficient therefore he uses the FX render state setting. I believe that it is far more important to be DX10 and Xbox 360 friendly and I will continue using the XNA 4.0 alternative. However, if DX9 GPUs performance is critical it is possible that you need to evaluate the use of the FX render state setting.

References


http://blogs.msdn.com/b/shawnhar/archive/2010/04/02/state-objects-in-xna-game-studio-4-0.aspx
http://forums.create.msdn.com/forums/p/72882/445994.aspx
Shanon Drone, Windows to Reality - Getting the Most out of Direct3D 10 Graphics in your Games, Gamefest 2007.


Last edited Jan 22, 2013 at 6:59 PM by jischneider, version 22

Comments

No comments yet.