Profiling rendering issues

The Profiler can be used to quickly narrow down which of the two devices used in the Rendering Pipeline we are bottlenecked within—whether it is the CPU or GPU. We must examine the problem using both the CPU Usage and GPU Usage Areas of the Profiler window, as this can tell us which device is working the hardest.

The following screenshot shows Profiler data for a CPU-bound application. The test involved creating thousands of simple cube objects, with no batching or shadowing techniques taking place. This resulted in an extremely large draw call count (around 32,000) for the CPU to generate commands for, but giving the GPU relatively little work to do due to the simplicity of the objects being rendered:

This example shows that the CPU's Rendering task is consuming a large number of cycles (around 25 ms per frame), whereas the GPU is processing for less than 4 milliseconds, indicating that the bottleneck resides in the CPU. Note that this profiling test was performed against a standalone app, not within the Editor. We now know that our rendering is CPU bound and can begin to apply some CPU-saving performance improvements (being careful not to introduce rendering bottlenecks elsewhere by doing so).

Meanwhile, profiling a GPU-bound application via the Profiler is a little trickier. This time, the test involves creating a simple object requiring minimal draw calls, but using a very expensive shader that samples a texture thousands of times to create an absurd amount of activity in the backend.

To perform fair GPU-bound profiling tests, you should ensure that you disable vertical sync through Edit | Project Settings | Quality | Other | V Sync Count; otherwise, it is likely to pollute our data.

The following screenshot shows Profiler data for this test when it is run in a standalone application:

As we can see in the preceding screenshot, the rendering task of the CPU Usage Area matches closely with the total rendering costs of the GPU Usage Area. We can also see that the CPU and GPU time costs at the bottom of the image are relatively similar (about 29 milliseconds each). This is somewhat confusing as we seem to be bottlenecked equally in both devices, where we would expect the GPU to be working much harder than the CPU.

In actuality, if we drill down into the Breakdown View of the CPU Usage Area using the Hierarchy Mode, we will note that most of the CPU time is spent on the task labeled Gfx.WaitForPresent. This is the amount of time that the CPU is wasting while it waits for the GPU to finish the current frame. Hence, we are, in fact, bottlenecked by the GPU despite appearing as though we are bound by both. Even if Multithreaded Rendering is enabled, the CPU must still wait for the Rendering Pipeline to finish before it can begin the next frame.

Gfx.WaitForPresent is also used to signal that the CPU is waiting on Vertical Sync to complete, hence the need to disable it for this test.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.104.238