In this recipe we will explore using a compute shader to gather characteristics from the source image and output to a buffer. The characteristic that we will be determining is the image's luminance histogram, that is, how many texels are there within the texture for each luminance value (mapped from 0.0-1.0 to 0-255).
We will also cover how to retrieve the data from the GPU and load it into an array that is accessible from the CPU.
We'll begin with the HLSL code necessary to calculate the histogram.
Texture2D
SRV; however, this time our output UAV will be RWByteAddressBuffer
.Texture2D<float4> input : register(t0);
RWByteAddressBuffer outputByteBuffer : register(u0);
#define THREADSX 32
#define THREADSY 32
// used for RGB/sRGB color models
#define LUMINANCE_RGB float3(0.2125, 0.7154, 0.0721)
#define LUMINANCE(_V) dot(_V.rgb, LUMINANCE_RGB)
// Calculate the luminance histogram of the input
// Output to outputByteBuffer
[numthreads(THREADSX, THREADSY, 1)]
void HistogramCS(uint groupIndex: SV_GroupIndex, uint3
groupId : SV_GroupID, uint3 groupThreadId: SV_GroupThreadID, uint3 dispatchThreadId : SV_DispatchThreadID)
{ float4 sample = input[dispatchThreadId.xy];
// Calculate the Relative luminance (and map to 0-255)
float luminance = LUMINANCE(sample.xyz) * 255.0;
// Addressable as bytes, x4 to store 32-bit integers
// Atomic increment of value at address.
outputByteBuffer.InterlockedAdd((uint)luminance * 4,
1);
}
var histogramResult = new SharpDX.Direct3D11.Buffer(device, new BufferDescription { BindFlags = BindFlags.UnorderedAccess, CpuAccessFlags = CpuAccessFlags.None, OptionFlags = ResourceOptionFlags.BufferAllowRawViews, Usage = ResourceUsage.Default, SizeInBytes = 256 * 4, StructureByteStride = 4 }); histogramResult.DebugName = "Histogram Result"; var histogramUAV = CreateBufferUAV(device, histogramResult); // Create resource that can be read from the CPU for // retrieving the histogram results var cpuReadDesc = histogramResult.Description; cpuReadDesc.OptionFlags = ResourceOptionFlags.None; cpuReadDesc.BindFlags = BindFlags.None; cpuReadDesc.CpuAccessFlags = CpuAccessFlags.Read; cpuReadDesc.Usage = ResourceUsage.Staging; var histogramCPU = new Buffer(device, cpuReadDesc); histogramCPU.DebugName = "Histogram Result (CPU)";
CreateBufferUAV
.public static UnorderedAccessView CreateBufferUAV(SharpDX.Direct3D11.Device device, SharpDX.Direct3D11.Buffer buffer) { UnorderedAccessViewDescription uavDesc = new UnorderedAccessViewDescription { Dimension = UnorderedAccessViewDimension.Buffer, Buffer = new UnorderedAccessViewDescription .BufferResource { FirstElement = 0 } }; // If a raw buffer if ((buffer.Description.OptionFlags & ResourceOptionFlags.BufferAllowRawViews) == ResourceOptionFlags.BufferAllowRawViews) { // A raw buffer requires R32_Typeless uavDesc.Format = Format.R32_Typeless; uavDesc.Buffer.Flags = UnorderedAccessViewBufferFlags.Raw; uavDesc.Buffer.ElementCount = buffer.Description.SizeInBytes / 4; } // else if a structured buffer else if ((buffer.Description.OptionFlags & ResourceOptionFlags.BufferStructured) == ResourceOptionFlags.BufferStructured) { uavDesc.Format = Format.Unknown; uavDesc.Buffer.ElementCount = buffer.Description.SizeInBytes / buffer.Description.StructureByteStride; } else { throw new ArgumentException("Buffer must be raw orstructured", "buffer"); } // Create the UAV for this buffer return new UnorderedAccessView(device, buffer, uavDesc); }
HistogramCS
shader code.// Firstly clear the target UAV otherwise the value will // accumulate between calls. context.ClearUnorderedAccessView(histogramUAV, Int4.Zero); // Load the image to process (this could be any compatible // SRV). var srcTextureSRV = ShaderResourceView.FromFile(device, "Village.png"); var srcTexture = srcTextureSRV.ResourceAs<Texture2D>(); var desc = srcTexture.Description; // Compile the shaders using (var bytecode = ShaderBytecode.Compile(hlslCode, "HistogramCS", "cs_5_0")) using (var cs = new ComputeShader(device, bytecode)) { // The source resource is the original image context.ComputeShader.SetShaderResource(0, srcTextureSRV); // The destination resource is the histogramResult context.ComputeShader.SetUnorderedAccessView(0, histogramUAV); // Run the histogram shader context.ComputeShader.Set(cs); context.Dispatch((int)Math.Ceiling(desc.Width / 1024.0), (int)Math.Ceiling(desc.Height / 1.0), 1); // Set the compute shader stage SRV and UAV to null context.ComputeShader.SetShaderResource(0, null); context.ComputeShader.SetUnorderedAccessView(0, null); ...SNIP }
// Copy the result into our CPU accessible resource context.CopyResource(histogramResult, histogramCPU); // Retrieve histogram from GPU into int array try { var databox = context.MapSubresource(histogramCPU, 0, MapMode.Read, SharpDX.Direct3D11.MapFlags.None); int[] intArray = new int[databox.RowPitch / sizeof(int)]; System.Runtime.InteropServices.Marshal.Copy( databox.DataPointer, intArray, 0, intArray.Length); // intArray now contains the histogram data, // alternatively access databox.DataPointer directly // MapSubresource has a number of overrides that, one // provides a DataStream. } finally { // We must unmap the subresource so it can be used // within the graphics pipeline again context.UnmapSubresource(histogramCPU, 0); }
HistogramCS
compute shader over the Village.png
image is shown in the following chart:We have already covered the calculation of the relative luminance itself; however, we now map the normalized luminance value to the 0-255 range. To determine the luminance histogram, we count how many texels there are within the source image at each relative luminance level.
We have done this by mapping an unstructured (raw) buffer to a byte address UAV as the output of the histogram shader. We then use the intrinsic InterlockedAdd
method of the UAV to increment the appropriate index within the buffer for each texel based on its relative luminance. For example, a luminance of 255 (white), will result in the equivalent of output[255]++;
, and a relative luminance of 127 (gray), will result in output[127]++;
.
We have created a reusable function to create the UAV from a buffer. This simply determines if the buffer is a structured or raw buffer, and creates the UAV description accordingly with the appropriate size and element count based on the relevant byte stride (size of uint
for raw or the size of the buffer.Description.StructureByteStride
method for a structured buffer).
The interlocked methods on the RWByteAddressBuffer
UAV allow us to write from multiple threads to the same buffer. Usually, a compute shader is only able to write to addresses reserved for the current thread. The range of interlocked operations include: Add
, AND
, CompareExchange
, CompareStore
, Exchange
, Max
, Min
, OR
, and XOR
.
Once we have executed the shader function, we copy the result from the GPU histogramResult
buffer into the histogramCPU
resource that is accessible from the CPU. In order to be able to read the resource from the CPU, we have created the resource with the following settings:
cpuReadDesc.CpuAccessFlags = CpuAccessFlags.Read; cpuReadDesc.Usage = ResourceUsage.Staging;
Once the result has been copied to the CPU accessible resource, we can then map it to a system memory location and read the data for whatever purpose we need. Transferring data from the GPU to CPU is slow and mapping the subresource can stall until the GPU is ready. C# can incur additional overhead if not careful, resulting in an extra memory copy operation. If the resource is correctly protected from further use, the actual reading of the data once mapped could potentially occur within another thread, but care must be taken, and the unmapping of the resource must be done in a thread-safe manner for the device context.
It might be tempting to try to use the group-shared memory for the histogram calculation; however, our threads potentially need to write to any address and a thread is only allowed to write to its own region of the group memory without synchronization. Any thread synchronization would most likely defeat any potential performance gains. Reading from the same location in shared memory across multiple threads is allowed.
3.14.79.63