Calculating an image's luminance histogram

In this recipe we will explore using a compute shader to gather characteristics from the source image and output to a buffer. The characteristic that we will be determining is the image's luminance histogram, that is, how many texels are there within the texture for each luminance value (mapped from 0.0-1.0 to 0-255).

We will also cover how to retrieve the data from the GPU and load it into an array that is accessible from the CPU.

How to do it…

We'll begin with the HLSL code necessary to calculate the histogram.

  1. The input continues to be a Texture2D SRV; however, this time our output UAV will be RWByteAddressBuffer.
    Texture2D<float4> input : register(t0);
    RWByteAddressBuffer outputByteBuffer : register(u0);
    #define THREADSX 32
    #define THREADSY 32
    // used for RGB/sRGB color models
    #define LUMINANCE_RGB float3(0.2125, 0.7154, 0.0721)
    #define LUMINANCE(_V) dot(_V.rgb, LUMINANCE_RGB)
  2. Our actual compute shader is quite simple:
    // Calculate the luminance histogram of the input
    // Output to outputByteBuffer
    [numthreads(THREADSX, THREADSY, 1)]
    void HistogramCS(uint groupIndex: SV_GroupIndex, uint3 
      groupId : SV_GroupID, uint3 groupThreadId: SV_GroupThreadID, uint3 dispatchThreadId : SV_DispatchThreadID)
    {   float4 sample = input[dispatchThreadId.xy];
        // Calculate the Relative luminance (and map to 0-255)
        float luminance = LUMINANCE(sample.xyz) * 255.0;
        
        // Addressable as bytes, x4 to store 32-bit integers
        // Atomic increment of value at address.
        outputByteBuffer.InterlockedAdd((uint)luminance * 4, 
            1);
    }
  3. In order to interact with this compute shader, we need to prepare a buffer to store the results. Note that we also create a buffer that is accessible from the CPU. The two properties that make the buffer accessible to the CPU are highlighted.
    var histogramResult = new SharpDX.Direct3D11.Buffer(device, new BufferDescription
    {
        BindFlags = BindFlags.UnorderedAccess,
        CpuAccessFlags = CpuAccessFlags.None,
        OptionFlags = ResourceOptionFlags.BufferAllowRawViews,
        Usage = ResourceUsage.Default,
        SizeInBytes = 256 * 4,
        StructureByteStride = 4
    });
    histogramResult.DebugName = "Histogram Result";
    
    var histogramUAV = CreateBufferUAV(device, histogramResult);
    // Create resource that can be read from the CPU for 
    // retrieving the histogram results
    var cpuReadDesc = histogramResult.Description;
    cpuReadDesc.OptionFlags = ResourceOptionFlags.None;
    cpuReadDesc.BindFlags = BindFlags.None;
    cpuReadDesc.CpuAccessFlags = CpuAccessFlags.Read;
    cpuReadDesc.Usage = ResourceUsage.Staging;
    var histogramCPU = new Buffer(device, cpuReadDesc);
    histogramCPU.DebugName = "Histogram Result (CPU)";
  4. We will wrap the logic to create the buffer's UAV into a reusable function called CreateBufferUAV.
    public static UnorderedAccessView CreateBufferUAV(SharpDX.Direct3D11.Device device, SharpDX.Direct3D11.Buffer buffer)
    {
      UnorderedAccessViewDescription uavDesc = new UnorderedAccessViewDescription
        {
            Dimension = UnorderedAccessViewDimension.Buffer,
            Buffer = new UnorderedAccessViewDescription
                .BufferResource { FirstElement = 0 }
        };
        // If a raw buffer
        if ((buffer.Description.OptionFlags & 
            ResourceOptionFlags.BufferAllowRawViews) == 
            ResourceOptionFlags.BufferAllowRawViews)
        {
            // A raw buffer requires R32_Typeless
            uavDesc.Format = Format.R32_Typeless;
            uavDesc.Buffer.Flags = 
                UnorderedAccessViewBufferFlags.Raw;
            uavDesc.Buffer.ElementCount = 
                buffer.Description.SizeInBytes / 4;
        }
        // else if a structured buffer
        else if ((buffer.Description.OptionFlags & 
            ResourceOptionFlags.BufferStructured) == 
            ResourceOptionFlags.BufferStructured)
        {
            uavDesc.Format = Format.Unknown;
            uavDesc.Buffer.ElementCount = 
                buffer.Description.SizeInBytes / 
                buffer.Description.StructureByteStride;
        } else { 
            throw new ArgumentException("Buffer must be raw orstructured", "buffer"); 
        }
        // Create the UAV for this buffer
        return new UnorderedAccessView(device, buffer, 
            uavDesc);
    }
  5. With the output resources in place, we can continue to load the image, and run with the previous HistogramCS shader code.
    // Firstly clear the target UAV otherwise the value will 
    // accumulate between calls. context.ClearUnorderedAccessView(histogramUAV, Int4.Zero);
    // Load the image to process (this could be any compatible 
    // SRV).
    var srcTextureSRV = ShaderResourceView.FromFile(device, 
        "Village.png");
    var srcTexture = srcTextureSRV.ResourceAs<Texture2D>();
    var desc = srcTexture.Description;
    // Compile the shaders
    using (var bytecode = ShaderBytecode.Compile(hlslCode, 
        "HistogramCS", "cs_5_0"))
    using (var cs = new ComputeShader(device, bytecode))
    {
        // The source resource is the original image
        context.ComputeShader.SetShaderResource(0, 
            srcTextureSRV);
        // The destination resource is the histogramResult
        context.ComputeShader.SetUnorderedAccessView(0,
            histogramUAV);
        // Run the histogram shader
        context.ComputeShader.Set(cs);
        context.Dispatch((int)Math.Ceiling(desc.Width / 1024.0), 
            (int)Math.Ceiling(desc.Height / 1.0), 1);
    
        // Set the compute shader stage SRV and UAV to null
        context.ComputeShader.SetShaderResource(0, null);
        context.ComputeShader.SetUnorderedAccessView(0, null);
    ...SNIP
    }
  6. Lastly, we copy the result into our CPU accessible resource and then load this into an array.
    // Copy the result into our CPU accessible resource
    context.CopyResource(histogramResult, histogramCPU);
    // Retrieve histogram from GPU into int array
    try
    {   var databox = context.MapSubresource(histogramCPU, 0, 
            MapMode.Read, SharpDX.Direct3D11.MapFlags.None);
        int[] intArray = new int[databox.RowPitch / sizeof(int)];
        System.Runtime.InteropServices.Marshal.Copy(
            databox.DataPointer, intArray, 0, intArray.Length);
        // intArray now contains the histogram data, 
        // alternatively access databox.DataPointer directly
        // MapSubresource has a number of overrides that, one 
        // provides a DataStream.
    }
    finally
    {
        // We must unmap the subresource so it can be used
        // within the graphics pipeline again
        context.UnmapSubresource(histogramCPU, 0);
    }
  7. The result of running the HistogramCS compute shader over the Village.png image is shown in the following chart:
    How to do it…

    Luminance histogram result exported to a chart

How it works…

We have already covered the calculation of the relative luminance itself; however, we now map the normalized luminance value to the 0-255 range. To determine the luminance histogram, we count how many texels there are within the source image at each relative luminance level.

We have done this by mapping an unstructured (raw) buffer to a byte address UAV as the output of the histogram shader. We then use the intrinsic InterlockedAdd method of the UAV to increment the appropriate index within the buffer for each texel based on its relative luminance. For example, a luminance of 255 (white), will result in the equivalent of output[255]++;, and a relative luminance of 127 (gray), will result in output[127]++;.

Note

The more threads there are, the more collisions with the interlock. By processing several pixels within a single thread, we can reduce the number of threads required, although this needs to be balanced with having enough threads to make effective use of the available hardware.

We have created a reusable function to create the UAV from a buffer. This simply determines if the buffer is a structured or raw buffer, and creates the UAV description accordingly with the appropriate size and element count based on the relevant byte stride (size of uint for raw or the size of the buffer.Description.StructureByteStride method for a structured buffer).

The interlocked methods on the RWByteAddressBuffer UAV allow us to write from multiple threads to the same buffer. Usually, a compute shader is only able to write to addresses reserved for the current thread. The range of interlocked operations include: Add, AND, CompareExchange, CompareStore, Exchange, Max, Min, OR, and XOR.

Once we have executed the shader function, we copy the result from the GPU histogramResult buffer into the histogramCPU resource that is accessible from the CPU. In order to be able to read the resource from the CPU, we have created the resource with the following settings:

cpuReadDesc.CpuAccessFlags = CpuAccessFlags.Read;
cpuReadDesc.Usage = ResourceUsage.Staging;

Once the result has been copied to the CPU accessible resource, we can then map it to a system memory location and read the data for whatever purpose we need. Transferring data from the GPU to CPU is slow and mapping the subresource can stall until the GPU is ready. C# can incur additional overhead if not careful, resulting in an extra memory copy operation. If the resource is correctly protected from further use, the actual reading of the data once mapped could potentially occur within another thread, but care must be taken, and the unmapping of the resource must be done in a thread-safe manner for the device context.

There's more…

It might be tempting to try to use the group-shared memory for the histogram calculation; however, our threads potentially need to write to any address and a thread is only allowed to write to its own region of the group memory without synchronization. Any thread synchronization would most likely defeat any potential performance gains. Reading from the same location in shared memory across multiple threads is allowed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.79.63