Chapter 6: Physically Based Rendering Using the glTF2 Shading Model

This chapter will cover the integration of Physically Based Rendering (PBR) into your graphics pipeline. We use the Graphics Language Transmission Format 2.0 (glTF 2.0) shading model as an example. PBR is not a single specific technique but rather a set of concepts, like using measured surface values and realistic shading models, to accurately represent real-world materials. Adding PBR to your graphics application or retrofitting an existing rendering engine with PBR might be challenging because it requires multiple big steps to be completed and work simultaneously before a correct image can be rendered.

Our goal here is to show how to implement all these steps from scratch. Some of these steps, such as precomputing irradiance maps or bidirectional reflectance distribution function (BRDF) lookup tables (LUTs), require additional tools to be written. We are not going to use any third-party tools here and will show how to implement the entire skeleton of a PBR pipeline from the ground up. Some pre-calculations can be done using general-purpose graphics processing unit (GPGPU) techniques and compute shaders, which will be covered here as well. We assume our readers have some basic understanding of PBR. For those who wish to acquire this knowledge, make sure you read the free book Physically Based Rendering: From Theory To Implementation by Matt Pharr, Wenzel Jakob, and Greg Humphreys, available online at http://www.pbr-book.org/.

In this chapter, we will learn the following recipes:

  • Simplifying Vulkan initialization and frame composition
  • Initializing compute shaders in Vulkan
  • Using descriptor indexing and texture arrays in Vulkan
  • Using descriptor indexing in Vulkan to render an ImGui user interface (UI)
  • Generating textures in Vulkan using compute shaders
  • Implementing computed meshes in Vulkan
  • Precomputing BRDF LUTs
  • Precomputing irradiance maps and diffuse convolution
  • Implementing the glTF2 shading model

Technical requirements

Here is what it takes to run the code from this chapter on your Linux or Windows PC. You will need a graphics processing unit (GPU) with recent drivers supporting OpenGL 4.6 and Vulkan 1.2. The source code can be downloaded from https://github.com/PacktPublishing/3D-Graphics-Rendering-Cookbook. To run the demo applications from this chapter, you are advised to download and unpack the entire Amazon Lumberyard Bistro dataset from the McGuire Computer Graphics Archive, at http://casual-effects.com/data/index.html. Of course, you can use smaller meshes if you cannot download the 2.4 gigabyte (GB) package.

Simplifying Vulkan initialization and frame composition

Before jumping into this chapter, let's learn how to generalize Vulkan application initialization for all of our remaining demos and how to extract common parts of the frame composition code.

How to do it...

The Graphics Library Framework (GLFW) window creation and Vulkan rendering surface initialization are performed in the initVulkanApp function. Let's take a closer look:

  1. A Resolution structure can be passed as an optional parameter:

    struct Resolution {

      uint32_t width  = 0;

      uint32_t height = 0;

    };

    GLFWwindow* initVulkanApp(  int width, int height,  Resolution* outResolution = nullptr)

    {

  2. In our examples, we always use the glslang compiler and the Vector-Optimized Library of Kernels (VOLK) library. If anything goes wrong with initialization, we terminate the application:

      glslang_initialize_process();

      volkInitialize();

      if (!glfwInit() || !glfwVulkanSupported())

        exit(EXIT_FAILURE);

  3. As in Chapter 3, Getting Started with OpenGL and Vulkan, we tell GLFW we do not need any graphics application programming interface (API) context:

      glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

      glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);

  4. If the outResolution argument is not null, we store the detected window resolution in there for further use. The detectResolution() function is shown in the following code snippet. The width and height parameters are passed as a reference. If these values are negative, we consider them as percentages of the screen's width and height:

      if (resolution) {

        *resolution = detectResolution(width, height);

        width  = resolution->width;

        height = resolution->height;

      }

  5. Once we have the detected window resolution, we create a GLFW window. If an application requires a different window title, this is the place to set it. If the window creation fails, we terminate the GLFW library and application:

      GLFWwindow* result = glfwCreateWindow(    width, height, "VulkanApp", nullptr, nullptr);

      if (!result) {

        glfwTerminate();

        exit(EXIT_FAILURE);

      }

      return result;

    }

Let's take a look at the detectResolution() function. The actual resolution detection happens in glfwGetVideoMode(). For our purposes, we get the parameters of the "primary" monitor. In multi-display configurations, we should properly determine which monitor displays our application; however, this goes beyond the scope of this book. The video-mode information for the selected monitor provides us screen dimensions in pixels. If the provided width or height values are positive, they are used directly. Negative values are treated as a percentage of the screen:

Resolution detectResolution(int width, int height) {

  GLFWmonitor* monitor = glfwGetPrimaryMonitor();

  if (glfwGetError(nullptr)) exit(EXIT_FAILURE);

  const GLFWvidmode* info = glfwGetVideoMode(monitor);

  const uint32_t W = width  >= 0 ?    width  : (uint32_t)(info->width * width / -100);

  const uint32_t H = height >= 0 ?    height : (uint32_t)(info->height * height / -100);

  return Resolution{ .width = W, .height = H };

}

To render and present a single frame on the screen, we implement the drawFrame() function, which contains the common frame-composition code refactored from the previous chapters.

  1. Two std::function callbacks, updateBuffersFunc() and composeFrameFunc(), are used to encapsulate all application-specific rendering code. The drawFrame() function is used in all the remaining Vulkan demos as the main frame composer:

    bool drawFrame(VulkanRenderDevice& vkDev,  const std::function<void(uint32_t)>&    updateBuffersFunc,  const std::function<void(VkCommandBuffer,    uint32_t)>& composeFrameFunc)

    {

  2. Before we can render anything on the screen, we should acquire a framebuffer image from the swapchain. When the next image in the swapchain is not ready to be rendered, we return false. This is not a fatal error but an indication that no frame has been rendered yet. The calling code decides what to do with the result. One example of such a reaction can be skipping the frames-per-second (FPS) counter update:

      uint32_t imageIndex = 0;

      VkResult result = vkAcquireNextImageKHR(    vkDev.device, vkDev.swapchain, 0,    vkDev.semaphore, VK_NULL_HANDLE, &imageIndex);

      if (result != VK_SUCCESS) return false;

  3. The global command pool is reset to allow the filling of the command buffers anew:

      VK_CHECK( vkResetCommandPool(    vkDev.device, vkDev.commandPool, 0) );

  4. The updateBuffersFunc() callback is invoked to update all the internal buffers for different renderers. Revisit the Organizing Vulkan frame rendering code recipe from Chapter 4, Adding User Interaction and Productivity Tools for a discussion of frame composition. This can be done in a more effective way—for example, by using a dedicated transfer queue and without waiting for all the GPU transfers to complete. For the purpose of this book, we deliberately choose code simplicity over performance. A command buffer, corresponding to the selected swapchain image, is acquired:

      updateBuffersFunc(imageIndex);

      VkCommandBuffer commandBuffer =    vkDev.commandBuffers[imageIndex];

      const VkCommandBufferBeginInfo bi = {    .sType =      VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,    .pNext = nullptr,    .flags =      VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT,    .pInheritanceInfo = nullptr   };

      VK_CHECK(vkBeginCommandBuffer(commandBuffer, &bi));

  5. After we have started recording to the command buffer, the composeFrameFunc() callback is invoked to write the command buffer's contents from different renderers. There is a large potential for optimizations here because Vulkan provides a primary-secondary command buffer separation, which can be used to record secondary buffers from multiple central processing unit (CPU) threads. Once all the renderers have contributed to the command buffer, we stop recording:

      composeFrameFunc(commandBuffer, imageIndex);

      VK_CHECK(vkEndCommandBuffer(commandBuffer));

    Next comes the submission of the recorded command buffer to a GPU graphics queue. The code is identical to that in Chapter 3, Getting Started with OpenGL and Vulkan:

      const VkPipelineStageFlags waitStages[] = {    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT   };

      const VkSubmitInfo si = {    .sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,    .pNext = nullptr,    .waitSemaphoreCount = 1,    .pWaitSemaphores = &vkDev.semaphore,    .pWaitDstStageMask = waitStages,    .commandBufferCount = 1,    .pCommandBuffers =      &vkDev.commandBuffers[imageIndex],    .signalSemaphoreCount = 1,    .pSignalSemaphores = &vkDev.renderSemaphore   };

      VK_CHECK(vkQueueSubmit(    vkDev.graphicsQueue, 1, &si, nullptr));

  6. After submitting the command buffer to the graphics queue, the swapchain is presented to the screen:

      const VkPresentInfoKHR pi = {    .sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,    .pNext = nullptr,    .waitSemaphoreCount = 1,    .pWaitSemaphores = &vkDev.renderSemaphore,    .swapchainCount = 1,    .pSwapchains = &vkDev.swapchain,    .pImageIndices = &imageIndex   };

      VK_CHECK(vkQueuePresentKHR(    vkDev.graphicsQueue, &pi));

  7. The final call to vkDeviceWaitIdle() ensures we have no frame tearing:

      VK_CHECK(vkDeviceWaitIdle(vkDev.device));

      return true;

    }

More sophisticated synchronization schemes with multiple in-flight frames can help to gain performance. However, those are beyond the scope of this book.

Initializing compute shaders in Vulkan

Up until now, we used only graphics-capable command queues on a Vulkan device. This time, we have to find device queues that are also capable of GPGPU computations. In Vulkan, such queues allow execution of compute shaders, which can read from and write to buffers used in the graphics rendering pipeline. For example, in Chapter 10, Advanced Rendering Techniques and Optimizations, we will show how to implement a GPU frustum culling technique by modifying the indirect rendering buffer introduced in the Indirect rendering in Vulkan recipe from Chapter 5, Working with Geometry Data.

Getting ready

The first thing we need to do to start using compute shaders is to revisit the render device initialization covered in Chapter 3, Getting Started with OpenGL and Vulkan. Check out the Initializing Vulkan instances and graphical device recipe before moving forward.

How to do it...

We add the code to search for a compute-capable device queue and to create a separate command buffer for compute shader workloads. Since the graphics hardware may not provide a separate command queue for arbitrary computations, our device and queue initialization logic must somehow remember if the graphics and compute queues are the same.

Note

On the other hand, using separate Vulkan queues for graphics and compute tasks enables the underlying Vulkan implementation to reduce the amount of work done on the GPU, by making decisions on the device about what sort of work is generated and how this is generated. This is especially important when dealing with GPU-generated commands. Check out the post New: Vulkan Device Generated Commands by Christoph Kubisch from NVIDIA, at https://developer.nvidia.com/blog/new-vulkan-device-generated-commands/.

Let's learn how to do it.

  1. As a starter, we declare new GPGPU-related fields in the VulkanRenderDevice class. The first one is a flag that signals whether this device supports compute shaders. Although the compute shader support is guaranteed to be true by the Vulkan specification if the device is capable of graphics operations, we show how to use this flag in our code:

      bool useCompute = false;

  2. The next two fields hold the index and handle of an internal queue for compute shaders' execution. If the device does not support a dedicated compute queue, the values of the computeFamily and the graphicsFamily fields are equal:

      uint32_t computeFamily;

      VkQueue computeQueue;

    Since we may want to use more than one device queue, we have to store indices and handles for each of those. This is needed because VkBuffer objects are bound to the device queue at creation time. For example, to use a vertex buffer generated by a compute shader in a graphics pipeline, we have to allocate this VkBuffer object explicitly, specifying the list of queues from which this buffer may be accessed. Later in this recipe, we introduce the createSharedBuffer() routine that explicitly uses these stored queue indices.

    Note

    Buffers are created with a sharing mode, controlling how they can be accessed from queues. Buffers created using VK_SHARING_MODE_EXCLUSIVE must only be accessed by queues in the queue family that has ownership of the resource. Buffers created using VK_SHARING_MODE_CONCURRENT must only be accessed by queues from the queue families specified through the queueFamilyIndexCount and pQueueFamilyIndices members of the corresponding …CreateInfo structures. Concurrent sharing mode may result in lower performance compared to exclusive mode. Refer to the Vulkan specifications for more details, at https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkSharingMode.html.

  3. The lists of initialized queue indices and appropriate queue handles are stored in two dynamic arrays:

      std::vector<uint32_t> deviceQueueIndices;

      std::vector<VkQueue> deviceQueues;

  4. Finally, we need a command buffer and a command buffer pool to create and run compute shader instances:

      VkCommandBuffer computeCommandBuffer;

      VkCommandPool computeCommandPool;

Now, let's learn how to initialize a rendering device capable of running compute shaders.

  1. To avoid breaking previous demos, we proceed step by step and introduce the new initVulkanRenderDeviceWithCompute() routine:

    bool initVulkanRenderDeviceWithCompute(  VulkanInstance& vk,  VulkanRenderDevice& vkDev,  uint32_t width, uint32_t height,  VkPhysicalDeviceFeatures deviceFeatures)

    {

      vkDev.framebufferWidth = width;

      vkDev.framebufferHeight = height;

  2. After finding the physical device and graphics queue, we also search for the compute-capable queue. This code will find a combined graphics plus compute queue even on devices that support a separate compute queue, as the combined queue tends to have a lower index. For simplicity, we use this approach throughout the book:

      VK_CHECK(findSuitablePhysicalDevice(    vk.instance, &isDeviceSuitable,    &vkDev.physicalDevice));

      vkDev.graphicsFamily = findQueueFamilies(    vkDev.physicalDevice, VK_QUEUE_GRAPHICS_BIT);

      vkDev.computeFamily = findQueueFamilies(    vkDev.physicalDevice, VK_QUEUE_COMPUTE_BIT);

  3. To initialize both queues, or a single one if these queues are the same, we call a new function, createDeviceWithCompute(), as illustrated in the following code snippet:

      VK_CHECK(createDeviceWithCompute(    vkDev.physicalDevice, deviceFeatures,    vkDev.graphicsFamily, vkDev.computeFamily,    &vkDev.device));

  4. Next, we save unique queue indices for later use in the createSharedBuffer() routine:

      vkDev.deviceQueueIndices.push_back(    vkDev.graphicsFamily);

      if (vkDev.graphicsFamily != vkDev.computeFamily)

        vkDev.deviceQueueIndices.push_back(      vkDev.computeFamily);

  5. After saving queue indices, we acquire the graphics and compute queue handles:

      vkGetDeviceQueue(vkDev.device, vkDev.graphicsFamily,    0, &vkDev.graphicsQueue);

      if (!vkDev.graphicsQueue) exit(EXIT_FAILURE);

      vkGetDeviceQueue(vkDev.device, vkDev.computeFamily,    0, &vkDev.computeQueue);

      if (!vkDev.computeQueue) exit(EXIT_FAILURE);

  6. After initializing the queues, we create everything related to the swapchain. A few lines of the following code snippet are also identical to those in the rendering device initialization procedure described earlier in the Initializing Vulkan instances and graphical devices recipe in Chapter 3, Getting Started with OpenGL and Vulkan:

      VkBool32 presentSupported = 0;

      vkGetPhysicalDeviceSurfaceSupportKHR(    vkDev.physicalDevice, vkDev.graphicsFamily,    vk.surface, &presentSupported);

      if (!presentSupported) exit(EXIT_FAILURE);

      VK_CHECK(createSwapchain(vkDev.device,    vkDev.physicalDevice, vk.surface,     vkDev.graphicsFamily,    width, height, vkDev.swapchain));

      const size_t imageCount = createSwapchainImages(    vkDev.device, vkDev.swapchain,    vkDev.swapchainImages,    vkDev.swapchainImageViews);

      vkDev.commandBuffers.resize(imageCount);

  7. The rendering synchronization primitives and a command buffer with a command pool are also created the same way as in the Indirect rendering in Vulkan recipe of Chapter 5, Working with Geometry Data:

      VK_CHECK(createSemaphore(    vkDev.device, &vkDev.semaphore));

      VK_CHECK(createSemaphore(    vkDev.device, &vkDev.renderSemaphore));

  8. For each swapchain image, we create a separate command queue, just as in the initVulkanRenderDevice() function:

      const VkCommandPoolCreateInfo cpi1 = {    .sType =      VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,    .flags = 0,    .queueFamilyIndex = vkDev.graphicsFamily   };

      VK_CHECK(vkCreateCommandPool(vkDev.device, &cpi1,    nullptr, &vkDev.commandPool));

      const VkCommandBufferAllocateInfo ai1 = {    .sType =       VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,    .pNext = nullptr,    .commandPool = vkDev.commandPool,    .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,    .commandBufferCount = static_cast<uint32_t>(      vkDev.swapchainImages.size())  };

      VK_CHECK(vkAllocateCommandBuffers(    vkDev.device, &ai1, &vkDev.commandBuffers[0]));

  9. Next, we create a single command pool for the compute queue:

      const VkCommandPoolCreateInfo cpi2 = {    .sType =      VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,    .pNext = nullptr,    .flags =       VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT,    .queueFamilyIndex = vkDev.computeFamily   };

      VK_CHECK(vkCreateCommandPool(vkDev.device, &cpi2,    nullptr, &vkDev.computeCommandPool));

  10. Using the created command pool, we allocate the command buffer for compute shaders:

      const VkCommandBufferAllocateInfo ai2 = {    .sType =      VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,    .pNext = nullptr,    .commandPool = vkDev.computeCommandPool,    .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,    .commandBufferCount = 1,  };

      VK_CHECK(vkAllocateCommandBuffers(    vkDev.device, &ai2, &vkDev.computeCommandBuffer));

  11. At the end, we raise a flag saying that we support the command buffer execution:

      vkDev.useCompute = true;

      return true;

    }

The initVulkanRenderDeviceWithCompute() routine written in the preceding code uses the createDeviceWithCompute() helper function to create a compatible Vulkan device. Let's see how this can be implemented.

  1. The function takes in the graphics and compute queue indices we want to use:

    VkResult createDeviceWithCompute(  VkPhysicalDevice physicalDevice,  VkPhysicalDeviceFeatures deviceFeatures,  uint32_t graphicsFamily,  uint32_t computeFamily,  VkDevice* device)

    {

      const std::vector<const char*> extensions = {    VK_KHR_SWAPCHAIN_EXTENSION_NAME   };

  2. If we use a single queue, we can call the old device initialization routine:

      if (graphicsFamily == computeFamily)

        return createDevice(physicalDevice,      deviceFeatures, graphicsFamily, device);

  3. For a case of two distinct queues, we fill in two individual VkDeviceQueueCreateInfo structures. Each of these queues has a default execution priority:

      const float queuePriorities[2] = { 0.f, 0.f };

  4. The graphics queue creation structure refers to the graphics queue family index:

      const VkDeviceQueueCreateInfo qciGfx = {    .sType =      VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .queueFamilyIndex = graphicsFamily,    .queueCount = 1,    .pQueuePriorities = &queuePriorities[0]  };

  5. The compute queue creation structure is similar and uses the compute queue family index:

      const VkDeviceQueueCreateInfo qciComp = {    .sType =      VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .queueFamilyIndex = computeFamily,    .queueCount = 1,    .pQueuePriorities = &queuePriorities[1]  };

  6. Both queue creation structures should be stored in an array for further use:

      const VkDeviceQueueCreateInfo qci[] =    { qciGfx, qciComp };

  7. The device creation structure now uses two references to the graphics and compute queues:

      const VkDeviceCreateInfo ci = {    .sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .queueCreateInfoCount = 2,    .pQueueCreateInfos = qci,    .enabledLayerCount = 0,    .ppEnabledLayerNames = nullptr,    .enabledExtensionCount =      uint32_t(extensions.size()),    .ppEnabledExtensionNames = extensions.data(),    .pEnabledFeatures = &deviceFeatures   };

      return vkCreateDevice(    physicalDevice, &ci, nullptr, device);

    }

To read the results of compute shaders and store them, we need to create shared VkBuffer instances using the following steps:

  1. The createSharedBuffer() routine is analogous to createBuffer() but it explicitly enumerates the command queues:

    bool createSharedBuffer(  VulkanRenderDevice& vkDev, VkDeviceSize size,  VkBufferUsageFlags usage,  VkMemoryPropertyFlags properties,  VkBuffer& buffer, VkDeviceMemory& bufferMemory)

    {

  2. If we have a single queue for graphics and compute, we delegate all the work to our old createBuffer() routine:

      const size_t familyCount =    vkDev.deviceQueueIndices.size();

      if (familyCount < 2u)

        return createBuffer(vkDev.device,      vkDev.physicalDevice, size,      usage, properties, buffer, bufferMemory);

  3. Inside the buffer creation structure, we should designate this buffer as being accessible from multiple command queues and pass a list of all the respective queue indices:

      const VkBufferCreateInfo bufferInfo = {    .sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .size = size,    .usage = usage,    .sharingMode = (familyCount > 1u) ?      VK_SHARING_MODE_CONCURRENT :      VK_SHARING_MODE_EXCLUSIVE,    .queueFamilyIndexCount =      static_cast<uint32_t>(familyCount),    .pQueueFamilyIndices = (familyCount > 1u) ?      vkDev.deviceQueueIndices.data() : nullptr

      };

  4. The buffer itself is created, but no memory is associated with it yet:

      VK_CHECK(vkCreateBuffer(    vkDev.device, &bufferInfo, nullptr, &buffer));

  5. The rest of the code allocates memory with specified parameters, just as in the createBuffer() routine. To do this, we ask the Vulkan implementation which memory-block properties we should use for this buffer:

      VkMemoryRequirements memRequirements;

      vkGetBufferMemoryRequirements(    vkDev.device, buffer, &memRequirements);

  6. In the allocation structure, we specify the physical buffer size and the exact memory heap type:

      const VkMemoryAllocateInfo allocInfo = {    .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,    .pNext = nullptr,    .allocationSize = memRequirements.size,    .memoryTypeIndex = findMemoryType(      vkDev.physicalDevice,      memRequirements.memoryTypeBits, properties)

      };

  7. Memory allocation and buffer binding conclude this routine:

      VK_CHECK(vkAllocateMemory(vkDev.device, &allocInfo,    nullptr, &bufferMemory));

      vkBindBufferMemory(    vkDev.device, buffer, bufferMemory, 0);

      return true;

    }

To execute compute shaders, we require a pipeline object, just as in the case with the graphics rendering. Let's write a function to create a Vulkan compute pipeline object.

  1. As usual, we fill in the Vulkan creation structure for the pipeline. The compute pipeline contains a VK_SHADER_STAGE_COMPUTE_BIT single shader stage and an attached Vulkan shader module:

    VkResult createComputePipeline(  VkDevice device,  VkShaderModule computeShader,  VkPipelineLayout pipelineLayout,  VkPipeline* pipeline)

    {

      VkComputePipelineCreateInfo     computePipelineCreateInfo = {    .sType =      VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .stage = {      .sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER                _STAGE_CREATE_INFO,      .pNext = nullptr,      .flags = 0,      .stage = VK_SHADER_STAGE_COMPUTE_BIT,      .module = computeShader,

  2. For the purpose of simplicity, all our compute shaders must have main() as their entry point:

          .pName = "main",      .pSpecializationInfo = nullptr     },

  3. Most of the parameters are set to default and zero values. The only required field is the pipeline layout object:

        .layout = pipelineLayout,    .basePipelineHandle = 0,    .basePipelineIndex  = 0   };

      return vkCreateComputePipelines(device, 0, 1,    &computePipelineCreateInfo, nullptr, pipeline);

    }

The pipeline layout is created using the same function as for the graphics part. It is worth mentioning that the compute shader compilation process is the same as for other shader stages.

We can now begin using the shaders after device initialization. The descriptor set creation process is the same as with the graphics-related descriptor sets, but the execution of compute shaders requires the insertion of new commands into the command buffer.

  1. The function that we now implement shows how to execute a compute shader workload given the prepared pipeline and descriptor set objects. The xSize, ySize, and zSize parameters are the numbers of local workgroups to dispatch in the X, Y, and Z dimensions:

    bool executeComputeShader(  VulkanRenderDevice& vkDev,  VkPipeline pipeline,  VkPipelineLayout pipelineLayout,  VkDescriptorSet ds,  uint32_t xSize, uint32_t ySize, uint32_t zSize)

    {

  2. As with the graphics work items, we begin filling the command buffer:

      VkCommandBuffer commandBuffer =    vkDev.computeCommandBuffer;

      VkCommandBufferBeginInfo commandBufferBeginInfo = {    VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,    0, VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, 0   };

      VK_CHECK(vkBeginCommandBuffer(    commandBuffer, &commandBufferBeginInfo));

  3. To execute a compute shader, we should first bind the pipeline and the descriptor set object, and then emit the vkCmdDispatch() command with the required execution range:

      vkCmdBindPipeline(commandBuffer,     VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);

      vkCmdBindDescriptorSets(commandBuffer,    VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout,    0, 1, &ds, 0, 0);

      vkCmdDispatch(commandBuffer, xSize, ySize, zSize);

  4. Before the CPU can read back data written to a buffer by a compute shader, we have to insert a memory barrier. More Vulkan synchronization details can be found in this tutorial: https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples. The code is illustrated here:

    VkMemoryBarrier readoutBarrier = {  .sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER,  .pNext = nullptr,  .srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT,  .dstAccessMask = VK_ACCESS_HOST_READ_BIT };

    vkCmdPipelineBarrier(commandBuffer,  VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,  VK_PIPELINE_STAGE_HOST_BIT, 0, 1, &readoutBarrier,  0, nullptr, 0, nullptr);

  5. After adding all the commands, we complete the recording of the command buffer:

      VK_CHECK(vkEndCommandBuffer(commandBuffer));

  6. We immediately submit this command buffer to the queue:

      VkSubmitInfo submitInfo = {    VK_STRUCTURE_TYPE_SUBMIT_INFO,    0, 0, 0, 0, 1, &commandBuffer, 0, 0   };

      VK_CHECK(vkQueueSubmit(    vkDev.computeQueue, 1, &submitInfo, 0));

  7. To synchronize buffers between computations and rendering, we should wait for the compute shader completion. Here, let's do it in a simple blocking way, just by waiting until the GPU finishes its work:

      VK_CHECK(vkQueueWaitIdle(vkDev.computeQueue));

      return true;

    }

We omit the descriptor set creation process here because it depends on what kind of data we want to access in the compute shader. The next recipe shows how to write compute shaders to generate images and vertex buffer contents, which is where a descriptor set will be required.

There's more...

We are going to use the Vulkan compute shaders functionality later in this chapter in the following recipes: Implementing computed meshes in Vulkan, Generating textures in Vulkan using compute shaders, Precomputing BRDF LUTs, and Precomputing irradiance maps and diffuse convolution.

Using descriptor indexing and texture arrays in Vulkan

Before we dive deep into the glTF and PBR implementation code, let's look at some lower-level functionality that will be required to minimize the number of Vulkan descriptor sets in applications that use lots of materials with multiple textures. Descriptor indexing is an extremely useful feature recently added to Vulkan 1.2 and, at the time of writing this book, is already supported on some devices. It allows us to create unbounded descriptor sets and use non-uniform dynamic indexing to access textures inside them. This way, materials can be stored in shader storage buffers and each one can reference all the required textures using integer identifiers (IDs). These IDs can be fetched from a shader storage buffer object (SSBO) and are directly used to index into an appropriate descriptor set that contains all the textures required by our application. Vulkan descriptor indexing is rather similar to the OpenGL bindless textures mechanism and significantly simplifies managing descriptor sets in Vulkan. Let's check out how to use this feature.

Getting ready

The source code for this recipe can be found in Chapter6/VK02_DescriptorIndexing. All the textures we used are stored in the data/explosion folder.

How to do it...

Before we can use the descriptor indexing feature in Vulkan, we need to enable it during the Vulkan device initialization. This process is a little bit verbose, but we will go through it once to show the basic principles. Let's take a look at new fragments of the initVulkan() function to see how it is done.

  1. After the window surface is created, we should construct an instance of the VkPhysicalDeviceDescriptorIndexingFeaturesEXT structure to enable non-uniform image array indexing and a variable descriptor set count:

    VkPhysicalDeviceDescriptorIndexingFeaturesEXT   physicalDeviceDescriptorIndexingFeatures = {  .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE           _DESCRIPTOR_INDEXING_FEATURES_EXT,  .shaderSampledImageArrayNonUniformIndexing =     VK_TRUE,  .descriptorBindingVariableDescriptorCount = VK_TRUE,  .runtimeDescriptorArray = VK_TRUE,};

  2. The required feature should be enabled using VkPhysicalDeviceFeatures:

    const VkPhysicalDeviceFeatures deviceFeatures = {  .shaderSampledImageArrayDynamicIndexing = VK_TRUE };

  3. After that, both structures can be used to construct VkPhysicalDeviceFeatures2:

    const VkPhysicalDeviceFeatures2 deviceFeatures2 = {  .sType =    VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2,  .pNext = &physicalDeviceDescriptorIndexingFeatures,  .features = deviceFeatures };

  4. The VkPhysicalDeviceFeatures2 instance should be passed into our initVulkanRenderDevice2() Vulkan initialization helper implemented in shared/UtilsVulkan.cpp:

    if (!initVulkanRenderDevice2(vk, vkDev,      kScreenWidth, kScreenHeight,      isDeviceSuitable, deviceFeatures2))

      exit(EXIT_FAILURE);

The initialization process for this extension is similar to how we initialized the Vulkan indirect rendering extension in the Indirect rendering in Vulkan recipe from Chapter 5, Working with Geometry Data. The only difference here is that the descriptor indexing feature was added into Vulkan 1.2, hence the different VkPhysicalDeviceFeatures2 structure and a separate initialization function.

Once we have a proper Vulkan device initialized, we can implement a simple flipbook animation using descriptor indexing. Our example application uses three different explosion animations released by Unity Technologies under the liberal Creative Commons (CC0) license (https://blogs.unity3d.com/2016/11/28/free-vfx-image-sequences-flipbooks). Let's look at the steps.

  1. First, let's get prepared to load three different explosions. Each explosion is stored as a separate flipbook and contains 100 frames defined as kNumFlipbookFrames:

    std::vector<std::string> textureFiles;

    for (uint32_t j = 0; j < 3; j++) {

      for (uint32_t i = 0; i != kNumFlipbookFrames; i++) {

        char fname[1024];

        snprintf(fname, sizeof(fname),      "data/explosion/explosion%02u-frame%03u.tga",      j, i+1);

        textureFiles.push_back(fname);

      }

    }

  2. We implemented a VulkanQuadRenderer helper class to render textured quads using Vulkan. We should construct it the following way using the texture filenames. To avoid having to deal with any kind of synchronization, we fill a separate SSBO with data for each swapchain image. It is far from being a silver bullet, but makes this entire book so much simpler:

    quadRenderer =

      std::make_unique<VulkanQuadRenderer>(    vkDev, textureFiles);

    for (size_t i = 0; i < vkDev.swapchainImages.size();     i++)

      fillQuadsBuffer(vkDev, *quadRenderer.get(), i);

  3. Before we finish the initialization process, we should construct our rendering layers—one for clearing the screen and another to present the rendered image. If you forget how our Vulkan frame composition works, check out the Organizing Vulkan frame rendering code recipe from Chapter 4, Adding User Interaction and Productivity Tools:

    VulkanImage nullTexture = {  .image = VK_NULL_HANDLE,  .imageView = VK_NULL_HANDLE };

    clear = std::make_unique<VulkanClear>(  vkDev, nullTexture);

    finish = std::make_unique<VulkanFinish>(  vkDev, nullTexture);

For all the implementation details of VulkanQuadRenderer, check out the shared/vkRenderers/VulkanQuadRenderer.cpp file, which contains mostly Vulkan descriptors initialization and texture-loading code—all its parts were extensively covered in the previous chapters. We will skip it here in the book text and focus on the actual demo application logic and OpenGL Shading Language (GLSL) shaders.

The Chapter6/VK02_DescriptorIndexing application renders an animated explosion every time a user clicks somewhere in the window. Multiple explosions, each using a different flipbook, can be rendered simultaneously. Let's see how to implement them using the following steps.

  1. First, we need a data structure to store the state of a single flipbook animation. Here, position defines the position of an animation in the window, startTime marks the timestamp when this animation was started, textureIndex is the index of the current texture inside the flipbook, and flipbookOffset points to the beginning of the current flipbook in the big array of textures we loaded earlier. We are going to store all active animations in a collection:

    struct AnimationState {

      vec2 position = vec2(0);

      double startTime = 0;

      uint32_t textureIndex = 0;

      uint32_t flipbookOffset = 0;

    };

    std::vector<AnimationState> animations;

  2. Here's the animation update logic put into a separate function. The current texture index is updated for each animation based on its start time. As we go through all animations, we can safely remove finished ones. Instead of using a swap-and-pop pattern here to remove an element from the container, which will create ugly Z-fighting where animations suddenly pop in front of each other, we use a straightforward naive removal via erase():

    void updateAnimations() {

      for (size_t i = 0; i < animations.size();) {

        const auto& anim = animations[i];

        anim.textureIndex = anim.flipbookOffset +      (uint32_t)(kAnimationFPS *        ((glfwGetTime() - anim.startTime)));

        if (anim.textureIndex - anim.flipbookOffset >        kNumFlipbookFrames)

          animations.erase(animations.begin() + i);

        else i++;

      }

  3. The final touch to the C++ part of our application is the mouse click handling callback that spawns a new animated explosion at the cursor position. The flipbook starting offset is selected randomly for each explosion:

    glfwSetMouseButtonCallback(window, [](GLFWwindow*  window, int button, int action, int mods) {

        if (button == GLFW_MOUSE_BUTTON_LEFT &&        action == GLFW_PRESS) {

          float mx =        (mouseX/vkDev.framebufferWidth )*2.0f - 1.0f;

          float my =         (mouseY/vkDev.framebufferHeight)*2.0f - 1.0f;

          animations.push_back(AnimationState{        .position = vec2(mx, my),        .startTime = glfwGetTime(),        .textureIndex = 0,        .flipbookOffset =          kNumFlipbookFrames * (uint32_t)(rand() % 3)      });

        }

      });

The chapter06/VK02_texture_array.vert vertex shader to render our textured rectangles is presented next.

  1. The programmable-vertex-fetch technique takes care of the vertices stored as ImDrawVert structures inside the SSBO:

    layout(location = 0) out vec2 out_uv;

    layout(location = 1) flat out uint out_texIndex;

    struct ImDrawVert {

      float x, y, z, u, v;

    };

    layout(binding = 1) readonly buffer SBO {

      ImDrawVert data[];

    } sbo;

  2. The geometry buffer is constant for all rendered rectangles, so we pass a vec2 position to shift the origin of our quad on the screen. The textureIndex field corresponds to the same field in AnimationState. It is easy to pass this information as a Vulkan push constant:

    layout(push_constant) uniform uPushConstant {

      vec2 position;

      uint textureIndex;

    } pc;

  3. Fetch the data from the buffers, apply the position to calculate the resulting value of gl_Position, and we are done here:

    void main() {

      uint idx = gl_VertexIndex;

      ImDrawVert v = sbo.data[idx];

      out_uv = vec2(v.u, v.v);

      out_texIndex = pc.textureIndex;

      gl_Position =    vec4(vec2(v.x, v.y) + pc.position, 0.0, 1.0);

    }

The chapter06/VK02_texture_array.frag fragment shader is trivial and uses the non-uniform descriptor indexing feature. Let's take a look.

  1. First, we have to enable the GL_EXT_nonuniform_qualifier extension to be able to use the texture index value. Note the unbounded array of textures:

    #version 460

    #extension GL_EXT_nonuniform_qualifier : require

    layout (binding = 2) uniform sampler2D textures[];

    layout (location = 0) in vec2 in_uv;

    layout (location = 1) flat in uint in_texIndex;

    layout (location = 0) out vec4 outFragColor;

  2. The nonuniformEXT type qualifier can be used to assert that a variable or expression is not dynamically uniform:

    void main() {

      outFragColor = texture(    textures[nonuniformEXT(in_texIndex)], in_uv);

    }

We can now run our application. Click a few times in the window to see something similar to this:

Figure 6.1 – Animated explosions using descriptor indexing and texture arrays in Vulkan

Figure 6.1 – Animated explosions using descriptor indexing and texture arrays in Vulkan

There's more...

While this example passes a texture index into a shader as a push constant, making it uniform, with the GL_EXT_nonuniform_qualifier extension it is possible to store texture indices inside Vulkan buffers in a completely dynamic way. In Chapter 7, Graphics Rendering Pipeline, we will build a material system based around this Vulkan extension, similar to how the bindless textures mechanism in OpenGL is deployed.

In the next recipe, we will show one more useful application of texture arrays.

Using descriptor indexing in Vulkan to render an ImGui

Another extremely useful application of descriptor indexing is the ability to trivially render multiple textures in ImGui. Up until now, our ImGui renderer was able to use only one single font texture and there was no possibility to render any static images in our UI. To allow backward compatibility with Chapter 4, Adding User Interaction and Productivity Tools, and Chapter 5, Working with Geometry Data, we add a new constructor to the ImGuiRenderer class and modify the addImGuiItem() method in the shared/vkRenderers/VulkanImGui.cpp file. We provide a thorough discussion of the required changes here because, to the best of our knowledge, there is no small down-to-earth tutorial on using multiple textures in the Vulkan ImGui renderer.

Getting ready

Check the previous Using descriptor indexing and texture arrays in Vulkan recipe, to learn how to initialize the descriptor indexing feature.

How to do it...

Let's start with the description of source code changes.

  1. First, we declare a list of used external textures as a field in the ImGuiRenderer structure:

      std::vector<VulkanTexture> extTextures_;

  2. The new constructor of the ImGuiRenderer class takes a list of Vulkan texture handles as a parameter:

    ImGuiRenderer::ImGuiRenderer(  VulkanRenderDevice& vkDev,  const std::vector<VulkanTexture>& textures)

    : RendererBase(vkDev, VulkanImage())

    , extTextures_(textures)

  3. The only parts of the constructor that are different is the descriptor set and pipeline layout initialization. The render pass, framebuffers, and uniform buffers are created in a standard way:

      if (!createColorAndDepthRenderPass(vkDev, false,        &renderPass_, RenderPassCreateInfo()) ||      !createColorAndDepthFramebuffers(vkDev,        renderPass_, VK_NULL_HANDLE,        swapchainFramebuffers_) ||      !createUniformBuffers(vkDev, sizeof(mat4)) ||

  4. The descriptor pool now has to accommodate the font and all the external textures:

          !createDescriptorPool(        vkDev, 1, 2, 1 + textures.size(),        &descriptorPool_) ||

  5. To create a descriptor set, we use the following new function:

          !createMultiDescriptorSet(vkDev) ||

  6. The pipeline for our renewed ImGui renderer must allow a single push constant that we use to pass the texture index of a rendered element:

          !createPipelineLayoutWithConstants(vkDev.device,        descriptorSetLayout_, &pipelineLayout_, 0,        sizeof(uint32_t)) ||

  7. The graphics pipeline uses the vertex shader from Chapter 4, Adding User Interaction and Productivity Tools, and the new fragment shader is described here:

          !createGraphicsPipeline(vkDev, renderPass_,        pipelineLayout_,        { "data/shaders/chapter04/imgui.vert",          "data/shaders/chapter06/imgui_multi.frag" },        &graphicsPipeline_,        VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,        true, true, true))

  8. If we fail to create any of the Vulkan objects, we display a small error message. Further diagnostics are displayed in the validation layer's output:

      {

        printf(      "ImGuiRenderer: pipeline creation failed ");

        exit(EXIT_FAILURE);

      }

This concludes our list of changes to the constructor code. The descriptor set creation code is similar to that of the VulkanQuadRenderer::createDescriptorSet() function, but since we skipped the implementation details at the beginning of this recipe, we describe the complete ImGuiRenderer::createMultiDesriptorSet() method here.

  1. The method starts with a description of buffer bindings. The difference here is that we might have more than one texture, so we specify this explicitly by asking for the extTextures_ array size:

    bool ImGuiRenderer::createMultiDescriptorSet(  VulkanRenderDevice& vkDev)

    {

      const std::array<VkDescriptorSetLayoutBinding, 4>    bindings = {

        descriptorSetLayoutBinding(0,      VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,      VK_SHADER_STAGE_VERTEX_BIT),

        descriptorSetLayoutBinding(1,      VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,      VK_SHADER_STAGE_VERTEX_BIT),

        descriptorSetLayoutBinding(2,      VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,      VK_SHADER_STAGE_VERTEX_BIT),

        descriptorSetLayoutBinding(3,      VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,      VK_SHADER_STAGE_FRAGMENT_BIT,      1 + extTextures_.size())

      };

  2. To create the descriptor set layout, we need the creation structure, which contains a reference to our binding flags:

      const VkDescriptorSetLayoutCreateInfo layoutInfo = {    .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET              _LAYOUT_CREATE_INFO,    .pNext = nullptr,    .flags = 0,    .bindingCount =      static_cast<uint32_t>(bindings.size()),    .pBindings = bindings.data()  };

      VK_CHECK(vkCreateDescriptorSetLayout(vkDev.device,    &layoutInfo, nullptr, &descriptorSetLayout_));

  3. As usual, we allocate descriptor sets for each of the swapchain images:

      std::vector<VkDescriptorSetLayout> layouts(    vkDev.swapchainImages.size(),    descriptorSetLayout_);

      const VkDescriptorSetAllocateInfo allocInfo = {    .sType =      VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,    .pNext = nullptr,    .descriptorPool = descriptorPool_,    .descriptorSetCount = static_cast<uint32_t>(      vkDev.swapchainImages.size()),    .pSetLayouts = layouts.data()  };

      descriptorSets_.resize(    vkDev.swapchainImages.size());

      VK_CHECK(vkAllocateDescriptorSets(vkDev.device,    &allocInfo, descriptorSets_.data()));

  4. Now, we should create image information structures for all of the used textures. The first texture is the ImGui font loaded in the constructor:

      std::vector<VkDescriptorImageInfo>    textureDescriptors = {      { fontSampler_, font_.imageView,        VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL }  };

  5. For each of the external textures, we create an image information structure. For simplicity, we assume all of our textures are in the shader-optimal layout:

      for (size_t i = 0; i < extTextures_.size(); i++)    textureDescriptors.push_back({      .sampler = extTextures_[i].sampler,      .imageView = extTextures_[i].image.imageView,      .imageLayout =        VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL     });

  6. Now, we should proceed to updating each of the created descriptor sets:

      for (size_t i = 0; i < vkDev.swapchainImages.size();       i++) {

        VkDescriptorSet ds = descriptorSets_[i];

  7. The three buffers with uniform data, item indices, and vertices are the same as in the createDescriptorSet() function:

        const VkDescriptorBufferInfo bufferInfo1 =      { uniformBuffers_[i], 0, sizeof(mat4) };

        const VkDescriptorBufferInfo bufferInfo2 =      { storageBuffer_[i], 0, ImGuiVtxBufferSize };

        const VkDescriptorBufferInfo bufferInfo3 =      { storageBuffer_[i], ImGuiVtxBufferSize,        ImGuiIdxBufferSize };

  8. The parameters to vkUpdateDescriptorSets() include the buffer information structures at the beginning:

        const std::array<VkWriteDescriptorSet, 4>      descriptorWrites = {

          bufferWriteDescriptorSet(ds, &bufferInfo1,  0,        VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER),

          bufferWriteDescriptorSet(ds, &bufferInfo2, 1,        VK_DESCRIPTOR_TYPE_STORAGE_BUFFER),

          bufferWriteDescriptorSet(ds, &bufferInfo3, 2,        VK_DESCRIPTOR_TYPE_STORAGE_BUFFER),

  9. The last binding in our descriptor sets is the indexed texture array. We explicitly state that this binding is indexed and refers to an array of texture handles in the textureDescriptors variable:

          VkWriteDescriptorSet {        .sType =           VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,        .dstSet = descriptorSets_[i],        .dstBinding = 3,        .dstArrayElement = 0,        .descriptorCount = static_cast<uint32_t>(          1 + extTextures_.size()),        .descriptorType =          VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,        .pImageInfo = textureDescriptors.data()    },

      };

  10. After updating each descriptor set, the function quietly returns:

        vkUpdateDescriptorSets(vkDev.device,      static_cast<uint32_t>(descriptorWrites.size()),      descriptorWrites.data(), 0, nullptr);

      }

      return true;

    }

  11. The only thing we modify in ImGuiRenderer::addImGuiItem() is the passing of the texture index. We should insert the following snippet between the vkCmdSetScissor() and vkCmdDraw() calls. If our external textures array is not empty, we extract the texture ID and pass it to our fragment shader using the push constants mechanism:

        if (textures.size()) {

          uint32_t texture =        (uint32_t)(intptr_t)pcmd->TextureId;

          vkCmdPushConstants(commandBuffer,         pipelineLayout, VK_SHADER_STAGE_FRAGMENT_BIT,         0, sizeof(uint32_t), (const void*)&texture);

        }

Finally, we implement a new fragment shader in data/shaders/chapter06/imgui_multi.frag, because the vertex shader remains the same.

  1. The fragment shader takes the UV texture coordinates and an optional item color as input. The only output is the fragment color:

    #version 460

    #extension GL_EXT_nonuniform_qualifier : require

    layout(location = 0) in vec2 uv;

    layout(location = 1) in vec4 color;

    layout(location = 0) out vec4 outColor;

    layout(binding = 3) uniform sampler2D textures[];

  2. Since we are rendering different UI elements using the same shader, we pass the texture index as a push constant:

    layout(push_constant) uniform pushBlock { uint index; } pushConsts;

  3. The main() function is slightly more complicated than the one in Chapter 4, Adding User Interaction and Productivity Tools. Here, we "decode" the passed texture index and decide how to interpret this texture's contents before outputting the fragment color. The higher 16 bits of the texture index indicate whether the fetched texture value should be interpreted as a color or as the depth buffer value:

    void main() {

      const uint kDepthTextureMask = 0xFFFF;

      uint texType =   (pushConsts.index >> 16) & kDepthTextureMask;

  4. The actual texture index in the textures array is stored in the lower 16 bits. After selecting the texture index, we sample the texture:

      uint tex = pushConsts.index & kDepthTextureMask;

      vec4 value = texture(    textures[nonuniformEXT(tex)], uv);

  5. If the texture type is a standard font texture or a red-green-blue (RGB) image, we multiply by the color and return. Otherwise, if the texture contains depth values, we output a grayscale value:

      outColor = (texType == 0) ?   (color * value) : vec4(value.rrr, 1.0);

    }

Let's look at the C++ counterpart of the code. The hypothetical usage of this new ImGui renderer's functionality can be wrapped in the following helper function. It accepts a window title and a texture ID, which is simply the index in the ImGuiRenderer::extTextures_ array passed to the constructor at creation time.

  1. The function creates a default window and fills the entire window with the texture's content. We get the minimum and maximum boundaries:

    void imguiTextureWindow(  const char* Title, uint32_t texId)

    {

      ImGui::Begin(Title, nullptr);

      ImVec2 vMin = ImGui::GetWindowContentRegionMin();

      ImVec2 vMax = ImGui::GetWindowContentRegionMax();

  2. The ImGui::Image() call creates a rectangular texture item, which is added to the draw list:

      ImGui::Image( (void*)(intptr_t)texId,    ImVec2(vMax.x - vMin.x, vMax.y – vMin.y));

      ImGui::End();

    }

  3. If we need to display the contents of a color buffer, we use the call with a texture index. If the texture contains depth data, we set the higher 16 bits to the texture index:

      imguiTextureWindow("Some title", textureID);

      imguiTextureWindow("Some depth buffer",    textureID | 0xFFFF);

In the subsequent chapters, we will show how to use this ability to display intermediate buffers for debugging purposes.

Generating textures in Vulkan using compute shaders

Now that we can initialize and use compute shaders, it is time to give a few examples of how to use these. Let's start with some basic procedural texture generation. In this recipe, we implement a small program to display animated textures whose pixel values are calculated in real time inside our custom compute shader. To add even more value to this recipe, we will port a GLSL shader from https://www.shadertoy.com to our Vulkan compute shader.

Getting ready

The compute pipeline creation code and Vulkan application initialization are the same as in the Initializing compute shaders in Vulkan recipe. Make sure you read this before proceeding further. To use and display the generated texture, we need a textured quad renderer. Its complete source code can be found in shared/vkRenderers/VulkanSingleQuad.cpp. We will not focus on its internals here because, at this point, it should be easy for you to implement such a renderer on your own using the material of the previous chapters. One of the simplest ways to do so would be to modify the ModelRenderer class from shared/vkRenderers/VulkanModelRenderer.cpp and fill the appropriate index and vertex buffers in the class constructor.

The original Industrial Complex shader that we are going to use here to generate a Vulkan texture was created by Gary "Shane" Warne (https://www.shadertoy.com/user/Shane) and can be downloaded from ShaderToy at https://www.shadertoy.com/view/MtdSWS.

How to do it...

Let's start by discussing the process of writing a texture-generating GLSL compute shader. The simplest shader to generate a red-green-blue-alpha (RGBA) image without using any input data outputs an image by using the gl_GlobalInvocationID built-in variable to know which pixel to output. This maps directly to how ShaderToy shaders operate, thus we can transform them into a compute shader just by adding some input and output (I/O) parameters and layout modifiers specific to compute shaders and Vulkan. Let's take a look at a minimalistic compute shader that creates a red-green gradient texture.

  1. As in all other compute shaders, one mandatory line at the beginning tells the driver how to distribute the workload on the GPU. In our case, we are processing tiles of 16x16 pixels:

    layout (local_size_x = 16, local_size_y = 16) in;

  2. The only buffer binding that we need to specify is the output image. This is the first time we have used the image2D image type in this book. Here, it means that the result variable is a two-dimensional (2D) array whose elements are nothing else but pixels of an image. The writeonly layout qualifier instructs the compiler to assume we will not read from this image in the shader:

    layout (binding = 0, rgba8) uniform   writeonly image2D result;

  3. The GLSL compute shading language provides a set of helper functions to retrieve various image attributes. We use the built-in imageSize() function to determine the size of an image in pixels:

    void main()

    {

      ivec2 dim = imageSize(result);

  4. The gl_GlobalInvocationID built-in variable tells us which global element of our compute grid we are processing. To convert this value into 2D image coordinates, we divide it by the image size. As we are dealing with 2D textures, only x and y components matter. The calling code from the C++ side executes the vkCmdDispatch() function and passes the output image size as the X and Y numbers of local workgroups:

      vec2 uv = vec2(gl_GlobalInvocationID.xy) / dim;

  5. The actual real work we do in this shader is to call the imageStore() GLSL function:

      imageStore(result, ivec2(gl_GlobalInvocationID.xy),    vec4(uv, 0.0, 1.0));

    }

Now, the preceding example is rather limited, and all you get is a red-and-green gradient image. Let's change it a little bit to use the actual shader code from ShaderToy. The compute shader that renders a Vulkan version of the Industrial Complex shader from ShaderToy, available via the following Uniform Resource Locator (URL), https://shadertoy.com/view/MtdSWS, can be found in the shaders/chapter06/VK03_compute_texture.comp file.

  1. First, let's copy the entire original ShaderToy GLSL code into our new compute shader. There is a function called mainImage() in there that is declared as follows:

    void mainImage(out vec4 fragColor, in vec2 fragCoord)

  2. We should replace it with a function that returns a vec4 color instead of storing it in the output parameter:

    vec4 mainImage(in vec2 fragCoord)

    Don't forget to add an appropriate return statement at the end.

  3. Now, let's change the main() function of our compute shader to invoke mainImage() properly. It is a pretty neat trick:

    void main()

    {

      ivec2 dim = imageSize(result);

      vec2 uv = vec2(gl_GlobalInvocationID.xy) / dim;

      imageStore(result, ivec2(gl_GlobalInvocationID.xy),    mainImage(uv*dim));

    }

  4. There is still one issue that needs to be resolved before we can run this code. The ShaderToy code uses two custom input variables, iTime for the elapsed time, and iResolution, which contains the size of the resulting image. To avoid any search and replace in the original GLSL code, we mimic these variables, one as a push constant, and the other with a hardcoded value for simplicity:

    layout(push_constant) uniform uPushConstant {

      float time;

    } pc;

    vec2 iResolution = vec2( 1280.0, 720.0 );

    float iTime = pc.time;

    Important note

    The GLSL imageSize() function can be used to obtain the iResolution value based on the actual size of our texture. We leave this as an exercise for the reader.

  5. The C++ code is rather short and consists of invoking the aforementioned compute shader, inserting a Vulkan pipeline barrier, and rendering a texture quad. The pipeline barrier that ensures the compute shader finishes before texture sampling happens can be created in the following way:

    void insertComputedImageBarrier(  VkCommandBuffer commandBuffer, VkImage image)

    {

      const VkImageMemoryBarrier barrier = {    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,    .srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT,    .dstAccessMask = VK_ACCESS_SHADER_READ_BIT,    .oldLayout = VK_IMAGE_LAYOUT_GENERAL,    .newLayout = VK_IMAGE_LAYOUT_GENERAL,    .image = image,    .subresourceRange =      { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }  };

      vkCmdPipelineBarrier(commandBuffer,    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,    VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,    0, 0, nullptr, 0, nullptr, 1, &barrier);

    }

The running application should render an image like the one shown in the following screenshot, which is similar to the output of https://www.shadertoy.com/view/MtdSWS:

Figure 6.2 – Using compute shaders to generate textures

Figure 6.2 – Using compute shaders to generate textures

In the next recipe, we will continue learning the Vulkan compute pipeline and implement a mesh-generation compute shader.

Implementing computed meshes in Vulkan

In the Initializing compute shaders in Vulkan recipe, we learned how to initialize the compute pipeline in Vulkan. We are going to need it in this chapter to implement a BRDF precomputation tool for our PBR pipeline. But before that, let's learn a few simple and interesting ways to use compute shaders in Vulkan and combine this feature with mesh geometry generation on the GPU.

We are going to run a compute shader to create triangulated geometry of a three-dimensional (3D) torus knot shape with different P and Q parameters.

Important note

A torus knot is a special kind of knot that lies on the surface of an unknotted torus in 3D space. Each torus knot is specified by a pair of p and q coprime integers. You can read more on this at https://en.wikipedia.org/wiki/Torus_knot.

The data produced by the compute shader is stored in a shader storage buffer and used in a vertex shader in a typical programmable-vertex-fetch way. To make the results more visually pleasing, we will implement real-time morphing between two different torus knots controllable from an ImGui widget. Let's get started.

Getting ready

The source code for this example is located in Chapter6/VK04_ComputeMesh.

How to do it...

The application consists of three different parts: the C++ part, which drives the UI and Vulkan commands, the mesh-generation compute shader, and the rendering pipeline with simple vertex and fragment shaders. The C++ part in Chapter6/VK04_ComputeMesh/src/main.cpp is rather short, so let's tackle this first.

  1. We store a queue of P-Q pairs that defines the order of morphing. The queue always has at least two elements that define the current and the next torus knot. We also store a morphCoef floating-point value that is the 0...1 morphing factor between these two pairs in the queue. The mesh is regenerated every frame and the morphing coefficient is increased until it reaches 1.0. At this point, we will either stop morphing or, in case there are more than two elements in the queue, remove the top element from it, reset morphCoef back to 0, and repeat. The animationSpeed value defines how fast one torus knot mesh morphs into another:

    std::deque<std::pair<uint32_t, uint32_t>> morphQueue =  { { 5, 8 }, { 5, 8 } };

    float morphCoef = 0.0f;

    float animationSpeed = 1.0f;

  2. Two global constants define the tessellation level of a torus knot. Feel free to play around with them:

    const uint32_t numU = 1024;

    const uint32_t numV = 1024;

  3. Another global declaration is the structure to pass data into the compute shader inside a uniform buffer. Note two sets of P and Q parameters here:

    struct MeshUniformBuffer {

      float time;

      uint32_t numU;

      uint32_t numV;

      float minU, maxU;

      float minV, maxV;

      uint32_t p1, p2;

      uint32_t q1, q2;

      float morph;

    } ubo;

  4. Regardless of the P and Q parameter values, we have a single order in which we should traverse vertices to produce torus knot triangles. The generateIndices() function prepares index buffer data for this purpose:

    void generateIndices(uint32_t* indices) {

      for (uint32_t j = 0 ; j < numV - 1 ; j++) {

        for (uint32_t i = 0 ; i < numU - 1 ; i++) {

          uint32_t offset = (j * (numU - 1) + i) * 6;

          uint32_t i1 = (j + 0) * numU + (i + 0);      uint32_t i2 = (j + 0) * numU + (i + 1);      uint32_t i3 = (j + 1) * numU + (i + 1);      uint32_t i4 = (j + 1) * numU + (i + 0);

          indices[offset + 0] = i1;      indices[offset + 1] = i2;      indices[offset + 2] = i4;      indices[offset + 3] = i2;      indices[offset + 4] = i3;      indices[offset + 5] = i4;

        }

      }

    }

Besides that, our C++ initialization part is in the initMesh() function. This allocates all the necessary buffers, uploads indices data into the GPU, loads compute shaders for texture and mesh generation, and creates two model renderers, one for a textured mesh and another for a colored one.

  1. First, we should allocate storage for our generated indices data. To make things simpler, we do not use triangle strips, so it is always 6 indices for each quad defined by the UV mapping:

    void initMesh() {

      std::vector<uint32_t> indicesGen(    (numU - 1) * (numV - 1) * 6);

          generateIndices(indicesGen.data());

  2. Compute all the necessary sizes for our GPU buffer. 12 floats are necessary to store three vec4 components per vertex. The actual data structure is defined only in GLSL and can be found in data/shaders/chapter06/mesh_common.inc:

      uint32_t vertexBufferSize =    12 * sizeof(float) * numU * numV;

      uint32_t indexBufferSize =    6 * sizeof(uint32_t) * (numU-1) * (numV-1);

      uint32_t bufferSize =    vertexBufferSize + indexBufferSize;

  3. Load both compute shaders. The grid size for texture generation is fixed at 1024x1024. The grid size for the mesh can be tweaked using numU and numV:

      imgGen = std::make_unique<ComputedImage>(vkDev,    "data/shaders/chapter06/VK04_compute_texture.comp",    1024, 1024, false);

      meshGen = std::make_unique<ComputedVertexBuffer>(    vkDev,    "data/shaders/chapter06/VK04_compute_mesh.comp",    indexBufferSize, sizeof(MeshUniformBuffer),    12 * sizeof(float), numU * numV);

  4. Use a staging buffer to upload indices data into the GPU memory:

      VkBuffer stagingBuffer;

      VkDeviceMemory stagingBufferMemory;

      createBuffer(vkDev.device, vkDev.physicalDevice,    bufferSize,    VK_BUFFER_USAGE_TRANSFER_SRC_BIT,    VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |      VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,    stagingBuffer, stagingBufferMemory);

      void* data = nullptr;

      vkMapMemory(vkDev.device, stagingBufferMemory, 0,    bufferSize, 0, &data);

      memcpy((void*)((uint8_t*)data + vertexBufferSize),    indicesGen.data(), indexBufferSize);

      vkUnmapMemory(vkDev.device, stagingBufferMemory);

      copyBuffer(vkDev, stagingBuffer,    meshGen->computedBuffer, bufferSize);

    Note

    More examples of staging buffers can be found in the Using texture data in Vulkan recipe from Chapter 3, Getting Started with OpenGL and Vulkan.

  5. Since indices are static, we do not require the staging buffer anymore, so it can be deallocated right here:

      vkDestroyBuffer(    vkDev.device, stagingBuffer, nullptr);

      vkFreeMemory(    vkDev.device, stagingBufferMemory, nullptr);

  6. Fill the Vulkan command buffer for our computed mesh generator, submit it for execution, and wait for the results:

      meshGen->fillComputeCommandBuffer();

      meshGen->submit();

      vkDeviceWaitIdle(vkDev.device);

Last but not least, let's create two model renderers.

  1. The first one will draw the generated mesh geometry textured with an image generated by a compute shader. The texture-generation process can be learned in the previous Generating textures in Vulkan using compute shaders recipe. The texture comes from a compute shader:

      std::vector<const char*> shaders =    { "data/shaders/chapter06/VK04_render.vert",      "data/shaders/chapter06/VK04_render.frag" };

      mesh = std::make_unique<ModelRenderer>(vkDev, true,    meshGen->computedBuffer, meshGen->computedMemory,    vertexBufferSize, indexBufferSize,    imgGen->computed, imgGen->computedImageSampler,    shaders, (uint32_t)sizeof(mat4), true);

  2. The second ModelRenderer object will apply only a solid color with some simple lighting. The only difference is in the set of shaders:

      std::vector<const char*> shadersColor =    {"data/shaders/chapter06/VK04_render.vert",     "data/shaders/chapter06/VK04_render_color.frag"};

      meshColor = std::make_unique<ModelRenderer>(vkDev,    true, meshGen->computedBuffer,    meshGen->computedMemory,    vertexBufferSize, indexBufferSize,    imgGen->computed, imgGen->computedImageSampler,    shadersColor, (uint32_t)sizeof(mat4),    true, mesh->getDepthTexture(), false);

    }  

Now, we need our chapter06/VK04_compute_mesh.comp mesh-generation compute shader.

  1. The compute shader outputs vertex data into the buffer filling the VertexData structure per each vertex:

    #version 440

    layout (local_size_x = 2, local_size_y = 1, local_size_z   = 1) in;

    struct VertexData {

      vec4 pos, tc, norm;

    };

    layout (binding = 0) buffer VertexBuffer {

      VertexData vertices[];

    } vbo;

  2. A bunch of uniforms come from C++. They correspond to the MeshUniformBuffer structure mentioned earlier:

    layout (binding = 1) uniform UniformBuffer {

      float time;

      uint  numU,  numV;

      float minU, maxU, minV, maxV;

      uint P1, P2, Q1, Q2;

      float morph;

    } ubo;

  3. The heart of our mesh-generation algorithm is the torusKnot() function, which uses the following parametrization to triangulate a torus knot:

    x = r * cos(u)

    y = r * sin(u)

    z = -sin(v)

  4. The torusKnot() function is rather long and is implemented directly from the aforementioned parametrization. Feel free to play with the baseRadius, segmentRadius, and tubeRadius values:

    VertexData torusKnot(vec2 uv, vec2 pq) {

      const float p = pq.x;

      const float q = pq.y;

      const float baseRadius    = 5.0;

      const float segmentRadius = 3.0;

      const float tubeRadius    = 0.5;

      float ct = cos(uv.x);

      float st = sin(uv.x);

      float qp = q / p;

      float qps = qp * segmentRadius;

      float arg = uv.x * qp;

      float sqp = sin(arg);

      float cqp = cos(arg);

      float BSQP = baseRadius + segmentRadius * cqp;

      float dxdt = -qps * sqp * ct - st * BSQP;

      float dydt = -qps * sqp * st + ct * BSQP;

      float dzdt =  qps * cqp;

      vec3 r =    vec3(BSQP * ct, BSQP * st, segmentRadius * sqp);

      vec3 drdt = vec3(dxdt, dydt, dzdt);

      vec3 v1 = normalize(cross(r, drdt));

      vec3 v2 = normalize(cross(v1, drdt));

      float cv = cos(uv.y);

      float sv = sin(uv.y);

      VertexData res;

      res.pos = vec4(r+tubeRadius*(v1 * sv + v2 * cv), 1);

      res.norm = vec4(cross(v1 * cv - v2 * sv, drdt ), 0);

      return res;

    }

  5. We are running this compute shader on each frame so, instead of generating a static set of vertices, we can actually pre-transform them to make the mesh look like it is rotating. Here are a couple of helper functions to compute appropriate rotation matrices:

    mat3 rotY(float angle) {

      float c = cos(angle), s = sin(angle);

      return mat3(c, 0, -s, 0, 1, 0, s, 0, c);

    }

    mat3 rotZ(float angle) {

      float c = cos(angle), s = sin(angle);

      return mat3(c, -s, 0, s, c, 0, 0, 0, 1);

    }

Using the aforementioned helpers, the main() function of our compute shader is now straightforward, and the only interesting thing worth mentioning here is the real-time morphing that blends two torus knots with different P and Q parameters. This is pretty easy because the total number of vertices always remains the same. Let's take a closer look.

  1. First, the two sets of UV coordinates for parametrization need to be computed:

    void main() {

      uint index = gl_GlobalInvocationID.x;

      vec2 numUV = vec2(ubo.numU, ubo.numV);

      vec2 ij = vec2(float(index / ubo.numU),                 float(index % ubo.numU));

      const vec2 maxUV1 =    2.0 * 3.1415926 * vec2(ubo.P1, 1.0);

      vec2 uv1 = ij * maxUV1 / (numUV - vec2(1));

      const vec2 maxUV2 =    2.0 * 3.1415926 * vec2(ubo.P2, 1.0);

      vec2 uv2 = ij * maxUV2 / (numUV - vec2(1));

    Note

    Refer to the https://en.wikipedia.org/wiki/Torus_knot Wikipedia page for additional explanation of the math details.

  2. Compute the model matrix for our mesh by combining two rotation matrices:

      mat3 modelMatrix =    rotY(0.5 * ubo.time) * rotZ(0.5 * ubo.time);

  3. Compute two vertex positions for two different torus knots:

      VertexData v1 =    torusKnot(uv1, vec2(ubo.P1, ubo.Q1));

      VertexData v2 =    torusKnot(uv2, vec2(ubo.P2, ubo.Q2));

  4. Do a linear blend between them using the ubo.morph coefficient. We need to blend only the position and the normal vector:

      vec3 pos = mix(v1.pos.xyz, v2.pos.xyz, ubo.morph);

      vec3 norm =    mix(v1.norm.xyz, v2.norm.xyz, ubo.morph);

  5. Fill in the resulting VertexData structure and store it in the output buffer:

      VertexData vtx;

      vtx.pos  = vec4(modelMatrix * pos, 1);

      vtx.tc   = vec4(ij / numUV, 0, 0);

      vtx.norm = vec4(modelMatrix * norm, 0);

      vbo.vertices[index] = vtx;

    }

Both the vertex and fragment shaders used to render this mesh are trivial and can be found in chapter06/VK04_render.vert, chapter06/VK04_render.frag, and chapter06/VK04_render_color.frag. Feel free to take a look yourself, as we are not going to copy and paste them here.

The demo application will produce a variety of torus knots similar to the one shown in the following screenshot. Each time you select a new pair of P-Q parameters from the UI, the morphing animation will kick in and transform one knot into another. Checking the Use colored mesh box will apply colors to the mesh instead of a computed texture:

Figure 6.3 – Computed mesh with real-time animation

Figure 6.3 – Computed mesh with real-time animation

There's more...

In this recipe, all the synchronization between the mesh-generation process and rendering was done using vkDeviceWaitIdle(), essentially making these two processes completely serial and inefficient. While this is acceptable for the purpose of showing a single feature in a standalone demo app, in a real-world application a fine-grained synchronization would be desirable to allow mesh generation and rendering to run—at least partially—in parallel. Check out the guide on Vulkan synchronization from Khronos for useful insights on how to do this: https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples.

Now, let's switch back to the main topic of this chapter and learn how to precompute BRDF LUTs for PBR rendering using compute shaders.

Precomputing BRDF LUTs

In the previous recipes, we learned how to initialize compute pipelines in Vulkan and demonstrated the basic functionality of compute shaders. Let's switch gears back to PBR and learn how to precompute the Smith GGX BRDF LUT. To render a PBR image, we have to evaluate the BRDF at each point based on surface properties and viewing direction. This is computationally expensive, and many real-time implementations, including the reference glTF-Sample-Viewer implementation from Khronos, use precalculated tables of some sort to find the BRDF value based on surface roughness and viewing direction. A BRDF LUT can be stored as a 2D texture where the x axis corresponds to the dot product between the surface normal vector and the viewing direction, and the y axis corresponds to the 0...1. surface roughness. Each texel stores two 16-bit floating-point values—namely, a scale and bias to F0, which is the specular reflectance at normal incidence.

Important note

In this recipe, we focus purely on details of a minimalistic implementation and do not touch on any math behind it. For those interested in the math behind this approach, check out the Environment BRDF section from the Real Shading in Unreal Engine 4 presentation by Brian Karis at https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf.

We are going to use Vulkan to calculate this texture on the GPU and implement a compute shader to do this.

Getting ready

It would be helpful to revisit the compute pipeline creation from the Initializing compute shaders in Vulkan recipe. Our implementation is based on https://github.com/SaschaWillems/Vulkan-glTF-PBR/blob/master/data/shaders/genbrdflut.frag, which runs the same computations in a fragment shader. Our compute shader can be found in data/shaders/chapter06/VK01_BRDF_LUT.comp.

How to do it...

Before we look into the shader code, let's implement a generic class to process data arrays on the GPU.

To manipulate data buffers on the GPU and use the data, we need three basic operations: upload data from the host memory into a GPU buffer, download data from a GPU buffer to the host memory, and run a compute shader workload on that buffer. The data uploading and downloading process consists of mapping the GPU memory to the host address space and then using memcpy() to transfer buffer contents.

  1. Our uploadBufferData() function uses the vkMapMemory() and vkUnmapMemory() Vulkan API calls to map and unmap memory:

    void uploadBufferData(const VulkanRenderDevice& vkDev,  VkDeviceMemory& bufferMemory,  VkDeviceSize deviceOffset,  const void* data, const size_t dataSize)

    {  

      void* mappedData = nullptr;

      vkMapMemory(vkDev.device, bufferMemory,    deviceOffset, dataSize, 0, &mappedData);

      memcpy(mappedData, data, dataSize);

      vkUnmapMemory(vkDev.device, bufferMemory);

    }

  2. The downloadBufferData() function is quite similar to the preceding code. The only difference is the copying "direction"—we read from the mapped memory and store it on our app's heap:

    void downloadBufferData(VulkanRenderDevice& vkDev,  VkDeviceMemory& bufferMemory,  VkDeviceSize deviceOffset,  void* outData, const size_t dataSize)

    {

      void* mappedData = nullptr;

      vkMapMemory(vkDev.device, bufferMemory,    deviceOffset, dataSize, 0, &mappedData);

      memcpy(outData, mappedData, dataSize);

      vkUnmapMemory(vkDev.device, bufferMemory);

    }

    Note

    The downloadBufferData() function should be called only after a corresponding memory barrier was executed in the compute command queue. Make sure to read the Initializing compute shaders in Vulkan recipe and the source code of executeComputeShader() for more details.

We now have two helper functions to move data around from the CPU to GPU buffers, and vice versa. Recall that in the Initializing compute shaders in Vulkan recipe, we implemented the executeComputeShader() function, which starts the compute pipeline. Let's focus on data manipulation on the GPU.

  1. By using the executeComputeShader() function, we can implement the ComputeBase class to nicely hide mentions of Vulkan devices, pipelines, and descriptor sets:

    class ComputeBase {

      void uploadInput(uint32_t offset,    void* inData, uint32_t byteCount) {

        uploadBufferData(vkDev, inBufferMemory, offset,      inData, byteCount);

      }

      void downloadOutput(uint32_t offset,    void* outData, uint32_t byteCount) {

        downloadBufferData(vkDev, outBufferMemory,      offset, outData, byteCount);

      }

  2. To immediately execute a GPU compute workload, the execute() method is provided:

      bool execute(uint32_t xSize, uint32_t ySize,    uint32_t zSize) {

        return executeComputeShader(vkDev, pipeline,      pipelineLayout, descriptorSet,      xSize, ySize, zSize);

      }

  3. Early in the constructor, we allocate I/O buffers. Those are shared between the compute and graphics queues:

    ComputeBase::ComputeBase(VulkanRenderDevice& vkDev,  const char* shaderName, uint32_t inputSize,  uint32_t outputSize)

    : vkDev(vkDev)

    {

      createSharedBuffer(vkDev, inputSize,    VK_BUFFER_USAGE_STORAGE_BUFFER_BIT,    VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |      VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,    inBuffer, inBufferMemory);

  4. To simplify our code, we allocate both buffers as host-visible. If the output buffer is needed only for rendering purposes, host visibility and coherence can be disabled:

      createSharedBuffer(vkDev, outputSize,    VK_BUFFER_USAGE_STORAGE_BUFFER_BIT,    VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |      VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,    outBuffer, outBufferMemory);

  5. This class uses a single compute shader to process the input buffer and write the output buffer:

      ShaderModule s;

      createShaderModule(vkDev.device, &s, shaderName);

  6. We use a descriptor set and pipeline creation routines similar to the ones from the previous recipe:

      createComputeDescriptorSetLayout(    vkDev.device, &dsLayout);

      createPipelineLayout(    vkDev.device, dsLayout, &pipelineLayout);

      createComputePipeline(vkDev.device, s.shaderModule,    pipelineLayout, &pipeline);

      createComputeDescriptorSet(vkDev.device, dsLayout);

  7. Finally, we dispose of the unused compute shader module:

      vkDestroyShaderModule(    vkDev.device, s.shaderModule, nullptr);

    }

As we might suspect, the longest method is the createComputeDescriptorSet() dreaded descriptor set creation function. Let's take a closer look at the steps.

  1. Fortunately, we only have two buffers, so the descriptor pool creation is relatively simple:

    bool ComputeBase::createComputeDescriptorSet(  VkDevice device,  VkDescriptorSetLayout descriptorSetLayout)

    {

      VkDescriptorPoolSize descriptorPoolSize = {    VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 2   };

      VkDescriptorPoolCreateInfo descriptorPoolCreateInfo = {    VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,    0, 0, 1, 1, &descriptorPoolSize   };

      VK_CHECK(vkCreateDescriptorPool(device,    &descriptorPoolCreateInfo, 0, &descriptorPool));

  2. The descriptor set creation is also straightforward, since we only need one set for the computation:

      VkDescriptorSetAllocateInfo     descriptorSetAllocateInfo = {      VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,      0, descriptorPool, 1, &descriptorSetLayout   };

      VK_CHECK(vkAllocateDescriptorSets(device,    &descriptorSetAllocateInfo, &descriptorSet));

  3. The I/O buffer handles are bound to the descriptor set. The buffer information structures are as simple as possible:

      VkDescriptorBufferInfo inBufferInfo =    { inBuffer, 0, VK_WHOLE_SIZE };

      VkDescriptorBufferInfo outBufferInfo =    { outBuffer, 0, VK_WHOLE_SIZE };

  4. The descriptor set update parameters refer to both buffers. After updating descriptor sets, we can return successfully:

      VkWriteDescriptorSet writeDescriptorSet[2] = {    { VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, 0,      descriptorSet, 0, 0, 1,      VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 0,      &inBufferInfo, 0},

       { VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, 0,     descriptorSet, 1, 0, 1,      VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 0,     &outBufferInfo, 0}  };

      vkUpdateDescriptorSets(    device, 2, writeDescriptorSet, 0, 0);

      return true;

    }

The preceding ComputeBase class is used directly in our Chapter6/VK01_BRDF_LUT tool. The entire computational heavy lifting to precalculate a BRDF LUT is done in a compute shader. Let's look inside the data/chapter06/VK01_BRDF_LUT.comp GLSL code:

  1. To break down our work into smaller pieces, we start from the preamble and the main() function of the BRDF LUT calculation shader. The preamble first sets the compute shader dispatching parameters. In our case, a single point of the texture is calculated by one GPU worker. The number of Monte Carlo trials for numeric integration is declared as a constant:

    layout (local_size_x = 1, local_size_y = 1,  local_size_z = 1) in;

    layout (constant_id = 0)  const uint NUM_SAMPLES = 1024u;

    layout (set = 0, binding = 0)  buffer SRC { float data[]; } src;

    layout (set = 0, binding = 1)  buffer DST { float data[]; } dst;

  2. We use a fixed width and height for the I/O buffer layouts. Last, but not least, PI is the only "physical" constant we use:

    const uint BRDF_W = 256;

    const uint BRDF_H = 256;

    const float PI = 3.1415926536;

  3. The main() function just wraps the BRDF function call. First, we recalculate the worker ID to output array indices:

    void main() {

      vec2 uv;

      uv.x = float(gl_GlobalInvocationID.x) / float(BRDF_W);

      uv.y = float(gl_GlobalInvocationID.y) / float(BRDF_H);

  4. The BRDF() function does all the actual work. The calculated value is put into the 2D array:

      vec2 v = BRDF(uv.x, 1.0 - uv.y);

      uint offset = gl_GlobalInvocationID.y * BRDF_W +                gl_GlobalInvocationID.x;

      dst.data[offset * 2 + 0] = v.x;

      dst.data[offset * 2 + 1] = v.y;

    }

Now that we have described some mandatory compute shader parts, we can see how the BRDF LUT items are calculated. Technically, we calculate the integral value over the hemisphere using the Monte Carlo integration procedure. Let's look at the steps.

  1. To generate random directions in the hemisphere, we use so-called Hammersley points calculated by the following function:

    vec2 hammersley2d(uint i, uint N) {

      uint bits = (i << 16u) | (i >> 16u);

      bits = ((bits&0x55555555u) << 1u) |         ((bits&0xAAAAAAAAu) >> 1u);

      bits = ((bits&0x33333333u) << 2u) |         ((bits&0xCCCCCCCCu) >> 2u);

      bits = ((bits&0x0F0F0F0Fu) << 4u) |         ((bits&0xF0F0F0F0u) >> 4u);

      bits = ((bits&0x00FF00FFu) << 8u) |         ((bits&0xFF00FF00u) >> 8u);

      float rdi = float(bits) * 2.3283064365386963e-10;

      return vec2(float(i) /float(N), rdi);

    }

    Important note

    The code is based on the following post: http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html. The bit-shifting magic for this application and many other applications is thoroughly described in Henry J. Warren's book called Hacker's Delight. Interested readers may also look up the Van der Corput sequence to see why this can be used as random directions on the hemisphere.

  2. We also need some kind of a pseudo-random number generator. We use the output array indices as input and pass them through another magic set of formulas:

    float random(vec2 co) {

      float a = 12.9898;

      float b = 78.233;

      float c = 43758.5453;

      float dt= dot(co.xy ,vec2(a,b));

      float sn= mod(dt, PI);

      return fract(sin(sn) * c);

    }

    Note

    Check out this link to find some useful details about this code: http://byteblacksmith.com/improvements-to-the-canonical-one-liner-glsl-rand-for-opengl-es-2-0/.

  3. Let's take a look at how importance sampling is implemented according to the paper Real Shading in Unreal Engine 4 by Brian Karis. Check out the fourth page of the following document: https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf. This function maps a 2D point to a hemisphere with spread based on surface roughness:

    vec3 importanceSample_GGX(  vec2 Xi, float roughness, vec3 normal)

    {

      float alpha = roughness * roughness;

      float phi =    2.0 * PI * Xi.x + random(normal.xz) * 0.1;

      float cosTheta =    sqrt((1.0-Xi.y)/(1.0+(alpha*alpha-1.0)*Xi.y));

      float sinTheta = sqrt(1.0 - cosTheta * cosTheta);

      vec3 H = vec3(    sinTheta*cos(phi), sinTheta*sin(phi), cosTheta);

      vec3 up = abs(normal.z) < 0.999 ?    vec3(0.0, 0.0, 1.0) : vec3(1.0, 0.0, 0.0);

      vec3 tangentX = normalize(cross(up, normal));

      vec3 tangentY = normalize(cross(normal, tangentX));

      return normalize(    tangentX*H.x + tangentY*H.y + normal*H.z);

    }

  4. There's one more utility function required to calculate BRDF—the geometric shadowing function:

    float G_SchlicksmithGGX(  float dotNL, float dotNV, float roughness)

    {

      float k = (roughness * roughness) / 2.0;

      float GL = dotNL / (dotNL * (1.0 - k) + k);

      float GV = dotNV / (dotNV * (1.0 - k) + k);

      return GL * GV;

    }

  5. The value of BRDF is calculated in the following way, using all of the preceding code. The NUM_SAMPLES number of Monte Carlo trials was set earlier to be 1024:

    vec2 BRDF(float NoV, float roughness)

    {

      const vec3 N = vec3(0.0, 0.0, 1.0);

      vec3 V = vec3(sqrt(1.0 - NoV*NoV), 0.0, NoV);

      vec2 LUT = vec2(0.0);

      for(uint i = 0u; i < NUM_SAMPLES; i++) {

        vec2 Xi = hammersley2d(i, NUM_SAMPLES);

        vec3 H = importanceSample_GGX(Xi, roughness, N);

        vec3 L = 2.0 * dot(V, H) * H - V;

        float dotNL = max(dot(N, L), 0.0);

        float dotNV = max(dot(N, V), 0.0);

        float dotVH = max(dot(V, H), 0.0);

        float dotNH = max(dot(H, N), 0.0);

        if (dotNL > 0.0) {

          float G =        G_SchlicksmithGGX(dotNL, dotNV, roughness);

          float G_Vis = (G * dotVH) / (dotNH * dotNV);

          float Fc = pow(1.0 - dotVH, 5.0);

          LUT += vec2((1.0 - Fc) * G_Vis, Fc * G_Vis);

        }

      }

      return LUT / float(NUM_SAMPLES);

    }

The C++ part of the project is trivial and just runs the compute shader, saving all the results into the data/brdfLUT.ktx file using the OpenGL Image (GLI) library. You can use Pico Pixel (https://pixelandpolygon.com) to view the generated image. It should look like the image shown in the following screenshot:

Figure 6.4 – BRDF LUT

Figure 6.4 – BRDF LUT

This concludes the BRDF LUT tool description. We will need yet another tool to calculate an irradiance cubemap from an environment cube map, which we will cover next.

There's more...

The method described previously can be used to precompute BRDF LUTs using high-quality Monte Carlo integration and store them as textures. Dependent texture fetches can be expensive on some mobile platforms. There is an interesting runtime approximation used in Unreal Engine that does not rely on any precomputation, as described in https://www.unrealengine.com/en-US/blog/physically-based-shading-on-mobile. Here is the GLSL source code:

vec3 EnvBRDFApprox(  vec3 specularColor, float roughness, float NoV )

{

  const vec4 c0 = vec4(-1, -0.0275, -0.572, 0.022);

  const vec4 c1 = vec4( 1, 0.0425, 1.04, -0.04);

  vec4 r = roughness * c0 + c1;

  float a004 =    min( r.x * r.x, exp2(-9.28 * NoV) ) * r.x + r.y;

  vec2 AB = vec2( -1.04, 1.04 ) * a004 + r.zw;

  return specularColor * AB.x + AB.y;

}

Precomputing irradiance maps and diffuse convolution

The second part of the split sum approximation necessary to calculate the glTF2 physically based shading model comes from the irradiance cube map, which is precalculated by convolving the input environment cube map with the GGX distribution of our shading model.

Getting ready

Check out the source code for this recipe in Chapter6/Util01_FilterEnvmap. If you want to dive deep into the math theory behind these computations, make sure you read Brian Karis's paper at https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf.

How to do it...

This code is written for simplicity rather than for speed or precision, so it does not use importance sampling and convolves the input cube map using simple Monte Carlo integration and the Hammersley sequence to generate uniformly distributed 2D points on an equirectangular projection of our input cube map.

The source code can be found in the Chapter6/Util01_FilterEnvmap/src/main.cpp file. Let's quickly go through the steps to cover the entire process.

  1. We need a C++ function to calculate the Van der Corput sequence, as described in Henry J. Warren's Hacker's Delight. Similar code was used in the GLSL shader in the previous recipe:

    float radicalInverse_VdC(uint32_t bits) {

      bits = (bits << 16u) | (bits >> 16u);

      bits = ((bits&0x55555555u) << 1u) |         ((bits&0xAAAAAAAAu) >> 1u);

      bits = ((bits&0x33333333u) << 2u) |         ((bits&0xCCCCCCCCu) >> 2u);

      bits = ((bits&0x0F0F0F0Fu) << 4u) |         ((bits&0xF0F0F0F0u) >> 4u);

      bits = ((bits&0x00FF00FFu) << 8u) |         ((bits&0xFF00FF00u) >> 8u);

      return float(bits) * 2.3283064365386963e-10f;

    }

  2. By definition of the Hammersley point set, the i-th point can be generated using the following function, as described in http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html:

    vec2 hammersley2d(uint32_t i, uint32_t N) {

      return vec2( float(i)/float(N),               radicalInverse_VdC(i) );

    }

Using this random points generator, we can finally convolve the cube map. For simplicity, our code supports only equirectangular projections where the width is twice the height of the image. Here are the steps.

  1. First, we resize the input environment cube map into a smaller image sized dstW x dstH:

    void convolveDiffuse(const vec3* data,  int srcW, int srcH, int dstW, int dstH,  vec3* output, int numMonteCarloSamples)

    {

      assert(srcW == 2 * srcH);

      if (srcW != 2 * srcH) return;

      std::vector<vec3> tmp(dstW * dstH);

      stbir_resize_float_generic(    reinterpret_cast<const float*>(data), srcW, srcH,    0,    reinterpret_cast<float*>(tmp.data()), dstW, dstH,    0, 3, STBIR_ALPHA_CHANNEL_NONE, 0,    STBIR_EDGE_CLAMP, STBIR_FILTER_CUBICBSPLINE,    STBIR_COLORSPACE_LINEAR, nullptr);

      const vec3* scratch = tmp.data();

      srcW = dstW;

      srcH = dstH;

  2. Then, we iterate over every pixel of the output cube map. We calculate two vectors, V1 and V2. The first vector, V1, is the direction to the current pixel of the output cube map. The second one, V2, is the direction to a randomly selected pixel of the input cube map:

      for (int y = 0; y != dstH; y++)

      {

        const float theta1 =      float(y) / float(dstH) * Math::PI;

        for (int x = 0; x != dstW; x++)

        {

          const float phi1 =        float(x) / float(dstW) * Math::TWOPI;

          const vec3 V1 = vec3(sin(theta1) * cos(phi1),        sin(theta1) * sin(phi1), cos(theta1));

          vec3 color = vec3(0.0f);

          float weight = 0.0f;

          for (int i = 0; i != numMonteCarloSamples; i++)

          {

            const vec2 h =          hammersley2d(i, numMonteCarloSamples);

            const int x1 = int(floor(h.x * srcW));

            const int y1 = int(floor(h.y * srcH));

            const float theta2 =          float(y1) / float(srcH) * Math::PI;

            const float phi2 =          float(x1) / float(srcW) * Math::TWOPI;

            const vec3 V2 = vec3(sin(theta2) * cos(phi2),          sin(theta2) * sin(phi2), cos(theta2));

  3. We use the dot product between V1 and V2 to convolve the values of the input cube map. This is done according to the implementation of PrefilterEnvMap() from the following paper: https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf. To speed up our CPU-based implementation, we sacrifice some precision by replacing NdotL > 0 from the original paper with 0.01f. The output value is renormalized using the sum of all NdotL weights:

            const float NdotL =          std::max(0.0f, glm::dot(V1, V2));

            if (NdotL > 0.01f) {

              color += scratch[y1 * srcW + x1] * NdotL;

              weight += NdotL;

            }

          }

          output[y * dstW + x] = color / weight;

        }

      }

    }

The remaining part of the code is purely mechanical work, such as loading the cube map image from the file, invoking the convolveDiffuse() function, and saving the result using the STB library. Let's check out the results of prefiltering for the input image shown in the following screenshot:

Figure 6.5 – Environment cube map

Figure 6.5 – Environment cube map

The convolved image should look like this:

Figure 6.6 – Prefiltered environment cube map using diffuse convolution

Figure 6.6 – Prefiltered environment cube map using diffuse convolution

There's one more fly in the ointment of the approximations already mentioned in this recipe. Technically, we should have a separate convolution for each different BRDF. This is, however, not practical in terms of storage, memory, and performance on mobile. It is wrong but good enough.

We now have all supplementary parts in place to render a PBR image. In the next Implementing the glTF2 shading model recipe, we are going to put everything together into a simple application to render a physically based glTF2 3D model.

There's more...

Paul Bourke created a set of tools and a great resource explaining how to convert cube maps between different formats. Make sure to check it out at http://paulbourke.net/panorama/cubemaps/index.html.

Implementing the glTF2 shading model

This recipe will cover how to integrate a PBR into your graphics pipeline. Since the topic of PBR rendering is vast, we focus on a minimalistic implementation just to guide you and get you started. In the book text right here, we focus on the GLSL shader code for the PBR shading model and use OpenGL to make things simpler. However, the source code bundle for this book contains a relatively small Vulkan implementation that reuses the same GLSL code. Indeed, rendering a physically based image is nothing more than running a fancy pixel shader with a set of textures.

Getting ready

It is recommended to read about glTF 2.0 before you proceed with this recipe. A lightweight introduction to the glTF 2.0 shading model can be found at https://github.com/KhronosGroup/glTF-Sample-Viewer/tree/glTF-WebGL-PBR.

The C++ source code for this recipe is in the Chapter6/GL01_PBR folder. The GLSL shader code responsible for PBR calculations can be found in data/shaders/chapter06/PBR.sp.

How to do it...

Before we dive deep into the GLSL code, we'll look at how the input data is set up from the C++ side. We are going to use the Damaged Helmet 3D model provided by Khronos. You can find the glTF file here: deps/src/glTF-Sample-Models/2.0/DamagedHelmet/glTF/DamagedHelmet.gltf. Let's get started.

  1. After loading the model vertices using AssImp, we need to load all the textures corresponding to our 3D model:

    GLTexture texAO(GL_TEXTURE_2D,  "DamagedHelmet/glTF/Default_AO.jpg");

    GLTexture texEmissive(GL_TEXTURE_2D,  "DamagedHelmet/glTF/Default_emissive.jpg");

    GLTexture texAlbedo(GL_TEXTURE_2D,  "DamagedHelmet/glTF/Default_albedo.jpg");

    GLTexture texMeR(GL_TEXTURE_2D,  "DamagedHelmet/glTF/Default_metalRoughness.jpg");

    GLTexture texNormal(GL_TEXTURE_2D,  "DamagedHelmet/glTF/Default_normal.jpg");

  2. Textures are bound to OpenGL binding points, starting from 0:

    const GLuint textures[] = { texAO.getHandle(),  texEmissive.getHandle(), texAlbedo.getHandle(),  texMeR.getHandle(), texNormal.getHandle() };

    glBindTextures(  0, sizeof(textures)/sizeof(GLuint), textures);

  3. The environment cube map and convolved irradiance map are loaded and bound to the OpenGL context:

    GLTexture envMap(GL_TEXTURE_CUBE_MAP,  "data/piazza_bologni_1k.hdr");

    GLTexture envMapIrradiance(GL_TEXTURE_CUBE_MAP,  "data/piazza_bologni_1k_irradiance.hdr");

    const GLuint envMaps[] = {  envMap.getHandle(), envMapIrradiance.getHandle() };

    glBindTextures(5, 2, envMaps);

    Check the previous Precomputing irradiance maps and diffuse convolution recipe for details of where it came from.

  4. The BRDF LUT is loaded. Check out the Precomputing BRDF LUTs recipe for all the precalculation details:

    GLTexture brdfLUT(GL_TEXTURE_2D, "data/brdfLUT.ktx");

    glBindTextureUnit(7, brdfLUT.getHandle());

Everything else is just mesh rendering, similar to how it was done in the previous chapter. Let's skip the rest of the C++ code and focus on the GLSL shaders. There are two shaders used to render our PBR model in OpenGL: GL01_PBR.vert and GL01_PBR.frag. The vertex shader does nothing interesting. It uses programmable vertex pulling to read vertex data from the SSBO and passes data further down the graphics pipeline. The fragment shader does the real, actual work. Let's take a look.

  1. As usual, we require per-frame data from the CPU side using a uniform buffer. Texture coordinates, the normal vector, and the fragment's world position are obtained from the vertex shader:

    #version 460 core

    layout(std140, binding = 0) uniform PerFrameData {

      mat4 view;

      mat4 proj;

      vec4 cameraPos;

    };

    layout (location=0) in vec2 tc;

    layout (location=1) in vec3 normal;

    layout (location=2) in vec3 worldPos;

    layout (location=0) out vec4 out_FragColor;

  2. The five textures we loaded in C++ are bound to the 0...4 OpenGL binding points:

    layout (binding = 0) uniform sampler2D texAO;

    layout (binding = 1) uniform sampler2D texEmissive;

    layout (binding = 2) uniform sampler2D texAlbedo;

    layout (binding = 3)  uniform sampler2D texMetalRoughness;

    layout (binding = 4) uniform sampler2D texNormal;

  3. The environment cube map, the prefiltered cube map, and the BRDF LUT are here:

    layout (binding = 5) uniform samplerCube texEnvMap;

    layout (binding = 6)  uniform samplerCube texEnvMapIrradiance;

    layout (binding = 7) uniform sampler2D texBRDF_LUT;

  4. Now, we include a GLSL source file containing a set of PBR calculation routines that are shared between our OpenGL and Vulkan implementations. It does all the heavy lifting, and we will come back to this file in a moment:

    #include <data/shaders/chapter06/PBR.sp>

  5. The main() function starts by fetching all the necessary texture data required for our shading model. Here come ambient occlusion, emissive color, albedo, metallic factor, and roughness. The last two values are packed into a single texture:

    void main() {

      vec4 Kao = texture(texAO, tc);

      vec4 Ke  = texture(texEmissive, tc);

      vec4 Kd  = texture(texAlbedo, tc);

      vec2 MeR = texture(texMetalRoughness, tc).yz;

  6. To calculate the proper normal mapping effect according to the normal map, we evaluate the normal vector per pixel. We do this in world space. The perturbNormal() function calculates the tangent space per pixel using the derivatives of the texture coordinates, and it is implemented in chapter06/PBR.sp. Make sure you check it out. If you want to disable normal mapping and use only per-vertex normals, just comment out the second line here:

      vec3 n = normalize(normal);

      n = perturbNormal(n,    normalize(cameraPos.xyz - worldPos), tc);

  7. Let's fill in the PBRInfo structure, which encapsulates multiple inputs used by the various functions in the PBR shading equation:

      PBRInfo pbrInputs;

      vec3 color = calculatePBRInputsMetallicRoughness(    Kd, n, cameraPos.xyz, worldPos, pbrInputs);

  8. For this demo application, we use only one hardcoded directional light source—(-1, -1, -1). Let's calculate the lighting coming from it:

      color += calculatePBRLightContribution( pbrInputs,    normalize(vec3(-1.0, -1.0, -1.0)), vec3(1.0) );

  9. Now, we should multiply the color by the ambient occlusion factor. Use 1.0 in case there is no ambient occlusion texture available:

      color = color * ( Kao.r < 0.01 ? 1.0 : Kao.r );

  10. Add the emissive color contribution. Make sure the input emissive texture is converted into the linear color space before use. Convert the resulting color back into the standard RGB (sRGB) color space before writing it into the framebuffer:

      color = pow(    SRGBtoLINEAR(Ke).rgb + color, vec3(1.0/2.2) );

      out_FragColor = vec4(color, 1.0);

    };

Let's take a look at the calculations that happen inside chapter06/PBR.sp. Our implementation is based on the reference implementation of glTF 2.0 Sample Viewer from Khronos, which you can find at https://github.com/KhronosGroup/glTF-Sample-Viewer/tree/glTF-WebGL-PBR.

  1. First of all, here is the PBRInfo structure that holds various input parameters for our shading model:

    struct PBRInfo {

      // cos angle between normal and light direction   float NdotL;

      // cos angle between normal and view direction   float NdotV;

      // cos angle between normal and half vector   float NdotH;

      // cos angle between light dir and half vector   float LdotH;

      // cos angle between view dir and half vector   float VdotH;

      // roughness value (input to shader)  float perceptualRoughness;

      // full reflectance color   vec3 reflectance0;  

      // reflectance color at grazing angle   vec3 reflectance90;

      // remapped linear roughness   float alphaRoughness;

      // contribution from diffuse lighting   vec3 diffuseColor;  

      // contribution from specular lighting   vec3 specularColor;

      // normal at surface point   vec3 n;             

      // vector from surface point to camera   vec3 v;             

    };

  2. The sRGB-to-linear color space conversion routine is implemented this way. It is a rough approximation, done for simplicity:

    vec4 SRGBtoLINEAR(vec4 srgbIn) {

      vec3 linOut = pow(srgbIn.xyz,vec3(2.2));

      return vec4(linOut, srgbIn.a);

    }

  3. Here is the calculation of the lighting contribution from an image-based lighting (IBL) source:

    vec3 getIBLContribution(  PBRInfo pbrInputs, vec3 n, vec3 reflection)

    {

      float mipCount =    float(textureQueryLevels(texEnvMap));

      float lod =    pbrInputs.perceptualRoughness * mipCount;

  4. Retrieve a scale and bias to F0 from the BRDF LUT:

      vec2 brdfSamplePoint = clamp(vec2(pbrInputs.NdotV,    1.0-pbrInputs.perceptualRoughness),    vec2(0.0), vec2(1.0));

      vec3 brdf =    textureLod(texBRDF_LUT, brdfSamplePoint, 0).rgb;

  5. This code is reused by both the OpenGL and Vulkan implementations. Convert the cube map coordinates into the Vulkan coordinate space:

    #ifdef VULKAN   vec3 cm = vec3(-1.0, -1.0, 1.0);#else   vec3 cm = vec3(1.0);#endif

  6. Fetch values from the cube maps. No conversion to the linear color space is required since High Dynamic Range (HDR) cube maps are already linear. Besides that, we can directly add diffuse and specular because our precalculated BRDF LUT already takes care of energy conservation:

      vec3 diffuseLight =    texture(texEnvMapIrradience, n.xyz * cm).rgb;

      vec3 specularLight = textureLod(texEnvMap,    reflection.xyz * cm, lod).rgb;

      vec3 diffuse =    diffuseLight * pbrInputs.diffuseColor;

      vec3 specular = specularLight *    (pbrInputs.specularColor * brdf.x + brdf.y);

      return diffuse + specular;

    }

Now, let's go through all the helper functions that are necessary to calculate different parts of the rendering equation.

  1. The diffuseBurley() function implements the diffuse term from the Physically Based Shading at Disney paper by Brent Burley, found at http://blog.selfshadow.com/publications/s2012-shading-course/burley/s2012_pbs_disney_brdf_notes_v3.pdf:

    vec3 diffuseBurley(PBRInfo pbrInputs) {

      float f90 = 2.0 * pbrInputs.LdotH *    pbrInputs.LdotH * pbrInputs.alphaRoughness - 0.5;

      return (pbrInputs.diffuseColor / M_PI) *    (1.0 + f90 * pow((1.0 - pbrInputs.NdotL), 5.0)) *    (1.0 + f90 * pow((1.0 - pbrInputs.NdotV), 5.0));

    }

  2. The next function models the Fresnel reflectance term of the rendering equation, also known as the F term:

    vec3 specularReflection(PBRInfo pbrInputs) {

      return pbrInputs.reflectance0 +    (pbrInputs.reflectance90 - pbrInputs.reflectance0)   * pow(clamp(1.0 - pbrInputs.VdotH, 0.0, 1.0), 5.0);

    }

  3. The geometricOcclusion() function calculates the specular geometric attenuation G, where materials with a higher roughness will reflect less light back to the viewer:

    float geometricOcclusion(PBRInfo pbrInputs) {

      float NdotL = pbrInputs.NdotL;

      float NdotV = pbrInputs.NdotV;

      float rSqr =    pbrInputs.alphaRoughness *    pbrInputs.alphaRoughness;

      float attenuationL = 2.0 * NdotL /    (NdotL + sqrt(rSqr + (1.0 - rSqr) *      (NdotL * NdotL)));

      float attenuationV = 2.0 * NdotV /    (NdotV + sqrt(rSqr + (1.0 - rSqr) *      (NdotV * NdotV)));

      return attenuationL * attenuationV;

    }

  4. The following function models the distribution of microfacet normals D across the area being drawn:

    float microfacetDistribution(PBRInfo pbrInputs) {

      float roughnessSq =    pbrInputs.alphaRoughness *    pbrInputs.alphaRoughness;

      float f = (pbrInputs.NdotH * roughnessSq -    pbrInputs.NdotH) * pbrInputs.NdotH + 1.0;

      return roughnessSq / (M_PI * f * f);

    }

    This implementation is from Average Irregularity Representation of a Rough Surface for Ray Reflection by T. S. Trowbridge and K. P. Reitz.

Before we can calculate the light contribution from a light source, we need to fill in the fields of the PBRInfo structure. The following function does this.

  1. As it is supposed to be in glTF 2.0, roughness is stored in the green channel, while metallic is stored in the blue channel. This layout intentionally reserves the red channel for optional occlusion map data:

    vec3 calculatePBRInputsMetallicRoughness(  vec4 albedo, vec3 normal, vec3 cameraPos,  vec3 worldPos, out PBRInfo pbrInputs)

    {

      float perceptualRoughness = 1.0;

      float metallic = 1.0;

      vec4 mrSample = texture(texMetalRoughness, tc);

      perceptualRoughness =    mrSample.g * perceptualRoughness;

      metallic = mrSample.b * metallic;

      perceptualRoughness =    clamp(perceptualRoughness, 0.04, 1.0);

      metallic = clamp(metallic, 0.0, 1.0);

  2. Roughness is authored as perceptual roughness; by convention, we convert this to material roughness by squaring the perceptual roughness. The albedo may be defined from a base texture or a flat color. Let's compute the specular reflectance in the following way:

      float alphaRoughness =    perceptualRoughness * perceptualRoughness;

      vec4 baseColor = albedo;

      vec3 f0 = vec3(0.04);

      vec3 diffuseColor =    baseColor.rgb * (vec3(1.0) - f0);

      diffuseColor *= 1.0 - metallic;

      vec3 specularColor =    mix(f0, baseColor.rgb, metallic);

      float reflectance = max(max(specularColor.r,    specularColor.g), specularColor.b);

  3. For a typical incident reflectance range between 4% to 100%, we should set the grazing reflectance to 100% for a typical Fresnel effect. For a very low reflectance range on highly diffused objects, below 4%, incrementally reduce the grazing reflectance to 0%:

      float reflectance90 =    clamp(reflectance * 25.0, 0.0, 1.0);

      vec3 specularEnvironmentR0 = specularColor.rgb;

      vec3 specularEnvironmentR90 =    vec3(1.0, 1.0, 1.0) * reflectance90;

      vec3 n = normalize(normal);

      vec3 v = normalize(cameraPos - worldPos);

      vec3 reflection = -normalize(reflect(v, n));

  4. Finally, we should fill in the PBRInfo structure with precalculated values. It will be reused to calculate the contribution of each individual light in the scene:

      pbrInputs.NdotV = clamp(abs(dot(n, v)), 0.001, 1.0);

      pbrInputs.perceptualRoughness = perceptualRoughness;

      pbrInputs.reflectance0 = specularEnvironmentR0;

      pbrInputs.reflectance90 = specularEnvironmentR90;

      pbrInputs.alphaRoughness = alphaRoughness;

      pbrInputs.diffuseColor = diffuseColor;

      pbrInputs.specularColor = specularColor;

      pbrInputs.n = n;

      pbrInputs.v = v;

  5. Calculate the lighting contribution from an IBL source using the getIBLContribution() function:

      vec3 color = getIBLContribution(    pbrInputs, n, reflection);

      return color;

    }

The lighting contribution from a single light source can be calculated in the following way using the precalculated values from PBRInfo.

  1. Here, l is the vector from the surface point to the light source, and h is the half vector between l and v:

    vec3 calculatePBRLightContribution(  inout PBRInfo pbrInputs, vec3 lightDirection,  vec3 lightColor)

    {

      vec3 n = pbrInputs.n;

      vec3 v = pbrInputs.v;

      vec3 l = normalize(lightDirection);

      vec3 h = normalize(l + v);  

      float NdotV = pbrInputs.NdotV;

      float NdotL = clamp(dot(n, l), 0.001, 1.0);

      float NdotH = clamp(dot(n, h), 0.0, 1.0);

      float LdotH = clamp(dot(l, h), 0.0, 1.0);

      float VdotH = clamp(dot(v, h), 0.0, 1.0);

      pbrInputs.NdotL = NdotL;

      pbrInputs.NdotH = NdotH;

      pbrInputs.LdotH = LdotH;

      pbrInputs.VdotH = VdotH;

  2. Calculate the shading terms for the microfacet specular shading model using the helper functions described earlier in this recipe:

      vec3 F = specularReflection(pbrInputs);

      float G = geometricOcclusion(pbrInputs);

      float D = microfacetDistribution(pbrInputs);

  3. Here is the calculation of the analytical lighting contribution:

      vec3 diffuseContrib =    (1.0 - F) * diffuseBurley(pbrInputs);

      vec3 specContrib =    F * G * D / (4.0 * NdotL * NdotV);

  4. Obtain the final intensity as reflectance (BRDF) scaled by the energy of the light using the cosine law:

      vec3 color = NdotL * lightColor *    (diffuseContrib + specContrib);

      return color;

    }

The resulting demo application should render an image like the one shown in the following screenshot. Try also using different PBR glTF 2.0 models:

Figure 6.7 – PBR of the Damaged Helmet glTF 2.0 model

Figure 6.7 – PBR of the Damaged Helmet glTF 2.0 model

There's more...

We've also made a Vulkan version of this app that reuses the same PBR calculation code from PBR.sp. This can be found in Chapter06/VK05_PBR.

The whole area of PBR is vast, and it is possible only to scratch its surface on these half-a-hundred pages. In real life, much more complicated PBR implementations can be created that are built on the requirements of content production pipelines. For an endless source of inspiration for what can be done, we recommend looking into the Unreal Engine source code, which is available for free on GitHub at https://github.com/EpicGames/UnrealEngine/tree/release/Engine/Shaders/Private.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset