This chapter will cover the integration of Physically Based Rendering (PBR) into your graphics pipeline. We use the Graphics Language Transmission Format 2.0 (glTF 2.0) shading model as an example. PBR is not a single specific technique but rather a set of concepts, like using measured surface values and realistic shading models, to accurately represent real-world materials. Adding PBR to your graphics application or retrofitting an existing rendering engine with PBR might be challenging because it requires multiple big steps to be completed and work simultaneously before a correct image can be rendered.
Our goal here is to show how to implement all these steps from scratch. Some of these steps, such as precomputing irradiance maps or bidirectional reflectance distribution function (BRDF) lookup tables (LUTs), require additional tools to be written. We are not going to use any third-party tools here and will show how to implement the entire skeleton of a PBR pipeline from the ground up. Some pre-calculations can be done using general-purpose graphics processing unit (GPGPU) techniques and compute shaders, which will be covered here as well. We assume our readers have some basic understanding of PBR. For those who wish to acquire this knowledge, make sure you read the free book Physically Based Rendering: From Theory To Implementation by Matt Pharr, Wenzel Jakob, and Greg Humphreys, available online at http://www.pbr-book.org/.
In this chapter, we will learn the following recipes:
Here is what it takes to run the code from this chapter on your Linux or Windows PC. You will need a graphics processing unit (GPU) with recent drivers supporting OpenGL 4.6 and Vulkan 1.2. The source code can be downloaded from https://github.com/PacktPublishing/3D-Graphics-Rendering-Cookbook. To run the demo applications from this chapter, you are advised to download and unpack the entire Amazon Lumberyard Bistro dataset from the McGuire Computer Graphics Archive, at http://casual-effects.com/data/index.html. Of course, you can use smaller meshes if you cannot download the 2.4 gigabyte (GB) package.
Before jumping into this chapter, let's learn how to generalize Vulkan application initialization for all of our remaining demos and how to extract common parts of the frame composition code.
The Graphics Library Framework (GLFW) window creation and Vulkan rendering surface initialization are performed in the initVulkanApp function. Let's take a closer look:
struct Resolution {
uint32_t width = 0;
uint32_t height = 0;
};
GLFWwindow* initVulkanApp( int width, int height, Resolution* outResolution = nullptr)
{
glslang_initialize_process();
volkInitialize();
if (!glfwInit() || !glfwVulkanSupported())
exit(EXIT_FAILURE);
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);
if (resolution) {
*resolution = detectResolution(width, height);
width = resolution->width;
height = resolution->height;
}
GLFWwindow* result = glfwCreateWindow( width, height, "VulkanApp", nullptr, nullptr);
if (!result) {
glfwTerminate();
exit(EXIT_FAILURE);
}
return result;
}
Let's take a look at the detectResolution() function. The actual resolution detection happens in glfwGetVideoMode(). For our purposes, we get the parameters of the "primary" monitor. In multi-display configurations, we should properly determine which monitor displays our application; however, this goes beyond the scope of this book. The video-mode information for the selected monitor provides us screen dimensions in pixels. If the provided width or height values are positive, they are used directly. Negative values are treated as a percentage of the screen:
Resolution detectResolution(int width, int height) {
GLFWmonitor* monitor = glfwGetPrimaryMonitor();
if (glfwGetError(nullptr)) exit(EXIT_FAILURE);
const GLFWvidmode* info = glfwGetVideoMode(monitor);
const uint32_t W = width >= 0 ? width : (uint32_t)(info->width * width / -100);
const uint32_t H = height >= 0 ? height : (uint32_t)(info->height * height / -100);
return Resolution{ .width = W, .height = H };
}
To render and present a single frame on the screen, we implement the drawFrame() function, which contains the common frame-composition code refactored from the previous chapters.
bool drawFrame(VulkanRenderDevice& vkDev, const std::function<void(uint32_t)>& updateBuffersFunc, const std::function<void(VkCommandBuffer, uint32_t)>& composeFrameFunc)
{
uint32_t imageIndex = 0;
VkResult result = vkAcquireNextImageKHR( vkDev.device, vkDev.swapchain, 0, vkDev.semaphore, VK_NULL_HANDLE, &imageIndex);
if (result != VK_SUCCESS) return false;
VK_CHECK( vkResetCommandPool( vkDev.device, vkDev.commandPool, 0) );
updateBuffersFunc(imageIndex);
VkCommandBuffer commandBuffer = vkDev.commandBuffers[imageIndex];
const VkCommandBufferBeginInfo bi = { .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, .pNext = nullptr, .flags = VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, .pInheritanceInfo = nullptr };
VK_CHECK(vkBeginCommandBuffer(commandBuffer, &bi));
composeFrameFunc(commandBuffer, imageIndex);
VK_CHECK(vkEndCommandBuffer(commandBuffer));
Next comes the submission of the recorded command buffer to a GPU graphics queue. The code is identical to that in Chapter 3, Getting Started with OpenGL and Vulkan:
const VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
const VkSubmitInfo si = { .sType = VK_STRUCTURE_TYPE_SUBMIT_INFO, .pNext = nullptr, .waitSemaphoreCount = 1, .pWaitSemaphores = &vkDev.semaphore, .pWaitDstStageMask = waitStages, .commandBufferCount = 1, .pCommandBuffers = &vkDev.commandBuffers[imageIndex], .signalSemaphoreCount = 1, .pSignalSemaphores = &vkDev.renderSemaphore };
VK_CHECK(vkQueueSubmit( vkDev.graphicsQueue, 1, &si, nullptr));
const VkPresentInfoKHR pi = { .sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR, .pNext = nullptr, .waitSemaphoreCount = 1, .pWaitSemaphores = &vkDev.renderSemaphore, .swapchainCount = 1, .pSwapchains = &vkDev.swapchain, .pImageIndices = &imageIndex };
VK_CHECK(vkQueuePresentKHR( vkDev.graphicsQueue, &pi));
VK_CHECK(vkDeviceWaitIdle(vkDev.device));
return true;
}
More sophisticated synchronization schemes with multiple in-flight frames can help to gain performance. However, those are beyond the scope of this book.
Up until now, we used only graphics-capable command queues on a Vulkan device. This time, we have to find device queues that are also capable of GPGPU computations. In Vulkan, such queues allow execution of compute shaders, which can read from and write to buffers used in the graphics rendering pipeline. For example, in Chapter 10, Advanced Rendering Techniques and Optimizations, we will show how to implement a GPU frustum culling technique by modifying the indirect rendering buffer introduced in the Indirect rendering in Vulkan recipe from Chapter 5, Working with Geometry Data.
The first thing we need to do to start using compute shaders is to revisit the render device initialization covered in Chapter 3, Getting Started with OpenGL and Vulkan. Check out the Initializing Vulkan instances and graphical device recipe before moving forward.
We add the code to search for a compute-capable device queue and to create a separate command buffer for compute shader workloads. Since the graphics hardware may not provide a separate command queue for arbitrary computations, our device and queue initialization logic must somehow remember if the graphics and compute queues are the same.
Note
On the other hand, using separate Vulkan queues for graphics and compute tasks enables the underlying Vulkan implementation to reduce the amount of work done on the GPU, by making decisions on the device about what sort of work is generated and how this is generated. This is especially important when dealing with GPU-generated commands. Check out the post New: Vulkan Device Generated Commands by Christoph Kubisch from NVIDIA, at https://developer.nvidia.com/blog/new-vulkan-device-generated-commands/.
Let's learn how to do it.
bool useCompute = false;
uint32_t computeFamily;
VkQueue computeQueue;
Since we may want to use more than one device queue, we have to store indices and handles for each of those. This is needed because VkBuffer objects are bound to the device queue at creation time. For example, to use a vertex buffer generated by a compute shader in a graphics pipeline, we have to allocate this VkBuffer object explicitly, specifying the list of queues from which this buffer may be accessed. Later in this recipe, we introduce the createSharedBuffer() routine that explicitly uses these stored queue indices.
Note
Buffers are created with a sharing mode, controlling how they can be accessed from queues. Buffers created using VK_SHARING_MODE_EXCLUSIVE must only be accessed by queues in the queue family that has ownership of the resource. Buffers created using VK_SHARING_MODE_CONCURRENT must only be accessed by queues from the queue families specified through the queueFamilyIndexCount and pQueueFamilyIndices members of the corresponding …CreateInfo structures. Concurrent sharing mode may result in lower performance compared to exclusive mode. Refer to the Vulkan specifications for more details, at https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkSharingMode.html.
std::vector<uint32_t> deviceQueueIndices;
std::vector<VkQueue> deviceQueues;
VkCommandBuffer computeCommandBuffer;
VkCommandPool computeCommandPool;
Now, let's learn how to initialize a rendering device capable of running compute shaders.
bool initVulkanRenderDeviceWithCompute( VulkanInstance& vk, VulkanRenderDevice& vkDev, uint32_t width, uint32_t height, VkPhysicalDeviceFeatures deviceFeatures)
{
vkDev.framebufferWidth = width;
vkDev.framebufferHeight = height;
VK_CHECK(findSuitablePhysicalDevice( vk.instance, &isDeviceSuitable, &vkDev.physicalDevice));
vkDev.graphicsFamily = findQueueFamilies( vkDev.physicalDevice, VK_QUEUE_GRAPHICS_BIT);
vkDev.computeFamily = findQueueFamilies( vkDev.physicalDevice, VK_QUEUE_COMPUTE_BIT);
VK_CHECK(createDeviceWithCompute( vkDev.physicalDevice, deviceFeatures, vkDev.graphicsFamily, vkDev.computeFamily, &vkDev.device));
vkDev.deviceQueueIndices.push_back( vkDev.graphicsFamily);
if (vkDev.graphicsFamily != vkDev.computeFamily)
vkDev.deviceQueueIndices.push_back( vkDev.computeFamily);
vkGetDeviceQueue(vkDev.device, vkDev.graphicsFamily, 0, &vkDev.graphicsQueue);
if (!vkDev.graphicsQueue) exit(EXIT_FAILURE);
vkGetDeviceQueue(vkDev.device, vkDev.computeFamily, 0, &vkDev.computeQueue);
if (!vkDev.computeQueue) exit(EXIT_FAILURE);
VkBool32 presentSupported = 0;
vkGetPhysicalDeviceSurfaceSupportKHR( vkDev.physicalDevice, vkDev.graphicsFamily, vk.surface, &presentSupported);
if (!presentSupported) exit(EXIT_FAILURE);
VK_CHECK(createSwapchain(vkDev.device, vkDev.physicalDevice, vk.surface, vkDev.graphicsFamily, width, height, vkDev.swapchain));
const size_t imageCount = createSwapchainImages( vkDev.device, vkDev.swapchain, vkDev.swapchainImages, vkDev.swapchainImageViews);
vkDev.commandBuffers.resize(imageCount);
VK_CHECK(createSemaphore( vkDev.device, &vkDev.semaphore));
VK_CHECK(createSemaphore( vkDev.device, &vkDev.renderSemaphore));
const VkCommandPoolCreateInfo cpi1 = { .sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, .flags = 0, .queueFamilyIndex = vkDev.graphicsFamily };
VK_CHECK(vkCreateCommandPool(vkDev.device, &cpi1, nullptr, &vkDev.commandPool));
const VkCommandBufferAllocateInfo ai1 = { .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, .pNext = nullptr, .commandPool = vkDev.commandPool, .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY, .commandBufferCount = static_cast<uint32_t>( vkDev.swapchainImages.size()) };
VK_CHECK(vkAllocateCommandBuffers( vkDev.device, &ai1, &vkDev.commandBuffers[0]));
const VkCommandPoolCreateInfo cpi2 = { .sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, .pNext = nullptr, .flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT, .queueFamilyIndex = vkDev.computeFamily };
VK_CHECK(vkCreateCommandPool(vkDev.device, &cpi2, nullptr, &vkDev.computeCommandPool));
const VkCommandBufferAllocateInfo ai2 = { .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, .pNext = nullptr, .commandPool = vkDev.computeCommandPool, .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY, .commandBufferCount = 1, };
VK_CHECK(vkAllocateCommandBuffers( vkDev.device, &ai2, &vkDev.computeCommandBuffer));
vkDev.useCompute = true;
return true;
}
The initVulkanRenderDeviceWithCompute() routine written in the preceding code uses the createDeviceWithCompute() helper function to create a compatible Vulkan device. Let's see how this can be implemented.
VkResult createDeviceWithCompute( VkPhysicalDevice physicalDevice, VkPhysicalDeviceFeatures deviceFeatures, uint32_t graphicsFamily, uint32_t computeFamily, VkDevice* device)
{
const std::vector<const char*> extensions = { VK_KHR_SWAPCHAIN_EXTENSION_NAME };
if (graphicsFamily == computeFamily)
return createDevice(physicalDevice, deviceFeatures, graphicsFamily, device);
const float queuePriorities[2] = { 0.f, 0.f };
const VkDeviceQueueCreateInfo qciGfx = { .sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, .pNext = nullptr, .flags = 0, .queueFamilyIndex = graphicsFamily, .queueCount = 1, .pQueuePriorities = &queuePriorities[0] };
const VkDeviceQueueCreateInfo qciComp = { .sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, .pNext = nullptr, .flags = 0, .queueFamilyIndex = computeFamily, .queueCount = 1, .pQueuePriorities = &queuePriorities[1] };
const VkDeviceQueueCreateInfo qci[] = { qciGfx, qciComp };
const VkDeviceCreateInfo ci = { .sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, .pNext = nullptr, .flags = 0, .queueCreateInfoCount = 2, .pQueueCreateInfos = qci, .enabledLayerCount = 0, .ppEnabledLayerNames = nullptr, .enabledExtensionCount = uint32_t(extensions.size()), .ppEnabledExtensionNames = extensions.data(), .pEnabledFeatures = &deviceFeatures };
return vkCreateDevice( physicalDevice, &ci, nullptr, device);
}
To read the results of compute shaders and store them, we need to create shared VkBuffer instances using the following steps:
bool createSharedBuffer( VulkanRenderDevice& vkDev, VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory)
{
const size_t familyCount = vkDev.deviceQueueIndices.size();
if (familyCount < 2u)
return createBuffer(vkDev.device, vkDev.physicalDevice, size, usage, properties, buffer, bufferMemory);
const VkBufferCreateInfo bufferInfo = { .sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO, .pNext = nullptr, .flags = 0, .size = size, .usage = usage, .sharingMode = (familyCount > 1u) ? VK_SHARING_MODE_CONCURRENT : VK_SHARING_MODE_EXCLUSIVE, .queueFamilyIndexCount = static_cast<uint32_t>(familyCount), .pQueueFamilyIndices = (familyCount > 1u) ? vkDev.deviceQueueIndices.data() : nullptr
};
VK_CHECK(vkCreateBuffer( vkDev.device, &bufferInfo, nullptr, &buffer));
VkMemoryRequirements memRequirements;
vkGetBufferMemoryRequirements( vkDev.device, buffer, &memRequirements);
const VkMemoryAllocateInfo allocInfo = { .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO, .pNext = nullptr, .allocationSize = memRequirements.size, .memoryTypeIndex = findMemoryType( vkDev.physicalDevice, memRequirements.memoryTypeBits, properties)
};
VK_CHECK(vkAllocateMemory(vkDev.device, &allocInfo, nullptr, &bufferMemory));
vkBindBufferMemory( vkDev.device, buffer, bufferMemory, 0);
return true;
}
To execute compute shaders, we require a pipeline object, just as in the case with the graphics rendering. Let's write a function to create a Vulkan compute pipeline object.
VkResult createComputePipeline( VkDevice device, VkShaderModule computeShader, VkPipelineLayout pipelineLayout, VkPipeline* pipeline)
{
VkComputePipelineCreateInfo computePipelineCreateInfo = { .sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO, .pNext = nullptr, .flags = 0, .stage = { .sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER _STAGE_CREATE_INFO, .pNext = nullptr, .flags = 0, .stage = VK_SHADER_STAGE_COMPUTE_BIT, .module = computeShader,
.pName = "main", .pSpecializationInfo = nullptr },
.layout = pipelineLayout, .basePipelineHandle = 0, .basePipelineIndex = 0 };
return vkCreateComputePipelines(device, 0, 1, &computePipelineCreateInfo, nullptr, pipeline);
}
The pipeline layout is created using the same function as for the graphics part. It is worth mentioning that the compute shader compilation process is the same as for other shader stages.
We can now begin using the shaders after device initialization. The descriptor set creation process is the same as with the graphics-related descriptor sets, but the execution of compute shaders requires the insertion of new commands into the command buffer.
bool executeComputeShader( VulkanRenderDevice& vkDev, VkPipeline pipeline, VkPipelineLayout pipelineLayout, VkDescriptorSet ds, uint32_t xSize, uint32_t ySize, uint32_t zSize)
{
VkCommandBuffer commandBuffer = vkDev.computeCommandBuffer;
VkCommandBufferBeginInfo commandBufferBeginInfo = { VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, 0, VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT, 0 };
VK_CHECK(vkBeginCommandBuffer( commandBuffer, &commandBufferBeginInfo));
vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout, 0, 1, &ds, 0, 0);
vkCmdDispatch(commandBuffer, xSize, ySize, zSize);
VkMemoryBarrier readoutBarrier = { .sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER, .pNext = nullptr, .srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT, .dstAccessMask = VK_ACCESS_HOST_READ_BIT };
vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_HOST_BIT, 0, 1, &readoutBarrier, 0, nullptr, 0, nullptr);
VK_CHECK(vkEndCommandBuffer(commandBuffer));
VkSubmitInfo submitInfo = { VK_STRUCTURE_TYPE_SUBMIT_INFO, 0, 0, 0, 0, 1, &commandBuffer, 0, 0 };
VK_CHECK(vkQueueSubmit( vkDev.computeQueue, 1, &submitInfo, 0));
VK_CHECK(vkQueueWaitIdle(vkDev.computeQueue));
return true;
}
We omit the descriptor set creation process here because it depends on what kind of data we want to access in the compute shader. The next recipe shows how to write compute shaders to generate images and vertex buffer contents, which is where a descriptor set will be required.
We are going to use the Vulkan compute shaders functionality later in this chapter in the following recipes: Implementing computed meshes in Vulkan, Generating textures in Vulkan using compute shaders, Precomputing BRDF LUTs, and Precomputing irradiance maps and diffuse convolution.
Before we dive deep into the glTF and PBR implementation code, let's look at some lower-level functionality that will be required to minimize the number of Vulkan descriptor sets in applications that use lots of materials with multiple textures. Descriptor indexing is an extremely useful feature recently added to Vulkan 1.2 and, at the time of writing this book, is already supported on some devices. It allows us to create unbounded descriptor sets and use non-uniform dynamic indexing to access textures inside them. This way, materials can be stored in shader storage buffers and each one can reference all the required textures using integer identifiers (IDs). These IDs can be fetched from a shader storage buffer object (SSBO) and are directly used to index into an appropriate descriptor set that contains all the textures required by our application. Vulkan descriptor indexing is rather similar to the OpenGL bindless textures mechanism and significantly simplifies managing descriptor sets in Vulkan. Let's check out how to use this feature.
The source code for this recipe can be found in Chapter6/VK02_DescriptorIndexing. All the textures we used are stored in the data/explosion folder.
Before we can use the descriptor indexing feature in Vulkan, we need to enable it during the Vulkan device initialization. This process is a little bit verbose, but we will go through it once to show the basic principles. Let's take a look at new fragments of the initVulkan() function to see how it is done.
VkPhysicalDeviceDescriptorIndexingFeaturesEXT physicalDeviceDescriptorIndexingFeatures = { .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE _DESCRIPTOR_INDEXING_FEATURES_EXT, .shaderSampledImageArrayNonUniformIndexing = VK_TRUE, .descriptorBindingVariableDescriptorCount = VK_TRUE, .runtimeDescriptorArray = VK_TRUE,};
const VkPhysicalDeviceFeatures deviceFeatures = { .shaderSampledImageArrayDynamicIndexing = VK_TRUE };
const VkPhysicalDeviceFeatures2 deviceFeatures2 = { .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2, .pNext = &physicalDeviceDescriptorIndexingFeatures, .features = deviceFeatures };
if (!initVulkanRenderDevice2(vk, vkDev, kScreenWidth, kScreenHeight, isDeviceSuitable, deviceFeatures2))
exit(EXIT_FAILURE);
The initialization process for this extension is similar to how we initialized the Vulkan indirect rendering extension in the Indirect rendering in Vulkan recipe from Chapter 5, Working with Geometry Data. The only difference here is that the descriptor indexing feature was added into Vulkan 1.2, hence the different VkPhysicalDeviceFeatures2 structure and a separate initialization function.
Once we have a proper Vulkan device initialized, we can implement a simple flipbook animation using descriptor indexing. Our example application uses three different explosion animations released by Unity Technologies under the liberal Creative Commons (CC0) license (https://blogs.unity3d.com/2016/11/28/free-vfx-image-sequences-flipbooks). Let's look at the steps.
std::vector<std::string> textureFiles;
for (uint32_t j = 0; j < 3; j++) {
for (uint32_t i = 0; i != kNumFlipbookFrames; i++) {
char fname[1024];
snprintf(fname, sizeof(fname), "data/explosion/explosion%02u-frame%03u.tga", j, i+1);
textureFiles.push_back(fname);
}
}
quadRenderer =
std::make_unique<VulkanQuadRenderer>( vkDev, textureFiles);
for (size_t i = 0; i < vkDev.swapchainImages.size(); i++)
fillQuadsBuffer(vkDev, *quadRenderer.get(), i);
VulkanImage nullTexture = { .image = VK_NULL_HANDLE, .imageView = VK_NULL_HANDLE };
clear = std::make_unique<VulkanClear>( vkDev, nullTexture);
finish = std::make_unique<VulkanFinish>( vkDev, nullTexture);
For all the implementation details of VulkanQuadRenderer, check out the shared/vkRenderers/VulkanQuadRenderer.cpp file, which contains mostly Vulkan descriptors initialization and texture-loading code—all its parts were extensively covered in the previous chapters. We will skip it here in the book text and focus on the actual demo application logic and OpenGL Shading Language (GLSL) shaders.
The Chapter6/VK02_DescriptorIndexing application renders an animated explosion every time a user clicks somewhere in the window. Multiple explosions, each using a different flipbook, can be rendered simultaneously. Let's see how to implement them using the following steps.
struct AnimationState {
vec2 position = vec2(0);
double startTime = 0;
uint32_t textureIndex = 0;
uint32_t flipbookOffset = 0;
};
std::vector<AnimationState> animations;
void updateAnimations() {
for (size_t i = 0; i < animations.size();) {
const auto& anim = animations[i];
anim.textureIndex = anim.flipbookOffset + (uint32_t)(kAnimationFPS * ((glfwGetTime() - anim.startTime)));
if (anim.textureIndex - anim.flipbookOffset > kNumFlipbookFrames)
animations.erase(animations.begin() + i);
else i++;
}
glfwSetMouseButtonCallback(window, [](GLFWwindow* window, int button, int action, int mods) {
if (button == GLFW_MOUSE_BUTTON_LEFT && action == GLFW_PRESS) {
float mx = (mouseX/vkDev.framebufferWidth )*2.0f - 1.0f;
float my = (mouseY/vkDev.framebufferHeight)*2.0f - 1.0f;
animations.push_back(AnimationState{ .position = vec2(mx, my), .startTime = glfwGetTime(), .textureIndex = 0, .flipbookOffset = kNumFlipbookFrames * (uint32_t)(rand() % 3) });
}
});
The chapter06/VK02_texture_array.vert vertex shader to render our textured rectangles is presented next.
layout(location = 0) out vec2 out_uv;
layout(location = 1) flat out uint out_texIndex;
struct ImDrawVert {
float x, y, z, u, v;
};
layout(binding = 1) readonly buffer SBO {
ImDrawVert data[];
} sbo;
layout(push_constant) uniform uPushConstant {
vec2 position;
uint textureIndex;
} pc;
void main() {
uint idx = gl_VertexIndex;
ImDrawVert v = sbo.data[idx];
out_uv = vec2(v.u, v.v);
out_texIndex = pc.textureIndex;
gl_Position = vec4(vec2(v.x, v.y) + pc.position, 0.0, 1.0);
}
The chapter06/VK02_texture_array.frag fragment shader is trivial and uses the non-uniform descriptor indexing feature. Let's take a look.
#version 460
#extension GL_EXT_nonuniform_qualifier : require
layout (binding = 2) uniform sampler2D textures[];
layout (location = 0) in vec2 in_uv;
layout (location = 1) flat in uint in_texIndex;
layout (location = 0) out vec4 outFragColor;
void main() {
outFragColor = texture( textures[nonuniformEXT(in_texIndex)], in_uv);
}
We can now run our application. Click a few times in the window to see something similar to this:
Figure 6.1 – Animated explosions using descriptor indexing and texture arrays in Vulkan
While this example passes a texture index into a shader as a push constant, making it uniform, with the GL_EXT_nonuniform_qualifier extension it is possible to store texture indices inside Vulkan buffers in a completely dynamic way. In Chapter 7, Graphics Rendering Pipeline, we will build a material system based around this Vulkan extension, similar to how the bindless textures mechanism in OpenGL is deployed.
In the next recipe, we will show one more useful application of texture arrays.
Another extremely useful application of descriptor indexing is the ability to trivially render multiple textures in ImGui. Up until now, our ImGui renderer was able to use only one single font texture and there was no possibility to render any static images in our UI. To allow backward compatibility with Chapter 4, Adding User Interaction and Productivity Tools, and Chapter 5, Working with Geometry Data, we add a new constructor to the ImGuiRenderer class and modify the addImGuiItem() method in the shared/vkRenderers/VulkanImGui.cpp file. We provide a thorough discussion of the required changes here because, to the best of our knowledge, there is no small down-to-earth tutorial on using multiple textures in the Vulkan ImGui renderer.
Check the previous Using descriptor indexing and texture arrays in Vulkan recipe, to learn how to initialize the descriptor indexing feature.
Let's start with the description of source code changes.
std::vector<VulkanTexture> extTextures_;
ImGuiRenderer::ImGuiRenderer( VulkanRenderDevice& vkDev, const std::vector<VulkanTexture>& textures)
: RendererBase(vkDev, VulkanImage())
, extTextures_(textures)
if (!createColorAndDepthRenderPass(vkDev, false, &renderPass_, RenderPassCreateInfo()) || !createColorAndDepthFramebuffers(vkDev, renderPass_, VK_NULL_HANDLE, swapchainFramebuffers_) || !createUniformBuffers(vkDev, sizeof(mat4)) ||
!createDescriptorPool( vkDev, 1, 2, 1 + textures.size(), &descriptorPool_) ||
!createMultiDescriptorSet(vkDev) ||
!createPipelineLayoutWithConstants(vkDev.device, descriptorSetLayout_, &pipelineLayout_, 0, sizeof(uint32_t)) ||
!createGraphicsPipeline(vkDev, renderPass_, pipelineLayout_, { "data/shaders/chapter04/imgui.vert", "data/shaders/chapter06/imgui_multi.frag" }, &graphicsPipeline_, VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST, true, true, true))
{
printf( "ImGuiRenderer: pipeline creation failed ");
exit(EXIT_FAILURE);
}
This concludes our list of changes to the constructor code. The descriptor set creation code is similar to that of the VulkanQuadRenderer::createDescriptorSet() function, but since we skipped the implementation details at the beginning of this recipe, we describe the complete ImGuiRenderer::createMultiDesriptorSet() method here.
bool ImGuiRenderer::createMultiDescriptorSet( VulkanRenderDevice& vkDev)
{
const std::array<VkDescriptorSetLayoutBinding, 4> bindings = {
descriptorSetLayoutBinding(0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, VK_SHADER_STAGE_VERTEX_BIT),
descriptorSetLayoutBinding(1, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, VK_SHADER_STAGE_VERTEX_BIT),
descriptorSetLayoutBinding(2, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, VK_SHADER_STAGE_VERTEX_BIT),
descriptorSetLayoutBinding(3, VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, VK_SHADER_STAGE_FRAGMENT_BIT, 1 + extTextures_.size())
};
const VkDescriptorSetLayoutCreateInfo layoutInfo = { .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET _LAYOUT_CREATE_INFO, .pNext = nullptr, .flags = 0, .bindingCount = static_cast<uint32_t>(bindings.size()), .pBindings = bindings.data() };
VK_CHECK(vkCreateDescriptorSetLayout(vkDev.device, &layoutInfo, nullptr, &descriptorSetLayout_));
std::vector<VkDescriptorSetLayout> layouts( vkDev.swapchainImages.size(), descriptorSetLayout_);
const VkDescriptorSetAllocateInfo allocInfo = { .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO, .pNext = nullptr, .descriptorPool = descriptorPool_, .descriptorSetCount = static_cast<uint32_t>( vkDev.swapchainImages.size()), .pSetLayouts = layouts.data() };
descriptorSets_.resize( vkDev.swapchainImages.size());
VK_CHECK(vkAllocateDescriptorSets(vkDev.device, &allocInfo, descriptorSets_.data()));
std::vector<VkDescriptorImageInfo> textureDescriptors = { { fontSampler_, font_.imageView, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL } };
for (size_t i = 0; i < extTextures_.size(); i++) textureDescriptors.push_back({ .sampler = extTextures_[i].sampler, .imageView = extTextures_[i].image.imageView, .imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL });
for (size_t i = 0; i < vkDev.swapchainImages.size(); i++) {
VkDescriptorSet ds = descriptorSets_[i];
const VkDescriptorBufferInfo bufferInfo1 = { uniformBuffers_[i], 0, sizeof(mat4) };
const VkDescriptorBufferInfo bufferInfo2 = { storageBuffer_[i], 0, ImGuiVtxBufferSize };
const VkDescriptorBufferInfo bufferInfo3 = { storageBuffer_[i], ImGuiVtxBufferSize, ImGuiIdxBufferSize };
const std::array<VkWriteDescriptorSet, 4> descriptorWrites = {
bufferWriteDescriptorSet(ds, &bufferInfo1, 0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER),
bufferWriteDescriptorSet(ds, &bufferInfo2, 1, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER),
bufferWriteDescriptorSet(ds, &bufferInfo3, 2, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER),
VkWriteDescriptorSet { .sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, .dstSet = descriptorSets_[i], .dstBinding = 3, .dstArrayElement = 0, .descriptorCount = static_cast<uint32_t>( 1 + extTextures_.size()), .descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, .pImageInfo = textureDescriptors.data() },
};
vkUpdateDescriptorSets(vkDev.device, static_cast<uint32_t>(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr);
}
return true;
}
if (textures.size()) {
uint32_t texture = (uint32_t)(intptr_t)pcmd->TextureId;
vkCmdPushConstants(commandBuffer, pipelineLayout, VK_SHADER_STAGE_FRAGMENT_BIT, 0, sizeof(uint32_t), (const void*)&texture);
}
Finally, we implement a new fragment shader in data/shaders/chapter06/imgui_multi.frag, because the vertex shader remains the same.
#version 460
#extension GL_EXT_nonuniform_qualifier : require
layout(location = 0) in vec2 uv;
layout(location = 1) in vec4 color;
layout(location = 0) out vec4 outColor;
layout(binding = 3) uniform sampler2D textures[];
layout(push_constant) uniform pushBlock { uint index; } pushConsts;
void main() {
const uint kDepthTextureMask = 0xFFFF;
uint texType = (pushConsts.index >> 16) & kDepthTextureMask;
uint tex = pushConsts.index & kDepthTextureMask;
vec4 value = texture( textures[nonuniformEXT(tex)], uv);
outColor = (texType == 0) ? (color * value) : vec4(value.rrr, 1.0);
}
Let's look at the C++ counterpart of the code. The hypothetical usage of this new ImGui renderer's functionality can be wrapped in the following helper function. It accepts a window title and a texture ID, which is simply the index in the ImGuiRenderer::extTextures_ array passed to the constructor at creation time.
void imguiTextureWindow( const char* Title, uint32_t texId)
{
ImGui::Begin(Title, nullptr);
ImVec2 vMin = ImGui::GetWindowContentRegionMin();
ImVec2 vMax = ImGui::GetWindowContentRegionMax();
ImGui::Image( (void*)(intptr_t)texId, ImVec2(vMax.x - vMin.x, vMax.y – vMin.y));
ImGui::End();
}
imguiTextureWindow("Some title", textureID);
imguiTextureWindow("Some depth buffer", textureID | 0xFFFF);
In the subsequent chapters, we will show how to use this ability to display intermediate buffers for debugging purposes.
Now that we can initialize and use compute shaders, it is time to give a few examples of how to use these. Let's start with some basic procedural texture generation. In this recipe, we implement a small program to display animated textures whose pixel values are calculated in real time inside our custom compute shader. To add even more value to this recipe, we will port a GLSL shader from https://www.shadertoy.com to our Vulkan compute shader.
The compute pipeline creation code and Vulkan application initialization are the same as in the Initializing compute shaders in Vulkan recipe. Make sure you read this before proceeding further. To use and display the generated texture, we need a textured quad renderer. Its complete source code can be found in shared/vkRenderers/VulkanSingleQuad.cpp. We will not focus on its internals here because, at this point, it should be easy for you to implement such a renderer on your own using the material of the previous chapters. One of the simplest ways to do so would be to modify the ModelRenderer class from shared/vkRenderers/VulkanModelRenderer.cpp and fill the appropriate index and vertex buffers in the class constructor.
The original Industrial Complex shader that we are going to use here to generate a Vulkan texture was created by Gary "Shane" Warne (https://www.shadertoy.com/user/Shane) and can be downloaded from ShaderToy at https://www.shadertoy.com/view/MtdSWS.
Let's start by discussing the process of writing a texture-generating GLSL compute shader. The simplest shader to generate a red-green-blue-alpha (RGBA) image without using any input data outputs an image by using the gl_GlobalInvocationID built-in variable to know which pixel to output. This maps directly to how ShaderToy shaders operate, thus we can transform them into a compute shader just by adding some input and output (I/O) parameters and layout modifiers specific to compute shaders and Vulkan. Let's take a look at a minimalistic compute shader that creates a red-green gradient texture.
layout (local_size_x = 16, local_size_y = 16) in;
layout (binding = 0, rgba8) uniform writeonly image2D result;
void main()
{
ivec2 dim = imageSize(result);
vec2 uv = vec2(gl_GlobalInvocationID.xy) / dim;
imageStore(result, ivec2(gl_GlobalInvocationID.xy), vec4(uv, 0.0, 1.0));
}
Now, the preceding example is rather limited, and all you get is a red-and-green gradient image. Let's change it a little bit to use the actual shader code from ShaderToy. The compute shader that renders a Vulkan version of the Industrial Complex shader from ShaderToy, available via the following Uniform Resource Locator (URL), https://shadertoy.com/view/MtdSWS, can be found in the shaders/chapter06/VK03_compute_texture.comp file.
void mainImage(out vec4 fragColor, in vec2 fragCoord)
vec4 mainImage(in vec2 fragCoord)
Don't forget to add an appropriate return statement at the end.
void main()
{
ivec2 dim = imageSize(result);
vec2 uv = vec2(gl_GlobalInvocationID.xy) / dim;
imageStore(result, ivec2(gl_GlobalInvocationID.xy), mainImage(uv*dim));
}
layout(push_constant) uniform uPushConstant {
float time;
} pc;
vec2 iResolution = vec2( 1280.0, 720.0 );
float iTime = pc.time;
Important note
The GLSL imageSize() function can be used to obtain the iResolution value based on the actual size of our texture. We leave this as an exercise for the reader.
void insertComputedImageBarrier( VkCommandBuffer commandBuffer, VkImage image)
{
const VkImageMemoryBarrier barrier = { .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER, .srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT, .dstAccessMask = VK_ACCESS_SHADER_READ_BIT, .oldLayout = VK_IMAGE_LAYOUT_GENERAL, .newLayout = VK_IMAGE_LAYOUT_GENERAL, .image = image, .subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 } };
vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier);
}
The running application should render an image like the one shown in the following screenshot, which is similar to the output of https://www.shadertoy.com/view/MtdSWS:
Figure 6.2 – Using compute shaders to generate textures
In the next recipe, we will continue learning the Vulkan compute pipeline and implement a mesh-generation compute shader.
In the Initializing compute shaders in Vulkan recipe, we learned how to initialize the compute pipeline in Vulkan. We are going to need it in this chapter to implement a BRDF precomputation tool for our PBR pipeline. But before that, let's learn a few simple and interesting ways to use compute shaders in Vulkan and combine this feature with mesh geometry generation on the GPU.
We are going to run a compute shader to create triangulated geometry of a three-dimensional (3D) torus knot shape with different P and Q parameters.
Important note
A torus knot is a special kind of knot that lies on the surface of an unknotted torus in 3D space. Each torus knot is specified by a pair of p and q coprime integers. You can read more on this at https://en.wikipedia.org/wiki/Torus_knot.
The data produced by the compute shader is stored in a shader storage buffer and used in a vertex shader in a typical programmable-vertex-fetch way. To make the results more visually pleasing, we will implement real-time morphing between two different torus knots controllable from an ImGui widget. Let's get started.
The source code for this example is located in Chapter6/VK04_ComputeMesh.
The application consists of three different parts: the C++ part, which drives the UI and Vulkan commands, the mesh-generation compute shader, and the rendering pipeline with simple vertex and fragment shaders. The C++ part in Chapter6/VK04_ComputeMesh/src/main.cpp is rather short, so let's tackle this first.
std::deque<std::pair<uint32_t, uint32_t>> morphQueue = { { 5, 8 }, { 5, 8 } };
float morphCoef = 0.0f;
float animationSpeed = 1.0f;
const uint32_t numU = 1024;
const uint32_t numV = 1024;
struct MeshUniformBuffer {
float time;
uint32_t numU;
uint32_t numV;
float minU, maxU;
float minV, maxV;
uint32_t p1, p2;
uint32_t q1, q2;
float morph;
} ubo;
void generateIndices(uint32_t* indices) {
for (uint32_t j = 0 ; j < numV - 1 ; j++) {
for (uint32_t i = 0 ; i < numU - 1 ; i++) {
uint32_t offset = (j * (numU - 1) + i) * 6;
uint32_t i1 = (j + 0) * numU + (i + 0); uint32_t i2 = (j + 0) * numU + (i + 1); uint32_t i3 = (j + 1) * numU + (i + 1); uint32_t i4 = (j + 1) * numU + (i + 0);
indices[offset + 0] = i1; indices[offset + 1] = i2; indices[offset + 2] = i4; indices[offset + 3] = i2; indices[offset + 4] = i3; indices[offset + 5] = i4;
}
}
}
Besides that, our C++ initialization part is in the initMesh() function. This allocates all the necessary buffers, uploads indices data into the GPU, loads compute shaders for texture and mesh generation, and creates two model renderers, one for a textured mesh and another for a colored one.
void initMesh() {
std::vector<uint32_t> indicesGen( (numU - 1) * (numV - 1) * 6);
generateIndices(indicesGen.data());
uint32_t vertexBufferSize = 12 * sizeof(float) * numU * numV;
uint32_t indexBufferSize = 6 * sizeof(uint32_t) * (numU-1) * (numV-1);
uint32_t bufferSize = vertexBufferSize + indexBufferSize;
imgGen = std::make_unique<ComputedImage>(vkDev, "data/shaders/chapter06/VK04_compute_texture.comp", 1024, 1024, false);
meshGen = std::make_unique<ComputedVertexBuffer>( vkDev, "data/shaders/chapter06/VK04_compute_mesh.comp", indexBufferSize, sizeof(MeshUniformBuffer), 12 * sizeof(float), numU * numV);
VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;
createBuffer(vkDev.device, vkDev.physicalDevice, bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
void* data = nullptr;
vkMapMemory(vkDev.device, stagingBufferMemory, 0, bufferSize, 0, &data);
memcpy((void*)((uint8_t*)data + vertexBufferSize), indicesGen.data(), indexBufferSize);
vkUnmapMemory(vkDev.device, stagingBufferMemory);
copyBuffer(vkDev, stagingBuffer, meshGen->computedBuffer, bufferSize);
Note
More examples of staging buffers can be found in the Using texture data in Vulkan recipe from Chapter 3, Getting Started with OpenGL and Vulkan.
vkDestroyBuffer( vkDev.device, stagingBuffer, nullptr);
vkFreeMemory( vkDev.device, stagingBufferMemory, nullptr);
meshGen->fillComputeCommandBuffer();
meshGen->submit();
vkDeviceWaitIdle(vkDev.device);
Last but not least, let's create two model renderers.
std::vector<const char*> shaders = { "data/shaders/chapter06/VK04_render.vert", "data/shaders/chapter06/VK04_render.frag" };
mesh = std::make_unique<ModelRenderer>(vkDev, true, meshGen->computedBuffer, meshGen->computedMemory, vertexBufferSize, indexBufferSize, imgGen->computed, imgGen->computedImageSampler, shaders, (uint32_t)sizeof(mat4), true);
std::vector<const char*> shadersColor = {"data/shaders/chapter06/VK04_render.vert", "data/shaders/chapter06/VK04_render_color.frag"};
meshColor = std::make_unique<ModelRenderer>(vkDev, true, meshGen->computedBuffer, meshGen->computedMemory, vertexBufferSize, indexBufferSize, imgGen->computed, imgGen->computedImageSampler, shadersColor, (uint32_t)sizeof(mat4), true, mesh->getDepthTexture(), false);
}
Now, we need our chapter06/VK04_compute_mesh.comp mesh-generation compute shader.
#version 440
layout (local_size_x = 2, local_size_y = 1, local_size_z = 1) in;
struct VertexData {
vec4 pos, tc, norm;
};
layout (binding = 0) buffer VertexBuffer {
VertexData vertices[];
} vbo;
layout (binding = 1) uniform UniformBuffer {
float time;
uint numU, numV;
float minU, maxU, minV, maxV;
uint P1, P2, Q1, Q2;
float morph;
} ubo;
x = r * cos(u)
y = r * sin(u)
z = -sin(v)
VertexData torusKnot(vec2 uv, vec2 pq) {
const float p = pq.x;
const float q = pq.y;
const float baseRadius = 5.0;
const float segmentRadius = 3.0;
const float tubeRadius = 0.5;
float ct = cos(uv.x);
float st = sin(uv.x);
float qp = q / p;
float qps = qp * segmentRadius;
float arg = uv.x * qp;
float sqp = sin(arg);
float cqp = cos(arg);
float BSQP = baseRadius + segmentRadius * cqp;
float dxdt = -qps * sqp * ct - st * BSQP;
float dydt = -qps * sqp * st + ct * BSQP;
float dzdt = qps * cqp;
vec3 r = vec3(BSQP * ct, BSQP * st, segmentRadius * sqp);
vec3 drdt = vec3(dxdt, dydt, dzdt);
vec3 v1 = normalize(cross(r, drdt));
vec3 v2 = normalize(cross(v1, drdt));
float cv = cos(uv.y);
float sv = sin(uv.y);
VertexData res;
res.pos = vec4(r+tubeRadius*(v1 * sv + v2 * cv), 1);
res.norm = vec4(cross(v1 * cv - v2 * sv, drdt ), 0);
return res;
}
mat3 rotY(float angle) {
float c = cos(angle), s = sin(angle);
return mat3(c, 0, -s, 0, 1, 0, s, 0, c);
}
mat3 rotZ(float angle) {
float c = cos(angle), s = sin(angle);
return mat3(c, -s, 0, s, c, 0, 0, 0, 1);
}
Using the aforementioned helpers, the main() function of our compute shader is now straightforward, and the only interesting thing worth mentioning here is the real-time morphing that blends two torus knots with different P and Q parameters. This is pretty easy because the total number of vertices always remains the same. Let's take a closer look.
void main() {
uint index = gl_GlobalInvocationID.x;
vec2 numUV = vec2(ubo.numU, ubo.numV);
vec2 ij = vec2(float(index / ubo.numU), float(index % ubo.numU));
const vec2 maxUV1 = 2.0 * 3.1415926 * vec2(ubo.P1, 1.0);
vec2 uv1 = ij * maxUV1 / (numUV - vec2(1));
const vec2 maxUV2 = 2.0 * 3.1415926 * vec2(ubo.P2, 1.0);
vec2 uv2 = ij * maxUV2 / (numUV - vec2(1));
Note
Refer to the https://en.wikipedia.org/wiki/Torus_knot Wikipedia page for additional explanation of the math details.
mat3 modelMatrix = rotY(0.5 * ubo.time) * rotZ(0.5 * ubo.time);
VertexData v1 = torusKnot(uv1, vec2(ubo.P1, ubo.Q1));
VertexData v2 = torusKnot(uv2, vec2(ubo.P2, ubo.Q2));
vec3 pos = mix(v1.pos.xyz, v2.pos.xyz, ubo.morph);
vec3 norm = mix(v1.norm.xyz, v2.norm.xyz, ubo.morph);
VertexData vtx;
vtx.pos = vec4(modelMatrix * pos, 1);
vtx.tc = vec4(ij / numUV, 0, 0);
vtx.norm = vec4(modelMatrix * norm, 0);
vbo.vertices[index] = vtx;
}
Both the vertex and fragment shaders used to render this mesh are trivial and can be found in chapter06/VK04_render.vert, chapter06/VK04_render.frag, and chapter06/VK04_render_color.frag. Feel free to take a look yourself, as we are not going to copy and paste them here.
The demo application will produce a variety of torus knots similar to the one shown in the following screenshot. Each time you select a new pair of P-Q parameters from the UI, the morphing animation will kick in and transform one knot into another. Checking the Use colored mesh box will apply colors to the mesh instead of a computed texture:
Figure 6.3 – Computed mesh with real-time animation
In this recipe, all the synchronization between the mesh-generation process and rendering was done using vkDeviceWaitIdle(), essentially making these two processes completely serial and inefficient. While this is acceptable for the purpose of showing a single feature in a standalone demo app, in a real-world application a fine-grained synchronization would be desirable to allow mesh generation and rendering to run—at least partially—in parallel. Check out the guide on Vulkan synchronization from Khronos for useful insights on how to do this: https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples.
Now, let's switch back to the main topic of this chapter and learn how to precompute BRDF LUTs for PBR rendering using compute shaders.
In the previous recipes, we learned how to initialize compute pipelines in Vulkan and demonstrated the basic functionality of compute shaders. Let's switch gears back to PBR and learn how to precompute the Smith GGX BRDF LUT. To render a PBR image, we have to evaluate the BRDF at each point based on surface properties and viewing direction. This is computationally expensive, and many real-time implementations, including the reference glTF-Sample-Viewer implementation from Khronos, use precalculated tables of some sort to find the BRDF value based on surface roughness and viewing direction. A BRDF LUT can be stored as a 2D texture where the x axis corresponds to the dot product between the surface normal vector and the viewing direction, and the y axis corresponds to the 0...1. surface roughness. Each texel stores two 16-bit floating-point values—namely, a scale and bias to F0, which is the specular reflectance at normal incidence.
Important note
In this recipe, we focus purely on details of a minimalistic implementation and do not touch on any math behind it. For those interested in the math behind this approach, check out the Environment BRDF section from the Real Shading in Unreal Engine 4 presentation by Brian Karis at https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf.
We are going to use Vulkan to calculate this texture on the GPU and implement a compute shader to do this.
It would be helpful to revisit the compute pipeline creation from the Initializing compute shaders in Vulkan recipe. Our implementation is based on https://github.com/SaschaWillems/Vulkan-glTF-PBR/blob/master/data/shaders/genbrdflut.frag, which runs the same computations in a fragment shader. Our compute shader can be found in data/shaders/chapter06/VK01_BRDF_LUT.comp.
Before we look into the shader code, let's implement a generic class to process data arrays on the GPU.
To manipulate data buffers on the GPU and use the data, we need three basic operations: upload data from the host memory into a GPU buffer, download data from a GPU buffer to the host memory, and run a compute shader workload on that buffer. The data uploading and downloading process consists of mapping the GPU memory to the host address space and then using memcpy() to transfer buffer contents.
void uploadBufferData(const VulkanRenderDevice& vkDev, VkDeviceMemory& bufferMemory, VkDeviceSize deviceOffset, const void* data, const size_t dataSize)
{
void* mappedData = nullptr;
vkMapMemory(vkDev.device, bufferMemory, deviceOffset, dataSize, 0, &mappedData);
memcpy(mappedData, data, dataSize);
vkUnmapMemory(vkDev.device, bufferMemory);
}
void downloadBufferData(VulkanRenderDevice& vkDev, VkDeviceMemory& bufferMemory, VkDeviceSize deviceOffset, void* outData, const size_t dataSize)
{
void* mappedData = nullptr;
vkMapMemory(vkDev.device, bufferMemory, deviceOffset, dataSize, 0, &mappedData);
memcpy(outData, mappedData, dataSize);
vkUnmapMemory(vkDev.device, bufferMemory);
}
Note
The downloadBufferData() function should be called only after a corresponding memory barrier was executed in the compute command queue. Make sure to read the Initializing compute shaders in Vulkan recipe and the source code of executeComputeShader() for more details.
We now have two helper functions to move data around from the CPU to GPU buffers, and vice versa. Recall that in the Initializing compute shaders in Vulkan recipe, we implemented the executeComputeShader() function, which starts the compute pipeline. Let's focus on data manipulation on the GPU.
class ComputeBase {
void uploadInput(uint32_t offset, void* inData, uint32_t byteCount) {
uploadBufferData(vkDev, inBufferMemory, offset, inData, byteCount);
}
void downloadOutput(uint32_t offset, void* outData, uint32_t byteCount) {
downloadBufferData(vkDev, outBufferMemory, offset, outData, byteCount);
}
bool execute(uint32_t xSize, uint32_t ySize, uint32_t zSize) {
return executeComputeShader(vkDev, pipeline, pipelineLayout, descriptorSet, xSize, ySize, zSize);
}
ComputeBase::ComputeBase(VulkanRenderDevice& vkDev, const char* shaderName, uint32_t inputSize, uint32_t outputSize)
: vkDev(vkDev)
{
createSharedBuffer(vkDev, inputSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, inBuffer, inBufferMemory);
createSharedBuffer(vkDev, outputSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, outBuffer, outBufferMemory);
ShaderModule s;
createShaderModule(vkDev.device, &s, shaderName);
createComputeDescriptorSetLayout( vkDev.device, &dsLayout);
createPipelineLayout( vkDev.device, dsLayout, &pipelineLayout);
createComputePipeline(vkDev.device, s.shaderModule, pipelineLayout, &pipeline);
createComputeDescriptorSet(vkDev.device, dsLayout);
vkDestroyShaderModule( vkDev.device, s.shaderModule, nullptr);
}
As we might suspect, the longest method is the createComputeDescriptorSet() dreaded descriptor set creation function. Let's take a closer look at the steps.
bool ComputeBase::createComputeDescriptorSet( VkDevice device, VkDescriptorSetLayout descriptorSetLayout)
{
VkDescriptorPoolSize descriptorPoolSize = { VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 2 };
VkDescriptorPoolCreateInfo descriptorPoolCreateInfo = { VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO, 0, 0, 1, 1, &descriptorPoolSize };
VK_CHECK(vkCreateDescriptorPool(device, &descriptorPoolCreateInfo, 0, &descriptorPool));
VkDescriptorSetAllocateInfo descriptorSetAllocateInfo = { VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO, 0, descriptorPool, 1, &descriptorSetLayout };
VK_CHECK(vkAllocateDescriptorSets(device, &descriptorSetAllocateInfo, &descriptorSet));
VkDescriptorBufferInfo inBufferInfo = { inBuffer, 0, VK_WHOLE_SIZE };
VkDescriptorBufferInfo outBufferInfo = { outBuffer, 0, VK_WHOLE_SIZE };
VkWriteDescriptorSet writeDescriptorSet[2] = { { VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, 0, descriptorSet, 0, 0, 1, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 0, &inBufferInfo, 0},
{ VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET, 0, descriptorSet, 1, 0, 1, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 0, &outBufferInfo, 0} };
vkUpdateDescriptorSets( device, 2, writeDescriptorSet, 0, 0);
return true;
}
The preceding ComputeBase class is used directly in our Chapter6/VK01_BRDF_LUT tool. The entire computational heavy lifting to precalculate a BRDF LUT is done in a compute shader. Let's look inside the data/chapter06/VK01_BRDF_LUT.comp GLSL code:
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout (constant_id = 0) const uint NUM_SAMPLES = 1024u;
layout (set = 0, binding = 0) buffer SRC { float data[]; } src;
layout (set = 0, binding = 1) buffer DST { float data[]; } dst;
const uint BRDF_W = 256;
const uint BRDF_H = 256;
const float PI = 3.1415926536;
void main() {
vec2 uv;
uv.x = float(gl_GlobalInvocationID.x) / float(BRDF_W);
uv.y = float(gl_GlobalInvocationID.y) / float(BRDF_H);
vec2 v = BRDF(uv.x, 1.0 - uv.y);
uint offset = gl_GlobalInvocationID.y * BRDF_W + gl_GlobalInvocationID.x;
dst.data[offset * 2 + 0] = v.x;
dst.data[offset * 2 + 1] = v.y;
}
Now that we have described some mandatory compute shader parts, we can see how the BRDF LUT items are calculated. Technically, we calculate the integral value over the hemisphere using the Monte Carlo integration procedure. Let's look at the steps.
vec2 hammersley2d(uint i, uint N) {
uint bits = (i << 16u) | (i >> 16u);
bits = ((bits&0x55555555u) << 1u) | ((bits&0xAAAAAAAAu) >> 1u);
bits = ((bits&0x33333333u) << 2u) | ((bits&0xCCCCCCCCu) >> 2u);
bits = ((bits&0x0F0F0F0Fu) << 4u) | ((bits&0xF0F0F0F0u) >> 4u);
bits = ((bits&0x00FF00FFu) << 8u) | ((bits&0xFF00FF00u) >> 8u);
float rdi = float(bits) * 2.3283064365386963e-10;
return vec2(float(i) /float(N), rdi);
}
Important note
The code is based on the following post: http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html. The bit-shifting magic for this application and many other applications is thoroughly described in Henry J. Warren's book called Hacker's Delight. Interested readers may also look up the Van der Corput sequence to see why this can be used as random directions on the hemisphere.
float random(vec2 co) {
float a = 12.9898;
float b = 78.233;
float c = 43758.5453;
float dt= dot(co.xy ,vec2(a,b));
float sn= mod(dt, PI);
return fract(sin(sn) * c);
}
Note
Check out this link to find some useful details about this code: http://byteblacksmith.com/improvements-to-the-canonical-one-liner-glsl-rand-for-opengl-es-2-0/.
vec3 importanceSample_GGX( vec2 Xi, float roughness, vec3 normal)
{
float alpha = roughness * roughness;
float phi = 2.0 * PI * Xi.x + random(normal.xz) * 0.1;
float cosTheta = sqrt((1.0-Xi.y)/(1.0+(alpha*alpha-1.0)*Xi.y));
float sinTheta = sqrt(1.0 - cosTheta * cosTheta);
vec3 H = vec3( sinTheta*cos(phi), sinTheta*sin(phi), cosTheta);
vec3 up = abs(normal.z) < 0.999 ? vec3(0.0, 0.0, 1.0) : vec3(1.0, 0.0, 0.0);
vec3 tangentX = normalize(cross(up, normal));
vec3 tangentY = normalize(cross(normal, tangentX));
return normalize( tangentX*H.x + tangentY*H.y + normal*H.z);
}
float G_SchlicksmithGGX( float dotNL, float dotNV, float roughness)
{
float k = (roughness * roughness) / 2.0;
float GL = dotNL / (dotNL * (1.0 - k) + k);
float GV = dotNV / (dotNV * (1.0 - k) + k);
return GL * GV;
}
vec2 BRDF(float NoV, float roughness)
{
const vec3 N = vec3(0.0, 0.0, 1.0);
vec3 V = vec3(sqrt(1.0 - NoV*NoV), 0.0, NoV);
vec2 LUT = vec2(0.0);
for(uint i = 0u; i < NUM_SAMPLES; i++) {
vec2 Xi = hammersley2d(i, NUM_SAMPLES);
vec3 H = importanceSample_GGX(Xi, roughness, N);
vec3 L = 2.0 * dot(V, H) * H - V;
float dotNL = max(dot(N, L), 0.0);
float dotNV = max(dot(N, V), 0.0);
float dotVH = max(dot(V, H), 0.0);
float dotNH = max(dot(H, N), 0.0);
if (dotNL > 0.0) {
float G = G_SchlicksmithGGX(dotNL, dotNV, roughness);
float G_Vis = (G * dotVH) / (dotNH * dotNV);
float Fc = pow(1.0 - dotVH, 5.0);
LUT += vec2((1.0 - Fc) * G_Vis, Fc * G_Vis);
}
}
return LUT / float(NUM_SAMPLES);
}
The C++ part of the project is trivial and just runs the compute shader, saving all the results into the data/brdfLUT.ktx file using the OpenGL Image (GLI) library. You can use Pico Pixel (https://pixelandpolygon.com) to view the generated image. It should look like the image shown in the following screenshot:
Figure 6.4 – BRDF LUT
This concludes the BRDF LUT tool description. We will need yet another tool to calculate an irradiance cubemap from an environment cube map, which we will cover next.
The method described previously can be used to precompute BRDF LUTs using high-quality Monte Carlo integration and store them as textures. Dependent texture fetches can be expensive on some mobile platforms. There is an interesting runtime approximation used in Unreal Engine that does not rely on any precomputation, as described in https://www.unrealengine.com/en-US/blog/physically-based-shading-on-mobile. Here is the GLSL source code:
vec3 EnvBRDFApprox( vec3 specularColor, float roughness, float NoV )
{
const vec4 c0 = vec4(-1, -0.0275, -0.572, 0.022);
const vec4 c1 = vec4( 1, 0.0425, 1.04, -0.04);
vec4 r = roughness * c0 + c1;
float a004 = min( r.x * r.x, exp2(-9.28 * NoV) ) * r.x + r.y;
vec2 AB = vec2( -1.04, 1.04 ) * a004 + r.zw;
return specularColor * AB.x + AB.y;
}
The second part of the split sum approximation necessary to calculate the glTF2 physically based shading model comes from the irradiance cube map, which is precalculated by convolving the input environment cube map with the GGX distribution of our shading model.
Check out the source code for this recipe in Chapter6/Util01_FilterEnvmap. If you want to dive deep into the math theory behind these computations, make sure you read Brian Karis's paper at https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf.
This code is written for simplicity rather than for speed or precision, so it does not use importance sampling and convolves the input cube map using simple Monte Carlo integration and the Hammersley sequence to generate uniformly distributed 2D points on an equirectangular projection of our input cube map.
The source code can be found in the Chapter6/Util01_FilterEnvmap/src/main.cpp file. Let's quickly go through the steps to cover the entire process.
float radicalInverse_VdC(uint32_t bits) {
bits = (bits << 16u) | (bits >> 16u);
bits = ((bits&0x55555555u) << 1u) | ((bits&0xAAAAAAAAu) >> 1u);
bits = ((bits&0x33333333u) << 2u) | ((bits&0xCCCCCCCCu) >> 2u);
bits = ((bits&0x0F0F0F0Fu) << 4u) | ((bits&0xF0F0F0F0u) >> 4u);
bits = ((bits&0x00FF00FFu) << 8u) | ((bits&0xFF00FF00u) >> 8u);
return float(bits) * 2.3283064365386963e-10f;
}
vec2 hammersley2d(uint32_t i, uint32_t N) {
return vec2( float(i)/float(N), radicalInverse_VdC(i) );
}
Using this random points generator, we can finally convolve the cube map. For simplicity, our code supports only equirectangular projections where the width is twice the height of the image. Here are the steps.
void convolveDiffuse(const vec3* data, int srcW, int srcH, int dstW, int dstH, vec3* output, int numMonteCarloSamples)
{
assert(srcW == 2 * srcH);
if (srcW != 2 * srcH) return;
std::vector<vec3> tmp(dstW * dstH);
stbir_resize_float_generic( reinterpret_cast<const float*>(data), srcW, srcH, 0, reinterpret_cast<float*>(tmp.data()), dstW, dstH, 0, 3, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_FILTER_CUBICBSPLINE, STBIR_COLORSPACE_LINEAR, nullptr);
const vec3* scratch = tmp.data();
srcW = dstW;
srcH = dstH;
for (int y = 0; y != dstH; y++)
{
const float theta1 = float(y) / float(dstH) * Math::PI;
for (int x = 0; x != dstW; x++)
{
const float phi1 = float(x) / float(dstW) * Math::TWOPI;
const vec3 V1 = vec3(sin(theta1) * cos(phi1), sin(theta1) * sin(phi1), cos(theta1));
vec3 color = vec3(0.0f);
float weight = 0.0f;
for (int i = 0; i != numMonteCarloSamples; i++)
{
const vec2 h = hammersley2d(i, numMonteCarloSamples);
const int x1 = int(floor(h.x * srcW));
const int y1 = int(floor(h.y * srcH));
const float theta2 = float(y1) / float(srcH) * Math::PI;
const float phi2 = float(x1) / float(srcW) * Math::TWOPI;
const vec3 V2 = vec3(sin(theta2) * cos(phi2), sin(theta2) * sin(phi2), cos(theta2));
const float NdotL = std::max(0.0f, glm::dot(V1, V2));
if (NdotL > 0.01f) {
color += scratch[y1 * srcW + x1] * NdotL;
weight += NdotL;
}
}
output[y * dstW + x] = color / weight;
}
}
}
The remaining part of the code is purely mechanical work, such as loading the cube map image from the file, invoking the convolveDiffuse() function, and saving the result using the STB library. Let's check out the results of prefiltering for the input image shown in the following screenshot:
Figure 6.5 – Environment cube map
The convolved image should look like this:
Figure 6.6 – Prefiltered environment cube map using diffuse convolution
There's one more fly in the ointment of the approximations already mentioned in this recipe. Technically, we should have a separate convolution for each different BRDF. This is, however, not practical in terms of storage, memory, and performance on mobile. It is wrong but good enough.
We now have all supplementary parts in place to render a PBR image. In the next Implementing the glTF2 shading model recipe, we are going to put everything together into a simple application to render a physically based glTF2 3D model.
Paul Bourke created a set of tools and a great resource explaining how to convert cube maps between different formats. Make sure to check it out at http://paulbourke.net/panorama/cubemaps/index.html.
This recipe will cover how to integrate a PBR into your graphics pipeline. Since the topic of PBR rendering is vast, we focus on a minimalistic implementation just to guide you and get you started. In the book text right here, we focus on the GLSL shader code for the PBR shading model and use OpenGL to make things simpler. However, the source code bundle for this book contains a relatively small Vulkan implementation that reuses the same GLSL code. Indeed, rendering a physically based image is nothing more than running a fancy pixel shader with a set of textures.
It is recommended to read about glTF 2.0 before you proceed with this recipe. A lightweight introduction to the glTF 2.0 shading model can be found at https://github.com/KhronosGroup/glTF-Sample-Viewer/tree/glTF-WebGL-PBR.
The C++ source code for this recipe is in the Chapter6/GL01_PBR folder. The GLSL shader code responsible for PBR calculations can be found in data/shaders/chapter06/PBR.sp.
Before we dive deep into the GLSL code, we'll look at how the input data is set up from the C++ side. We are going to use the Damaged Helmet 3D model provided by Khronos. You can find the glTF file here: deps/src/glTF-Sample-Models/2.0/DamagedHelmet/glTF/DamagedHelmet.gltf. Let's get started.
GLTexture texAO(GL_TEXTURE_2D, "DamagedHelmet/glTF/Default_AO.jpg");
GLTexture texEmissive(GL_TEXTURE_2D, "DamagedHelmet/glTF/Default_emissive.jpg");
GLTexture texAlbedo(GL_TEXTURE_2D, "DamagedHelmet/glTF/Default_albedo.jpg");
GLTexture texMeR(GL_TEXTURE_2D, "DamagedHelmet/glTF/Default_metalRoughness.jpg");
GLTexture texNormal(GL_TEXTURE_2D, "DamagedHelmet/glTF/Default_normal.jpg");
const GLuint textures[] = { texAO.getHandle(), texEmissive.getHandle(), texAlbedo.getHandle(), texMeR.getHandle(), texNormal.getHandle() };
glBindTextures( 0, sizeof(textures)/sizeof(GLuint), textures);
GLTexture envMap(GL_TEXTURE_CUBE_MAP, "data/piazza_bologni_1k.hdr");
GLTexture envMapIrradiance(GL_TEXTURE_CUBE_MAP, "data/piazza_bologni_1k_irradiance.hdr");
const GLuint envMaps[] = { envMap.getHandle(), envMapIrradiance.getHandle() };
glBindTextures(5, 2, envMaps);
Check the previous Precomputing irradiance maps and diffuse convolution recipe for details of where it came from.
GLTexture brdfLUT(GL_TEXTURE_2D, "data/brdfLUT.ktx");
glBindTextureUnit(7, brdfLUT.getHandle());
Everything else is just mesh rendering, similar to how it was done in the previous chapter. Let's skip the rest of the C++ code and focus on the GLSL shaders. There are two shaders used to render our PBR model in OpenGL: GL01_PBR.vert and GL01_PBR.frag. The vertex shader does nothing interesting. It uses programmable vertex pulling to read vertex data from the SSBO and passes data further down the graphics pipeline. The fragment shader does the real, actual work. Let's take a look.
#version 460 core
layout(std140, binding = 0) uniform PerFrameData {
mat4 view;
mat4 proj;
vec4 cameraPos;
};
layout (location=0) in vec2 tc;
layout (location=1) in vec3 normal;
layout (location=2) in vec3 worldPos;
layout (location=0) out vec4 out_FragColor;
layout (binding = 0) uniform sampler2D texAO;
layout (binding = 1) uniform sampler2D texEmissive;
layout (binding = 2) uniform sampler2D texAlbedo;
layout (binding = 3) uniform sampler2D texMetalRoughness;
layout (binding = 4) uniform sampler2D texNormal;
layout (binding = 5) uniform samplerCube texEnvMap;
layout (binding = 6) uniform samplerCube texEnvMapIrradiance;
layout (binding = 7) uniform sampler2D texBRDF_LUT;
#include <data/shaders/chapter06/PBR.sp>
void main() {
vec4 Kao = texture(texAO, tc);
vec4 Ke = texture(texEmissive, tc);
vec4 Kd = texture(texAlbedo, tc);
vec2 MeR = texture(texMetalRoughness, tc).yz;
vec3 n = normalize(normal);
n = perturbNormal(n, normalize(cameraPos.xyz - worldPos), tc);
PBRInfo pbrInputs;
vec3 color = calculatePBRInputsMetallicRoughness( Kd, n, cameraPos.xyz, worldPos, pbrInputs);
color += calculatePBRLightContribution( pbrInputs, normalize(vec3(-1.0, -1.0, -1.0)), vec3(1.0) );
color = color * ( Kao.r < 0.01 ? 1.0 : Kao.r );
color = pow( SRGBtoLINEAR(Ke).rgb + color, vec3(1.0/2.2) );
out_FragColor = vec4(color, 1.0);
};
Let's take a look at the calculations that happen inside chapter06/PBR.sp. Our implementation is based on the reference implementation of glTF 2.0 Sample Viewer from Khronos, which you can find at https://github.com/KhronosGroup/glTF-Sample-Viewer/tree/glTF-WebGL-PBR.
struct PBRInfo {
// cos angle between normal and light direction float NdotL;
// cos angle between normal and view direction float NdotV;
// cos angle between normal and half vector float NdotH;
// cos angle between light dir and half vector float LdotH;
// cos angle between view dir and half vector float VdotH;
// roughness value (input to shader) float perceptualRoughness;
// full reflectance color vec3 reflectance0;
// reflectance color at grazing angle vec3 reflectance90;
// remapped linear roughness float alphaRoughness;
// contribution from diffuse lighting vec3 diffuseColor;
// contribution from specular lighting vec3 specularColor;
// normal at surface point vec3 n;
// vector from surface point to camera vec3 v;
};
vec4 SRGBtoLINEAR(vec4 srgbIn) {
vec3 linOut = pow(srgbIn.xyz,vec3(2.2));
return vec4(linOut, srgbIn.a);
}
vec3 getIBLContribution( PBRInfo pbrInputs, vec3 n, vec3 reflection)
{
float mipCount = float(textureQueryLevels(texEnvMap));
float lod = pbrInputs.perceptualRoughness * mipCount;
vec2 brdfSamplePoint = clamp(vec2(pbrInputs.NdotV, 1.0-pbrInputs.perceptualRoughness), vec2(0.0), vec2(1.0));
vec3 brdf = textureLod(texBRDF_LUT, brdfSamplePoint, 0).rgb;
#ifdef VULKAN vec3 cm = vec3(-1.0, -1.0, 1.0);#else vec3 cm = vec3(1.0);#endif
vec3 diffuseLight = texture(texEnvMapIrradience, n.xyz * cm).rgb;
vec3 specularLight = textureLod(texEnvMap, reflection.xyz * cm, lod).rgb;
vec3 diffuse = diffuseLight * pbrInputs.diffuseColor;
vec3 specular = specularLight * (pbrInputs.specularColor * brdf.x + brdf.y);
return diffuse + specular;
}
Now, let's go through all the helper functions that are necessary to calculate different parts of the rendering equation.
vec3 diffuseBurley(PBRInfo pbrInputs) {
float f90 = 2.0 * pbrInputs.LdotH * pbrInputs.LdotH * pbrInputs.alphaRoughness - 0.5;
return (pbrInputs.diffuseColor / M_PI) * (1.0 + f90 * pow((1.0 - pbrInputs.NdotL), 5.0)) * (1.0 + f90 * pow((1.0 - pbrInputs.NdotV), 5.0));
}
vec3 specularReflection(PBRInfo pbrInputs) {
return pbrInputs.reflectance0 + (pbrInputs.reflectance90 - pbrInputs.reflectance0) * pow(clamp(1.0 - pbrInputs.VdotH, 0.0, 1.0), 5.0);
}
float geometricOcclusion(PBRInfo pbrInputs) {
float NdotL = pbrInputs.NdotL;
float NdotV = pbrInputs.NdotV;
float rSqr = pbrInputs.alphaRoughness * pbrInputs.alphaRoughness;
float attenuationL = 2.0 * NdotL / (NdotL + sqrt(rSqr + (1.0 - rSqr) * (NdotL * NdotL)));
float attenuationV = 2.0 * NdotV / (NdotV + sqrt(rSqr + (1.0 - rSqr) * (NdotV * NdotV)));
return attenuationL * attenuationV;
}
float microfacetDistribution(PBRInfo pbrInputs) {
float roughnessSq = pbrInputs.alphaRoughness * pbrInputs.alphaRoughness;
float f = (pbrInputs.NdotH * roughnessSq - pbrInputs.NdotH) * pbrInputs.NdotH + 1.0;
return roughnessSq / (M_PI * f * f);
}
This implementation is from Average Irregularity Representation of a Rough Surface for Ray Reflection by T. S. Trowbridge and K. P. Reitz.
Before we can calculate the light contribution from a light source, we need to fill in the fields of the PBRInfo structure. The following function does this.
vec3 calculatePBRInputsMetallicRoughness( vec4 albedo, vec3 normal, vec3 cameraPos, vec3 worldPos, out PBRInfo pbrInputs)
{
float perceptualRoughness = 1.0;
float metallic = 1.0;
vec4 mrSample = texture(texMetalRoughness, tc);
perceptualRoughness = mrSample.g * perceptualRoughness;
metallic = mrSample.b * metallic;
perceptualRoughness = clamp(perceptualRoughness, 0.04, 1.0);
metallic = clamp(metallic, 0.0, 1.0);
float alphaRoughness = perceptualRoughness * perceptualRoughness;
vec4 baseColor = albedo;
vec3 f0 = vec3(0.04);
vec3 diffuseColor = baseColor.rgb * (vec3(1.0) - f0);
diffuseColor *= 1.0 - metallic;
vec3 specularColor = mix(f0, baseColor.rgb, metallic);
float reflectance = max(max(specularColor.r, specularColor.g), specularColor.b);
float reflectance90 = clamp(reflectance * 25.0, 0.0, 1.0);
vec3 specularEnvironmentR0 = specularColor.rgb;
vec3 specularEnvironmentR90 = vec3(1.0, 1.0, 1.0) * reflectance90;
vec3 n = normalize(normal);
vec3 v = normalize(cameraPos - worldPos);
vec3 reflection = -normalize(reflect(v, n));
pbrInputs.NdotV = clamp(abs(dot(n, v)), 0.001, 1.0);
pbrInputs.perceptualRoughness = perceptualRoughness;
pbrInputs.reflectance0 = specularEnvironmentR0;
pbrInputs.reflectance90 = specularEnvironmentR90;
pbrInputs.alphaRoughness = alphaRoughness;
pbrInputs.diffuseColor = diffuseColor;
pbrInputs.specularColor = specularColor;
pbrInputs.n = n;
pbrInputs.v = v;
vec3 color = getIBLContribution( pbrInputs, n, reflection);
return color;
}
The lighting contribution from a single light source can be calculated in the following way using the precalculated values from PBRInfo.
vec3 calculatePBRLightContribution( inout PBRInfo pbrInputs, vec3 lightDirection, vec3 lightColor)
{
vec3 n = pbrInputs.n;
vec3 v = pbrInputs.v;
vec3 l = normalize(lightDirection);
vec3 h = normalize(l + v);
float NdotV = pbrInputs.NdotV;
float NdotL = clamp(dot(n, l), 0.001, 1.0);
float NdotH = clamp(dot(n, h), 0.0, 1.0);
float LdotH = clamp(dot(l, h), 0.0, 1.0);
float VdotH = clamp(dot(v, h), 0.0, 1.0);
pbrInputs.NdotL = NdotL;
pbrInputs.NdotH = NdotH;
pbrInputs.LdotH = LdotH;
pbrInputs.VdotH = VdotH;
vec3 F = specularReflection(pbrInputs);
float G = geometricOcclusion(pbrInputs);
float D = microfacetDistribution(pbrInputs);
vec3 diffuseContrib = (1.0 - F) * diffuseBurley(pbrInputs);
vec3 specContrib = F * G * D / (4.0 * NdotL * NdotV);
vec3 color = NdotL * lightColor * (diffuseContrib + specContrib);
return color;
}
The resulting demo application should render an image like the one shown in the following screenshot. Try also using different PBR glTF 2.0 models:
Figure 6.7 – PBR of the Damaged Helmet glTF 2.0 model
We've also made a Vulkan version of this app that reuses the same PBR calculation code from PBR.sp. This can be found in Chapter06/VK05_PBR.
The whole area of PBR is vast, and it is possible only to scratch its surface on these half-a-hundred pages. In real life, much more complicated PBR implementations can be created that are built on the requirements of content production pipelines. For an endless source of inspiration for what can be done, we recommend looking into the Unreal Engine source code, which is available for free on GitHub at https://github.com/EpicGames/UnrealEngine/tree/release/Engine/Shaders/Private.