Chapter 3. Queues and Commands


What You’ll Learn in This Chapter

• What a queue is and how to use it

• How to create commands and send them to Vulkan

• How to ensure that a device has finished processing your work


Vulkan devices expose multiple queues that perform work. In this chapter, we discuss the various queue types and explain how to submit work to them in the form of command buffers. We also show how to instruct a queue to complete all of the work you’ve sent it.

Device Queues

Each device in Vulkan has one or more queues. The queue is the part of the device that actually performs work. It can be thought of as a subdevice that exposes a subset of the device’s functionality. In some implementations, each queue may even be a physically separate part of the system.

Queues are grouped into one or more queue families, each containing one or more queues. Queues within a single family are essentially identical. Their capabilities are the same, their performance level and access to system resources is the same, and there is no cost (beyond synchronization) of transferring work between them. If a device contains multiple cores that have the same capabilities but differ in performance, access to memory, or some other factor that might mean they can’t operate identically, it may expose them in separate families that otherwise appear identical.

As discussed in Chapter 1, “Overview of Vulkan,” you can query the properties of each of a physical device’s queue families by calling vkGetPhysicalDeviceQueueFamilyProperties(). This function writes the properties of the queue family into an instance of the VkQueueFamilyProperties structure that you hand it.

The number and type of queues that you wish to use must be specified when you create the device. As you saw in Chapter 1, “Overview of Vulkan,” the VkDeviceCreateInfo structure that you pass to vkCreateDevice() contains the queueCreateInfoCount and pQueueCreateInfos members. Chapter 1, “Overview of Vulkan,” glossed over them, but now it’s time to fill them in. The queueCreateInfoCount member contains the number of VkDeviceQueueCreateInfo structures stored in the array pointed to by pQueueCreateInfos. The definition of the VkDeviceQueueCreateInfo structure is

typedef struct VkDeviceQueueCreateInfo {
    VkStructureType             sType;
    const void*                 pNext;
    VkDeviceQueueCreateFlags    flags;
    uint32_t                    queueFamilyIndex;
    uint32_t                    queueCount;
    const float*                pQueuePriorities;
} VkDeviceQueueCreateInfo;

As with most Vulkan structures, the sType field is the structure type, which in this case should be VK_STRUCTURE_TYPE_QUEUE_CREATE_INFO, and the pNext field is used for extensions and should be set to nullptr when none are used. The flags field contains flags controlling queue construction, but no flag is defined for use in the current version of Vulkan, so this field should be set to zero.

The fields of interest here are queueFamilyIndex and queueCount. The queueFamilyIndex field specifies the family from which you want to allocate queues, and the queueCount field specifies the number of queues to allocate from that family. To allocate queues from multiple families, simply pass an array of more than one VkDeviceQueueCreateInfo structure in the pQueueCreateInfos member of the VkDeviceCreateInfo structure.

The queues are constructed when the device is created. For this reason, we don’t create queues, but obtain them from the device. To do this, call vkGetDeviceQueue():

void vkGetDeviceQueue (
    VkDevice                            device,
    uint32_t                            queueFamilyIndex,
    uint32_t                            queueIndex,
    VkQueue*                            pQueue);

The vkGetDeviceQueue() function takes as arguments the device from which you want to obtain the queue, the family index, and the index of the queue within that family. These are specified in device, queueFamilyIndex, and queueIndex, respectively. The pQueue parameter points to the VkQueue handle that is to be filled with the handle to the queue. queueFamilyIndex and queueIndex must refer to a queue that was initialized when the device was created. If they do, a queue handle is placed into the variable pointed to by pQueue; otherwise, this variable is set to VK_NULL_HANDLE.

Creating Command Buffers

The primary purpose of a queue is to process work on behalf of your application. Work is represented as a sequence of commands that are recorded into command buffers. Your application will create command buffers containing the work it needs to do and submit them to one of the queues for execution. Before you can record any commands, you need to create a command buffer. Command buffers themselves are not created directly, but allocated from pools. To create a pool, call vkCreateCommandPool(), whose prototype is

VkResult vkCreateCommandPool (
    VkDevice                            device,
    const VkCommandPoolCreateInfo*      pCreateInfo,
    const VkAllocationCallbacks*        pAllocator,
    VkCommandPool*                      pCommandPool);

As with most Vulkan object creation functions, the first parameter, device, is the handle to the device that will own the new pool object, and a description of the pool is passed via a structure, a pointer to which is placed in pCreateInfo. This structure is an instance of VkCommandPoolCreateInfo, the definition of which is

typedef struct VkCommandPoolCreateInfo {
    VkStructureType             sType;
    const void*                 pNext;
    VkCommandPoolCreateFlags    flags;
    uint32_t                    queueFamilyIndex;
} VkCommandPoolCreateInfo;

As with most Vulkan structures, the first two fields, sType and pNext, contain the structure type and a pointer to another structure containing more information about the pool to be created. Here, we’ll set sType to VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO and, because we’re not passing any extra information, set pNext to nullptr.

The flags field contains flags that determine the behavior of the pool and the command buffers that are allocated from it. These are members of the VkCommandPoolCreateFlagBits enumeration, and there are currently two flags defined for use here.

• Setting the VK_COMMAND_POOL_CREATE_TRANSIENT_BIT indicates that command buffers taken from the pool will be short-lived and returned to the pool shortly after use. Not setting this bit suggests to Vulkan that you might keep the command buffers around for some time.

• Setting the VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT allows individual command buffers to be reused by resetting them or restarting them. (Don’t worry, we’ll cover that in a moment.) If this bit is not specified, then only the pool itself can be reset, which implicitly recycles all of the command buffers allocated from it.

Each of these bits may add some overhead to the work done by a Vulkan implementation to track the resources or otherwise alter its allocation strategy. For example, setting VK_COMMAND_POOL_CREATE_TRANSIENT_BIT may cause a Vulkan implementation to employ a more advanced allocation strategy for the pool in order to avoid fragmentation as command buffers are frequently allocated and then returned to it. Setting VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT may cause the implementation to track the reset status of each command buffer rather than simply track it at the pool level.

In this case, we’re actually going to set both bits. This gives us the most flexibility, possibly at the expense of some performance in cases where we could have managed command buffers in bulk.

Finally, the queueFamilyIndex field of VkCommandPoolCreateInfo specifies the family of queues to which command buffers allocated from this pool will be submitted. This is necessary because even where two queues on a device have the same capabilities and support the same set of commands, issuing a particular command to one queue may work differently from issuing that same command to another queue.

The pAllocator parameter is used for application-managed host memory allocations, which is covered in Chapter 2, “Memory and Resources.” Assuming successful creation of the command pool, its handle will be written into the variable pointed to by pCommandPool, and vkCreateCommandPool() will return VK_SUCCESS.

Once we have a pool from which to allocate command buffers, we can grab new command buffers by calling vkAllocateCommandBuffers(), which is defined as

VkResult vkAllocateCommandBuffers (
    VkDevice                                   device,
    const VkCommandBufferAllocateInfo*         pAllocateInfo,
    VkCommandBuffer*                           pCommandBuffers);

The device used to allocate the command buffers is passed in device, and the remaining parameters describing the command buffers to allocate are passed in an instance of the VkCommandBufferAllocateInfo structure, the address of which is passed in pCommandBuffers. The definition of VkCommandBufferAllocateInfo is

typedef struct VkCommandBufferAllocateInfo {
    VkStructureType         sType;
    const void*             pNext;
    VkCommandPool           commandPool;
    VkCommandBufferLevel    level;
    uint32_t                commandBufferCount;
} VkCommandBufferAllocateInfo;

The sType field should be set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, and as we’re using only the core feature set here, we set the pNext parameter to nullptr. A handle to the command pool that we created earlier is placed into the commandPool parameter.

The level parameter specifies the level of the command buffers that we want to allocate. It can be set to either VK_COMMAND_BUFFER_LEVEL_PRIMARY or VK_COMMAND_BUFFER_LEVEL_SECONDARY. Vulkan allows primary command buffers to call secondary command buffers. For our first few examples, we will use only primary-level command buffers. We’ll cover secondary-level command buffers later in the book.

Finally, commandBufferCount specifies the number of command buffers that we want to allocate from the pool. Note that we don’t tell Vulkan anything about the length or size of the command buffers we’re creating. The internal data structures representing device commands will generally vary too greatly for any unit of measurement, such as bytes or commands, to make much sense. Vulkan will manage the command buffer memory for you.

If vkAllocateCommandBuffers() is successful, it will return VK_SUCCESS and place the handles to the allocated command buffers in the array pointed to by pCommandBuffers. This array should be big enough to hold all the handles. Of course, if you want to allocate only a single command buffer, you can point this at a regular VkCommandBuffer handle.

To free command buffers, we use the vkFreeCommandBuffers() command, which is declared as

void vkFreeCommandBuffers (
    VkDevice                       device,
    VkCommandPool                  commandPool,
    uint32_t                       commandBufferCount,
    const VkCommandBuffer*         pCommandBuffers);

The device parameter is the device that owns the pool from which the command buffers were allocated. commandPool is a handle to that pool, commandBufferCount is the number of command buffers to free, and pCommandBuffers is a pointer to an array of commandBufferCount handles to the command buffers to free. Note that freeing a command buffer doesn’t necessarily free all of the resources associated with it but returns them to the pool from which they were allocated.

To free all of the resources used by a command pool and all of the command buffers allocated from it, call vkDestroyCommandPool(), the prototype of which is

void vkDestroyCommandPool (
    VkDevice                               device,
    VkCommandPool                          commandPool,
    const VkAllocationCallbacks*           pAllocator;

The device that owns the command pool is passed in the device parameter, and a handle to the command pool to destroy is passed in commandPool. A pointer to a host memory allocation structure compatible with the one used to create the pool is passed in pAllocator. This parameter should be nullptr if the pAllocator parameter to vkCreateCommandPool() was also nullptr.

There is no need to explicitly free all of the command buffers allocated from a pool before destroying the pool. The command buffers allocated from the pool are all freed as a part of destroying the pool and freeing its resources. Care should be taken, however, that no command buffers allocated from the pool are still executing or queued for execution on the device when vkDestroyCommandPool() is called.

Recording Commands

Commands are recorded into command buffers using Vulkan command functions, all of which take a command buffer handle as their first parameter. Access to the command buffer must be externally synchronized, meaning that it is the responsibility of your application to ensure that no two threads simultaneously attempt to record commands into the same command buffer at the same time. However, the following is perfectly acceptable:

• One thread can record commands into multiple command buffers by simply calling command buffer functions on different command buffers in succession.

• Two or more threads can participate in building a single command buffer, so long as the application can guarantee that no two of them are ever executing a command buffer building function concurrently.

One of the key design principles of Vulkan is to enable efficient multithreading. To achieve this, it is important that your application’s threads do not block each other’s execution by, for example, taking a mutex to protect a shared resource. For this reason, it’s best to have one or more command buffers for each thread rather than to try sharing one. Further, as command buffers are allocated from pools, you can go further and create a command pool for each thread, allowing command buffers to be allocated by your worker threads from their respective pools without interacting.

Before you can start recording commands into a command buffer, however, you have to begin the command buffer, which resets it to an initial state. To do this, call vkBeginCommandBuffer(), the prototype of which is

VkResult vkBeginCommandBuffer (
    VkCommandBuffer                        commandBuffer,
    const VkCommandBufferBeginInfo*        pBeginInfo);

The command buffer to begin recording is passed in commandBuffer, and the parameters that are used in recording this command buffer are passed through a pointer to a VkCommandBufferBeginInfo structure specified in pBeginInfo. The definition of VkCommandBufferBeginInfo is

typedef struct VkCommandBufferBeginInfo {
    VkStructureType                        sType;
    const void*                            pNext;
    VkCommandBufferUsageFlags              flags;
    const VkCommandBufferInheritanceInfo*  pInheritanceInfo;
} VkCommandBufferBeginInfo;

The sType field of VkCommandBufferBeginInfo should be set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO, and pNext should be set to nullptr. The flags field is used to tell Vulkan how the command buffer will be used. This should be a bitwise combination of some of the members of the VkCommandBufferUsageFlagBits enumeration, which include the following:

VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT means that the command buffer will be recorded, executed only once, and then destroyed or recycled.

VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT means that the command buffer will be used inside a renderpass and is valid only for secondary command buffers. The flag is ignored when you create a primary command buffer, which is what we will cover in this chapter. Renderpasses and secondary command buffers are covered in more detail in Chapter 13, “Multipass Rendering.”

VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT means that the command buffer might be executed or pending execution more than once.

For our purposes, it’s safe to set flags to zero, which means that we might execute the command buffer more than once but not simultaneously and that we’re not creating a secondary command buffer.

The pInheritanceInfo member of VkCommandBufferBeginInfo is used when beginning a secondary command buffer to define which states are inherited from the primary command buffer that will call it. For primary command buffers, this pointer is ignored. We’ll cover the content of the VkCommandBufferInheritanceInfo structure when we introduce secondary command buffers in Chapter 13, “Multipass Rendering.”

Now it’s time to create our first command. Back in Chapter 2, “Memory and Resources,” you learned about buffers, images, and memory. The vkCmdCopyBuffer() command is used to copy data between two buffer objects. Its prototype is

void vkCmdCopyBuffer (
    VkCommandBuffer                  commandBuffer,
    VkBuffer                         srcBuffer,
    VkBuffer                         dstBuffer,
    uint32_t                         regionCount,
    const VkBufferCopy*              pRegions);

This is the general form of all Vulkan commands. The first parameter, commandBuffer, is the command buffer to which the command is appended. The srcBuffer and dstBuffer parameters specify the buffer objects to be used as the source and destination of the copy, respectively. Finally, an array of regions is passed to the function. The number of regions is specified in regionCount, and the address of the array of regions is specified in pRegions. Each region is represented as an instance of the VkBufferCopy structure, the definition of which is

typedef struct VkBufferCopy {
    VkDeviceSize    srcOffset;
    VkDeviceSize    dstOffset;
    VkDeviceSize    size;
} VkBufferCopy;

Each element of the array simply contains the source and destination offsets and the size of each region to be copied in srcOffset, dstOffset, and size, respectively. When the command is executed, for each region in pRegions, size bytes of data will be copied from srcOffset in srcBuffer to dstOffset in dstBuffer. The offsets are also measured in bytes.

One thing that is fundamental to the operation of Vulkan is that the commands are not executed as soon as they are called. Rather, they are simply added to the end of the specified command buffer. If you are copying data to or from a region of memory that is visible to the host (i.e., it’s mapped), then you need to be sure of several things:

• Ensure that the data is in the source region before the command is executed by the device.

• Ensure that the data in the source region is valid until after the command has been executed on the device.

• Ensure that you don’t read the destination data until after the command has been executed on the device.

The first of these is perhaps the most interesting. In particular, it means that you can build the command buffer containing the copy command before putting the source data in memory. So long as the source data is in the right place before the command buffer is executed, things will work out.

Listing 3.1 demonstrates how to use vkCmdCopyBuffer() to copy a section of data from one buffer to another. The command buffer to perform the copy with is passed in the cmdBuffer parameter; the source and destination buffers are passed in srcBuffer and dstBuffer parameters, respectively; and the offsets of the data within them is passed in the srcOffset and dstOffset parameters. The function packs these parameters, along with the size of the copy, into a VkBufferCopy structure and calls vkCmdCopyBuffer() to perform the copy operation.

Listing 3.1: Example of Using vkCmdCopyBuffer()

void CopyDataBetweenBuffers(VkCmdBuffer cmdBuffer,
                            VkBuffer srcBuffer, VkDeviceSize srcOffset,
                            VkBuffer dstBuffer, VkDeviceSize dstOffset,
                            VkDeviceSize size)
{
   const VkBufferCopy copyRegion =
   {
       srcOffset, dstOffset, size
   };

   vkCmdCopyBuffer(cmdBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
}

Remember that srcOffset and dstOffset are relative to the start of the source and destination buffers, respectively, but that each of those buffers could be bound to memory at different offsets and could potentially be bound to the same memory object. Therefore, if one of the memory objects is mapped, the offset within the memory object is the offset at which the buffer object is bound to it plus the offset you pass to vkCmdCopyBuffer().

Before the command buffer is ready to be sent to the device for execution, we must tell Vulkan that we’re done recording commands into it. To do this, we call vkEndCommandBuffer(), the prototype of which is

VkResult vkEndCommandBuffer (
    VkCommandBuffer                       commandBuffer);

The vkEndCommandBuffer() function takes only a single parameter, commandBuffer, which is the command buffer to end recording. After vkEndCommandBuffer() is executed on a command buffer, Vulkan finishes any final work it needs to do to get the command buffer ready for execution.

Recycling Command Buffers

In many applications, a similar sequence of commands is used to render all or part of each frame. Therefore, it is likely that you will record similar command buffers over and over. Using the commands introduced so far, you would call vkAllocateCommandBuffers() to grab one or more command buffer handles, record commands into the command buffers, and then call vkFreeCommandBuffers() to return the command buffers to their respective pools. This is a relatively heavyweight operation, and if you know that you will reuse a command buffer for similar work many times in a row, it may be more efficient to reset the command buffer. This effectively puts the command buffer back into its original state but does not necessarily interact with the pool at all. Therefore, if the command buffer dynamically allocates resources from the pool as it grows, it can hang on to those resources and avoid the cost of reallocation the second and subsequent times it’s rebuilt. To reset a command buffer, call vkResetCommandBuffer(), the prototype of which is

VkResult vkResetCommandBuffer (
    VkCommandBuffer                    commandBuffer,
    VkCommandBufferResetFlags          flags);

The command buffer to reset is passed in commandBuffer. flags specifies additional operations to perform while resetting the command buffer. The only flag defined for use here is VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT. If this bit is set, then resources allocated from the pool by the command buffer are returned to the pool. Even with this bit set, it’s probably still more efficient to call vkResetCommandBuffer() than it is to free and reallocate a new command buffer.

It’s also possible to reset all the command buffers allocated from a pool in one shot. To do this, call vkResetCommandPool(), the prototype of which is

VkResult vkResetCommandPool (
    VkDevice                           device,
    VkCommandPool                      commandPool,
    VkCommandPoolResetFlags            flags);

The device that owns the command pool is specified in device, and the pool to reset is specified in commandPool. Just as with vkResetCommandBuffer(), the flags parameter specifies additional action to be taken as part of resetting the pool. Again, the only flag defined for use here is VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT. When this bit is set, any resources dynamically allocated by the pool are freed as part of the reset operation.

Command buffers allocated from the pool are not freed by vkResetCommandPool(), but all reenter their initial state as if they had been freshly allocated. vkResetCommandPool() is typically used at the end of a frame to return a batch of reusable command buffers to their pool rather than individually reset individual command buffers.

Care should be taken to try to keep the complexity of command buffers consistent over their multiple uses if they are reset without returning resources to the pool. As a command buffer grows, it may allocate resources dynamically from the pool, and the command pool may allocate resources from a systemwide pool. The amount of resources that a command buffer may consume is essentially unbounded, because there is no hard-wired limit to the number of commands you can place in a single command buffer. If your application uses a mix of very small and very large command buffers, it’s possible that eventually all command buffers will grow as large as the most complex command buffers.

To avoid this scenario, either periodically specify the VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT or VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT when resetting command buffers or their pools, respectively, or try to ensure that the same command buffers are always used in the same way—either short, simple command buffers or long, complex command buffers. Avoid mixing use cases.

Submission of Commands

To execute the command buffer on the device, we need to submit it to one of the device’s queues. To do this, call vkQueueSubmit(), the prototype of which is

VkResult vkQueueSubmit (
    VkQueue                        queue,
    uint32_t                       submitCount,
    const VkSubmitInfo*            pSubmits,
    VkFence                        fence);

This command can submit one or more command buffers to the device for execution. The queue parameter specifies the device queue to which to send the command buffer. Access to the queue must be externally synchronized. All of the command buffers to submit were allocated from a pool, and that pool must have been created with respect to one of the device’s queue families. This is the queueFamilyIndex member of the VkCommandPoolCreateInfo structure passed to vkCreateCommandPool(). queue must be a member of that family.

The number of submissions is specified in submitCount, and an array of structures describing each of the submissions is specified in pSubmits. Each submission is represented by an instance of the VkSubmitInfo structures, the definition of which is

typedef struct VkSubmitInfo {
    VkStructureType                sType;
    const void*                    pNext;
    uint32_t                       waitSemaphoreCount;
    const VkSemaphore*             pWaitSemaphores;
    const VkPipelineStageFlags*    pWaitDstStageMask;
    uint32_t                       commandBufferCount;
    const VkCommandBuffer*         pCommandBuffers;
    uint32_t                       signalSemaphoreCount;
    const VkSemaphore*             pSignalSemaphores;
} VkSubmitInfo;

The sType field of VkSubmitInfo should be set to VK_STRUCTURE_TYPE_SUBMIT_INFO, and pNext should be set to nullptr. Each VkSubmitInfo structure can represent multiple command buffers that are to be executed by the device.

Each set of command buffers can be wrapped in a set of semaphores upon which to wait before beginning execution and can signal one or more semaphores when they complete execution. A semaphore is a type of synchronization primitive that allows work executed by different queues to be scheduled and coordinated correctly. We will cover semaphores along with other synchronization primitives in Chapter 11, “Synchronization.” For now, we’re not going to use these fields, so waitSemaphoreCount and signalSemaphoreCount can be set to zero, and pWaitSemaphores, pWaitDstStageMask, and pSignalSemaphores can all be set to nullptr.

The command buffers we want to execute are placed in an array, and its address is passed in pCommandBuffers. The number of command buffers to execute (the length of the pCommandBuffers array) is specified in commandBufferCount. At some time after the vkQueueSubmit() command is called, the commands in the command buffers begin executing on the device. Commands submitted to different queues on the same device (or to queues belonging to different devices) may execute in parallel. vkQueueSubmit() returns as soon as the specified command buffers have been scheduled, possibly long before they’ve even begun executing.

The fence parameter to vkQueueSubmit() is a handle to a fence object, which can be used to wait for completion of the commands executed by this submission. A fence is another type of synchronization primitive that we will cover in Chapter 11, “Synchronization.” For now, we’ll set fence to VK_NULL_HANDLE. Until we cover fences, we can wait for all work submitted to a queue to complete by calling vkQueueWaitIdle(). Its prototype is

VkResult vkQueueWaitIdle (
    VkQueue                          queue);

The only parameter to vkQueueWaitIdle(), queue, is the queue upon which to wait. When vkQueueWaitIdle() returns, all command buffers submitted to queue are guaranteed to have completed execution. A shortcut to wait for all commands submitted to all queues on a single device to have completed is to call vkDeviceWaitIdle(). Its prototype is

VkResult vkDeviceWaitIdle (
    VkDevice                         device);

Calling vkQueueWaitIdle() or vkDeviceWaitIdle() is really not recommended, as they fully flush any work on the queue or device and are very heavyweight operations. Neither should be called in any performance-critical part of your application. Suitable use cases include just before shutting down the application or when reinitializing application subsystems such as thread management, memory management, and so on, where there is likely to be a substantial pause anyway.

Summary

This chapter introduced you to command buffers, which are the mechanisms by which commands are communicated by your application to the Vulkan device. We introduced our first Vulkan command and showed you how to ask the device to execute work for you.

We discussed how to send the command buffers to the Vulkan device for execution by submitting them to the queue. You saw how to ensure that all work submitted to a queue or to a device has finished executing. Although we glossed over a number of important topics, such as how to call one command buffer from another and how to implement fine-grained synchronization between the host and the device and between queues on the device, these topics will be discussed in upcoming chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.82.46