Chapter 9: Working with Scene Graphs

In this chapter, we add a few more touches to our scene-graph code to allow dynamic scene graphs. Besides that, we show how to handle custom components using an example with a physics simulation library, Bullet. Using these techniques, we can extend our scene-graph implementation even further and cover many real-world use cases.

This chapter covers the following recipes:

  • Deleting nodes and merging scene graphs
  • Finalizing the scene-converter tool
  • Implementing lightweight rendering queues in Open Graphics Library (OpenGL)
  • Working with shape lists in Vulkan
  • Adding Bullet physics to a graphics application

Technical requirements

To run the recipes from this chapter, you will need to use a computer with a video card supporting OpenGL 4.6 with ARB_bindless_texture and Vulkan 1.2. Read Chapter 1, Establishing a Build Environment, if you want to learn how to build demonstration applications from this book.

This chapter relies on the geometry-loading code explained in Chapter 7, Graphics Rendering Pipeline, so make sure you read it before proceeding any further.

You can find the code files present in this chapter on GitHub at

Deleting nodes and merging scene graphs

Our scene-management routines from Chapter 7, Graphics Rendering Pipeline, are incomplete without a few more operations:

  • Deleting scene nodes
  • Merging multiple scenes into one (in our case, Lumberyard Bistro's exterior and interior objects)
  • Merging material and mesh data lists
  • Merging multiple static meshes with the same material into a large single mesh

These operations are crucial to the success of our final demo application since, without a combined list of render operations, we cannot easily construct auxiliary draw lists for shadow casters and transparent objects. Frustum culling is significantly easier to do while dealing with a single list of renderable items. Mesh merging is also essential for optimization purposes. The original Lumberyard Bistro scene contains a big tree in the backyard, where small orange and green leaves comprise almost two-thirds of the total draw call count of the entire scene—that is, roughly 18,000 out of 27,000 objects.

This recipe describes deleteSceneNodes()and mergeScene()routines used for scene-graph manipulations. Together with the next recipe, Finalizing the scene-converter tool, these functions complete our scene data-conversion and preprocessing tool, which we started back in Chapter 7, Graphics Rendering Pipeline.

Let's recall Chapter 7, Graphics Rendering Pipeline, where we packed all the scene data into a continuous array wrapped in std::vector for convenience so that the data was directly usable by the Graphics Processing Unit (GPU). Here, we use Standard Template Library's (STL's) partitioning algorithms to keep everything tightly packed while deleting and merging scene nodes.

For a starter, let's implement a utility function to delete a collection of items from an array. It may seem superfluous to implement a routine to remove an entire collection of nodes at once, but even in the simplest case of deleting a single node, we also need to remove all of the node's children.

The idea is to move all the nodes marked for deletion to the end of the array using the std::stable_partition() algorithm. After moving the nodes, we only need to resize the container. The following diagram clarifies the deletion process:

Figure 9.1 – Deleting nodes from a scene graph

Figure 9.1 – Deleting nodes from a scene graph

On the left, we have an initial scene graph with some nodes marked for deletion (ch1 and ch4, ch5, ch6). In the middle, we have a linear list of nodes, with arrows indicating inter-node references. On the right, we have a reorganized node list and the resulting scene graph.

How to do it...

Let's take a look at how to implement our utility functions.

  1. The eraseSelected() routine is generic enough to delete any kind of items from an array, so we use template arguments:

    template <class T, class Index = int>

    void eraseSelected(  std::vector<T>& v,  const std::vector<Index>& selection) {

      v.resize(std::distance(v.begin(),    std::stable_partition(v.begin(), v.end(),      [&selection, &v](const T& item) {        return !std::binary_search(          selection.begin(), selection.end(),          static_cast<Index>(            static_cast<const T*>(&item) - &v[0]));        }))



    The function "chops off" the elements moved to the end of the vector by using vector::resize(). The exact number of items to retain is calculated as a distance from the start of the array to the iterator returned by the stable_partition() function. The std::stable_partition() algorithm takes a lambda function that checks whether the element should be moved to the end of the array. In our case, this lambda function checks whether the item is in the selection container passed as an argument. The "usual" way to find an item index in the array is to use std::distance() and std::find(), but we can also resort to good old pointer arithmetic, as the container is tightly packed.

  2. Now that we have our workhorse, the eraseSelected() routine, we can implement scene-node deletion. When we want to delete one node, all its children must also be marked for deletion. We collect all the nodes to be deleted with the following recursive routine. To do this, we iterate all the children, as we did in Chapter 7, Graphics Rendering Pipeline, while traversing a scene graph. Each iterated index is added to the array:

    void collectNodesToDelete(  const Scene& scene, int node,  std::vector<uint32_t>& nodes)


      for (int n = scene.hierarchy_[node].firstChild_;    n != - 1 ; n = scene.hierarchy_[n].nextSibling_)


        addUniqueIdx(nodes, n);

        collectNodesToDelete(scene, n, nodes);



  3. An addUniqueIndex() helper routine avoids adding items twice:

    void addUniqueIdx(std::vector<uint32_t>& v, int index) {

      if (!std::binary_search(v.begin(), v.end(), index))



    One subtle requirement here is that the array is sorted. When all the children come strictly after their parents, this is not a problem. Otherwise, std::find() should be used, which naturally increases the runtime cost of the algorithm.

  4. Our deleteSceneNodes() deletion routine starts by adding all child nodes to the deleted nodes list. To keep track of moved nodes, we create a nodes linear list of indices, starting at 0:

    void deleteSceneNodes(Scene& scene,  const std::vector<uint32_t>& nodesToDelete)


      auto indicesToDelete = nodesToDelete;

      for (auto i: indicesToDelete)

        collectNodesToDelete(scene, i, indicesToDelete);

      std::vector<uint32_t> nodes(    scene.hierarchy_.size());

      std::iota(nodes.begin(), nodes.end(), 0);

  5. Afterward, we remember the source node count and remove all the indices from our linear index list. To fix the child node indices, we create a linear mapping table from old node indices to the new ones:

      auto oldSize = nodes.size();

      eraseSelected(nodes, indicesToDelete);

      std::vector<uint32_t> newIndices(oldSize, -1);

      for (uint32_t i = 0; i < nodes.size(); i++)

        newIndices[nodes[i]] = i;

  6. Before deleting nodes from the hierarchy array, we remap all node indices. The following lambda modifies a single Hierarchy item by finding the non-null node in the newIndices container:

      auto nodeMover = [&scene, &newIndices](Hierarchy& h)


        return Hierarchy {      .parent_ = (h.parent_ != -1) ?        newIndices[h.parent_] : -1,

          .firstChild_ = findLastNonDeletedItem(        scene, newIndices, h.firstChild_),

          .nextSibling_ = findLastNonDeletedItem(        scene, newIndices, h.nextSibling_),

          .lastSibling_ = findLastNonDeletedItem(        scene, newIndices, h.lastSibling_)    };


  7. The std::transform() algorithm modifies all the nodes in the hierarchy. After fixing node indices, we are ready to actually delete data. Three calls to eraseSelected() throw away the unused hierarchy and transformation items:

      std::transform(scene.hierarchy_.begin(),    scene.hierarchy_.end(), scene.hierarchy_.begin(),    nodeMover);

      eraseSelected(scene.hierarchy_, indicesToDelete);

      eraseSelected(    scene.localTransform_, indicesToDelete);

      eraseSelected(    scene.globalTransform_, indicesToDelete);

  8. Finally, we need to adjust the indices in mesh, material, and name maps. For this, we use the shiftMapIndices() function shown here:

      shiftMapIndices(scene.meshes_, newIndices);

      shiftMapIndices(scene.materialForNode_, newIndices);

      shiftMapIndices(scene.nameForNode_, newIndices);


  9. The search for node replacement used during the node-index shifting is implemented recursively. The findLastNonDeletedItem() function returns a deleted node replacement index:

    int findLastNonDeletedItem(const Scene& scene,  const std::vector<uint32_t>& newIndices, int node)


      if (node == -1)    return -1;

      return (newIndices[node] == -1) ?

        findLastNonDeletedItem(      scene, newIndices,      scene.hierarchy_[node].nextSibling_) :



    If the input is empty, no replacement is necessary. If we have no replacement for the node, we recurse to the next sibling of the deleted node.

  10. The last function replaces the pair::second value in each map's item:

    void shiftMapIndices(  std::unordered_map<uint32_t, uint32_t>& items,  const std::vector<int>& newIndices)


      std::unordered_map<uint32_t, uint32_t> newItems;

      for (const auto& m: items) {

        int newIndex = newIndices[m.first];

        if (newIndex != -1)

          newItems[newIndex] = m.second;


      items = newItems;


The deleteSceneNodes() routine allows us to compress and optimize a scene graph while merging multiple meshes with the same material. Now, we need a method to combine multiple meshes into one and delete scene nodes referring to merged meshes. The merging of mesh data requires only the index-data modification. Let's look at the steps involved:

  1. The mergeScene() function uses two functions, the first one calculating the number of merged indices. We remember the starting vertex offset for all of the meshes. The loop shifts all the indices in individual mesh blocks of the meshData.indexData_ array. Also, for each Mesh object, a new minVtxOffset value is assigned to the vertex-data offset field. The return value is the difference between the original and merged index count. This difference is also the offset to the point where the merged index data starts:

    static uint32_t shiftMeshIndices(MeshData& meshData,  const std::vector<uint32_t>& meshesToMerge)


      auto minVtxOffset = numeric_limits<uint32_t>::max();

      for (auto i: meshesToMerge)

        minVtxOffset = std::min(      meshData.meshes_[i].vertexOffset, minVtxOffset);

      uint32_t mergeCount = 0;

      for (auto i: meshesToMerge) {

        auto& m = meshData.meshes_[i];

        const uint32_t delta =      m.vertexOffset - minVtxOffset;

        const auto idxCount = m.getLODIndicesCount(0);

        for (auto ii = 0u ; ii < idxCount ; ii++)

          meshData.indexData_[m.indexOffset + ii] +=        delta;

        m.vertexOffset = minVtxOffset;

        mergeCount += idxCount;


      return meshData.indexData_.size() - mergeCount;


  2. The mergeIndexArray() function copies indices for each mesh into the newIndices array:

    static void mergeIndexArray(MeshData& md,  const std::vector<uint32_t>& meshesToMerge,  std::map<uint32_t, uint32_t>& oldToNew)


      std::vector<uint32_t> newIndices(    md.indexData_.size());

      uint32_t copyOffset = 0;

      uint32_t mergeOffset =    shiftMeshIndices(md, meshesToMerge);

    For each mesh, we decide where to copy its index data. The copyOffset value is used for meshes that are not merged, and the mergeOffset value starts at the beginning of the merged index data returned by the shiftMeshIndices() function.

  3. Two variables contain mesh indices of the merged mesh and the copied mesh. We iterate all the meshes to check whether the current one needs to be merged:

      const auto mergedMeshIndex =    md.meshes_.size() - meshesToMerge.size();

      auto newIndex = 0u;

      for (uint32_t midx = 0;       midx < md.meshes_.size(); midx++)


        const bool shouldMerge =      std::binary_search(meshesToMerge.begin(),                          meshesToMerge.end(), midx);

  4. Each index is stored in an old-to-new correspondence map:

        oldToNew[midx] =      shouldMerge ? mergedMeshIndex : newIndex;

        newIndex += shouldMerge ? 0 : 1;

  5. The offset of the index block for this mesh is modified, so first calculate the source offset for the index data:

        auto& mesh = md.meshes_[midx];

        auto idxCount = mesh.getLODIndicesCount(0);

        const auto start =      md.indexData_.begin() + mesh.indexOffset;

        mesh.indexOffset = copyOffset;

  6. We choose between two offsets and copy index data from the original array to the output. The new index array is copied into the mesh data structure:

        const auto offsetPtr =      shouldMerge ? &mergeOffset : &copyOffset;

        std::copy(start, start + idxCount,      newIndices.begin() + *offsetPtr);

        *offsetPtr += idxCount;


      md.indexData_ = newIndices;

  7. One last step in the merge process is the creation of a merged mesh. Copy the first of the merged mesh descriptors and assign new Lateral Offset Device (LOD) offsets:

      Mesh lastMesh = md.meshes_[meshesToMerge[0]];

      lastMesh.indexOffset = copyOffset;

      lastMesh.lodOffset[0] = copyOffset;

      lastMesh.lodOffset[1] = mergeOffset;

      lastMesh.lodCount = 1;



The mergeScene() routine omits a couple of important things. First, we merge only the finest LOD level. For our purpose, this is sufficient because our scene contains a large amount of simple (one to two triangles) meshes with only a single LOD. Second, we assume that the merged meshes have the same transformation. This is also the case for our test scene, but if correct transformation is necessary, all the vertices should be transformed into the global coordinate system and then transformed back to the local coordinates of the node where we place the resulting merged mesh. Let's take a look at the implementation:

  1. To avoid string comparisons, convert material names to their indices in the material name array:

    void mergeScene(Scene& scene, MeshData& meshData,  const std::string& materialName)


      int oldMaterial = (int)std::distance(    std::begin(scene.materialNames_),    std::find(std::begin(scene.materialNames_),      std::end(scene.materialNames_),      materialName));

  2. When you have the material index, collect all the scene nodes that will be deleted:

      std::vector<uint32_t> toDelete;

      for (uint32_t i = 0;       i < scene.hierarchy_.size(); i++) {

        if (scene.meshes_.contains(i) &&        scene.materialForNode_.contains(i) &&        ( == oldMaterial))



  3. The number of meshes to be merged is the same as the number of deleted scene nodes (in our scene, at least), so convert scene-node indices into mesh indices:

      std::vector<uint32_t> meshesToMerge(    toDelete.size());

      std::transform(toDelete.begin(), toDelete.end(),    meshesToMerge.begin(),    [&scene](auto i) { return; });

  4. An essential part of this code merges index data and assigns changed mesh indices to scene nodes:

      std::map<uint32_t, uint32_t> oldToNew;

      mergeIndexArray(meshData, meshesToMerge, oldToNew);

      for (auto& n: scene.meshes_)

        n.second = oldToNew[n.second];

  5. Finally, cut out all the merged meshes and attach a new node containing the merged meshes to the scene graph:

      eraseSelected(meshData.meshes_, meshesToMerge);

      int newNode = addNode(scene, 0, 1);

      scene.meshes_[newNode] = meshData.meshes_.size()-1;

      scene.materialForNode_[newNode] =    (uint32_t)oldMaterial;

      deleteSceneNodes(scene, toDelete);


The mergeScene() function is used in the scene-converter tool. Let's jump to the next recipe to learn how to merge multiple meshes in the Lumberyard Bistro scene.

Finalizing the scene-converter tool

While the scene-node deletion routine from the previous recipe is useful for implementing interactive editors, we still need to automatically optimize our Lumberyard Bistro scene geometry. Here, we provide a few helper routines to merge multiple scenes into one. These routines and the code from the previous recipe allow us to complete the scene data-conversion tool that we started in Chapter 7, Graphics Rendering Pipeline.

Getting ready

Make sure you read the previous recipe, Deleting nodes and merging scene graphs.

The source code for this recipe is part of the Chapter7/SceneConverter tool implemented in Chapter 7, Graphics Rendering Pipeline. Start exploring from the mergeBistro() function and follow this recipe's text.

How to do it...

The first routine we need is the merging of multiple meshes into a single contiguous array. Since each MeshData structure contains an array of triangle indices and an interleaved array of vertex attributes, the merging procedure consists of copying input MeshData instances to a single array and modifying index-data offsets. Let's look at the steps:

  1. The mergeMeshData() routine takes a list of MeshData instances and creates a new file header while simultaneously copying all the indices and vertices to the output object:

    MeshFileHeader mergeMeshData(  MeshData& m, const std::vector<MeshData*> md)


      uint32_t totalVertexDataSize = 0;

      uint32_t totalIndexDataSize  = 0;

      uint32_t offs = 0;

      for (const MeshData* i: md) {

        mergeVectors(m.indexData_, i->indexData_);

        mergeVectors(m.vertexData_, i->vertexData_);

        mergeVectors(m.meshes_, i->meshes_);

        mergeVectors(m.boxes_, i->boxes_);

  2. After merging the index and vertex data along with the auxiliary precalculated bounding boxes, shift each index by the total size of the merged index array:

        for (size_t j = 0;         j < (uint32_t)i->meshes_.size(); j++)

          m.meshes_[offs + j].indexOffset +=        totalIndexDataSize;

  3. Each index must be shifted by the current size of the m.vertexData_ array. The "magic" number—8—here is the sum of 3 vertex position components, 3 normal vector components, and 2 texture coordinates:

        uint32_t vtxOffset = totalVertexDataSize / 8;

        for(size_t j = 0 ; j < i->indexData_.size() ; j++)

          m.indexData_[totalIndexDataSize + j] +=        vtxOffset;

  4. At each iteration, increment global offsets in the mesh, index, and vertex arrays:

        offs += (uint32_t)i->meshes_.size();

        totalIndexDataSize +=      (uint32_t)i->indexData_.size();

        totalVertexDataSize +=      (uint32_t)i->vertexData_.size();


  5. The resulting mesh file header contains the total size of index and vertex data arrays:

      return MeshFileHeader {    .magicValue = 0x12345678,    .meshCount = offs,    .dataBlockStartOffset =      sizeof(MeshFileHeader) + offs * sizeof(Mesh),    .indexDataSize =      totalIndexDataSize * sizeof(uint32_t),    .vertexDataSize =      totalVertexDataSize * sizeof(float)  };


    The mergeVectors() function is a templated one-liner that appends the second vector, v2, to the end of the first one, v1:

    template<typename T> inline void mergeVectors(  std::vector<T>& v1, const std::vector<T>& v2)


      v1.insert( v1.end(), v2.begin(), v2.end() );


  6. Along with mesh-data merging, you need to create an aggregate material description list from a collection of material lists. The mergeMaterialLists() function creates a single texture filenames list and a material description list with correct texture indices:

    void mergeMaterialLists(  const std::vector<vector<MaterialDescription>*>&    oldMaterials,  const std::vector<vector<string>*>& oldTextures,  std::vector<MaterialDescription>& allMaterials,  std::vector<std::string>& newTextures)


      std::unordered_map<string, int> newTextureNames;

      std::unordered_map<int, int> materialToTextureList;

  7. The merge process starts with the creation of a single list of materials. Each material list index is associated with a texture so that later on, we can figure out in which of the lists the texture appears. Since our beloved C++ does not yet have a canonical Python-style iteration of items while keeping track of the item's index, we declare an index variable manually:

      int midx = 0;

      for (vector<MaterialDescription>* ml: oldMaterials){

        for (const MaterialDescription& m: *ml) {


          materialToTextureList[allMaterials.size()-1] =        midx;




  8. The newTextures global texture array contains only unique filenames. Indices of texture files are stored in a map to fix the values in material descriptors below them:

      for (const auto& tl: oldTextures)

        for (const std::string& file: *tl)

          newTextureNames[file] =

            addUnique(newTextures, file);

  9. The replaceTexture() lambda takes a texture index from a local texture array and assigns a global texture index from the newTextures array:

      auto replaceTexture =

        [&materialToTextureList, &oldTextures,

         &newTextureNames] (int m, uint64_t* textureID)


          if (*textureID < INVALID_TEXTURE) {

            auto listIdx = materialToTextureList[m];

            auto texList = oldTextures[listIdx];

            const std::string& texFile =          (*texList)[*textureID];

            *textureID = newTextureNames[texFile];



    The final loop goes over all materials and adjusts the texture indices accordingly:

      for (size_t i = 0 ; i < allMaterials.size() ; i++) {

        auto& m = allMaterials[i];

        replaceTexture(i, &m.ambientOcclusionMap_);

        replaceTexture(i, &m.emissiveMap_);

        replaceTexture(i, &m.albedoMap_);

        replaceTexture(i, &m.metallicRoughnessMap_);

        replaceTexture(i, &m.normalMap_);



To merge interior and exterior object lists, we need one more routine that merges multiple scene hierarchies into one large scene graph. The scene data is specified by the hierarchy item array, local and global transforms, mesh, material, and scene-node associative arrays. Just as with mesh index and vertex data, the merge routine boils down to merging individual arrays and then shifting indices in individual scene nodes:

  1. The shiftNodes() routine increments individual fields of the Hierarchy structure by the given amount:

    void shiftNodes(Scene& scene,  int startOffset, int nodeCount, int shiftAmount)


      auto shiftNode = [shiftAmount](Hierarchy& node)  {

        if (node.parent_ > -1)      node.parent_ += shiftAmount;

        if (node.firstChild_ > -1)      node.firstChild_ += shiftAmount;

        if (node.nextSibling_ > -1)      node.nextSibling_ += shiftAmount;

        if (node.lastSibling_ > -1)      node.lastSibling_ += shiftAmount;


      for (int i = 0; i < nodeCount; i++)

        shiftNode(scene.hierarchy_[i + startOffset]);


  2. The mergeMaps() helper routine adds the otherMap collection to the m output map and shifts item indices by specified amounts:

    using ItemMap =  std::unordered_map<uint32_t, uint32_t>;

    void mergeMaps(ItemMap& m, const ItemMap& otherMap,  int indexOffset, int itemOffset)


      for (const auto& i: otherMap)

        m[i.first + indexOffset] = i.second + itemOffset;


  3. The mergeScenes() routine creates a new scene node named "NewRoot" and adds all the root scene nodes from the list to the new scene as children of the "NewRoot" node. In the accompanying source-code bundle, this routine has two more parameters, mergeMeshes and mergeMaterials, which allow the creation of composite scenes with shared mesh and material data. We omit these non-essential parameters to shorten the description:

    void mergeScenes(Scene& scene,  const std::vector<Scene*>& scenes,  const std::vector<glm::mat4>& rootTransforms,  const std::vector<uint32_t>& meshCounts,  bool mergeMeshes, bool mergeMaterials)


      scene.hierarchy_ = { {    .parent_ = -1,    .firstChild_ = 1,    .nextSibling_ = -1,    .lastSibling_ = -1,    .level_ = 0   } };

  4. Name and transform arrays initially contain a single element, "NewRoot":

      scene.nameForNode_[0] = 0;

      scene.names_ = { "NewRoot" };



      if (scenes.empty()) return;

  5. While iterating the scenes, we merge and shift all the arrays and maps. The next few variables keep track of item counts in the output scene:

      int offs = 1;

      int meshOffs = 0;

      int nameOffs = (int)scene.names_.size();

      int materialOfs = 0;

      auto meshCount = meshCounts.begin();

      if (!mergeMaterials)

        scene.materialNames_ = scenes[0]->materialNames_;

  6. This implementation is not the best possible one, not least because we risk merging all the scene-graph components in a single routine:

      for (const Scene* s: scenes) {

        mergeVectors(      scene.localTransform_, s->localTransform_);

        mergeVectors(      scene.globalTransform_, s->globalTransform_);

        mergeVectors(scene.hierarchy_, s->hierarchy_);

        mergeVectors(scene.names_, s->names_);

        if (mergeMaterials)

          mergeVectors(        scene.materialNames_, s->materialNames_);

        int nodeCount = (int)s->hierarchy_.size();

        shiftNodes(scene, offs, nodeCount, offs);

        mergeMaps(scene.meshes_,      s->meshes_, offs, mergeMeshes ? meshOffs : 0);

        mergeMaps(scene.materialForNode_,      s->materialForNode_, offs,      mergeMaterials ? materialOfs : 0);

        mergeMaps(scene.nameForNode_,      s->nameForNode_, offs, nameOffs);

  7. At each iteration, we add the sizes of the current arrays to global offsets:

        offs += nodeCount;

        materialOfs += (int)s->materialNames_.size();

        nameOffs += (int)s->names_.size();

        if (mergeMeshes) {

           meshOffs += *meshCount;




  8. Logically, the routine is complete, but there is one more step to perform. Each scene node contains a cached index of the last sibling node, which we have to set for the new root nodes. Each root node can now have a new local transform, which we set in the following loop:

      offs = 1;

      int idx = 0;

      for (const Scene* s: scenes)  {

        int nodeCount = (int)s->hierarchy_.size();

        bool isLast = (idx == scenes.size() - 1);

        int next = isLast ? -1 : offs + nodeCount;

        scene.hierarchy_[offs].nextSibling_ = next;

        scene.hierarchy_[offs].parent_ = 0;

        if (!rootTransforms.empty())

          scene.localTransform_[offs] =        rootTransforms[idx] *        scene.localTransform_[offs];

        offs += nodeCount;



  9. At the end of the routine, we should increment all the levels of the scene nodes but leave the "NewRoot" node untouched—hence, +1:

      for (auto i = scene.hierarchy_.begin() + 1;       i != scene.hierarchy_.end() ; i++)



Our final addition to the scene-converter tool is a routine that combines interior and exterior objects of the Lumberyard Bistro scene. This routine also merges almost 20,000 leaves and a tree trunk into just three large aggregate meshes. Here are the steps involved:

  1. In the beginning, we load two MeshData instances, two Scene objects, and two MaterialDescription containers. All of this data is produced in the main() function of SceneConverter when it is run with the provided configuration file:

    void mergeBistro() {

      Scene scene1, scene2;

      std::vector<Scene*> scenes = { &scene1, &scene2 };

      MeshData m1, m2;

      auto header1 = loadMeshData(    "data/meshes/test.meshes", m1);

      auto header2 = loadMeshData(    "data/meshes/test2.meshes", m2);

      std::vector<uint32_t> meshCounts =    { header1.meshCount, header2.meshCount };

      loadScene("data/meshes/test.scene", scene1);

      loadScene("data/meshes/test2.scene", scene2);

      Scene scene;

      mergeScenes(scene, scenes, {}, meshCounts);

      MeshData meshData;

      std::vector<MeshData*> meshDatas = { &m1, &m2 };

  2. Once we have loaded all the mesh data, we create an aggregate MeshData object. Material data is also loaded and merged, similar to the mesh data:

      MeshFileHeader header =    mergeMeshData(meshData, meshDatas);

      std::vector<MaterialDescription>    materials1, materials2;

      std::vector<std::string>    textureFiles1, textureFiles2;

      loadMaterials("data/meshes/test.materials",    materials1, textureFiles1);

      loadMaterials("data/meshes/test2.materials",    materials2, textureFiles2);

      std::vector<MaterialDescription> allMaterials;

      std::vector<std::string> allTextures;

  3. A global material list is created with the mergeMaterialLists() function described previously:

      mergeMaterialLists(    { &materials1, &materials2 },    { &textureFiles1, &textureFiles2 },    allMaterials, allTextures);

  4. Our scene contains a leafy tree object in the backyard. Just by inspecting the source mesh files, we can easily find out the names of materials for the meshes to be merged. Green and orange leaves constitute almost two-thirds of the total mesh count in the combined scene, so they are merged into two large meshes. The trunk is almost 1,000 meshes, so we merge it as well:

      mergeScene(scene, meshData,    "Foliage_Linde_Tree_Large_Orange_Leaves");

      mergeScene(scene, meshData,    "Foliage_Linde_Tree_Large_Green_Leaves");

      mergeScene(scene, meshData,    "Foliage_Linde_Tree_Large_Trunk");

  5. Following the modification, we have our bounding-box array broken, so we call the calculation routine. The saving of optimized mesh, material, and scene node lists is done in the same way as in the processScene() function described in Chapter 7, Graphics Rendering Pipeline:


      saveMaterials(    "data/meshes/bistro_all.materials",    allMaterials, allTextures);

      saveMeshData(    "data/meshes/bistro_all.meshes", meshData);

      saveScene("data/meshes/bistro_all.scene", scene);


Now, we can consider our SceneConverter tool fully implemented. Let's switch back to the actual rendering topics and see how convenient it is to work with a single scene graph.

There's more...

While we can call the scene converter complete for the purpose of this book, there are many improvements that are still desirable and easy to implement. We recommend adding texture compression as an exercise for our readers.

Implementing lightweight rendering queues in OpenGL

All our previous rendering examples in this book were built with the assumption that an indirect draw call renders the entire collection of loaded meshes using the currently bound shader program. This functionality is sufficient to implement simple rendering techniques, where all the meshes can be treated the same way—for example, we can take the entire scene geometry and render it using a shadow-mapping shader program. Then, we take exactly the same scene geometry and render it entirely using another shader program to apply texture mapping. As we try to build a more complex rendering engine, this approach immediately breaks because different parts of the scene require different treatment. It can be as simple as different materials or as complex as having opaque and transparent surfaces, which may require completely different rendering code paths.

One naive solution to this problem would be to physically separate the actual geometry into different buffers and use these separate datasets to render different subparts of the scene. Sounds better compared to what we have now, right? What if the scene has overlapping geometry subsets—for example, all opaque objects should be rendered in the Z-prepass while some of these opaque objects have a physically based rendering (PBR) shader and others require simple Blinn-Phong shading? Duplicating subsets of objects with specific properties from the original dataset and putting them into distinct GPU buffers would be wasteful in terms of memory. Instead, we can store all the objects with their geometry and materials in one big set of buffers and use multiple OpenGL indirect buffers that specify objects to be rendered in each and every rendering pass. Let's implement this technique and use it in subsequent OpenGL recipes of this chapter.

Getting ready

Before going forward with this recipe, make sure you check out the previous chapter and see how mesh rendering is organized there. The source code of our old OpenGL mesh renderer is in Chapter8/GLMesh8.h.

The source code for this recipe is located in the Chapter9/GLMesh9.h file.

How to do it...

Our structure describing a single draw command is contained in DrawElementsIndirectCommand, which corresponds to a similar structure from OpenGL. Let's take a look once again:

struct DrawElementsIndirectCommand {

  GLuint count_;

  GLuint instanceCount_;

  GLuint firstIndex_;

  GLuint baseVertex_;

  GLuint baseInstance_;


In the previous chapters, we had a single immutable container of commands that was immediately uploaded into a GL_DRAW_INDIRECT_BUFFER OpenGL buffer. Let's separate indirect buffers from the mesh data. Here are the steps involved:

  1. A separate class would be suitable for this task, holding a container with OpenGL draw commands as well as a buffer. The maxDrawCommands parameter defines the maximum number of commands this indirect buffer can store. It can be inferred from the total number of shapes in our scene data:

    class GLIndirectBuffer final {

      GLBuffer bufferIndirect_;


      std::vector<DrawElementsIndirectCommand>    drawCommands_;

      explicit GLIndirectBuffer(size_t maxDrawCommands)

      : bufferIndirect_(      sizeof(DrawElementsIndirectCommand) *      maxDrawCommands, nullptr,      GL_DYNAMIC_STORAGE_BIT)

      , drawCommands_(maxDrawCommands)


      GLuint getHandle() const

      { return bufferIndirect_.getHandle(); }

  2. The indirect buffer can be dynamically updated for convenience. This is handy for our central processing unit (CPU) frustum culling code implemented in the next chapter, Chapter 10, Advanced Rendering Techniques and Optimizations:

      void uploadIndirectBuffer() {

        glNamedBufferSubData(bufferIndirect_.getHandle(),      0, sizeof(DrawElementsIndirectCommand) *      drawCommands_.size(),;


  3. To simplify our work with indirect buffers, let's add one more useful operation. The selectTo() method takes another indirect buffer as an output parameter and populates it with draw commands that satisfy a predicate defined by a pred lambda. This is very handy for situations when we take one indirect buffer containing the entire scene and select draw commands that only draw meshes with specific properties, such as having transparent materials or requiring any other special handling:

      void selectTo(GLIndirectBuffer& buf,    const std::function<bool(      const DrawElementsIndirectCommand&)>& pred) {


        for (const auto& c : drawCommands_) {

          if (pred(c))






This class is very easy to use. Let's see how it can be used with our new GLMesh class, defined here in the Chapter9/GLMesh9.h file. In this chapter, we will have demos with order-independent transparency that requires separate handling of opaque and transparent meshes, and a lazy-loading demo that requires a different data type than GLSceneData from Chapter 7, Graphics Rendering Pipeline. Let's see how to implement the new mesh class for this chapter:

  1. The class is now parametrized with the scene data type. It can be the good old GLSceneData type or the new GLSceneDataLazy type, which will be discussed later in the next chapter, in the Loading texture assets asynchronously recipe. The data members contain all the necessary buffers, as in the previous chapter, Chapter 8, Image-Based Techniques, plus an instance of GLIndirectBuffer, which contains draw commands to render the entire mesh:

    template <typename GLSceneDataType>

    class GLMesh final {


      GLuint vao_;

      uint32_t numIndices_;

      GLBuffer bufferIndices_;

      GLBuffer bufferVertices_;

      GLBuffer bufferMaterials_;

      GLBuffer bufferModelMatrices_;

      GLIndirectBuffer bufferIndirect_;

      explicit GLMesh(const GLSceneDataType& data)

        : numIndices_(        data.header_.indexDataSize / sizeof(uint32_t))

        , bufferIndices_(data.header_.indexDataSize,, 0)

        , bufferVertices_(data.header_.vertexDataSize,, 0)

        , bufferMaterials_(sizeof(MaterialDescription) *        data.materials_.size(),,        GL_DYNAMIC_STORAGE_BIT)

        , bufferModelMatrices_(        sizeof(glm::mat4) * data.shapes_.size(),        nullptr, GL_DYNAMIC_STORAGE_BIT)

        , bufferIndirect_(data.shapes_.size())


        glCreateVertexArrays(1, &vao_);


          vao_, bufferIndices_.getHandle());

  2. OpenGL vertex streams' initialization is hardcoded to vec3 vertices, vec3 normals, and vec2 texture coordinates:

        glVertexArrayVertexBuffer(      vao_, 0, bufferVertices_.getHandle(), 0,      sizeof(vec3) + sizeof(vec3) + sizeof(vec2));

        // positions

        glEnableVertexArrayAttrib(vao_, 0);

        glVertexArrayAttribFormat(      vao_, 0, 3, GL_FLOAT, GL_FALSE, 0);

        glVertexArrayAttribBinding(vao_, 0, 0);

        // UVs

        glEnableVertexArrayAttrib(vao_, 1);

        glVertexArrayAttribFormat(      vao_, 1, 2, GL_FLOAT, GL_FALSE, sizeof(vec3));

        glVertexArrayAttribBinding(vao_, 1, 0);

        // normals

        glEnableVertexArrayAttrib(vao_, 2);

        glVertexArrayAttribFormat(      vao_, 2, 3, GL_FLOAT, GL_TRUE,      sizeof(vec3) + sizeof(vec2));

        glVertexArrayAttribBinding(vao_, 2, 0);

        std::vector<mat4> matrices(data.shapes_.size());

    The new constructor is virtually identical to the old one, except for one tricky bit we need to mention. In the previous chapter, our bindless rendering mechanism was built around the idea that we have flat buffers of model-to-world mat4 transformations and materials. Buffers with matrices were indexed inside OpenGL Shading Language (GLSL) shaders using the integer value of gl_InstanceID, which grows monotonically from 0 to the total number of meshes minus 1. Buffers with materials were indexed values from the gl_BaseInstance built-in variable that comes from DrawElementsIndirectCommand. It was, in turn, initialized in the GLMesh constructor, using an appropriate material index for each mesh. This worked pretty well. Now, we have a slightly different situation. Once we pick a part of the draw commands to form another GLIndirectBuffer instance, we cannot use gl_InstanceID as an index because the indices are no longer sequential. There is a very neat and simple approach to overcome this limitation.

  3. Let's split the DrawElementsIndirectCommand::

    baseInstance 32-bit member field into two 16-bit parts. One can hold the material index while the other can hold the original index of the mesh. Simple bit-shift arithmetic packs the values, and all GLSL shaders are required to unpack them:

        for (size_t i = 0; i != data.shapes_.size(); i++)


          const uint32_t meshIdx =        data.shapes_[i].meshIndex;

          const uint32_t lod = data.shapes_[i].LOD;

          bufferIndirect_.drawCommands_[i] = {        .count_ = data.meshData_.meshes_[meshIdx].          getLODIndicesCount(lod),        .instanceCount_ = 1,        .firstIndex_ = data.shapes_[i].indexOffset,        .baseVertex_ = data.shapes_[i].vertexOffset,        .baseInstance_ = data.shapes_[i].materialIndex           + (uint32_t(i) << 16)


          matrices[i] = data.scene_.globalTransform_[        data.shapes_[i].transformIndex];



        glNamedBufferSubData(      bufferModelMatrices_.getHandle(), 0,      matrices.size() * sizeof(mat4),;


  4. The materials uploading code is now implemented as a separate method:

      void updateMaterialsBuffer(    const GLSceneDataType& data)


        glNamedBufferSubData(bufferMaterials_.getHandle(),      0, sizeof(MaterialDescription) *      data.materials_.size(),;


    This is necessary to implement the Loading texture assets asynchronously recipe in the next chapter, where material data is uploaded to the GPU every time a new texture has streamed in from another thread.

  5. The mesh-rendering code is similar to the previous chapter, except that now, we can render only a part of the scene. Instead of using the GLIndirectBuffer instance from this class, we can supply our own one, along with the number of drawing commands we want to invoke:

      void draw(size_t numDrawCommands,    const GLIndirectBuffer* buffer = nullptr) const



        glBindBufferBase(GL_SHADER_STORAGE_BUFFER,      kBufferIndex_Materials,      bufferMaterials_.getHandle());

        glBindBufferBase(GL_SHADER_STORAGE_BUFFER,      kBufferIndex_ModelMatrices,      bufferModelMatrices_.getHandle());

        glBindBuffer(GL_DRAW_INDIRECT_BUFFER,      (buffer ? *buffer :      bufferIndirect_).getHandle());

        glMultiDrawElementsIndirect(      GL_TRIANGLES, GL_UNSIGNED_INT,      nullptr, (GLsizei)numDrawCommands, 0);


  6. The cleanup code is straightforward, as is explicit removal of the copy constructor of this class:

      ~GLMesh() {

        glDeleteVertexArrays(1, &vao_);


      GLMesh(const GLMesh&) = delete;

      GLMesh(GLMesh&&) = default;


We can now have a single storage system for scene data and partially render it using separate indirect buffers, which can be thought of as rendering queues. We will put this class to work in the next chapter and implement CPU frustum culling for the Bistro scene. Now, let's switch to Vulkan and see how to reorganize our mesh-rendering code there.

There's more...

Here, we used glMultiDrawElementsIndirect() with the number of draw commands supplied from the CPU side, instead of using glMultiDrawElementsIndirectCount(). This might be OK for many situations; however, if we want to implement GPU culling with indirect buffer compaction, the number of draw commands has to be fetched from a GPU buffer. We will leave this as an exercise for our readers.

Working with shape lists in Vulkan

While a scene graph is a useful conceptual representation of a three-dimensional (3D) scene, it is not entirely suitable for GPU processing. We have already dealt with the linearization of a scene graph; now, it's time to prepare another list of renderable items, which we call shapes. This term coined the Advanced Scenegraph Rendering Pipeline presentation by Markus Tavenrath and Christoph Kubisch from NVIDIA. This list only contains references to packed meshes, introduced in Chapter 5, Working with Geometry Data, and material representation, discussed in the Implementing a material system recipe from Chapter 7, Graphics Rendering Pipeline. This improvement will allow us to use our MultiRenderer class with more complex scenes.

Getting ready

The implementation of this recipe describes the MultiRenderer class, which is a modification and an upgrade of MultiMeshRenderer from Chapter 5, Working with Geometry Data. Following our framework structure described in Chapter 7, Graphics Rendering Pipeline, in the Managing Vulkan resources, Unifying descriptor-set creation routines, and Working with rendering passes recipes, we show how to combine the material data, instanced rendering, and hierarchical transforms.

How to do it...

In Chapter 5, Working with Geometry Data, all the mesh rendering was performed using indirect draw commands stored in GPU buffers. The necessary GLSL shaders for indirect rendering with per-object materials support are presented in the Implementing a material system recipe in Chapter 7, Graphics Rendering Pipeline. To separate mesh and material data handling from the Renderer interface, which was described in the Working with rendering passes recipe, we introduce the VKSceneData data container class that stores all meshes, materials, and indirect buffers. A single instance of VKSceneData can be shared between multiple renderers to simplify multipass rendering techniques described in the next chapter:

  1. The constructor of VKSceneData uses our new resources management scheme to load all the scene data. The mesh file contains vertex and index buffers for all geometry in the scene. The details of the format were described in Chapter 5, Working with Geometry Data. The input scene contains the linearized scene graph in the format of our scene-converter tool. A linear list of packed material data is stored in a separate file, which is also written by the SceneConverter tool. Environment and irradiance maps are passed from an external context because they can be shared with other renderers:

    struct VKSceneData {

      VKSceneData(VulkanRenderContext& ctx,    const char* meshFile,    const char* sceneFile,    const char* materialFile,    VulkanTexture envMap,    VulkanTexture irradienceMap)

        : ctx(ctx)

        , envMapIrradience_(irradienceMap)

        , envMap_(envMap)


  2. The bidirectional reflective distribution function (BRDF) lookup table (LUT) required for the PBR shading model is loaded first. After the LUT, we load material data and a complete list of texture files used in all materials:

        brdfLUT_ = ctx.resources.loadKTX(      "data/brdfLUT.ktx");

        std::vector<std::string> textureFiles;

        loadMaterials(      materialFile, materials_, textureFiles);

  3. Here, we might have used std::transform with the parallel execution policy to allow the multithreaded loading of texture data. This would require some locking in the VulkanResources::loadTexture2D() method and might give a considerable speed-up because the majority of the loading code is context-free and should easily run in parallel. However, we have not implemented this approach because we still have to load all the textures right here. A real-world solution would be the deferred loading and asynchronous update of textures as soon as they are loaded:

        std::vector<VulkanTexture> textures;

        for (const auto& f: textureFiles)

            textures.push_back(          ctx.resources.loadTexture2D(f.c_str()));

        allMaterialTextures =      fsTextureArrayAttachment(textures);

  4. Our material data is tightly packed, so after loading it from a file, we create a GPU storage buffer and upload the materials list without any conversions:

        const uint32_t materialsSize = (uint32_t)(      sizeof(MaterialDescription)*materials_.size());

        material_ = ctx.resources.addStorageBuffer(      materialsSize);

        uploadBufferData(ctx.vkDev, material_.memory, 0,, materialsSize);

  5. At the end of initialization, the scene graph and mesh data is loaded:




  6. The constructor depends on two private helper methods. The loadScene() method in turn uses the global scene loader from the Loading and saving a scene graph recipe from Chapter 7, Graphics Rendering Pipeline. The bulk of the method converts scene nodes with attached meshes to a list of indirect draw structures. For nodes without meshes or materials, no renderable items are generated:

    void loadScene(const char* sceneFile) {

      ::loadScene(sceneFile, scene_);

      for (const auto& c : scene_.meshes_) {

        auto material = scene_.materialForNode_.find(      c.first);

        if (material == scene_.materialForNode_.end())


  7. The shapes list is filled here just as in Chapter 5, Working with Geometry Data, but this time, we also store material indices. No LOD calculation is performed yet, so we set LOD to the 0-th level. In the Implementing transformation trees recipe from Chapter 7, Graphics Rendering Pipeline, we demonstrated how to effectively linearize hierarchical transform calculations. Essentially, the next line binds our scene node's global transform to the GPU-drawable element: a shape. Usage of transformIndex is shown in the following code snippet while discussing the convertGlobalToShapeTransforms() method implementation:

        shapes_.push_back( DrawData{      .meshIndex = c.second,       .materialIndex = material->second,      .LOD = 0,      .indexOffset = meshes_[c.second].indexOffset,      .vertexOffset = meshes_[c.second].vertexOffset,      .transformIndex = c.first     });


  8. After the shape list has been created, we allocate a GPU buffer for all global transformations and recalculate all these transformations:


      transforms_ = ctx.resources.addStorageBuffer(    shapes_.size() * sizeof(glm::mat4));




  9. The second helper method uses loadMeshData() from Chapter 5, Working with Geometry Data, to load the scene geometry. After loading, vertices and indices are uploaded into a single buffer. The actual code is slightly more involved because Vulkan requires sub-buffer offsets to be a multiple of the minimum alignment value. The omitted alignment code and vertexData padding are the same as that described in Chapter 5, Working with Geometry Data:

      void loadMeshes(const char* meshFile) {

        std::vector<uint32_t> indexData;

        std::vector<float> vertexData;

        MeshFileHeader header = loadMeshData(      meshFile, meshes_, indexData, vertexData);

        uint32_t idxSize = header.indexDataSize;

        uint32_t vtxSize = header.vertexDataSize;

        auto storage = ctx.resources.addStorageBuffer(      vtxSize + idxSize);

        uploadBufferData(ctx.vkDev, storage.memory, 0,, vtxSize);

        uploadBufferData(ctx.vkDev, storage.memory,      vtxSize,, idxSize);


Before we explain the public interface of the VKSceneData class, let's take a look at the encapsulated data. The references to GPU items, stored in VulkanResources, are cached here for quick access by external classes:

  1. Three shared textures come first, which are shared by all the rendering shapes to handle PBR lighting calculations. The list of all textures used in materials is also stored here and used externally by MultiRenderer:

      VulkanTexture envMapIrradience_, envMap_, brdfLUT_;

      TextureArrayAttachment allMaterialTextures;

  2. Shared GPU buffers representing per-object materials and node global transformations are exposed to external classes too:

      VulkanBuffer material_, transforms_;

  3. Probably the second largest GPU buffer after the array of textures is the mesh geometry buffer. References to its parts are stored here. An internal reference to the Vulkan context is at the end of the GPU-related fields:

      BufferAttachment indexBuffer_, vertexBuffer_;

      VulkanRenderContext& ctx;

To conclude the GPU part of VKSceneData, it is important to note that GPU buffer handles for node transformations and shape lists are not stored here because different renderers and additional processors may alter these buffers—for example, a frustum culler may remove some invisible shapes:

  1. Right after the GPU buffers and textures, the VKSceneData class declares local CPU-accessible scene, material, and mesh data arrays:

      Scene scene_;

      std::vector<MaterialDescription> materials_;

      std::vector<Mesh> meshes_;

  2. The final part of the VKSceneData class contains a shapes list and global transformations for each shape. This is done because the list of scene nodes does not map one-to-one shapes due to the invisibility of some nodes or due to the absence of attached node geometry:

      std::vector<glm::mat4> shapeTransforms_;

      std::vector<DrawData> shapes_;

Let's now implement a couple of methods to handle local and global transformations of scene nodes following our data declarations:

  1. This function fetches current global node transformations and assigns them to the appropriate shapes:

      void convertGlobalToShapeTransforms() {

        size_t i = 0;

        for (const auto& c : shapes_)

          shapeTransforms_[i++] =        scene_.globalTransform_[c.transformIndex];


  2. The following method recalculates all the global transformations after marking each node as changed:

      void recalculateAllTransforms() {

        markAsChanged(scene_, 0);



    The markAsChanged() function performs a recursive descent down the scene graph from the root node. Revisit the Implementing transformation trees recipe from Chapter 7, Graphics Rendering Pipeline, for its implementation details.

  3. The last method of VKSceneData is a utility function that fetches global shape transforms from the node transform list and immediately uploads these transforms to the GPU buffer:

      void uploadGlobalTransforms() {


        uploadBufferData(ctx.vkDev, transforms_.memory, 0,           ,                     transforms_.size);



We can now proceed to implement the new scene renderer with material support and the hierarchical scene graph.

With all the loading of scenes, meshes, and materials done by the VKSceneData class, it is straightforward to implement the new MultiRenderer class. Arguably, the hardest part is the initialization, which is made somewhat simpler by our new resource management:

  1. The private section of the MultiRenderer class starts with a reference to the VKSceneData object. The only GPU data we use in this class is a list of indirect drawing commands and the shapes list, one for each image in the swapchain. The last few fields contain local copies of the camera-related matrices and camera-position vector:

    class MultiRenderer: public Renderer {

      VKSceneData& sceneData_;

      std::vector<VulkanBuffer> indirect_;

      std::vector<VulkanBuffer> shape_;

      glm::mat4 proj_, model_, view_;

      glm::vec4 cameraPos_;

  2. The constructor of the class resembles all the renderers in our new framework and takes as input references to VulkanRenderContext and VKSceneData (the names of the GLSL shaders used for rendering of this scene, which, by default, are the shaders described in the Implementing a material system recipe from Chapter 7, Graphics Rendering Pipeline) and a list of output textures for offscreen rendering. Custom render passes may be required for depth-only rendering in shadow mapping or for average lighting calculations:

      MultiRenderer(    VulkanRenderContext& ctx,    VKSceneData& sceneData,    const char* vtxShaderFile =      DefaultMeshVertexShader,    const char* fragShaderFile =       DefaultMeshFragmentShader,    const std::vector<VulkanTexture>& outputs =      std::vector<VulkanTexture> {},    RenderPass screenRenderPass = { .handle =       VK_NULL_HANDLE })

      : Renderer(ctx), sceneData_(sceneData)


  3. As with all the renderers, we first initialize the rendering pass and frame buffer. The shape and indirect buffers both depend on the shape list element count, but have different item sizes:

      const PipelineInfo pInfo = initRenderPass(    PipelineInfo {}, outputs,    screenRenderPass, ctx.screenRenderPass);

      const uint32_t indirectSize =    sceneData_.shapes_.size() *    sizeof(VkDrawIndirectCommand);

      const uint32_t shapesSize =    (uint32_t)sceneData_.shapes_.size() *    sizeof(DrawData);

  4. All the containers with per-frame GPU buffers and descriptor sets are resized to match the number of images in a swapchain:

      const size_t imgCount =    ctx.vkDev.swapchainImages.size();





  5. The uniform buffer layout is somewhat hardcoded here—an interested reader may change the code to pass this value into a constructor's parameter:

      const uint32_t uniformsSize =    sizeof(mat4) + sizeof(vec4);

  6. The shaders use three predefined textures from the VKSceneData class. All the material-related textures reside in a separate texture array:

      std::vector<TextureAttachment> textures;

  7. The array is filled with push_back, but the real code also checks for textures to have non-zero width:

      textures.push_back(    fsTextureAttachment(sceneData_.envMap_));

      textures.push_back(   fsTextureAttachment(sceneData_.envMapIrradience_));

      textures.push_back(    fsTextureAttachment(sceneData_.brdfLUT_));

      DescriptorSetInfo dsInfo = { .buffers = {

          uniformBufferAttachment(VulkanBuffer {}, 0,        uniformBufferSize,        VK_SHADER_STAGE_VERTEX_BIT |        VK_SHADER_STAGE_FRAGMENT_BIT),



          storageBufferAttachment(VulkanBuffer {}, 0,        shapesSize,        VK_SHADER_STAGE_VERTEX_BIT),

          storageBufferAttachment(sceneData_.material_, 0,        sceneData_.material_.size,        VK_SHADER_STAGE_FRAGMENT_BIT),

          storageBufferAttachment(        sceneData_.transforms_, 0,        sceneData_.transforms_.size,        VK_SHADER_STAGE_VERTEX_BIT),    },

        .textures = textures,

        .textureArrays =      { sceneData_.allMaterialTextures }


  8. After allocating the descriptor-set layout and descriptor pool, we create per-frame indirect and uniform buffers:

      descriptorSetLayout_ = ctx.resources.    addDescriptorSetLayout(dsInfo);

      descriptorPool_ = ctx.resources.    addDescriptorPool(dsInfo, imgCount);

      for (size_t i = 0; i != imgCount; i++) {

        uniforms_[i] =      ctx.resources.addUniformBuffer(uniformSize);

        indirect_[i] =      ctx.resources.addIndirectBuffer(indirectSize);


        shape_[i] =      ctx.resources.addStorageBuffer(shapesSize);

        uploadBufferData(ctx.vkDev, shape_[i].memory, 0,, shapesSize);

        dsInfo.buffers[0].buffer = uniforms_[i];

        dsInfo.buffers[3].buffer = shape_[i];

        descriptorSets_[i] =      ctx.resources.addDescriptorSet(        descriptorPool_, descriptorSetLayout_);

        ctx.resources.updateDescriptorSet(      descriptorSets_[i], dsInfo);


  9. The final step in initialization is pipeline creation with the user-specified shader stages:

      initPipeline(    { vertShaderFile, fragShaderFile }, pInfo);


  10. The rendering logic in the fillCommandBuffer() method is extremely simple. Believe it or not, the entire loaded scene is rendered while a single indirect draw command is executed on the Vulkan graphics queue:

        void fillCommandBuffer(VkCommandBuffer cmdBuffer,      size_t currentImage,      VkFramebuffer fb = VK_NULL_HANDLE,      VkRenderPass rp = VK_NULL_HANDLE) override


          beginRenderPass(        (rp != VK_NULL_HANDLE) ?            rp : renderPass_.handle,        (fb != VK_NULL_HANDLE) ? fb : framebuffer_,            commandBuffer, currentImage);

          vkCmdDrawIndirect(commandBuffer,        indirect_[currentImage].buffer,        0, (uint32_t)sceneData_.shapes_.size(),        sizeof(VkDrawIndirectCommand));



  11. The updateBuffers() overridden method uploads the current camera transformation and world position:

      void updateBuffers(size_t currentImage) override {    updateUniformBuffer((uint32_t)imageIndex, 0,      sizeof(glm::mat4),      glm::value_ptr(proj_ * view_ * model_));

        updateUniformBuffer((uint32_t)imageIndex,      sizeof(glm::mat4), 4 * sizeof(float),      glm::value_ptr(cameraPos_));


The updateIndirectBuffers() method is almost identical to the same method from the MultiMeshRenderer class in Chapter 5, Working with Geometry Data. The only difference is in the usage of the VKSceneData object as a container of the shapes list. Here are the steps involved:

  1. The indirect command buffer is updated using a local memory mapping:

      void updateIndirectBuffers(    size_t currentImage,  bool* visibility = nullptr)


        VkDrawIndirectCommand* data = nullptr;

        vkMapMemory(ctx_.vkDev.device,      indirect_[currentImage].memory, 0,      sizeof(VkDrawIndirectCommand), 0,(void**)&data);

  2. Each of the shapes in a scene gets its own draw command:

        const uint32_t size =      (uint32_t)sceneData_.shapes_.size();

        for (uint32_t i = 0; i != size; i++) {

          const uint32_t j =        sceneData_.shapes_[i].meshIndex;

          const uint32_t lod = sceneData_.shapes_[i].LOD;

  3. The draw command extracts a vertex count from the LOD information of this shape. If we have CPU-generated visibility information, we may set the instance count to 0. This will be used in the next chapter to implement frustum culling on the CPU:

          data[i] = {        .vertexCount =        sceneData_.meshes_[j].getLODIndicesCount(lod),

            .instanceCount = visibility ?           (visibility[i] ? 1u : 0u) : 1u,

  4. Each rendering command here starts with a 0-th vertex. A brief discussion of how we may use individual draw commands for submeshes can be found in the Doing frustum culling on the GPU with compute shaders recipe from Chapter 10, Advanced Rendering Techniques and Optimizations. The first instance value is set to be the current shape's index, and it is handled in the GLSL shader manually:

            .firstVertex = 0,

            .firstInstance = i       };    }

        vkUnmapMemory(ctx_.vkDev.device,      indirect_[currentImage].memory);


There are a couple of helper methods in the class, and they both deal with a 3D camera:

  1. The setMatrices() method stores the camera matrices for later uploading to the GPU uniform buffer:

      void setMatrices(const glm::mat4& proj,    const glm::mat4& view, const glm::mat4& model) {

        proj_ = proj; view_ = view; model_ = model;


  2. Another one-liner function stores a local copy of the current camera position for lighting calculations:

      void setCameraPosition(const glm::vec3& cameraPos) {

        cameraPos_ = glm::vec4(cameraPos, 1.0f);



The MultiRenderer class is used in all of the demo applications for this chapter. Chapter 8, Image-Based Techniques, explains how the MultiRenderer class fits into the postprocessing pipeline, and in the next chapter, Chapter 10, Advanced Rendering Techniques and Optimizations, we will see how to optimize indirect rendering using CPU and GPU frustum culling.

Adding Bullet physics to a graphics application

Before we conclude the business of this chapter, let's touch on one more topic, which is not a 3D rendering matter but links very closely to our 3D scene implementation. This recipe shows how to animate individual visual objects in our scene graph by using a rigid-body physics simulation library, Bullet (

Getting ready

To compile the Bullet library, we use a custom CMakeLists.txt file, so it is useful to recall the general CMake workflow for third-party libraries covered in Chapter 2, Using Essential Libraries.

The demo application for this recipe can be found in the Chapter9/VK01_Physics folder.

How to do it...

  1. To simulate a collection of rigid bodies, we implement the Physics class, which calls the appropriate libBullet methods to create, manage, and update physical objects:

    struct Physics {


      : collisionDispatcher(&collisionConfiguration)

      , dynamicsWorld(&collisionDispatcher, &broadphase,      &solver, &collisionConfiguration)


        dynamicsWorld.setGravity(      btVector3( 0.0f, -9.8f, 0.0f ) );

        // add "floor" object - large massless box

        addBox( vec3(100.f, 0.05f, 100.f),      btQuaternion(0,0,0,1), vec3(0,0,0), 0.0f);


      void addBox( const vec3& halfSize,    const btQuaternion& orientation,    const vec3& position, float mass);

      void update(float deltaSeconds);

      std::vector<mat4> boxTransform;


      std::vector<std::uninque_ptr<btRigidBody>>    rigidBodies;

      btDefaultCollisionConfiguration     collisionConfiguration;

      btCollisionDispatcher collisionDispatcher;

      btDbvtBroadphase broadphase;

      btSequentialImpulseConstraintSolver solver;

      btDiscreteDynamicsWorld dynamicsWorld;


    The synchronization point with our rendering framework is the update() method.

  2. First, we call the stepSimulation() method from the Bullet application programming interface (API), which calculates new positions and orientations for each rigid body participating in the simulation:

    void Physics::update(float deltaSeconds)


      dynamicsWorld.stepSimulation(    deltaSeconds, 10, 0.01f);

  3. The synchronization itself consists of fetching the transformation for each active body and storing that transformation in glm::mat4 format in the boxTransform array. Our rendering application subsequently fetches that array and uploads it into a GPU buffer for rendering:

      for (size_t i = 0; i != rigidBodies.size(); i++) {

        if (!rigidBodies[i]->isActive()) continue;

        btTransform trans;

        rigidBodies[i]->      getMotionState()->getWorldTransform(trans);

        trans.getOpenGLMatrix(      glm::value_ptr(boxTransform[i]));



  4. A helper routine converts between Bullet and glm 3D vector representations:

    static btVector3 Vec3ToBulletVec3( const vec3& v ) {

      return btVector3( v.x, v.y, v.z );


  5. The creation of a single solid box object proceeds as follows. Refer to the Bullet documentation for the details of this, as we are focusing on the rendering part only:

    void Physics::addBox( const glm::vec3& halfSize,  const btQuaternion& orientation,  const vec3& position,  float mass)




      btCollisionShape* collisionShape =    new btBoxShape( Vec3ToBulletVec3(halfSize) );

      btDefaultMotionState* motionState =    new btDefaultMotionState(      btTransform(orientation,      Vec3ToBulletVec3(position)) );

      btVector3 localInertia(0, 0, 0);

      collisionShape->calculateLocalInertia(    mass, localInertia );

      btRigidBody::btRigidBodyConstructionInfo     rigidBodyCI(mass, motionState,      collisionShape, localInertia);

      rigidBodyCI.m_friction = 0.1f;

      rigidBodyCI.m_rollingFriction = 0.1f;

  6. A new btRigidBody object is created and stored in the rigidBodies array for later use. We register the rigid body in the dynamicsWorld simulation object:

      rigidBodies.emplace_back(    std::make_unique<btRigidBody>(rigidBodyCI));

      dynamicsWorld.addRigidBody(    rigidBodies.back().get());


    To render rigid-body objects simulated with the Physics class, we convert global transformations from the Physics::boxTransform array into individual node transformations.

Let's look into how the Physics class can be used together with our Vulkan scene-rendering code:

  1. The application class uses MultiRenderer to draw a scene:

    struct MyApp: public CameraApp {


      : CameraApp(-90, -90)

      , plane(ctx)

      , sceneData(ctx, "data/meshes/cube.meshes",      "data/meshes/cube.scene",      "data/meshes/cube.material", {}, {})

      , multiRenderer(ctx, sceneData,      "data/shaders/chapter09/VK01_Simple.vert",      "data/shaders/chapter09/VK01_Simple.frag")

      , imgui(ctx) {

        onScreenRenderers.emplace_back(plane, false);


        onScreenRenderers.emplace_back(imgui, false);


  2. The drawUI() method provides a way to add more objects to the scene:

      void drawUI() override {

        ImGui::Begin("Settings", nullptr);

        ImGui::Text("FPS: %.2f", getFPS());

        if (ImGui::Button("Add body"))

          if (physics.boxSizes.size() < maxCubes)




  3. The draw3D() method overrides global transformations for simulated nodes. Since, in this demo, all the objects are simulated, we explicitly update each node's global transform:

      void draw3D() override {

        const mat4 p = getDefaultProjection();

        const mat4 view = camera.getViewMatrix();

        multiRenderer.setMatrices(      p, view, glm::mat4(1.f));

        multiRenderer.setCameraPosition(      positioner.getPosition());

        plane.setMatrices(p, view, glm::mat4(1.f));

        sceneData.scene_.globalTransform_[0] =      glm::mat4(1.f);

        for (size_t i = 1;         i < physics.boxSizes.size(); i++) {

          sceneData.scene_.globalTransform_[i] =        physics.boxTransform[i] * glm::scale(          glm::mat4(1.f), physics.boxSizes[i]);


        for (size_t i = physics.boxSizes.size();         i <= maxCubes; i++) {

          sceneData.scene_.globalTransform_[i] =        glm::mat4(1.f);



  4. The update() method allows us to call our physics simulator:

      void update(float deltaSeconds) override {





  5. The private data contains all the renderers and a Physics instance:


      InfinitePlaneRenderer plane;

      VKSceneData sceneData;


      GuiRenderer imgui;



    The running physics demo should render an image like this:

Figure 9.2 – Physics simulation using Bullet

Figure 9.2 – Physics simulation using Bullet

Other components can be added to this scene-graph implementation using the described approach, making it extensible enough to cover many real-world situations.

There's more...

Since Bullet provides a set of powerful collision-detection routines, a minor extension to the code may add accelerated mouse-pointer object picking for interactive editing or manipulation of objects in a 3D scene.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.