Chapter 5: Working with Geometry Data

Previously, we tried different ad hoc approaches to store and handle 3D geometry data in our graphical applications. The mesh data layout for vertex and index buffers was hardcoded into each of our demo apps. By doing so, it was easier to focus on other important parts of the graphics pipeline. As we move into the territory of more complex graphics applications, we will require additional control over the storage of different 3D meshes within system memory and GPU buffers. However, our focus remains on guiding you through the main principles and practices rather than on pure efficiency.

In this chapter, you will learn how to store and handle mesh geometry data in a more organized way. We will cover the following recipes:

  • Organizing the storage of mesh data
  • Implementing a geometry conversion tool
  • Indirect rendering in Vulkan
  • Implementing an infinite grid GLSL shader
  • Rendering multiple meshes with OpenGL
  • Generating Levels of Detail (LODs) using MeshOptimizer
  • Integrating tessellation into the OpenGL graphics pipeline

Technical requirements

Here is what it takes to run the code from this chapter on your Linux or Windows PC. You will need a GPU with recent drivers supporting OpenGL 4.6 and Vulkan 1.1. The source code can be downloaded from

To run the demo applications of this chapter, you are advised to download and unpack the entire Amazon Lumberyard Bistro dataset from the McGuire Computer Graphics Archive. You can find this at Of course, you can use smaller meshes if you cannot download the 2.4 GB package.

Organizing the storage of mesh data

In Chapter 3, Getting Started with OpenGL and Vulkan and Chapter 4, Adding User Interaction and Productivity Tools, we used fixed formats for our meshes, which changed between demos and also implicitly included a description of the material; for example, a hardcoded texture was used to provide color information. Let's define a unified mesh storage format that covers all use cases for the remainder of this book.

A triangle mesh is defined by indices and vertices. Each vertex is defined as a set of floating-point attributes. All of the auxiliary physical properties of an object, such as collision detection data, mass, and moments of inertia, can be represented by a mesh. In comparison, other information, such as surface material properties, can be stored outside of the mesh as external metadata.

Getting ready

This recipe describes the basic data structures that we will use to store mesh data for the remainder of this book. The full corresponding source code is located in the shared/scene/VtxData.h header.

How to do it...

A vector of homogenous vertex attributes stored contiguously is called a vertex stream. Examples of such attributes include vertex positions, texture coordinates, and normal vectors, with each of the three representing one attribute. Each attribute can consist of one or multiple floating-point components. Vertex positions have three components, texture coordinates usually have two components, and so on.

LOD is an index buffer of reduced size that uses existing vertices and, therefore, can be used directly for rendering with the original vertex buffer.

We define a mesh as a collection of all vertex data streams and a collection of all index buffers – one for each LOD. The length of all vertex data streams is the same and is called the "vertex count." Put simply, we always use 32-bit offsets for our data.

All of the vertex data streams and LOD index buffers are packed into a single blob. This allows us to load data in a single fread() call or even use memory mapping to allow direct data access. This simple vertex data representation also enables us to directly upload the mesh to a GPU. The most interesting aspect is the ability to combine the data for multiple meshes in a single file (or, equivalently, into two large buffers – one for indices and the other for vertex attributes). This will come in very handy later when we learn how to implement a LOD switching technique on GPU.

In this recipe, we will only deal with geometrical data. The LOD creation process is covered in the Generating LODs using MeshOptimizer recipe, and the material data export process is covered in subsequent chapters. Let's get started by declaring the main data structure for our mesh:

  1. First, we need two constants to define the limits on how many LODs and vertex streams we can have in a single mesh:

    constexpr const uint32_t kMaxLODs    = 8;

    constexpr const uint32_t kMaxStreams = 8;

  2. Next, we define an individual mesh description. We deliberately avoid using pointers that hide memory allocations and prohibit the simple saving and loading of data. We store offsets to individual data streams and LOD index buffers. They are equivalent to pointers but are more flexible and, most importantly, GPU-friendlier. All the offsets in the Mesh structure are given relative to the beginning of the data block.
  3. Let's declare our main data structure for the mesh. It contains the number of LODs and vertex data streams. The LOD count, where the original mesh counts as one of the LODs, must be strictly less than kMaxLODs. This is because we do not store LOD index buffer sizes but calculate them from offsets. To calculate these sizes, we store one additional empty LOD level at the end. The number of vertex data streams is stored directly with no modifications:

    struct Mesh final {

      uint32_t lodCount;

      uint32_t streamCount;

  4. We will postpone the question of material data storage for the Chapter 7, Graphics Rendering Pipeline. To do this elegantly, let's introduce a level of indirection. The materialID field contains an abstract identifier that allows us to reference any material data that is stored elsewhere:

      uint32_t materialID;

  5. The size of the mesh can be used as a simple substitute for a checksum to control that nothing has been lost on the way without checking whether the mesh data is, in fact, intact. The meshSize field must be equal to the sum of all LOD index array sizes and the sum of all individuals stream sizes. The vertexCount field contains the total number of vertices in this mesh. This number can be greater than the number of vertices on any individual LOD:

      uint32_t meshSize;

      uint32_t vertexCount;

  6. Each mesh can potentially be displayed at different LODs. The file contains all the indices for all the LODs, and offsets to the beginning of each LOD are stored in the lodOffset array. This array contains one extra item at the end, which serves as a marker to calculate the size of the last LOD:

      uint32_t lodOffset[kMaxLODs];

  7. Instead of storing the sizes of each LOD, we define a little helper function to calculate their sizes:

      inline uint64_t lodSize(uint32_t lod) {

        return lodOffset[lod+1] - lodOffset[lod];


  8. Just as the lodOffset field contains offsets inside the index buffer where each LOD starts, the streamOffset field stores offsets to all of the individual vertex data streams. Next, we need to specify how each data stream is used. Usage semantics is defined by the stream element size. For example, the vertex-only stream has an element size, which is counted in floats, of 3. The stream with vertices and texture coordinates has an element size of 6, and so on. In the demo from the Rendering multiple meshes with OpenGL recipe, we use a vertex-only format, which sets streamElementSize to 3.

    Important note

    Besides the element size, we might want to store the element type, such as byte, short integer, or float. This information is important for performance reasons in real-world applications. To simplify the code in this book, we will not do it here.

      uint64_t streamOffset[kMaxStreams];

      uint32_t streamElementSize[kMaxStreams];



    For this book, we assume tightly-packed (Interleaved) vertex attribute streams only. However, it is not difficult to extend the proposed schema to support non-interleaved data storage. One major drawback is that such data reorganization would require us to change all the vertex-pulling code of the vertex shaders. If you are developing production code, measure which storage format works faster on your target hardware before committing to one particular approach.

Our mesh data file begins with a simple header to allow for the rapid fetching of the mesh list. Let's take a look at how it is declared:

  1. To ensure data integrity and to check the validity of the header, a magic hexadecimal value of 0x12345678 is stored in the first 4 bytes of the header:

    struct MeshFileHeader {

      uint32_t magicValue;

  2. The number of different meshes in this file is stored in the meshCount field:

      uint32_t meshCount;

  3. For convenience, we store an offset to the beginning of the mesh data:

      uint32_t dataBlockStartOffset;

  4. The last two member fields store the sizes of index and vertex data in bytes, respectively. These values come in handy when you are checking the integrity of a mesh file:

      uint32_t indexDataSize;

      uint32_t vertexDataSize;


The file continues with the list of Mesh structures. After the header and a list of individual mesh descriptors, we store a large index and vertex data block that can be loaded all at once.

How it works...

Let's go through all of the remaining data structures that are required to store our meshes. To use a mesh file in a rendering application, we need to have an array of mesh descriptions and two arrays with index and vertex data:

  std::vector<Mesh> meshes;

  std::vector<uint8_t> indexData;

  std::vector<uint8_t> vertexData;

The pseudocode for loading such a file is just four fread() calls. They appear as follows:

  1. First, we read the file header with the mesh count. In this book, error checks have been skipped, but they are present in the bundled source code:

      FILE *f = fopen("data/meshes/test.meshes", "rb");

      MeshFileHeader header;

      fread(&header, 1, sizeof(header), f);

  2. Having read the header, we resize the mesh descriptors array and read in all the Mesh descriptions:

    fread(, header.meshCount, sizeof(Mesh), f);

  3. Then, we read the main geometry data blocks for this mesh, which contain the actual index and vertex data:



      fread(, 1, header.indexDataSize, f);

      fread(, 1, header.vertexDataSize, f);

Alternatively, index and vertex buffers can be combined into a single large byte buffer. We will leave it as an exercise for the reader.

Later, the indexData and vertexData containers can be uploaded into the GPU directly and accessed as data buffers from shaders to implement programmable vertex pulling, as described in Chapter 2, Using Essential Libraries. We will return to this in later recipes.

There's more...

This geometry data format is pretty straightforward for the purpose of storing static mesh data. If the meshes can be changed, reloaded, or loaded asynchronously, we can store separate meshes into dedicated files.

Since it is impossible to predict all use cases, and since this book is all about rendering and not some general-purpose gaming engine creation, it is up to the reader to make decisions about adding extra features such as mesh skinning. One simple example of such a decision is the addition of material data directly inside the mesh file. Technically, all we need to do is add a materialCount field to the MeshFileHeader structure and store a list of material descriptions right after the list of meshes. Even doing such a simple thing immediately raises more questions. Should we pack texture data in the same file? If yes, then how complex should the texture format be? What material model should we use? And so forth. For now, we will just leave the mesh geometry data separated from the material descriptions. We will come back to materials in the Chapter 7, Graphics Rendering Pipeline.

Implementing a geometry conversion tool

In the previous chapters, we learned how to use the Assimp library to load and render 3D models stored in different file formats. In real-world graphics applications, the loading of a 3D model can be a tedious and multistage process. Besides just loading, we might want to preprocess a mesh in a specific way, such as optimizing geometry data or computing LODs for meshes. This process might become slow for sizable meshes, so it makes perfect sense to preprocess meshes offline, before an application starts, and load them later in the app, as described in the Organizing the storage of mesh data recipe. Let's learn how to implement a skeleton for a simple offline mesh conversion tool.

Getting ready

The source code for the geometry conversion tool described in this chapter can be found in the Chapter5/MeshConvert folder. The entire project is covered in several recipes, including Implementing a geometry conversion tool and Generating LODs using MeshOptimizer.

How to do it...

Let's examine how the Assimp library is used to export mesh data and save it inside a binary file using the data structures defined in the Organizing the storage of mesh data recipe:

  1. We start by including some mandatory header files:

    #include <vector>

    #include <assimp/scene.h>

    #include <assimp/postprocess.h>

    #include <assimp/cimport.h>

    #include "shared/VtxData.h"

  2. A global Boolean flag determines whether we should output textual messages during the conversion process. This comes in handy for debugging purposes:

    bool verbose = true;

  3. The actual mesh descriptions and mesh geometry data are stored in the following three arrays. We cannot output converted meshes one by one, at least not in a single-pass tool, because we do not know the total size of the data in advance. So, we allocate in-memory storage for all the data and then write these data blobs into the output file:

    std::vector<Mesh> meshes;

    std::vector<uint32_t> indexData;

    std::vector<float> vertexData;

  4. To fill the indexData and vertexData fields, we require two counters to track offsets of index and vertex mesh data inside the file. Two flags control whether we need to export texture coordinates and normal vectors:

    uint32_t indexOffset = 0;

    uint32_t vertexOffset = 0;

    bool exportTextures = false;

    bool exportNormals  = false;

  5. By default, we only export vertex coordinates into the output file; therefore, 3 elements are specified here. To override this, we parse command-line arguments and check whether texture coordinates or normal vectors are also needed:

    uint32_t numElementsToStore = 3;

The main mesh conversion logic of this tool is implemented in the convertAIMesh() function, which takes in an Assimp mesh and converts it into our mesh representation. Let's take a look at how it is implemented:

  1. First, we check whether a set of texture coordinates is present in the original Assimp mesh:

    Mesh convertAIMesh(const aiMesh* m)


      const bool hasTexCoords = m->HasTextureCoords(0);

  2. For this recipe, we assume there is a single LOD and all the vertex data is stored as a continuous data stream. In other words, we have data stored in an interleaved manner. Also, for now, we ignore all the material information and deal exclusively with the index and vertex data:

      const uint32_t numIndices = m->mNumFaces * 3;

      const uint32_t numElements = numElementsToStore;

  3. The size of the stream element in bytes is directly calculated from the number of elements per vertex. Earlier, we agreed to store each component as a floating-point value, so no branching logic is required to do that:

      const uint32_t streamElementSize =    static_cast<uint32_t>(      numElements * sizeof(float));

  4. The total data size for this mesh is the size of the vertex stream plus the size of the index data:

      const uint32_t meshSize = static_cast<uint32_t>(    m->mNumVertices * streamElementSize +    numIndices * sizeof(uint32_t) );

  5. The mesh descriptor for the aiMesh input object here has its lodCount and streamCount fields set to 1:

      const Mesh result = {

        .lodCount     = 1,

        .streamCount  = 1,

  6. Since we are not yet exporting materials, we set materialID to the default value of zero:

        .materialID   = 0,

  7. The mesh data size and the total vertex count for this mesh are also stored in the mesh description:

        .meshSize     = meshSize,

        .vertexCount  = m->mNumVertices,

  8. Since we have only one LOD, the lodOffset array contains two items. The first one stores the indexOffset counter multiplied by a single index size to determine the byte offset in the indexData array. The second element contains the last item of the indexData, which is used by this mesh:

        .lodOffset = { indexOffset * sizeof(uint32_t),                  (indexOffset + numIndices) *                   sizeof(uint32_t) },

        .streamOffset =      { vertexOffset * streamElementSize },

        .streamElementSize = { streamElementSize }


  9. For each of the vertices, we extract their data from the aiMesh object and always store vertex coordinates in the vertexData output stream:

      for (size_t i = 0; i != m->mNumVertices; i++) {

        const aiVector3D& v = m->mVertices[i];

        const aiVector3D& n = m->mNormals[i];

        const aiVector3D& t = hasTexCoords ?       m->mTextureCoords[0][i] : aiVector3D();




  10. If the export of texture coordinates or normal vectors is required, we append them to the vertex stream:

        if (exportTextures) {




        if (exportNormals) {






  11. The vertexOffset variable contains the starting vertex index for the current mesh. We add the vertexOffset value to each of the indices imported from the input file:

      for (size_t i = 0; i != m->mNumFaces; i++) {

        const aiFace& F = m->mFaces[i];

        indexData.push_back(F.mIndices[0] + vertexOffset);

        indexData.push_back(F.mIndices[1] + vertexOffset);

        indexData.push_back(F.mIndices[2] + vertexOffset);


  12. After processing the input mesh, we increment offset counters for the indices and current starting vertex:

      indexOffset  += numIndices;

      vertexOffset += m->mNumVertices;

      return result;


Processing the file comprises loading the scene and converting each mesh into an internal format. Let's take a look at the loadFile() function to learn how to do it:

  1. The list of flags for the aiImportFile() function includes options that allow further usage of imported data without any processing. For example, all the transformation hierarchies are flattened and the resulting transformation matrices are applied to mesh vertices:

    bool loadFile(const char* fileName) {

      if (verbose) printf("Loading '%s'... ", fileName);

      const unsigned int flags =    | aiProcess_JoinIdenticalVertices     | aiProcess_Triangulate     | aiProcess_GenSmoothNormals     | aiProcess_PreTransformVertices     | aiProcess_RemoveRedundantMaterials     | aiProcess_FindDegenerates     | aiProcess_FindInvalidData     | aiProcess_FindInstances     | aiProcess_OptimizeMeshes;

  2. Just as we did in the previous chapters, we use aiImportFile() to load all the mesh data from a file:

      const aiScene* scene =    aiImportFile(fileName, flags);

      if (!scene || !scene->HasMeshes()) {

        printf("Unable to load '%s' ", fileName);

        return false;


  3. After importing the scene, we resize the mesh descriptor container accordingly and call convertAIMesh() for each mesh in the scene:


      for (size_t i = 0; i != scene->mNumMeshes; i++)

        meshes.push_back(      convertAIMesh(scene->mMeshes[i]));

      return true;


Saving converted meshes inside our file format is the reverse process of reading meshes from the file described in the Organizing the storage of mesh data recipe:

  1. First, we fill the file header structure using the mesh number and offsets:

    inline void saveMeshesToFile(FILE* f) {

      const MeshFileHeader header = {    .magicValue = 0x12345678,    .meshCount = (uint32_t)meshes.size(),    .dataBlockStartOffset =      (uint32_t)(sizeof(MeshFileHeader) +      meshes.size()*sizeof(Mesh)),

  2. We calculate the byte sizes of the index and vertex data buffers:

        .indexDataSize =      indexData.size() * sizeof(uint32_t),    .vertexDataSize =       vertexData.size() * sizeof(float)  };

  3. Once all the sizes are known, we save the header and the list of mesh descriptions:

       fwrite(&header, 1, sizeof(header), f);

       fwrite(, header.meshCount, sizeof(Mesh), f);

  4. After the header and descriptors, two blocks with index and vertex data are stored:

       fwrite(, 1, header.indexDataSize, f);

       fwrite(, 1, header.vertexDataSize, f);


Let's put all of this code into a functioning mesh converter app:

  1. The converter's main function checks for command-line parameters and calls the loadFile() and saveMeshesToFile() functions. Command-line arguments determine whether we should export texture coordinates and normal vectors:

    int main(int argc, char** argv) {

      bool exportTextures = false;

      bool exportNormals  = false;

  2. The first and second arguments must contain input and output filenames. If the argument count is low, a short instruction is printed:

      if (argc < 3) {

        printf("Usage: meshconvert <input> <output>       --export-texcoords | -t]      [--export-normals | -n] ");

        printf("Options: ");

        printf(" --export-texcoords | -t: export texture       coordinates ");

        printf(" --export-normals   | -n: export       normals ");



  3. The remaining optional command-line arguments specify whether we want to export texture coordinates and normal vectors:


    This sort of manual command-line parsing is tedious and error-prone. It is used for simplicity in this book. In real-world applications, normally, you would use a command-line parsing library. We recommend that you try Argh! from

      for (int i = 3 ; i < argc ; i++) {

        exportTextures |=      !strcmp(argv[i], "--export-texcoords") ||      !strcmp(argv[i], "-t");

        exportNormals |=      !strcmp(argv[i], "--export-normals") ||      !strcmp(argv[i], "-n");

        const bool exportAll =      !strcmp(argv[i], "-tn") ||      !strcmp(argv[i], "-nt");

        exportTextures |= exportAll;

        exportNormals |= exportAll;


  4. Once we know the export parameters, we calculate the amount of per-vertex data counted in floating-point numbers. For example, a mesh with vertex coordinates only stores three floating-point numbers per vertex. Here, we use hardcoded values for simplicity:

      if (exportTextures) numElementsToStore += 2;

      if (exportNormals ) numElementsToStore += 3;

  5. Having determined all the export parameters, we call the loadFile() function to load the scene with all of the meshes:

      if ( !loadFile(argv[1]) ) exit(255);

After loading and converting all of the meshes, we save the output file:

  FILE *f = fopen(argv[2], "wb");



  return 0;


How it works...

To use the mesh conversion tool, let's invoke it to convert one of the Lumberyard Bistro meshes into our mesh format. That can be done with the following command:

Ch5_Tool05_MeshConvert_Release   exterior.obj exterior.mesh -tn

The output mesh is saved inside the exterior.mesh file. Let's go through the rest of this chapter to learn how to render this mesh with Vulkan.

There's more...

The complete source code of the converter can be found in the Chapter5/MeshConvert folder. The final version of the tool contains LOD-generation functionality, which will be discussed later in the Generating LODs using MeshOptimizer recipe.

Indirect rendering in Vulkan

Indirect rendering is the process of issuing drawing commands to the graphics API, where most of the parameters to those commands come from GPU buffers. It is a part of many modern GPU usage paradigms, and it exists in all contemporary rendering APIs in some form. For example, we can do indirect rendering with OpenGL using the glDraw*Indirect*() family of functions. Instead of dealing with OpenGL here, let's get more technical and learn how to combine indirect rendering in Vulkan with the mesh data format that we introduced in the Organizing the storage of mesh data recipe.

Getting ready

Once we have defined the mesh data structures, we also need to render them. To do this, we allocate GPU buffers for the vertex and index data using the previously described functions, upload all the data to GPU, and, finally, fill the command buffers to render these buffers at each frame.

The whole point of the previously defined Mesh data structure is the ability to render multiple meshes in a single Vulkan command. Since version 1.0 of the API, Vulkan supports the technique of indirect rendering. This means we do not need to issue the vkCmdDraw() command for each and every mesh. Instead, we create a GPU buffer and fill it with an array of VkDrawIndirectCommand structures, fill these structures with appropriate offsets into our index and vertex data buffers, and, finally, emit a single vkCmdDrawIndirect() call.

How to do it...

Before we proceed with rendering, let's introduce a data structure to represent an individual mesh instance in our 3D world. We will use it to specify which meshes we want to render, how to transform them, and which material and LOD level should be used:

struct InstanceData {

  float    transform[16];

  uint32_t meshIndex;

  uint32_t materialIndex;

  uint32_t LOD;

  uint32_t indexOffset;


As mentioned in the previous Chapter 4, Adding User Interaction and Productivity Tools, we implement another layer for our frame composition system:

  1. The MultiMeshRenderer class constructor takes in the names of the shader files to render the meshes and filenames with input data. To render multiple meshes, we need the instance data for each mesh, the material description, and the mesh geometry itself:

    class MultiMeshRenderer: public RendererBase {


      MultiMeshRenderer(    VulkanRenderDevice& vkDev,    const char* meshFile,    const char* instanceFile,    const char* materialFile,    const char* vtxShaderFile,    const char* fragShaderFile);

  2. We'll return to the constructor later. For now, let's take a look at the private data part. Here, we have containers to store all of the loaded data:


      std::vector<InstanceData> instances;

      std::vector<Mesh> meshes;

      std::vector<uint32_t> indexData;

      std::vector<float> vertexData;

  3. The reference to a Vulkan render device is used all over the code:

      VulkanRenderDevice& vkDev;

  4. This renderer contains all the index and vertex data in a single, large GPU buffer. Its contents are loaded from the indexData and vertexData containers mentioned earlier:

      VkBuffer storageBuffer_;

      VkDeviceMemory storageBufferMemory_;

  5. Some buffer sizes are used multiple times after allocation, so they are cached in the following variables:

      uint32_t maxVertexBufferSize_, maxIndexBufferSize_;

      uint32_t maxInstances_;

      uint32_t maxInstanceSize_, maxMaterialSize_;

  6. In this recipe, we do not use any material data, but we declare an empty GPU buffer for it to be used later:

      VkBuffer materialBuffer_;

      VkDeviceMemory materialBufferMemory_;

  7. For each of the swapchain images, we declare a copy of indirect rendering data. Additionally, we declare buffers for the instance data:

      std::vector<VkBuffer> indirectBuffers_;

      std::vector<VkDeviceMemory> indirectBuffersMemory_;

      std::vector<VkBuffer> instanceBuffers_;

      std::vector<VkDeviceMemory> instanceBuffersMemory_;

  8. The routine to create the descriptor set is identical to the routines described in the previous examples. Hopefully, its content can be easily derived from the vertex shader source found at the end of this recipe or in data/shaders/chapter05/VK01.vert:

      bool createDescriptorSet(VulkanRenderDevice& vkDev);

  9. The routines to update the uniform and instance buffers simply redirect to the uploadBufferData() function:

      void updateUniformBuffer(VulkanRenderDevice& vkDev,    size_t currentImage, const mat4&m)


        uploadBufferData(vkDev,      uniformBuffersMemory_[currentImage], 0,      glm::value_ptr(m), sizeof(mat4));


      void updateInstanceBuffer(VulkanRenderDevice& vkDev,    size_t currentImage,    uint32_t instanceSize, const void* instanceData)


        uploadBufferData(vkDev,      instanceBuffersMemory_[currentImage], 0,      instanceData, instanceSize);


  10. The geometry data is uploaded into the GPU in two parts. Technically, this can be simplified to a single upload operation if the indices and vertices are stored in a single contiguous buffer:

      void updateGeometryBuffers(    VulkanRenderDevice& vkDev,    uint32_t vertexCount,    uint32_t indexCount,    const void* vertices,    const void* indices)


        uploadBufferData(vkDev, storageBufferMemory_, 0,      vertices, vertexCount);

        uploadBufferData(vkDev, storageBufferMemory_,      maxVertexBufferSize_, indices, indexCount);


  11. A buffer required for indirect rendering is filled according to the loaded instances list. After mapping a region of GPU-visible memory into a CPU-accessible pointer, we iterate the instance array. For each loaded mesh instance, we fetch the vertex count and vertex offset within the data buffer:

      void updateIndirectBuffers(    VulkanRenderDevice& vkDev,    size_t currentImage)


        VkDrawIndirectCommand* data = nullptr;

        vkMapMemory(vkDev.device,      indirectBuffersMemory_[currentImage], 0,      2 * sizeof(VkDrawIndirectCommand), 0,      (void **)&data);

        for (uint32_t i = 0 ; i < maxInstances_ ; i++) {

          const uint32_t j = instances[i].meshIndex;

          data[i] = {        .vertexCount = static_cast<uint32_t>(          meshes[j].lodSize(            instances[i].LOD) / sizeof(uint32_t)),

  12. For this recipe, the instance count is one. However, if we make a few necessary modifications to the instance buffer layout with transformation matrices, we can render multiple instances of a single mesh at once:

            .instanceCount = 1,

  13. The vertex offset has to be recalculated into a float value count. As we have one instance, we just set the first and only firstInstance value to i:

            .firstVertex = static_cast<uint32_t>(          meshes[j].streamOffset[0] /          meshes[j].streamElementSize[0]),        .firstInstance = i       };


        vkUnmapMemory(vkDev.device,      indirectBuffersMemory_[currentImage]);


  14. The code to fill the command buffer is extremely simple. Once we have the indirect rendering data inside a GPU buffer, we call the vkCmdDrawIndirect() function:

      virtual void fillCommandBuffer(    VkCommandBuffer commandBuffer,    size_t currentImage) override


        beginRenderPass(commandBuffer, currentImage);

        vkCmdDrawIndirect(commandBuffer,      indirectBuffers_[currentImage],      0, maxInstances_,      sizeof(VkDrawIndirectCommand));



  15. The destructor deallocates all GPU buffers and Vulkan objects:

      virtual ~MultiMeshRenderer() {

        VkDevice device = vkDev.device;

        vkDestroyBuffer(device, storageBuffer_, nullptr);

        vkFreeMemory(      device, storageBufferMemory_, nullptr);

        for (size_t i = 0;         i < swapchainFramebuffers_.size(); i++)


          vkDestroyBuffer(        device, instanceBuffers_[i], nullptr);

          vkFreeMemory(        device, instanceBuffersMemory_[i], nullptr);

          vkDestroyBuffer(        device, indirectBuffers_[i], nullptr);

          vkFreeMemory(        device, indirectBuffersMemory_[i], nullptr);


        vkDestroyBuffer(device, materialBuffer_, nullptr);

        vkFreeMemory(      device, materialBufferMemory_, nullptr);

        destroyVulkanImage(device, depthTexture_);


The longest part of the code is the constructor. To describe the initialization process, we need to define two helper functions:

  1. The first one loads a list of transformations for each mesh:

    void MultiMeshRenderer::loadInstanceData(  const char* instanceFile)


      FILE* f = fopen(instanceFile, "rb");

      fseek(f, 0, SEEK_END);

      size_t fsize = ftell(f);

      fseek(f, 0, SEEK_SET);

    After determining the size of the input file, we should calculate the number of instances in this file:

      maxInstances_ = static_cast<uint32_t>(    fsize / sizeof(InstanceData));


    A single fread() call gets the instance data loading job done:

      if (fread(, sizeof(InstanceData),        maxInstances_, f) != maxInstances_)


        printf("Unable to read instance data ");





  2. The second helper function loads the mesh data:

    MeshFileHeader MultiMeshRenderer::loadMeshData(  const char* meshFile)


      MeshFileHeader header;

      FILE* f = fopen(meshFile, "rb");

    The loading process is the same as in the pseudocode from the Implementing a geometry conversion tool recipe:

      if (fread(&header, 1, sizeof(header), f)        != sizeof(header)) {

        printf("Unable to read mesh file header ");




    After reading the file header, we read individual mesh descriptions:

      if (fread(, sizeof(Mesh),        header.meshCount, f) != header.meshCount) {

        printf("Could not read mesh descriptors ");



    Two more fread() calls read the mesh indices and vertex data:

      indexData.resize(    header.indexDataSize / sizeof(uint32_t));

      vertexData.resize(    header.vertexDataSize / sizeof(float));

      if ((fread(, 1,header.indexDataSize,        f) != header.indexDataSize) ||

         (fread(,1,header.vertexDataSize,        f) != header.vertexDataSize))


        printf("Unable to read index/vertex data ");




    To ensure the correct initialization within the constructor, we return the initialized MeshFileHeader object:

      return header;


We are ready to describe the initialization procedure of the MultiMeshRenderer class:

  1. Let's take a look at the constructor again:

    MultiMeshRenderer::MultiMeshRenderer(  VulkanRenderDevice& vkDev,  const char* meshFile,  const char* instanceFile,  const char* materialFile,  const char* vtxShaderFile,  const char* fragShaderFile)

    : vkDev(vkDev), RendererBase(vkDev, VulkanImage())


    In the same way as our other renderers, we create a render pass object:

      if (!createColorAndDepthRenderPass(vkDev, false,        &renderPass_, RenderPassCreateInfo()))


        printf("Failed to create render pass ");



  2. As always, we start the rendering pass by saving the frame buffer dimensions:

      framebufferWidth_ = vkDev.framebufferWidth;

      framebufferHeight_ = vkDev.framebufferHeight;

  3. To ensure correct 3D rendering, we allocate a depth buffer:

      createDepthResources(vkDev, framebufferWidth_,    framebufferHeight_, depthTexture_);

  4. To proceed with the GPU buffer allocation, we need to read the mesh and instance data:


      MeshFileHeader header = loadMesh(meshFile);

      const uint32_t indirectDataSize =    maxInstances_ * sizeof(VkDrawIndirectCommand);

      maxInstanceSize_ =    maxInstances_ * sizeof(InstanceData);

      maxMaterialSize_ = 1024;

  5. We require a copy of the instance and indirect rendering data for each of the swapchain images:

      instanceBuffers_.resize(    vkDev.swapchainImages.size());

      instanceBuffersMemory_.resize(    vkDev.swapchainImages.size());

      indirectBuffers_.resize(    vkDev.swapchainImages.size());

      indirectBuffersMemory_.resize(    vkDev.swapchainImages.size());

    For this recipe, we do not need materials or textures. So, we will just allocate the buffer for the material data and avoid using it for now:

      if (!createBuffer(vkDev.device,        vkDev.physicalDevice, materialDataSize,        VK_BUFFER_USAGE_STORAGE_BUFFER_BIT,        VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |        VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,        materialBuffer_, materialBufferMemory_))


        printf("Cannot create material buffer ");



  6. To allocate a descriptor set, we need to save the sizes of the index and vertex buffers:

      maxVertexBufferSize_ = header.vertexDataSize;

      maxIndexBufferSize_ = header.indexDataSize;

In the previous chapters, we were lucky that the size of our vertex data was a multiple of 16 bytes. Now, we want to store arbitrary arrays of mesh vertices and face indices, so this forces us to support arbitrary offsets of GPU sub-buffers. Our descriptor set for MultiMeshRenderer has two logical storage buffers for index and vertex data. In the following snippet, we pad the vertex data with zeros so that its size has the necessary alignment properties:

  1. We fetch the device properties structure to find out the minimal storage buffer alignment value:

      VkPhysicalDeviceProperties devProps;

      vkGetPhysicalDeviceProperties(    vkDev.physicalDevice, &devProps);

      const uint32_t offsetAlignment =    devProps.limits.minStorageBufferOffsetAlignment;

  2. After that, if the vertex data size does not meet the alignment requirements, we add the necessary zeros to the end of the vertex data array:

      if ((maxVertexBufferSize_&(offsetAlignment-1)) != 0)


        int floats = (offsetAlignment -      (maxVertexBufferSize_&(offsetAlignment-1))) /      sizeof(float);

        for (int ii = 0; ii < floats; ii++)


  3. We update the vertex buffer size for the buffer allocation as follows:

        maxVertexBufferSize_ =      (maxVertexBufferSize_+offsetAlignment) &      ~(offsetAlignment - 1);


  4. Once we have calculated the index and vertex buffer size, we allocate the GPU buffer itself:

      if (!createBuffer(vkDev.device,        vkDev.physicalDevice,        maxVertexBufferSize_ + maxIndexBufferSize_,        VK_BUFFER_USAGE_STORAGE_BUFFER_BIT,        VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |        VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,        storageBuffer_, storageBufferMemory_))


        printf("Cannot create vertex/index buffer ");



  5. In this recipe, we only update the geometry data once, during the initialization stage:

      updateGeometryBuffers(vkDev, header.vertexDataSize,    header.indexDataSize,,;

  6. The remaining GPU buffers are allocated similarly to the uniform buffers in the previous chapters. One swapchain image corresponds to one instance buffer or indirect draw data buffer:

      for (size_t i = 0; i < vkDev.swapchainImages.size();       i++)


        if (!createBuffer(vkDev.device,      vkDev.physicalDevice, indirectDataSize,      VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT,

  7. For the sake of debugging and code simplification, we allocate indirect draw data buffers as host-visible. If the instances are not changed by the CPU, for example, for static geometry, such buffers could be allocated on the GPU for better performance. However, this requires you to allocate a staging buffer. An example of staging buffer allocation can be found in Chapter 3, Getting Started with OpenGL and Vulkan, where we dealt with texture images in the Using texture data in Vulkan recipe:

          VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |      VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,      indirectBuffers_[i], indirectBuffersMemory_[i]))


            printf("Cannot create indirect buffer ");



  8. Upon allocation, we fill the indirect draw buffers with the required values:

        updateIndirectBuffers(vkDev, i);

    In the demo application code snippet at the end of this recipe, we do not update this buffer during runtime. However, it might be necessary to do so if we want to set the LOD for the meshes when the camera position changes.

  9. In this example, the instance data also stays immutable after initialization:

        if (!createBuffer(vkDev.device,      vkDev.physicalDevice, instanceDataSize,      VK_BUFFER_USAGE_STORAGE_BUFFER_BIT,      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |      VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,      instanceBuffers_[i], instanceBuffersMemory_[i]))


          printf("Cannot create instance buffer ");



  10. After allocation, we upload local instance data to the GPU buffer:

        updateInstanceBuffer(      vkDev, i, instanceDataSize,;


This completes the description of our initialization process. Now, let's turn to the shader source code:

  1. The data/shaders/chapter05/VK01.vert vertex shader uses a whole bunch of buffers to calculate the onscreen vertex position. Each GPU buffer used in rendering is structured. We only use meshes with vertex coordinates. The InstanceData buffer contains a transformation matrix, a mesh, a material ID, and a LOD value:

    #version 460

    layout(location = 0) out vec3 uvw;

    struct ImDrawVert {

      float x, y, z;


    struct InstanceData {

      mat4 xfrm;

      uint mesh;

      uint matID;

      uint lod;


  2. Material data has not yet been used in this chapter. So, for the sake of brevity, we will just assume there is a single texture index:

    struct MaterialData {

      uint tex2D;


  3. All the buffer indices correspond to the ones used in the descriptor set layout creation code in C++ mentioned earlier:

    layout(binding = 0) uniform  UniformBuffer { mat4 inMtx; } ubo;

    layout(binding = 1) readonly buffer SBO {ImDrawVert data[];} sbo;

    layout(binding = 2) readonly buffer IBO {uint data[];} ibo;

    layout(binding = 3) readonly buffer InstBO {InstanceData data[];} instanceBuffer;

  4. The first four instances of these meshes are colored using the values from the following array:

    vec3 colors[4] = vec3[](  vec3(1.0, 0.0, 0.0),  vec3(0.0, 1.0, 0.0),  vec3(0.0, 0.0, 1.0),  vec3(0.0, 0.0, 0.0));

  5. In the shader program, we use a programmable vertex-pulling technique to read the vertex coordinates. Here, we fetch indices manually to simplify our C++ implementation. While this handling results in a shorter code, it defeats any hardware vertex reuse and should not be used in most real-world applications:

    void main() {

      uint idx =[gl_VertexIndex];

      ImDrawVert v =[idx];

  6. Then, we use the gl_BaseInstance counter to determine where we should fetch the output color from:

      uvw = (gl_BaseInstance >= 4) ?    vec3(1,1,0): colors[gl_BaseInstance];

  7. Due to the subtleties of the buffer management implementation in Vulkan, we cannot declare our transformation as a mat4 field in the InstanceData structure. Locally, we declare the mat4 object and manually convert a floating-point array into a 4x4 matrix:

      mat4 xfrm = transpose([gl_BaseInstance].xfrm);

  8. Last but not least, we apply the view-projection matrix from the uniform buffer and the xfrm instance model-to-world transform to the input vertex coordinates:

      gl_Position =    ubo.inMtx * xfrm * vec4(v.x, v.y, v.z, 1.0);


The data/shaders/chapter05/VK01.frag fragment shader simply outputs the color passed in the uvw variable from the vertex shader. In the subsequent Chapter 7, Graphics Rendering Pipeline, we will use the material information buffer and read material parameters from there. For now, a solid color is enough to run our multi-mesh rendering code:

#version 460

layout(location = 0) in vec3 uvw;

layout(location = 0) out vec4 outColor;

void main()


  outColor = vec4(uvw, 0.5);


The vkCmdDrawIndirect() function is an extension to the Vulkan API, and it must be explicitly enabled during the Vulkan render device initialization phase:

  1. The initVulkan() routine is very similar to the initialization routines that we have already implemented. Mandatory instance creation and GLFW surface initialization happens before the creation of a Vulkan device:

    void initVulkan()



      if (!setupDebugCallbacks(      vk.instance, &vk.messenger, &vk.reportCallback))




      if (glfwCreateWindowSurface(      vk.instance, window, nullptr, &vk.surface))




  2. In the initVulkanRenderDevice() function, the deviceFeatures parameter contains two flags that allow us to use indirect rendering and instance offsets:

      if (!initVulkanRenderDevice(vk, vkDev, kScreenWidth,        kScreenHeight,        isDeviceSuitable,        { .multiDrawIndirect = VK_TRUE,          .drawIndirectFirstInstance = VK_TRUE }))




  3. Along with the multi-mesh renderer, we also initialize two auxiliary objects to clear the frame buffer and finalize the rendering pass:

      clear = std::make_unique<VulkanClear>(    vkDev, VulkanImage());

      finish = std::make_unique<VulkanFinish>(    vkDev, VulkanImage());

  4. The MultiMeshRenderer class is initialized using the mesh file, the instance list file, an empty string for the material file's name, and names of two shaders which render all the meshes:

      multiRenderer = std::make_unique<MultiMeshRenderer>(    vkDev, "data/meshes/test.cubes",    "data/meshes/test.grid", "",    "data/shaders/chapter05/VK01.vert",    "data/shaders/chapter05/VK01.frag");


  5. To save space, we do not comment the entire drawOverlay() routine which is invoked every frame. The two new things we do in drawOverlay() include a call to updateUniformBuffer() to update the uniform buffer with the new model-view-projection matrix, mtx, and filling a Vulkan command buffer:

      multiRenderer->updateUniformBuffer(    vkDev, imageIndex, mtx);

      multiRenderer->fillCommandBuffer(    commandBuffer, imageIndex);

There's more...

It might be challenging to write a modern Vulkan renderer from scratch. For those who are interested, we would like to recommend an open source project,, by Arseny Kapoulkine, which tries to achieve exactly that. Many advanced Vulkan topics are covered in his YouTube streaming sessions.

Implementing an infinite grid GLSL shader

In the previous recipes of this chapter, we learned how to organize geometry storage in a more systematic way. To debug our applications, it is useful to have a visible representation of the coordinate system so that a viewer can quickly infer the camera orientation and position just by looking at a rendered image. A natural way to represent a coordinate system in an image is to render an infinite grid where the grid plane is aligned with one of the coordinate planes. Let's learn how to implement a decent-looking grid in GLSL.

Getting ready

The full C++ source code for this recipe can be found in Chapter5/GL01_Grid. The corresponding GLSL shaders are located in the data/shaders/chapter05/GL01_grid.frag and data/shaders/chapter05/GL01_grid.vert files.

How to do it...

To parametrize our grid, we should introduce some parameters. They can be found and tweaked in the data/shaders/chapter05/GridParameters.h GLSL include file:

  1. First of all, we need to define the size of our grid extents in the world coordinates, that is, how far from the camera the grid will be visible:

    float gridSize = 100.0;

  2. The size of one grid cell is specified in the same units as the grid size:

    float gridCellSize = 0.025;

  3. Let's define the colors of the grid lines. We will use two different colors: one for regular thin lines and the other for thick lines, which are rendered every tenth line. Since we render everything against a white background, we are good with black and 50% gray:

    vec4 gridColorThin = vec4(0.5, 0.5, 0.5, 1.0);

    vec4 gridColorThick = vec4(0.0, 0.0, 0.0, 1.0);

  4. Our grid implementation will change the number of rendered lines based on the grid LOD. We will switch the LOD when the number of pixels between two adjacent cell lines drops below this value:

    const float gridMinPixelsBetweenCells = 2.0;

  5. Let's take a look at a simple vertex shader that we can use to generate and transform grid vertices. It takes no input except the gl_VertexID parameter and scales the [-1..+1] rectangle by grid size:

    layout (location=0) out vec2 uv;

    const vec3 pos[4] = vec3[4](  vec3(-1.0, 0.0, -1.0),  vec3( 1.0, 0.0, -1.0),  vec3( 1.0, 0.0,  1.0),  vec3(-1.0, 0.0,  1.0));

    const int indices[6] = int[6](0, 1, 2, 2, 3, 0);

  6. An additional custom output that it produces is the XZ world coordinates of the vertex inside the uv parameter:

    void main() {

      vec3 vpos = pos[indices[gl_VertexID]]* gridSize;

      gl_Position = proj * view * vec4(vpos, 1.0);

      uv = vpos.xz;


The fragment shader is somewhat more complex. It will calculate a programmatic texture that looks like a grid. The grid lines are rendered based on how fast the uv coordinates change in the image space to avoid the Moiré pattern. Therefore, we are going to need screen space derivatives:

  1. First, we introduce a bunch of GLSL helper functions to aid our calculations:

    float log10(float x) {

      return log(x) / log(10.0);


    float satf(float x) {

      return clamp(x, 0.0, 1.0);


    vec2 satv(vec2 x) {

      return clamp(x, vec2(0.0), vec2(1.0));


    float max2(vec2 v) {

      return max(v.x, v.y);


  2. Let's take a look at the main() function and start by calculating the screen space length of the derivatives of the uv coordinates that we previously generated in the vertex shader. We will use the built-in dFdx() and dFdy() functions to calculate the required derivatives:

    vec2 dudv = vec2(

      length(vec2(dFdx(uv.x), dFdy(uv.x))),

      length(vec2(dFdx(uv.y), dFdy(uv.y)))


  3. By knowing the derivatives, the current LOD of our grid can be calculated in the following way. The gridMinPixelsBetweenCells value controls how fast we want our LOD level to increase. In this case, it is the minimum number of pixels between two adjacent cell lines of the grid:

    float lodLevel = max(0.0, log10((length(dudv) *  gridMinPixelsBetweenCells) / gridCellSize) + 1.0);

    float lodFade = fract(lodLevel);

    Besides the LOD value itself, we are going to need a fading factor to render smooth transitions between the adjacent levels. This can be obtained by taking a fractional part of the floating-point LOD level. A logarithm base of 10 is used to ensure each next LOD covers at least pow(10, lodLevel) more cells of the previous LOD.

  4. The LOD levels are blended between each other. To render them, we have to calculate the cell size for each LOD. Here, instead of calculating pow() three times, which is purely done for the sake of explanation, we can calculate it for lod0 only, and multiply each subsequent LOD cell size by 10.0:

    float lod0 =  gridCellSize * pow(10.0, floor(lodLevel+0));

    float lod1 =  gridCellSize * pow(10.0, floor(lodLevel+1));

    float lod2 =  gridCellSize * pow(10.0, floor(lodLevel+2));

  5. To be able to draw antialiased lines using alpha transparency, we need to increase the screen coverage of our lines. Let's make sure each line covers up to 4 pixels:

    dudv *= 4.0;

  6. Now we should get a coverage alpha value that corresponds to each calculated LOD level of the grid. To do that, we calculate the absolute distances to the cell line centers for each LOD and pick the maximum coordinate:

    float lod0a = max2( vec2(1.0) -  abs(satv(mod(uv, lod0) / dudv) * 2.0 – vec2(1.0)) );

    float lod1a = max2(vec2(1.0) -   abs(satv(mod(uv, lod1) / dudv) * 2.0 – vec2(1.0)) );

    float lod2a = max2(vec2(1.0) -   abs(satv(mod(uv, lod2) / dudv) * 2.0 – vec2(1.0)) );

  7. Nonzero alpha values represent non-empty transition areas of the grid. Let's blend between them using two colors to handle the LOD transitions:

    vec4 c = lod2a > 0.0 ? gridColorThick : lod1a > 0.0 ?  mix(gridColorThick, gridColorThin, lodFade) :  gridColorThin;

  8. Last but not least, make the grid disappear when it is far away from the camera. Use the gridSize value to calculate the opacity falloff:

    float opacityFalloff =  (1.0 - satf(length(uv) / gridSize));

  9. Now we can blend between the LOD level alpha values and scale the result with the opacity falloff factor. The resulting pixel color value can be stored in the framebuffer:

    c.a *= lod2a > 0.0 ? lod2a : lod1a > 0.0 ?  lod1a : (lod0a * (1.0-lodFade));

    c.a *= opacityFalloff;

    out_FragColor = c;

  10. The preceding shaders should be rendered using the following OpenGL state:

    glClearColor(1.0f, 1.0f, 1.0f, 1.0f);




    const PerFrameData = {  .view = view,  .proj = p,  .cameraPos = glm::vec4(camera.getPosition(), 1.0f)


    glNamedBufferSubData(perFrameDataBuffer, 0,  kUniformBufferSize, &perFrameData);

    glDrawArraysInstancedBaseInstance(  GL_TRIANGLES, 0, 6, 1, 0);

View the complete example at Chapter5/GL01_Grid for a self-contained demo app. The camera can be controlled with the WSAD keys and a mouse. The resulting image should appear similar to the following screenshot:

Figure 5.1 – The GLSL grid

Figure 5.1 – The GLSL grid

There's more...

Besides only considering the distance to the camera to calculate the antialiasing falloff factor, we can use the angle between the viewing vector and the grid line. This will make the overall look and feel of the grid more visually pleasing and can be an interesting improvement if you want to implement a grid not only as an internal debugging tool but also as a part of a customer-facing product, such as an editor. Please refer to the Our Machinery blog for additional details about how to implement a more complicated grid (

Rendering multiple meshes with OpenGL

In the previous recipes, we learned how to build a mesh preprocessing pipeline and convert 3D meshes from data exchange formats, such as .obj or .gltf2, into our runtime mesh data format and render it via the Vulkan API. Let's switch gears and examine how to render this converted data using OpenGL.

Getting ready

The full source code for this recipe is located in Chapter5/GL03_MeshRenderer. It is recommended that you revisit the Implementing a geometry conversion tool recipe before continuing further.

How to do it…

Let's implement a simple GLMesh helper class to render our mesh using OpenGL:

  1. The constructor accepts pointers to the indices and vertices data buffers. The data buffers are used as-is, and they are uploaded directly into the respective OpenGL buffers. The number of indices is inferred from the indices buffer size, assuming the indices are stored as 32-bit unsigned integers:

    class GLMesh final {


      GLMesh(const uint32_t* indices,    uint32_t indicesSizeBytes,    const float* vertexData,    uint32_t verticesSizeBytes)

      : numIndices_(indicesSizeBytes / sizeof(uint32_t))

      , bufferIndices_(indicesSizeBytes, indices, 0)

      , bufferVertices_(verticesSizeBytes, vertexData, 0)


        glCreateVertexArrays(1, &vao_);

        glVertexArrayElementBuffer(      vao_, bufferIndices_.getHandle());

        glVertexArrayVertexBuffer(vao_, 0,      bufferVertices_.getHandle(), 0, sizeof(vec3));

  2. The vertex data format for this recipe only contains vertices that are represented as vec3:

        glEnableVertexArrayAttrib(vao_, 0);

        glVertexArrayAttribFormat(      vao_, 0, 3, GL_FLOAT, GL_FALSE, 0);

        glVertexArrayAttribBinding(vao_, 0, 0);


  3. The draw() method binds the associated OpenGL vertex array object and renders the entire mesh:

      void draw() const {


        glDrawElements(GL_TRIANGLES,      static_cast<GLsizei>(numIndices_),      GL_UNSIGNED_INT, nullptr);


  4. The destructor takes care of removing the vertex array object (VAO), and that is pretty much it:

      ~GLMesh() {

        glDeleteVertexArrays(1, &vao_);



      GLuint vao_;

      uint32_t numIndices_;

      GLBuffer bufferIndices_;

      GLBuffer bufferVertices_;


Now we should read the converted mesh data from our file and load it into a new GLMesh object. Let's discuss how to do this next. Perform the following steps:

  1. First, we open the mesh file and read the MeshFileHeader structure, which was described in the Implementing a geometry conversion tool recipe:

    FILE* f = fopen("data/meshes/test.meshes", "rb");

    if (!f) {

      printf("Unable to open mesh file ");



    MeshFileHeader header;

    if (fread(&header, 1, sizeof(header), f) !=      sizeof(header)) {

      printf("Unable to read mesh file header ");



  2. Once the header has been retrieved, we use the number of meshes from the header to allocate space for the mesh descriptions. We store all of the Mesh structures inside a contiguous container so that all their data can be read from the file in one syscall. Rudimentary error checking will ensure that we read the number of elements that are equal to the number of meshes:

    std::vector<Mesh> meshes1;

    const auto meshCount = header.meshCount;


    if (fread(, sizeof(Mesh), meshCount, f)      != meshCount) {

      printf("Could not read meshes ");



  3. After loading the descriptions of each mesh, we should load the actual geometry into two containers, one for the indices and one for vertices. This code uses a shortcut, that is, the indexDataSize and vertexDataSize fields from the header:

    std::vector<uint32_t> indexData;

    std::vector<float> vertexData;

    const auto idxDataSize = header.indexDataSize;

    const auto vtxDataSize = header.vertexDataSize;

    indexData.resize(idxDataSize / sizeof(uint32_t));

    vertexData.resize(vtxDataSize / sizeof(float));

    if ((fread(,  1, idxDataSize, f)       != idxDataSize) ||    (fread(, 1, vtxDataSize, f)       != vtxDataSize)) {

      printf("Unable to read index/vertex data ");




  4. The resulting index and vertex buffers can be used to invoke the GLMesh constructor and prepare our geometry data for rendering:

    GLMesh mesh(, idxDataSize,, vtxDataSize);

Now we can go ahead and configure the OpenGL for rendering. To do that, we follow these simple steps:

  1. We should load all the necessary GLSL shaders for this demo app and link them into a shader program. One set of shaders is needed for grid rendering, as described in the Implementing an infinite grid GLSL shader recipe:

    GLShader shdGridVertex(  "data/shaders/chapter05/GL01_grid.vert");

    GLShader shdGridFragment(  "data/shaders/chapter05/GL01_grid.frag");

    GLProgram progGrid(shdGridVertex, shdGridFragment);

  2. Another set of shaders is needed to render the mesh itself. These shaders are mostly trivial and do not use the programmable vertex-pulling approach for the sake of brevity. So, we will omit their text here:

    GLShader shaderVertex(  "data/shaders/chapter05/GL03_mesh_inst.vert");

    GLShader shaderGeometry(  "data/shaders/chapter05/GL03_mesh_inst.geom");

    GLShader shaderFragment(  "data/shaders/chapter05/GL03_mesh_inst.frag");

    GLProgram program(  shaderVertex, shaderGeometry, shaderFragment);

  3. Since everything in this example is packed into a single mesh, we can allocate a buffer with model-to-world matrices. In this case, it contains only one matrix:

    const mat4 m(1.0f);

    GLBuffer modelMatrices(  sizeof(mat4), value_ptr(m), GL_DYNAMIC_STORAGE_BIT);

    glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2,  modelMatrices.getHandle());

  4. The modelMatrices buffer binding slot number 2 corresponds to this description in the GL03_mesh_inst.vert shader:

    layout(std430, binding = 2)  restrict readonly buffer Matrices


      mat4 in_Model[];


  5. We need to set up blending for grid rendering. The depth buffer is needed to correctly render a mesh:

    glClearColor(1.0f, 1.0f, 1.0f, 1.0f);




  6. The mesh and the grid are rendered as follows:








    glDrawArraysInstancedBaseInstance   (GL_TRIANGLES, 0, 6, 1, 0);

The running application will render an image of the Lumberyard Bistro mesh that looks similar to the following screenshot:

Figure 5.2 – The Amazon Lumberyard Bistro mesh geometry loaded and rendered

Figure 5.2 – The Amazon Lumberyard Bistro mesh geometry loaded and rendered

Generating LODs using MeshOptimizer

Earlier in this chapter, in the Implementing a geometry conversion tool recipe, we talked about preprocessing a mesh so that we can store it in a runtime efficient data format. One important part of the preprocessing pipeline is optimizing geometry and generating simplified meshes for real-time discrete LOD algorithms that we might want to use later. Let's learn how to generate simplified meshes using the MeshOptimizer library.

Getting ready

It is recommended that you revisit the Introducing MeshOptimizer recipe from Chapter 2, Using Essential Libraries. The complete source code for this recipe can be found in Chapter5/MeshConvert.

How to do it...

We are going to add a processLODs() function to our MeshConverter tool so that we can generate all of the necessary LOD meshes for a specified set of indices and vertices. Let's go through this function step by step to learn how to do it:

  1. The LOD meshes are represented as a collection of indices that construct a new simplified mesh from the same vertices that are used for the original mesh. This way, we only have to store one set of vertices and can render the corresponding LODs by simply switching the index buffer data. As we did earlier, for simplicity, we store all of the indices as unsigned 32-bit integers:

    void processLODs(  std::vector<uint32_t> indices,  const std::vector<float>& vertices,  std::vector<std::vector<uint32_t>>& outLods)


  2. Each vertex is constructed from 3 float values, hence the hardcoded value here:

      size_t verticesCountIn = vertices.size() / 3;

      size_t targetIndicesCount = indices.size();

  3. The first "zero" LOD corresponds to the original mesh indices. Push it as-is into the resulting container and print some debug information:

      uint8_t LOD = 1;

      printf("    LOD0: %i indices",    int(indices.size()));


  4. Let's iterate until the number of indices in the last LOD drops below 1024 or the total number of generated LODs reaches 8. Each subsequent LOD is supposed to have half of the number of indices from the previous LOD. The MeshOptimizer library implements two simplification algorithms. The first one, implemented in the meshopt_simplify() function, tries to follow the topology of the original mesh. This is so that the attribute seams, borders, and overall appearance can be preserved. The target error value of 0.02 corresponds to the 2% deviation from the original mesh:

      while ( targetIndicesCount > 1024 && LOD < 8 ) {

        targetIndicesCount = indices.size()/2;

        bool sloppy = false;

        size_t numOptIndices = meshopt_simplify(,, (uint32_t)indices.size(),, verticesCountIn,      sizeof( float ) * 3,      targetIndicesCount, 0.02f );

  5. The second simplification algorithm implemented in meshopt_simplifySloppy() does not follow the topology of the original mesh. This means it can be more aggressive by collapsing internal mesh details that are too small to matter because they are topologically disjointed but spatially close. We will switch to this aggressive algorithm in case the first algorithm is unable to achieve a significant reduction of the index count by at least 10%. If the aggressive version of the algorithm cannot improve the situation, we should just give up and terminate the sequence:

        if (static_cast<size_t>(numOptIndices * 1.1f) >          indices.size()) {

          if (LOD > 1) {

            numOptIndices = meshopt_simplifySloppy(,, indices.size(),, verticesCountIn,          sizeof(float) * 3,          targetIndicesCount);

            sloppy = true;

            if ( numOptIndices == indices.size() ) break;


          else break;


  6. A new set of indices is generated, and we can truncate the total number to match the output of the simplification process. Let's optimize each LOD for the vertex cache, as described in the Introducing MeshOptimizer recipe from Chapter 2, Using Essential Libraries. The resulting set of indices is ready to be stored as the next LOD:

        indices.resize( numOptIndices );

        meshopt_optimizeVertexCache(,,      indices.size(), verticesCountIn );

        printf("    LOD%i: %i indices %s",      int(LOD), int(numOptIndices),      sloppy ? "[sloppy]" : "");





This code will generate up to eight LOD meshes for a given set of indices and vertices, and it will store them inside our runtime mesh format data structures. We will learn how to make use of these LODs in Chapter 10, Advanced Rendering Techniques and Optimizations.

There's more...

The MeshOptimizer library contains many other useful algorithms, such as triangle strip generation, index and vertex buffer compression, mesh animation data compression, and more. All of these might be very useful for your geometry preprocessing stage, depending on the kind of graphics software you are writing. Please refer to the official documentation and releases page to view the latest features. You can find this at

Integrating tessellation into the OpenGL graphics pipeline

Now, let's switch gears and learn how to integrate hardware tessellation functionality into the OpenGL 4.6 graphics rendering pipeline.

Hardware tessellation is a feature that was introduced in OpenGL 4.0. It is implemented as a set of two new shader stages types in the graphics pipeline. The first shader stage is called the tessellation control shader, and the second stage is called the tessellation evaluation shader. The tessellation control shader operates on a set of vertices, which are called control points and define a geometric surface called a patch. The shader can manipulate the control points and calculate the required tessellation level. The tessellation evaluation shader can access the barycentric coordinates of the tessellated triangles and can use them to interpolate any per-vertex attributes that are required, such as texture coordinates, colors, and more. Let's go through the code to examine how these OpenGL pipeline stages can be used to triangulate a mesh depending on the distance from the camera.

Getting ready

The complete source code for this recipe is located in Chapter5/GL02_Tessellation.

How to do it...

Before we can tackle the actual GLSL shaders, we should augment our OpenGL shader loading code with a new shader type:

  1. To do that, we should change the GLShaderTypeFromFileName() helper function in the following way:

    GLenum GLShaderTypeFromFileName(const char* fileName)


      if (endsWith(fileName, ".vert"))    return GL_VERTEX_SHADER;

      if (endsWith(fileName, ".frag"))    return GL_FRAGMENT_SHADER;

      if (endsWith(fileName, ".geom"))    return GL_GEOMETRY_SHADER;

      if (endsWith(fileName, ".tesc"))    return GL_TESS_CONTROL_SHADER;

      if (endsWith(fileName, ".tese"))    return GL_TESS_EVALUATION_SHADER;

      if (endsWith(fileName, ".comp"))    return GL_COMPUTE_SHADER;


      return 0;


  2. The additional constructor for GLProgram can now swallow five different shaders at once. This should be sufficient to simultaneously accommodate vertex, tessellation control, tessellation evaluation, geometry, and fragment shaders in a single OpenGL shader program:

    GLProgram(const GLShader& a, const GLShader& b,  const GLShader& c, const GLShader& d,  const GLShader& e);

  3. This implementation is similar to other constructors and simply attaches all of the shaders, one by one, to the program object:

    GLProgram::GLProgram(  const GLShader& a, const GLShader& b,  const GLShader& c, const GLShader& d,  const GLShader& e)

    : handle_(glCreateProgram())


      glAttachShader(handle_, a.getHandle());

      glAttachShader(handle_, b.getHandle());

      glAttachShader(handle_, c.getHandle());

      glAttachShader(handle_, d.getHandle());

      glAttachShader(handle_, e.getHandle());




What we want to do now is write shaders that will calculate per-vertex tessellation levels based on the distance to the camera. In this way, we can render more geometrical details in the areas that are closer to the viewer. To do that, we should start with a vertex shader, such as datashaderschapter05.

GL02_duck.vert, which will compute the world positions of the vertices and pass them down to the tessellation control shader:

  1. Our per-frame data consists of the usual view and projection matrices, together with the current camera position in the world space, and the tessellation scaling factor, which is user-controlled and comes from an ImGui widget:

    #version 460 core

    layout(std140, binding = 0) uniform PerFrameData {

      mat4 view;

      mat4 proj;

      vec4 cameraPos;

      float tessellationScale;


  2. Geometry is accessed using the programmable vertex-pulling technique and stored in the following format, using vec3 for the vertex positions and vec2 for the texture coordinates:

    struct Vertex {

      float p[3];

      float tc[2];


    layout(std430, binding = 1)  restrict readonly buffer Vertices


      Vertex in_Vertices[];


  3. The model-to-world matrices are stored in a single shader storage buffer object:

    layout(std430, binding = 2)  restrict readonly buffer Matrices


      mat4 in_Model[];


  4. Let's write some helper functions to access the vertex positions and texture coordinates using traditional GLSL data types:

    vec3 getPosition(int i) {

      return vec3(    in_Vertices[i].p[0], in_Vertices[i].p[1],    in_Vertices[i].p[2]);


    vec2 getTexCoord(int i) {

      return vec2(    in_Vertices[i].tc[0], in_Vertices[i].tc[1]);


  5. The vertex shader outputs UV texture coordinates and per-vertex world positions. The actual calculation is done as follows. Note that the gl_DrawID variable is used to index the matrices buffer:

    layout (location=0) out vec2 uv_in;

    layout (location=1) out vec3 worldPos_in;

    void main() {

      mat4 MVP = proj * view * in_Model[gl_DrawID];

      vec3 pos = getPosition(gl_VertexID);

      gl_Position = MVP * vec4(pos, 1.0);

      uv_in = getTexCoord(gl_VertexID);

      worldPos_in =     ( in_Model[gl_DrawID] * vec4(pos, 1.0) ).xyz;


Now we can move on to the next shader stage and view the tessellation control shader, data/shaders/chapter05/GL02_duck.tesc:

  1. The shader operates on a group of 3 vertices, which correspond to a single triangle in the input data. The uv_in and worldPos_in variables correspond to the ones in the vertex shader. Here, notice how we have arrays instead of single solitary values:

    #version 460 core

    layout (vertices = 3) out;

    layout (location = 0) in vec2 uv_in[];

    layout (location = 1) in vec3 worldPos_in[];

  2. The PerFrameData structure should be exactly the same for all of the shader stages in this example:

    layout(std140, binding = 0) uniform PerFrameData {

      mat4 view;

      mat4 proj;

      vec4 cameraPos;

      float tessellationScale;


  3. Let's describe the input and output data structures that correspond to each individual vertex. Besides the required vertex position, we store the vec2 texture coordinates:

    in gl_PerVertex {

      vec4 gl_Position;

    } gl_in[];

    out gl_PerVertex {

      vec4 gl_Position;

    } gl_out[];

    struct vertex {

      vec2 uv;


    layout(location = 0) out vertex Out[];

  4. The getTessLevel() function calculates the desired tessellation level based on the distance of two adjacent vertices from the camera. The hardcoded distance values, which are used to switch the levels, are scaled using the tessellationScale uniform coming from the user interface:

    float getTessLevel(float distance0, float distance1) {

      const float distanceScale1 = 7.0;

      const float distanceScale2 = 10.0;

      const float avgDistance =    (distance0 + distance1) * 0.5;

      if (avgDistance <=      distanceScale1 * tessellationScale)

        return 5.0;

      else if (avgDistance <=           distanceScale2 * tessellationScale)

        return 3.0;

      return 1.0;


  5. The main() function is straightforward. It passes the positions and UV coordinates as-is and then calculates the distance from each vertex in the triangle to the camera:

    void main() {  

      gl_out[gl_InvocationID].gl_Position =    gl_in[gl_InvocationID].gl_Position;

      Out[gl_InvocationID].uv = uv_in[gl_InvocationID];

      vec3 c =;

      float eyeToVertexDistance0 =    distance(c, worldPos_in[0]);

      float eyeToVertexDistance1 =    distance(c, worldPos_in[1]);

      float eyeToVertexDistance2 =    distance(c, worldPos_in[2]);

  6. Based on these distances, we can calculate the required inner and outer tessellation levels in the following way:

      gl_TessLevelOuter[0] = getTessLevel(    eyeToVertexDistance1, eyeToVertexDistance2);

      gl_TessLevelOuter[1] = getTessLevel(    eyeToVertexDistance2, eyeToVertexDistance0);

      gl_TessLevelOuter[2] = getTessLevel(    eyeToVertexDistance0, eyeToVertexDistance1);

      gl_TessLevelInner[0] = gl_TessLevelOuter[2];


Let's take a look at the data/shaders/chapter05/GL02_duck.tese tessellation evaluation shader:

  1. We should specify the triangles as input. The equal_spacing spacing mode tells OpenGL that the n tessellation level should be clamped to a range of 0...64 and rounded to the nearest integer. After that, the corresponding edge should be divided into n equal segments. When the tessellation primitive generator produces triangles, the orientation of the triangles can be specified by an input layout declaration using the cw and ccw identifiers. We use the counter-clockwise orientation:

    #version 460 core

    layout(triangles, equal_spacing, ccw) in;

    struct vertex {

      vec2 uv;


    in gl_PerVertex {

      vec4 gl_Position;

    } gl_in[];

    layout(location = 0) in vertex In[];

    out gl_PerVertex {

      vec4 gl_Position;


    layout (location=0) out vec2 uv;

  2. These two helper functions are useful to interpolate between the vec2 and vec4 attribute values at the corners of the original triangle using the barycentric coordinates of the current vertex. The built-in gl_TessCoord variable contains the required barycentric coordinates:

    vec2 interpolate2(in vec2 v0, in vec2 v1, in vec2 v2){

      return v0 * gl_TessCoord.x +         v1 * gl_TessCoord.y +         v2 * gl_TessCoord.z;


    vec4 interpolate4(in vec4 v0, in vec4 v1, in vec4 v2){

      return v0 * gl_TessCoord.x +         v1 * gl_TessCoord.y +         v2 * gl_TessCoord.z;


  3. The actual interpolation code is straightforward and can be written in the following way:

    void main() {

      gl_Position = interpolate4(gl_in[0].gl_Position,                             gl_in[1].gl_Position,                             gl_in[2].gl_Position);

      uv = interpolate2(In[0].uv, In[1].uv, In[2].uv);


The next stage of our hardware tessellation graphics pipeline is the data/shaders/chapter05/GL02_duck.geom geometry shader. We use it to generate barycentric coordinates for all of the small tessellated triangles. It is used to render a nice antialiased wireframe overlay on top of our colored mesh, as described in Chapter 2, Using Essential Libraries:

  1. The geometry shader consumes triangles that have been generated by the tessellation hardware and outputs triangle strips, each consisting of a single triangle:

    #version 460 core

    layout (triangles) in;

    layout (triangle_strip, max_vertices = 3) out;

    layout (location=0) in vec2 uv[];

    layout (location=0) out vec2 uvs;

    layout (location=1) out vec3 barycoords;

    void main() {

  2. Barycentric coordinates are assigned per vertex using these hardcoded constants:

      const vec3 bc[3] = vec3[](    vec3(1.0, 0.0, 0.0),    vec3(0.0, 1.0, 0.0),    vec3(0.0, 0.0, 1.0)  );

      for ( int i = 0; i < 3; i++ ) {

        gl_Position = gl_in[i].gl_Position;

        uvs = uv[i];

        barycoords = bc[i];





The final stage of this rendering pipeline is the datashaderschapter05GL02_duck.frag fragment shader:

  1. We take in the barycentric coordinates from the geometry shader and use them to calculate a wireframe overlay for our mesh:

    #version 460 core

    layout (location=0) in vec2 uvs;

    layout (location=1) in vec3 barycoords;

    layout (location=0) out vec4 out_FragColor;

    layout (location=0) uniform sampler2D texture0;

  2. A helper function returns the blending factor based on the distance to the edge and the desired thickness of the wireframe contour:

    float edgeFactor(float thickness) {

      vec3 a3 = smoothstep(vec3(0.0),    fwidth(barycoords) * thickness, barycoords);

      return min( min( a3.x, a3.y ), a3.z );


  3. Let's sample the texture using the UV values and call it a day:

    void main() {

      vec4 color = texture(texture0, uvs);

      out_FragColor =    mix( color * vec4(0.8), color, edgeFactor(1.0) );


The GLSL shader part of our OpenGL hardware tessellation pipeline is over. Now it is time to look at the C++ code. The source code is located in the Chapter5/GL02_Tessellation/src/main.cpp file:

  1. The shaders for the tessellated mesh rendering are loaded in the following way:

    GLShader shaderVertex(  "data/shaders/chapter05/GL02_duck.vert");

    GLShader shaderTessControl(  "data/shaders/chapter05/GL02_duck.tesc");

    GLShader shaderTessEval(  "data/shaders/chapter05/GL02_duck.tese");

    GLShader shaderGeometry(  "data/shaders/chapter05/GL02_duck.geom");

    GLShader shaderFragment(  "data/shaders/chapter05/GL02_duck.frag");

  2. Now we can use our helper class to link everything inside an OpenGL shader program:

    GLProgram program(shaderVertex, shaderTessControl,  shaderTessEval,shaderGeometry, shaderFragment);

The data/rubber_duck/scene.gltf mesh loading code is identical to that of the previous chapter, so we will skip it here. What's more important is how we render the ImGui widget to control the tessellation scale factor:

  1. First, we define an ImGuiGLRenderer object, which will take care of all the ImGui rendering functionality, as described in the Rendering a basic UI with Dear ImGui recipe from Chapter 2, Using Essential Libraries. The minimalist implementation we provide can be found in the shared/glFramework/UtilsGLImGui.h file:

    ImGuiGLRenderer rendererUI;

  2. Inside our frame rendering loop, we can access all the ImGui rendering functionality as usual. Here, we just render a single slider containing a floating-point value for the tessellation scale factor:

    io.DisplaySize = ImVec2((float)width, (float)height);


    ImGui::SliderFloat("Tessellation scale",  &tessellationScale, 1.0f, 2.0f, "%.1f");


  3. After we have issued all the ImGui drawing commands, we can render the resulting user interface using a single call to ImGuiGLRenderer::render(). The implementation will take care of all the necessary OpenGL render states to draw the ImGui data:

    rendererUI.render(  width, height, ImGui::GetDrawData() );

Here is a screenshot of the running demo application:

Figure 5.3 – A tessellated duck

Figure 5.3 – A tessellated duck

Note how the different tessellation levels vary based on the distance to the camera. Try playing with the control slider to emphasize the effect.

There's more...

This recipe can be used as a cornerstone to hardware mesh tessellation techniques in your OpenGL applications. One natural step forward would be to apply a displacement map to the fine-grained tessellated vertices using the direction of normal vectors. Please refer to for inspiration. If you want to go serious on the adaptive tessellation of subdivision surfaces, there is a chapter in the GPU Gems 2 book, which covers this advanced topic in more detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.