A solid understanding of the architecture of columnstore indexes is necessary to make optimal use of them. Best practices, query patterns, maintenance, and troubleshooting are all based on the internal structure of columnstore indexes. This chapter will focus on these architectural components, providing the foundation for the rest of this book.
Sample Data
To demonstrate the topics presented throughout this book, a sample data set will be created based on the Fact.Sale table in the WideWorldImportersDW database.
Query Used to Generate a Data Set for Use in Columnstore Index Testing
This generates 25,109,150 rows of data spanning invoice date ranges of 1/1/2013 through 5/31/2016. While this data is not massive, it is large enough for suitable demonstrations without being cumbersome to those that wish to replicate it at home. This data set will be reused in future chapters, being placed into a number of test tables to illustrate a wide variety of topics related to columnstore indexes, OLAP performance, and database architecture.
Script That Creates and Populates a Columnstore Index Test Table
Rowgroups and Segments
Analytic data cannot be stored in one large contiguous structure. While each column is compressed and stored separately, the data within those columns needs to be grouped and compressed separately. Each compressed unit is what is read into memory. If that unit is too small, then the storage and management of a multitude of compressed structures would be expensive, and its compression poor. Conversely, if the number of rows in each unit is too large, then the amount of data that needs to be read into memory to satisfy queries would also become too large.
Note that columnstore indexes are not built on binary tree structures like clustered and nonclustered rowstore indexes are. Instead, each rowgroup contains a set of compressed segments, one per column in a table. The example in Figure 4-2 is for a table with eight columns and up to 6*220 rows, containing a total of 48 segments (one segment per column per rowgroup). The only significant architectural convention shared between rowstore and columnstore indexes are their use of 8KB pages to store their data.
Rowgroups are created and managed automatically as data is created in a columnstore index. There is no cap on the number of rowgroups that can exist, nor are there limits on the count of segments within the index. Because a rowgroup contains up to 220 rows, a table should have many more rows than this to make optimal use of a columnstore index. If a table has 500k rows, then it is likely to all be stored in a single rowgroup. As a result, any query requiring data from the table would need to read segments that contain data for all rows in the table. For a table to make effective use of a columnstore index, it should have at least 5 million or 10 million rows so that it can be broken into separate rowgroups of which not all need to be read each time a query is issued against it.
Script to Return Basic Rowgroup Metadata for a Columnstore Index
By joining in other system views, such as sys.indexes and sys.tables, it is possible to return additional information about the table that the columnstore index resides on. sys.column_store_row_groups contains some additional columns of interest, such as the compression status of the rowgroup, the row count, and its size. Note that the selection of eight rowgroups in Figure 4-3 all contain the maximum number of rows allowed in a rowgroup, 220.
Using rowgroup metadata allows the user to quickly measure the size of a columnstore index and gain a basic understanding of its structure.
Script to Return Basic Segment Metadata for a Columnstore Index
The row count returned by the query is equal to the total number of segments in the index, which is subsequently equal to the count of rowgroups multiplied by the number of columns in the table. In addition to the row count contained in each segment and its size on disk, there are details given as to whether the segment has NULLs in it, and min_data_id/max_data_id, which provide dictionary lookups for values contained in the segment. Details about dictionaries and how they work will be provided alongside the discussion of compression in Chapter 5.
The Delta Store
Writing data to highly compressed segments is a resource-intensive process. The resources required to decompress a set of segments, write additional data to them, compress them, and update metadata is not trivial. The effort to write a million rows to one rowgroup is comprised of the effort needed to decompress, write, and compress all segments in the rowgroup. The effort to write a million rows one at a time to that rowgroup would be comparable to the same process being executed one million times.
Because of this, processes that frequently write small numbers of rows need a way to manage those writes so that a columnstore index is not consuming all of a server’s resources decompressing, writing, and re-creating segments. The delta store is a set of clustered rowstore indexes that temporarily store small writes alongside the columnstore index. Each delta rowgroup is one of these clustered indexes. Changes accumulate in the delta store until its row count reaches a threshold (220 rows per delta rowgroup) and is pushed all at once into the columnstore index.
The delta store is maintained automatically by SQL Server. An asynchronous background process called the tuple mover manages the movement of data from delta store into columnstore index. While operators can run maintenance scripts to influence the behavior of the delta store, doing so is not necessary for using columnstore indexes. More information on columnstore maintenance can be found in Chapter 14.
When queries are executed against a columnstore index, the contents of the delta store are read alongside the compressed segments that are needed. While the delta store is comprised of classic rowstore indexes, its size is generally very small compared to the compressed portion of the columnstore index; therefore, reading it is generally not detrimental to performance. The benefit of the delta store is that it greatly reduces the computing resources needed to write smaller batches of data to a columnstore index and ensures that frequent smaller writes do not become unscalable bottlenecks during a data load, maintenance, or software release process.
Note that the delta store is not used to manage all INSERT operations against a columnstore index. Chapter 8 discusses in detail how bulk load processes are used to greatly speed up larger INSERT operations .
The Delete Bitmap
Deletion of data from a columnstore index poses similar challenges to those solved for inserts by the delta store. Deleting a single row would require a compressed segment per column to be decompressed, modified, and recompressed. This is inherently expensive and becomes prohibitively costly when the number of rowgroups affected increases.
SQL Server needs to be able to delete data from a columnstore index quickly and without having a prolonged negative impact on server performance. To accomplish this, data is not physically deleted from a columnstore index when a DELETE statement is executed. Instead, the deleted status of the affected rows is written to a structure called the delete bitmap.
Only one delete bitmap can exist per partition within a columnstore index, and its purpose is to track which rows are deleted (if any). When rows are deleted from a columnstore index, the delete bitmap is updated to indicate the deletion. The corresponding rows within the columnstore index are not modified in any way. These soft deletes allow DELETE operations to execute quickly and not become burdensome to data loads or maintenance processes.
When a query reads data from a columnstore index, the contents of the delete bitmap are also read and any rows that are flagged as deleted will be omitted from the results. The delete bitmap is also a clustered rowstore index that exists side by side with the columnstore index. The biggest upside of having a delete bitmap is that DELETE operations can execute exceptionally fast as there is no need to decompress, update, and recompress segments in the index. The downside is that deleted rows still take up space in the index and are not immediately removed. Over time, the space consumed by deleted rows may become nontrivial, at which point index maintenance (discussed in Chapter 14) can periodically be used to reclaim this space.
Both the delete bitmap and the delta store are components of a columnstore index that only exist when needed. If no deleted rows exist, then there will be no delete bitmap to overlay onto the compressed rowgroups. Similarly, an empty delta store will not need to be checked when queries are executed against the columnstore index. These components exist when necessary, but otherwise will have no impact on performance when not needed.
Nonclustered Columnstore Index Architecture
The architecture of nonclustered columnstore indexes is quite similar to that of clustered indexes. Instead of being the primary storage mechanism for a table, the nonclustered columnstore index is an additional index structure that exists alongside a clustered rowstore index.
An important difference is that a column list may be specified on the nonclustered columnstore index, allowing for more targeted optimization projects. Additionally, filters may be applied to nonclustered columnstore indexes, allowing for data to be targeted based on whether it is hot, warm, or cold, thus decreasing index size and the cost to write to the index.
Hot Data: Real time and actively used by a system with regular reads, writes, and high concurrency. Availability is expected to be high and latency should be very low.
Warm Data: Represents data that is used less frequently. Availability is still expected to be high, but there is a tolerance for more latency when retrieving it.
Cold Data: Represents infrequently accessed data that is old, archived, or maintained for posterity. Availability is more flexible for cold data and high latency is tolerable. Concurrency is very low for cold data.
Note that the purpose of a nonclustered columnstore index is to provide real-time analytics against a transactional table, which requires careful consideration prior to implementing. Chapter 12 will dive into more detail with regard to nonclustered columnstore indexes including options, demonstrations, and optimal use cases.
Physical Data on Pages
Thus far, the discussion of architecture has occurred at a logical level. That is, rowgroups, segments, delta store, and the delete bitmap have been presented as generic containers for columns of data. The physical storage of columnstore indexes on pages helps in fully appreciating how they can efficiently service analytic queries.
All data in SQL Server that is not stored in memory-optimized structures is stored on 8 kilobyte pages. These pages reside on physical storage and are read into memory when needed as-is. If a page is compressed, it remains compressed until its data is needed. Whether a page is in a rowstore or columnstore index, it is read into memory in the same fashion. The delete bitmap and delta store are also maintained on pages and read into memory as needed.
Page header
Data
Row offsets
The page header contains basic information about the page such as the object that owns it, the type of data stored in it, and the amount of unallocated space available for additional data to be written.
The data rows are the actual data stored on the page. This may be the physical data in a table (the clustered index or heap), index entries, or a variety of other contents that are available under a variety of circumstances. For this discussion of columnstore and rowstore indexes, the data and index data are all that is of immediate concern.
Row offsets store the position that each row starts, allowing SQL Server to locate the data for any given row once a page is read into memory.
This table has six columns, and they are written sequentially for each row, one after another. SQL Server will continue to write rows sequentially for this table to the same page until it runs out of space, at which point a new page will be allocated to the table and the data will continue there.
This structure is optimized for reading or writing small range lookups that consist of a set of sequential rows. For example, a query that returned some (or all) of columns from rows two through five would only need to read the single page shown in Figure 4-9. If additional rows were required that were on other pages, then those pages would also be read into memory to satisfy the query.
Because this is the clustered index, the leaf levels contain the column data for each row that is referenced by the clustered index. In a nonclustered index, the lowest level of the index would contain pointers to the target data, to be used for key lookups. Similarly, the leaf level of a nonclustered index would also contain any included columns that are defined on the index.
A query that needed to read clustered index ID values between 25 and 75 could do so by reading the index root page first and then pages 1 and 2. A query reading only clustered index ID value 136 would read the index root and page 4.
Note that this data, a subset of a compressed segment, contains sequential values for the same column. This sequence would continue until the end of the rowgroup, at which point a new segment would begin, or if it were the last of the rowgroups, the next column would begin.
A query that aggregated values for the first 40 rows of column 1 could retrieve data very efficiently and do so by simply reading this single page. In a rowstore index for the same table, it would be necessary to scan data for all columns in those 40 rows and read those pages into memory. For a table with 20 columns, that would require 20 times the effort.
Summarizing Differences
The key to understanding the difference between the architecture of rowstore indexes and columnstore indexes is to consider the logical storage of data vs. the physical storage of data.
Rowstore indexes store data physically in order by row and then column. Data is logically accessible as it is ordered by row. Therefore, this architecture is optimized for queries that seek a sequence or limited subset of rows. Binary tree indexes provide a speedy way to search for that data based on other columns that the clustered index does not order the data by.
Columnstore indexes store data physically grouped by column, but ordered by row. This convention allows for filters to limit the number of rows returned and column groupings to reduce the pages read as the number of columns needed for a query decreases. This allows queries to read significantly more rows, but maintain excellent performance when the column list is limited. Chapter 10 explores how the order of data within a columnstore index can be controlled to add an additional dimension of filtering so that data can be sliced both horizontally and vertically.