Any analytic data store requires the ability to perform data loads quickly and efficiently. Bulk loading is a reduced logging process that allows data to be inserted directly into a columnstore index. This not only bypasses the delta store, but results in a transaction size that reflects the compression of the target data, greatly reducing the amount of data written to the transaction log when this process is used.
Bulk Load Processes Explained
Traditional transactional workloads are fully logged. When fully logged, enough metadata is written to the transaction log so that SQL Server can recover from a failed transaction. In addition, data is written to the transaction log to ensure that point-in-time recovery is possible.
Fully logged transactions in OLTP scenarios are ideal as they allow a database to be rolled back to a specific point in time in the event of application problems. In addition, point-in-time recovery allows for research and forensics into the details of transactions, when needed.
Sometimes, transactional writes are intentionally subdivided into smaller batches to ensure that each transaction is short and fast and causes minimal contention with other queries against the same data.
More storage space consumed in the transaction log
More storage space consumed in transaction log backup files
More CPU/memory to process transaction log backups
Longer query duration for write operations
Analytic workloads differ greatly in their recovery needs as data loads tend to be larger, less frequent, and asynchronous. When analytic data loads fail, the most common recourse is to investigate the problem, resolve it, and then rerun the data load. Point-in-time recovery within data load processes is less important than simply having recovery available to a point in time prior to the data load. Therefore, OLAP data loads can benefit greatly from minimally logged insert operations.
BCP
Bulk insert operations
Select into operations
Partition switching
Because point-in-time recovery can be critical to transactional systems, any process that utilizes bulk loading needs to be documented well enough to not accidentally be used in a scenario where it is not desired.
Columnstore indexes use built-in bulk load processes that automatically insert larger batches directly into rowgroups without using the delta store. This greatly improves insert speeds for large analytic data loads and reduces transaction log bloat resulting from those processes.
Bulk Loading into Columnstore Indexes
Unlike some of the other types of minimally logged insert operations in SQL Server, there are no prerequisites to take advantage of bulk loading data into a columnstore index. There is no need to adjust isolation levels, use explicit table locking, or adjust parallelism settings. Since bulk loading data will be the desired operation for large inserts into columnstore indexes, it is the default and will be used automatically when possible.
- 1.
Each batch of 220 rows will be bulk inserted into the columnstore index until there are less than 220 rows remaining to insert.
- 2.
If the remainder of rows is greater than or equal to 102,400, it will also be bulk inserted into the columnstore index.
- 3.
If the remainder of rows is less than 102,400, it will be inserted into the delta store.
2 bulk loads of 1,048,576 rows each
1 bulk load of 902,848 rows (the remainder)
2 bulk loads of 1,048,576 rows each
2,848 rows (the remainder) inserted into the delta store
SQL Server can bulk load data into a columnstore index in multiple parallel threads so long as the data for each thread is targeting a different data file. This is automatic and requires no particular user action to accomplish. Columnstore bulk insert operations acquire exclusive locks against target rowgroups, and so long as parallel inserts target separate data files, they are guaranteed to not overlap the rowgroups they are inserting into.
When data is inserted into a partitioned columnstore index, that data is first assigned a partition and then each group of rows is inserted into their respective partitions. Therefore, whether bulk load processes are used will be dependent on the numbers of rows inserted into the target partition, rather than the total number of rows inserted by the source query. Typically, analytic tables will have a current/active partition that accepts new data, whereas the remaining partitions will contain older data that is no longer written to (outside of one-off data loads, software releases, or maintenance events).
Performance of Bulk Loading into Columnstore Indexes
An Insert of 102,400 Rows into a Clustered Rowstore Index
Query to Calculate the Transaction Size for a Clustered Rowstore Insert
An Insert of 102,400 Rows into a Clustered Columnstore Index
Note that the transaction log size for the insert is 118,192 bytes, or about 115KB. This constitutes a significant reduction in transaction size when compared to a page compressed rowstore index.
An Insert of 102,399 Rows into a Clustered Columnstore Index
Query to Calculate Log Growth for Columnstore Index and Delta Store
The insert into the delta store was significantly more expensive than any operation demonstrated thus far in this chapter, with a transaction size of 1,300,448, or about 1.2GB. As a result, there is a significant incentive to take advantage of bulk loading data into columnstore indexes when possible.
Trickle Insert vs. Staged Insert
While inserting into the delta store may seem expensive, it is nowhere near as costly as repeated inserts into a columnstore index would be. If an analytic table is often targeted with many smaller insert operations, there is significant value in collecting those rows into a temporary table first, prior to inserting into the columnstore index.
In workloads involving rowstore tables, large insert operations may be batched into small groups of rows in order to reduce contention and decrease the transaction size for each insert. Columnstore indexes do not benefit from micro-batching. When migrating code from rowstore tables to columnstore tables, resources can be saved by adjusting data load processes to operate on significantly larger batches. Batches of 220 (1,048,576) rows are optimal, though a batch size greater than or equal to 102,400 will ensure that the delta store is not needed. If insert operations contain tens of millions of rows, then those inserts can be broken down into more manageable subunits to prevent a staging or temporary table from becoming too large.
- 1.
Load at least 102,400 rows per insert operation.
- 2.
When possible, load 1,048,576 rows per batch.
- 3.
Only load less than 102,400 rows for the remaining rows at the end of the data load.
While this guidance may make it seem as though the delta store should be avoided at all costs, the goals of an analytic data store should not be compromised for this purpose. Delaying reporting data to prevent delta store usage is not worthwhile and will not save enough resources to be meaningful. Actively avoiding the insert of repeated small batches will ensure that insert performance is quite good. If the delta store is needed for the tail end of a data load process, then there is no reason to avoid it.
Other Data Load Considerations
Vertipaq optimization, which can greatly improve the effectiveness of columnstore index compression, is not used when a nonclustered rowstore index is present on a clustered columnstore index. Over time, this will result in rowgroups that will consume more resources than they would otherwise.
For scenarios where nonclustered rowstore indexes are required to support analytic processes, consider the following options for managing performance on the columnstore index.
Drop Nonclustered Indexes During Data Loads
A simple solution to ensure that Vertipaq optimization is always used is to drop nonclustered rowstore indexes during data loads and re-create them afterward. Building the nonclustered indexes is an online operation and will not interfere with other queries against the table. The downside to this approach is that any analytics that use the nonclustered indexes will be impacted during the data load process, until the nonclustered index rebuild is complete. Additional resources are also required to handle the nonclustered index build. This approach is most effective when the nonclustered indexes are small and relatively quick to build.
On partitioned tables, only the partitions with data that has changed require this consideration. Typically, partitions containing older data that is primarily static and unchanging in nature (not hot data) can be ignored.
- 1.
Drop nonclustered indexes on active partitions.
- 2.
Rebuild the clustered columnstore index on active partitions.
- 3.
Re-create nonclustered indexes.
As a planned maintenance event, these are relatively innocuous tasks that should not consume an unusual amount of time, even on a larger table.
Columnstore Reorganize Operations with Each Data Load
Query to Return Details About the Structure of a Columnstore Index
Index Reorganize Command to Compress Delta Store
The results show that the delta rowgroup affected by the index maintenance is now in an intermediary state. A new object has been created and compressed with the contents of the delta rowgroup inserted into it. The previous delta rowgroup (an uncompressed heap) is left in a tombstone state for the tuple mover to remove at a later point in time. Note the significant size difference between an uncompressed delta rowgroup and a compressed rowgroup.
Once the tuple mover completes its cleanup process, all that remains is a single compressed columnstore rowgroup.
Performing maintenance like this is not necessary but can improve query read speeds against a columnstore index after a data load is completed. Index maintenance will be discussed in far more detail in Chapter 14, including its use as part of data loads and other columnstore processes .
Summary
When loading data into a columnstore index, bulk loading ensures the fastest possible load speeds while minimizing server resource consumption. Bulk loading large numbers of rows in fewer batches is far more efficient than using trickle or small batch inserts.
By focusing data load processing around maximizing the use of compressed columnstore rowgroups and minimizing delta store usage, overall columnstore index performance can be improved, both for the data load processes and analytics that use the newly loaded data.