© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
E. PollackAnalytics Optimization with Columnstore Indexes in Microsoft SQL Serverhttps://doi.org/10.1007/978-1-4842-8048-5_8

8. Bulk Loading Data

Edward Pollack1  
(1)
Albany, NY, USA
 

Any analytic data store requires the ability to perform data loads quickly and efficiently. Bulk loading is a reduced logging process that allows data to be inserted directly into a columnstore index. This not only bypasses the delta store, but results in a transaction size that reflects the compression of the target data, greatly reducing the amount of data written to the transaction log when this process is used.

Bulk Load Processes Explained

Traditional transactional workloads are fully logged. When fully logged, enough metadata is written to the transaction log so that SQL Server can recover from a failed transaction. In addition, data is written to the transaction log to ensure that point-in-time recovery is possible.

Fully logged transactions in OLTP scenarios are ideal as they allow a database to be rolled back to a specific point in time in the event of application problems. In addition, point-in-time recovery allows for research and forensics into the details of transactions, when needed.

Sometimes, transactional writes are intentionally subdivided into smaller batches to ensure that each transaction is short and fast and causes minimal contention with other queries against the same data.

The cost of fully logged transactions is that more data needs to be written to the transaction log, which results in
  • More storage space consumed in the transaction log

  • More storage space consumed in transaction log backup files

  • More CPU/memory to process transaction log backups

  • Longer query duration for write operations

Analytic workloads differ greatly in their recovery needs as data loads tend to be larger, less frequent, and asynchronous. When analytic data loads fail, the most common recourse is to investigate the problem, resolve it, and then rerun the data load. Point-in-time recovery within data load processes is less important than simply having recovery available to a point in time prior to the data load. Therefore, OLAP data loads can benefit greatly from minimally logged insert operations.

Outside of columnstore indexes, bulk loading data is limited to a handful of write operations, such as
  • BCP

  • Bulk insert operations

  • Select into operations

  • Partition switching

Because point-in-time recovery can be critical to transactional systems, any process that utilizes bulk loading needs to be documented well enough to not accidentally be used in a scenario where it is not desired.

Columnstore indexes use built-in bulk load processes that automatically insert larger batches directly into rowgroups without using the delta store. This greatly improves insert speeds for large analytic data loads and reduces transaction log bloat resulting from those processes.

Bulk Loading into Columnstore Indexes

The breakpoint for deciding whether to use a bulk load process or to use the delta store is 102,400 rows. Inserts of less than 102,400 rows will always be written to the delta store via a fully logged process, whereas inserts of 102,400 rows or more will be written directly to rowgroups in the columnstore index, bypassing the delta store. This decision can be shown in Figure 8-1.
Figure 8-1

Determination of process for bulk loading data into a columnstore index

Unlike some of the other types of minimally logged insert operations in SQL Server, there are no prerequisites to take advantage of bulk loading data into a columnstore index. There is no need to adjust isolation levels, use explicit table locking, or adjust parallelism settings. Since bulk loading data will be the desired operation for large inserts into columnstore indexes, it is the default and will be used automatically when possible.

If an insert operation contains more than 220 (1,048,576) rows, it will be subdivided into inserts in the following fashion:
  1. 1.

    Each batch of 220 rows will be bulk inserted into the columnstore index until there are less than 220 rows remaining to insert.

     
  2. 2.

    If the remainder of rows is greater than or equal to 102,400, it will also be bulk inserted into the columnstore index.

     
  3. 3.

    If the remainder of rows is less than 102,400, it will be inserted into the delta store.

     
For example, if an insert of 3,000,000 rows is made to a columnstore index, it will be broken down as follows:
  • 2 bulk loads of 1,048,576 rows each

  • 1 bulk load of 902,848 rows (the remainder)

Alternatively, if an insert of 2,100,000 rows occurs, it will be processed like this:
  • 2 bulk loads of 1,048,576 rows each

  • 2,848 rows (the remainder) inserted into the delta store

SQL Server can bulk load data into a columnstore index in multiple parallel threads so long as the data for each thread is targeting a different data file. This is automatic and requires no particular user action to accomplish. Columnstore bulk insert operations acquire exclusive locks against target rowgroups, and so long as parallel inserts target separate data files, they are guaranteed to not overlap the rowgroups they are inserting into.

When data is inserted into a partitioned columnstore index, that data is first assigned a partition and then each group of rows is inserted into their respective partitions. Therefore, whether bulk load processes are used will be dependent on the numbers of rows inserted into the target partition, rather than the total number of rows inserted by the source query. Typically, analytic tables will have a current/active partition that accepts new data, whereas the remaining partitions will contain older data that is no longer written to (outside of one-off data loads, software releases, or maintenance events).

Performance of Bulk Loading into Columnstore Indexes

To demonstrate the impact of bulk loading data into a columnstore index, a test will be conducted against a rowstore copy of Fact.Sale. The table’s creation is not shown here, but is an identical copy, with only a clustered primary key and page compression, rather than a clustered columnstore index. This insert query is shown in Listing 8-1.
INSERT INTO fact.Sale_Transactional
       ([Sale Key], [City Key],[Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID],
     Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items],
     [Total Chiller Items], [Lineage Key])
SELECT TOP 102400
       *
FROM Fact.Sale;
Listing 8-1

An Insert of 102,400 Rows into a Clustered Rowstore Index

The insert takes about 1 second. With the rows inserted, an undocumented but useful system function will be used to read the contents of the transaction log and determine the size of the transaction, as seen in Listing 8-2.
SELECT
       fn_dblog.allocunitname,
       SUM(fn_dblog.[log record length]) AS log_size
FROM sys.fn_dblog (NULL, NULL)
WHERE fn_dblog.allocunitname = ('Fact.Sale_Transactional.PK_Fact_Sale_Transactional')
GROUP BY fn_dblog.allocunitname;
Listing 8-2

Query to Calculate the Transaction Size for a Clustered Rowstore Insert

The result is a single row that indicates the total log size for the insert transaction, as seen in Figure 8-2.
Figure 8-2

Transaction size for an insert of 102,400 rows into a clustered rowstore index

The log size returned for the rowstore insert was 225,856 bytes, or about 220KB. The same insert will now be performed against a clean copy of the table with a columnstore index, as seen in Listing 8-3.
INSERT INTO fact.Sale_CCI_Clean_Test
       ([Sale Key], [City Key],[Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID],
     Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items],
     [Total Chiller Items], [Lineage Key])
SELECT TOP 102400
       *
FROM Fact.Sale;
Listing 8-3

An Insert of 102,400 Rows into a Clustered Columnstore Index

This insert takes less than a second to complete. A new clean table is used to ensure that there are no residual transactions from previous demonstrations that would pollute the log. The results from fn_dblog() are seen in Figure 8-3.
Figure 8-3

Transaction size for an insert of 102,400 rows into a clustered columnstore index

Note that the transaction log size for the insert is 118,192 bytes, or about 115KB. This constitutes a significant reduction in transaction size when compared to a page compressed rowstore index.

With the impact that bulk loading can have on transaction size demonstrated, it is important to illustrate the difference between inserting 102,400 rows into a columnstore index and inserting 102,399 rows. The T-SQL in Listing 8-4 inserts 102,399 rows into a newly created clustered columnstore index.
INSERT INTO fact.Sale_CCI_Clean_Test_2
       ([Sale Key], [City Key],[Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID],
     Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry Items],
     [Total Chiller Items], [Lineage Key])
SELECT TOP 102399
       *
FROM Fact.Sale;
Listing 8-4

An Insert of 102,399 Rows into a Clustered Columnstore Index

This takes about a second to execute. The query to pull data from fn_dblog() needs to be adjusted slightly to accommodate the log growth due to both the columnstore index itself and the delta store. This is shown in Listing 8-5.
SELECT
       fn_dblog.allocunitname,
       SUM(fn_dblog.[log record length]) AS log_size
FROM sys.fn_dblog (NULL, NULL)
WHERE fn_dblog.allocunitname IN ('Fact.Sale_CCI_Clean_Test_2.CCI_Sale_CCI_Clean_Test_2', 'Fact.Sale_CCI_Clean_Test_2.CCI_Sale_CCI_Clean_Test_2(Delta)')
GROUP BY fn_dblog.allocunitname;
Listing 8-5

Query to Calculate Log Growth for Columnstore Index and Delta Store

Note that the delta store needs to be referenced separately to be included in the results. The transaction log space consumed by each object is seen in Figure 8-4.
Figure 8-4

Transaction size for an insert of 102,399 rows into a clustered columnstore index

The insert into the delta store was significantly more expensive than any operation demonstrated thus far in this chapter, with a transaction size of 1,300,448, or about 1.2GB. As a result, there is a significant incentive to take advantage of bulk loading data into columnstore indexes when possible.

Trickle Insert vs. Staged Insert

While inserting into the delta store may seem expensive, it is nowhere near as costly as repeated inserts into a columnstore index would be. If an analytic table is often targeted with many smaller insert operations, there is significant value in collecting those rows into a temporary table first, prior to inserting into the columnstore index.

In workloads involving rowstore tables, large insert operations may be batched into small groups of rows in order to reduce contention and decrease the transaction size for each insert. Columnstore indexes do not benefit from micro-batching. When migrating code from rowstore tables to columnstore tables, resources can be saved by adjusting data load processes to operate on significantly larger batches. Batches of 220 (1,048,576) rows are optimal, though a batch size greater than or equal to 102,400 will ensure that the delta store is not needed. If insert operations contain tens of millions of rows, then those inserts can be broken down into more manageable subunits to prevent a staging or temporary table from becoming too large.

The best practices for loading data into columnstore indexes are as follows:
  1. 1.

    Load at least 102,400 rows per insert operation.

     
  2. 2.

    When possible, load 1,048,576 rows per batch.

     
  3. 3.

    Only load less than 102,400 rows for the remaining rows at the end of the data load.

     

While this guidance may make it seem as though the delta store should be avoided at all costs, the goals of an analytic data store should not be compromised for this purpose. Delaying reporting data to prevent delta store usage is not worthwhile and will not save enough resources to be meaningful. Actively avoiding the insert of repeated small batches will ensure that insert performance is quite good. If the delta store is needed for the tail end of a data load process, then there is no reason to avoid it.

Other Data Load Considerations

Vertipaq optimization, which can greatly improve the effectiveness of columnstore index compression, is not used when a nonclustered rowstore index is present on a clustered columnstore index. Over time, this will result in rowgroups that will consume more resources than they would otherwise.

For scenarios where nonclustered rowstore indexes are required to support analytic processes, consider the following options for managing performance on the columnstore index.

Drop Nonclustered Indexes During Data Loads

A simple solution to ensure that Vertipaq optimization is always used is to drop nonclustered rowstore indexes during data loads and re-create them afterward. Building the nonclustered indexes is an online operation and will not interfere with other queries against the table. The downside to this approach is that any analytics that use the nonclustered indexes will be impacted during the data load process, until the nonclustered index rebuild is complete. Additional resources are also required to handle the nonclustered index build. This approach is most effective when the nonclustered indexes are small and relatively quick to build.

On partitioned tables, only the partitions with data that has changed require this consideration. Typically, partitions containing older data that is primarily static and unchanging in nature (not hot data) can be ignored.

For scenarios when nonclustered indexes are necessary and cannot be dropped, consider periodic maintenance to address inefficient compression over time. If a quarterly maintenance can be scheduled, then the following actions can be taken:
  1. 1.

    Drop nonclustered indexes on active partitions.

     
  2. 2.

    Rebuild the clustered columnstore index on active partitions.

     
  3. 3.

    Re-create nonclustered indexes.

     

As a planned maintenance event, these are relatively innocuous tasks that should not consume an unusual amount of time, even on a larger table.

Columnstore Reorganize Operations with Each Data Load

When a columnstore index is read, its contents plus the delta store are read together to produce the necessary output. While the delta store cannot get excessively large, columnstore indexes with periodic data loads can improve read speeds slightly by forcefully compressing the contents of the delta store into the columnstore index. The query in Listing 8-6 examines the contents of the small columnstore index created earlier.
SELECT
       tables.name AS table_name,
       indexes.name AS index_name,
       partitions.partition_number,
       column_store_row_groups.row_group_id,
       column_store_row_groups.state_description,
       column_store_row_groups.total_rows,
       column_store_row_groups.size_in_bytes,
       column_store_row_groups.deleted_rows,
       internal_partitions.internal_object_type_desc,
       internal_partitions.rows,
       internal_partitions.data_compression_desc
FROM sys.column_store_row_groups
INNER JOIN sys.indexes
ON indexes.index_id = column_store_row_groups.index_id
AND indexes.object_id = column_store_row_groups.object_id
INNER JOIN sys.tables
ON tables.object_id = indexes.object_id
INNER JOIN sys.partitions
ON partitions.partition_number = column_store_row_groups.partition_number
AND partitions.index_id = indexes.index_id
AND partitions.object_id = tables.object_id
LEFT JOIN sys.internal_partitions
ON internal_partitions.object_id = tables.object_id
WHERE tables.name = 'Sale_CCI_Clean_Test_2'
ORDER BY indexes.index_id, column_store_row_groups.row_group_id;
Listing 8-6

Query to Return Details About the Structure of a Columnstore Index

The results are found in Figure 8-5.
Figure 8-5

Rowgroup information for a small columnstore index

Note that the entire contents of the columnstore index (102,399 rows) reside in the delta store. The delete bitmap exists as a default and is currently empty. If the operator wishes to move the contents of the delta store into columnstore rowgroups, this can be accomplished by the index maintenance command in Listing 8-7.
ALTER INDEX CCI_Sale_CCI_Clean_Test_2 ON Fact.Sale_CCI_Clean_Test_2 REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON);
Listing 8-7

Index Reorganize Command to Compress Delta Store

Once complete, the delta store would be compressed and ready to move into the columnstore index, as seen in Figure 8-6.
Figure 8-6

Effects of index maintenance on a columnstore delta rowgroup

The results show that the delta rowgroup affected by the index maintenance is now in an intermediary state. A new object has been created and compressed with the contents of the delta rowgroup inserted into it. The previous delta rowgroup (an uncompressed heap) is left in a tombstone state for the tuple mover to remove at a later point in time. Note the significant size difference between an uncompressed delta rowgroup and a compressed rowgroup.

At this time, running the same ALTER INDEX statement as before will force the tuple mover to complete this cleanup. Alternatively, waiting for a short amount of time to pass will achieve the same results. After 5 minutes have passed, the contents of this columnstore index are as seen in Figure 8-7.
Figure 8-7

Columnstore index contents after the tuple mover executes

Once the tuple mover completes its cleanup process, all that remains is a single compressed columnstore rowgroup.

Performing maintenance like this is not necessary but can improve query read speeds against a columnstore index after a data load is completed. Index maintenance will be discussed in far more detail in Chapter 14, including its use as part of data loads and other columnstore processes .

Summary

When loading data into a columnstore index, bulk loading ensures the fastest possible load speeds while minimizing server resource consumption. Bulk loading large numbers of rows in fewer batches is far more efficient than using trickle or small batch inserts.

By focusing data load processing around maximizing the use of compressed columnstore rowgroups and minimizing delta store usage, overall columnstore index performance can be improved, both for the data load processes and analytics that use the newly loaded data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.81.206