© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
E. PollackAnalytics Optimization with Columnstore Indexes in Microsoft SQL Serverhttps://doi.org/10.1007/978-1-4842-8048-5_9

9. Delete and Update Operations

Edward Pollack1  
(1)
Albany, NY, USA
 

Modifying data within a highly compressed structure is expensive and requires additional processes to manage it successfully. Whereas insert operations can benefit from bulk loading to streamline data load processes, delete and update operations require using the delete bitmap and delta store to manage changes to existing data.

The Cost of Modifying Data

In Chapter 5, columnstore compression was discussed in extensive detail. An important takeaway is that some forms of compression, such as run-length encoding, depend on the exact contents of the underlying data. Columnstore compression is exceptionally good at shrinking data by a considerable amount by applying a variety of encoding and compression algorithms and optimizations. Consider the sample data in Figure 9-1.
Figure 9-1

Sample data encoded with run-length encoding

The sample data is encoded with dictionary compression, reordered with Vertipaq optimization, and further compressed with run-length encoding. If a process deletes the last two rows in the table, then it is necessary to decompress the data fully, delete the rows, and then recompress it. Figure 9-2 shows how this process would impact the data.
Figure 9-2

Deleting rows from run-length encoded data

The resulting set of indexed ID groups contains only 4 rows, rather than 5 (as index ID 4 has been removed), and the count for the index ID 3 has been reduced.

If the table were larger and the deletion impacted more rows, then it is likely that many rowgroups would need to be decompressed, adjusted, and recompressed. In the process, many pages would need to be updated. This operation would quickly grow to be prohibitively expensive. A balance needs to be maintained between the speed of write operations and the speed of read operations, and in this scenario, the ability to load and modify data quickly needs to be prioritized.

Delete Operations

In a columnstore index, the cost to decompress rowgroups, delete rows, and recompress them is prohibitively high. The more rowgroups a delete operation targets, the greater this cost would become. To mitigate this cost, a structure called the delete bitmap is used to track deletions from the columnstore index.

The delete bitmap is a heap that references the rows within underlying rowgroups. When a row is deleted, the data within the rowgroup remains unchanged and the delete bitmap is updated with a reference to the deleted row. As such, deleted rows in columnstore indexes can be seen as soft deleted, where removed rows are flagged, but not physically deleted.

Note that delete operations against rows in the delta store do not need to use the delete bitmap as the delta store is a rowstore heap and rows can simply be deleted from it as needed, without the need for soft deletes.

When a query is executed against a rowgroup containing deleted rows, the delete bitmap is consulted and deleted rows are excluded from the results. The delete bitmap may be visualized as seen in Figure 9-3.
Figure 9-3

The delete bitmap and its relationship to a columnstore index

The underlying rows within the rowgroup are unchanged when a deletion occurs against them. Instead, the delete bitmap tracks which rows are deleted and is consulted when future queries are issued against this rowgroup. Because of this, deleting data in a columnstore index will not reclaim storage space.

Consider the query shown in Listing 9-1.
DELETE
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] = '1/1/2016';
Listing 9-1

Query That Deletes Data from a Columnstore Index

When executed, this query will delete all sale data that was invoiced on 1/1/2016. Before doing so, the rowgroup metadata can be consulted to confirm that there are currently no deleted rows in the rowgroup. The query in Listing 9-2 will return metadata about rowgroups in this columnstore index, including the number of deleted rows, if any.
SELECT
       tables.name AS table_name,
       indexes.name AS index_name,
       partitions.partition_number,
       column_store_row_groups.state_description,
       column_store_row_groups.total_rows,
       column_store_row_groups.size_in_bytes,
       column_store_row_groups.deleted_rows
FROM sys.column_store_row_groups
INNER JOIN sys.indexes
ON indexes.index_id = column_store_row_groups.index_id
AND indexes.object_id = column_store_row_groups.object_id
INNER JOIN sys.tables
ON tables.object_id = indexes.object_id
INNER JOIN sys.partitions
ON partitions.partition_number = column_store_row_groups.partition_number
AND partitions.index_id = indexes.index_id
AND partitions.object_id = tables.object_id
WHERE tables.name = 'Sale_CCI'
ORDER BY indexes.index_id, column_store_row_groups.row_group_id;
Listing 9-2

Query That Returns Metadata About Deleted Rows in Rowgroups

The rowgroup metadata can be seen in Figure 9-4.
Figure 9-4

Rowgroup metadata for a columnstore index, including deleted rows

As is expected of a columnstore index that has had no delete operations executed against it, the deleted_rows column in sys.column_store_row_groups shows zero for each rowgroup. Executing the query in Listing 9-1 will delete a total of 15,400 rows from the columnstore index. Figure 9-5 shows the metadata details for the columnstore index after the deletion has completed.
Figure 9-5

Rowgroup metadata after a deletion has occurred

Each rowgroup now contains some number of deleted rows, depending on the number of rows within them that happen to have been invoiced on 1/1/2016. Note that the total rows in each rowgroup and the size in bytes have not changed. This is expected and reflects the fact that rows were soft deleted via the delete bitmap.

Because rows were removed from all rowgroups in the index, a deletion without the aid of the delete bitmap would have required a rebuild of the entire index, which would be a prohibitively expensive operation to perform while a process waits for a single day’s worth of rows to be deleted. Instead, the delete of 15,400 rows completed exceptionally quickly, in under a second, thanks to the delete bitmap!

The only way to clean up deleted rows and free up the space consumed within rowgroups is to perform an index rebuild on the columnstore index or on any partitions impacted by the deletion. This is generally not needed unless the amount of deleted data becomes large with respect to the overall size of the index. Like with classic rowstore indexes, fragmentation is not problematic until there is too much of it, at which time a rebuild can resolve it. Note that index reorganize operations do not remove deleted rows from columnstore indexes! Chapter 14 dives into index maintenance in detail, discussing how and when it is needed, and best practices for its use.

Update Operations

In a columnstore index, an update is executed as two operations: a delete and an insert. While logically, the update will perform as a single atomic unit, under the covers it will consist of
  1. 1.

    A set of deleted rows, flagged in the delete bitmap

     
  2. 2.

    A set of newly inserted rows, written to the delta store

     

This means that an update operation will need to write to both the delete bitmap and the delta store in order to complete successfully. It is also important to note that the insert operations that result from an update operation against a columnstore index will exclusively use the delta store and cannot take advantage of bulk load processes.

Before continuing, a rebuild will be executed against the columnstore index to allow for easier visualization of the results. The query in Listing 9-3 will rebuild the index.
ALTER INDEX CCI_fact_sale_CCI ON Fact.Sale_CCI REBUILD;
Listing 9-3

Query to Rebuild a Columnstore Index, Removing the Delete Bitmap

With the data now clean, the query in Listing 9-4 can be used to return metadata about all rowgroups in the columnstore index, including both the delta store and delete bitmap.
SELECT
       tables.name AS table_name,
       indexes.name AS index_name,
       partitions.partition_number,
       column_store_row_groups.row_group_id,
       column_store_row_groups.state_description,
       column_store_row_groups.total_rows,
       column_store_row_groups.size_in_bytes,
       column_store_row_groups.deleted_rows,
       internal_partitions.internal_object_type_desc,
       internal_partitions.rows
FROM sys.column_store_row_groups
INNER JOIN sys.indexes
ON indexes.index_id = column_store_row_groups.index_id
AND indexes.object_id = column_store_row_groups.object_id
INNER JOIN sys.tables
ON tables.object_id = indexes.object_id
INNER JOIN sys.partitions
ON partitions.partition_number = column_store_row_groups.partition_number
AND partitions.index_id = indexes.index_id
AND partitions.object_id = tables.object_id
LEFT JOIN sys.internal_partitions
ON internal_partitions.object_id = tables.object_id
WHERE tables.name = 'Sale_CCI'
ORDER BY indexes.index_id, column_store_row_groups.row_group_id;
Listing 9-4

Query to Return Delete Bitmap and Delta Store Metadata for Rowgroups in a Columnstore Index

The results in Figure 9-6 show a clean columnstore index with no deleted rows or entries in the delta store.
Figure 9-6

Metadata for rowgroups with no deletes/updates against them

With a pristine columnstore index available, the effects of update operations can be easily visualized against it. Consider the query shown in Listing 9-5.
SELECT
       *
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] = '1/2/2016';
Listing 9-5

Query to Identify a Data Set to Update

This SELECT query identifies a total of 18,150 rows in the table that match the filter criteria of an invoice date of 1/2/2016. Next, an update will be made against two columns in the table, as shown in Listing 9-6.
UPDATE Sale_CCI
       SET [Total Dry Items] = [Total Dry Items] - 1,
           [Total Chiller Items] = [Total Chiller Items] + 1
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] = '1/2/2016';
Listing 9-6

Query to Update Data from a Columnstore Index

Returning to the rowgroup metadata query in Listing 9-4, the results of the UPDATE statement can be reviewed, with a sample seen in Figure 9-7.
Figure 9-7

Metadata for rowgroups after updating 18150 rows

The metadata after the UPDATE was executed shows that every rowgroup in the columnstore index was impacted as 18,150 rows were deleted and 18,150 rows were then inserted. Sys.internal_partitions shows a delete bitmap and delta store object, each containing 18,150 total rows. The rowgroup detail illustrates how many rows were updated per rowgroup. In addition, the new rowgroup (number 25) shows the new open delta store that was created for the newly inserted rows.

The resulting structure underscores the fact that an UPDATE against a columnstore index executes as a combination of discrete delete and insert operations. Performing each of those operations sequentially would yield similar results.

Consider an update to all rows in this columnstore index for the range of invoice dates from 1/3/2016 up to 1/8/2016. A count of these rows shows a total of 148,170 that match that date filter. Before running an update, the columnstore index will be rebuilt, which will clean up the deleted rows and delta stores, allowing for a better demonstration. Listing 9-7 provides the query to rebuild the index and then update these rows.
ALTER INDEX CCI_fact_sale_CCI ON Fact.Sale_CCI REBUILD;
GO
UPDATE Sale_CCI
       SET [Total Dry Items] = [Total Dry Items] - 1,
           [Total Chiller Items] = [Total Chiller Items] + 1
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] >= '1/3/2016'
AND [Invoice Date Key] < '1/8/2016';
Listing 9-7

Query to Update 148,170 Rows in a Columnstore Index After Rebuilding the Index

When updating 148,170 rows, the first thing to note is that it took 5 full seconds to execute! The previous update of 18,150 rows completed almost instantly after being executed. Viewing the metadata reveals the reason for this, as seen in Figure 9-8.
Figure 9-8

Metadata for rowgroups after updating 148,170 rows

The key takeaway from the metadata following the larger update is that the delta store contains all 148,170 rows updated in the operation. Normally, any INSERT operations of 102,400 rows or more will benefit from a minimally logged bulk insert process, but UPDATE operations cannot benefit from it. Because the UPDATE is composed of both an INSERT and DELETE in the same transaction, it is not possible to fork the single transaction into a fully logged DELETE and a minimally logged INSERT.

Bulk insert processes on columnstore indexes can save immense resources when used, but are not allowed to violate the ACID (Atomic, Consistent, Isolated, and Durable) properties of a SQL Server database. Attempting to splice together a fully logged and minimally logged transaction into a single larger transaction would require the creation of a transaction that would provide point-in-time restore capabilities for the DELETE, but not the INSERT. While it is possible to conceive of ways for SQL Server to architect its way around that limitation, doing so would be confusing to anyone using a columnstore index and could result in unexpected results when restoring databases containing columnstore indexes that are subject to frequent UPDATE operations.

The key takeaway is that an UPDATE against a columnstore index will be comprised of a DELETE operation and a fully logged INSERT into the delta store. As the number of rows updated increases, the performance of those operations will decrease dramatically. The delta store was built to handle small numbers of rows – either trickle loads or the residual rows from a larger data load process. It was not built for large volumes of rows at one time and as a matter of course will perform poorly for those applications.

There is one final demonstration to underscore this challenge, and that is to update more rows than would fit into a single rowgroup. Consider the sample query in Listing 9-8.
SELECT
       COUNT(*)
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] >= '1/8/2016'
AND [Invoice Date Key] < '3/5/2016';
Listing 9-8

Query to Count Rows with a Wide Date Range on a Columnstore Index

When executed, the result is 1,159,180 rows counted. This is greater than the 1,048,576 rows that can be stored in a single columnstore rowgroup. Listing 9-9 rebuilds the index once more and then performs an update against all 1,159,180 rows identified in Listing 9-9. This single UPDATE operation will result in the deletion of old rows, the insertion of new rows into the delta store, and the compression of most of the new rows into rowgroups.
ALTER INDEX CCI_fact_sale_CCI ON Fact.Sale_CCI REBUILD;
GO
UPDATE Sale_CCI
       SET [Total Dry Items] = [Total Dry Items] - 1,
           [Total Chiller Items] = [Total Chiller Items] + 1
FROM Fact.Sale_CCI
WHERE [Invoice Date Key] >= '1/8/2016'
AND [Invoice Date Key] < '3/5/2016';
Listing 9-9

Query to Update 1,159,180 Rows in a Columnstore Index After Rebuilding the Index

The update of 1,159,180 rows took almost a minute to complete. Figure 9-9 shows the resulting columnstore metadata immediately after this large UPDATE operation completes.
Figure 9-9

Metadata for rowgroups after updating 1,159,180 rows

Note that there are multiple delta stores present. The open delta store (number 26) will remain open to accept future inserted data. The closed delta store will be processed by the tuple mover asynchronously and be compressed into a permanent columnstore rowgroup. Figure 9-10 shows the metadata for this columnstore index after a minute has passed and the tuple mover has executed.
Figure 9-10

Metadata for rowgroups after updating 1,159,180 rows and allowing the tuple mover to execute

The rowgroup labeled in a state of “TOMBSTONE” (number 25) is no longer a logical part of the columnstore index and will be cleaned up in the future, either via the tuple mover or index maintenance. Rowgroup number 27 contains the contents of rowgroup number 25 when it was processed by the tuple mover. To summarize the changes made to the columnstore index by the update of 1,159,180 rows:
  1. 1.

    1,159,180 rows were marked as deleted in the delete bitmap across all rowgroups.

     
  2. 2.

    1,048,576 rows were inserted into one delta store, filling it to capacity.

     
  3. 3.

    110,604 rows were inserted into another delta store and remain in an open state awaiting future inserted rows.

     
  4. 4.

    The 1,048,576 rows in the full delta store are processed by the tuple mover and compressed into a permanent rowgroup.

     

This is a significant amount of work for a single UPDATE statement, and it scales poorly with increasing row counts.

In general, updates should be limited to small row counts or avoided altogether. There is value in rethinking how code is written to refactor an UPDATE against a columnstore index into a set of deletes and inserts or to manage an update via intermediary processes that avoid it altogether. The performance of UPDATE operations on columnstore indexes will be unpredictable and a large enough row count will result in transactions large enough to create resource pressure on the SQL Server.

Chapter 15 (best practices) will discuss in greater detail how to avoid updates and a variety of tactics that can be used when migrating UPDATE code from rowstore indexes into columnstore indexes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.25.74