Looking behind the curtain of concurrency

So far, we have seen that we can easily create a table with indexes, query, and insert data into that table with our traditional T-SQL knowledge. It is now also clear that we are able to change an existing implementation of disk-based tables into memory-optimized tables with some simple changes. If we want to push the performance to the max, we can also consider natively compiled stored procedures to further improve upon some quite large performance gains. We have also seen how the two types of index; hash index and non-clustered index, are implemented in the storage engine. However, to better understand how the magic happens, we need to have a closer look at what is happening to our sample table and indexes while we are manipulating them.

We already mentioned at the beginning of this chapter that SQL Server uses a pessimistic concurrency model in the disk-based storage engine and an optimistic concurrency model for the memory-optimized objects. The implementation of this optimistic concurrency model is achieved using a Multi-Version Concurrency Control (MVCC), or to describe it another way, snapshotting the data. MVCC is not a new technology; it has been around in one form or another since the early 1980s and has been an optional isolation level in SQL Server since SQL Server 2005 (READ COMMITTED SNAPSHOT). This technology has been repurposed in the In-Memory OLTP engine to allow for lock and latch-free optimistic concurrency. READ COMMITTED snapshot isolation level provides row versioning for disk-based tables by redirecting write operations into the row version store inside tempdb; however, In-Memory OLTP works entirely inside memory and cannot push row versions into tempdb.

If we take a look at our row header from the beginning of the chapter again, we can investigate the, Begin Ts and End Ts fields:

Memory-optimized table row structure—Header and Payload

When a row is inserted, the Begin Ts timestamp is filled. This timestamp is constructed from two pieces of information. The first is the transaction ID, which is a globally unique value on the instance. This value increments whenever a transaction starts on the instance and is reset when the server restarts. The second portion of this generated timestamp is created from the Global Transaction Timestamp. This is equally unique across the instance, but is not reset at server restart and increments whenever a transaction completes.

In order to avoid the requirement of locking records to update them, memory-optimized tables only ever add new records to a table. Whenever an insert occurs (the simplest change operation), a new row is added to a table, with a Begin Ts at the transaction start time. The End Ts is left empty, allowing SQL Server to know that this row exists inside the database from the Begin Ts and is valid until End Ts (or in the case of the row not being deleted, it is valid until infinite).

The interesting changes occur when we wish to update or delete an existing row. To avoid having to lock the row and perform a delete (for a pure delete) or a delete and insert (for an update), a memory-optimized table will simply insert the updated version of the row with a Begin Ts of the transaction timestamp and insert the same timestamp in the End Ts of the newly deleted row:

Update of a memory-optimized table—row versioning

In the preceding figure, we can see this behavior illustrated. The update statement in the top-left corner shows that an existing row (blue row) needs to be updated. The concurrency model then creates a new copy of the row (green) while noting the current transaction timestamp (orange). At this point, we have a new row which is valid from timestamp 0x3 and an old row that was valid from timestamp 0x1 until 0x3. Any transaction that started at timestamp 0x2 would see the original row (blue), while any transaction from 0x3 onwards would see the new row (green).

As you may realize, the action of continually appending new versions of a row will cause this linked list of row versions to grow over time. This is where the GC comes into play. The GC is a background process which periodically traverses the tables and indexes in the In-Memory storage engine and removes the old and invalidated rows. In order to further accelerate this cleanup process, user processes also assist with identifying entries that can be cleaned up. When a user process scans across a table or index that has stale row version data, it will remove their links from the row chains as it passes them.

As with many features inside SQL Server, we can gain insight into the inner workings of this row version mechanism using Dynamic Management Objects (DMOs).

To take a look at the table we created previously and see the row version information, we query sys.dm_db_xtp_index_stats joining to sys.indexes to retrieve the corresponding object information:

SELECT i.name AS 'index_name', 
    s.rows_returned, 
    s.rows_expired, 
    s.rows_expired_removed 
FROM sys.dm_db_xtp_index_stats s 
    JOIN sys.indexes i 
        ON s.object_id = i.object_id 
           AND s.index_id = i.index_id 
WHERE OBJECT_ID('InMemoryTable') = s.object_id; 
GO 
 
index_name  rows_returned   rows_expired    rows_expired_removed 
----------- --------------- --------------  ---------------- 
NULL        594115           0                0 
NCL_IDX     0                0                0 
PK_UserId 0 649901 399368

Here, we can see that after our data wrangling using the different methods, the table has rows added and also expired (detected as stale) or expired removed (how many rows have been unlinked from the index).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.38.253