Data durability concerns

Now that we have a better understanding of how data moves inside a memory-optimized table, we need to also dive a little deeper into the durability of memory-optimized tables. Earlier in the chapter, we saw that we can choose between SCHEMA_AND_DATA or SCHEMA_ONLY for durability of these objects. The succinct description for SCHEMA_ONLY durability is that any data (or changes) inside a memory-optimized table that has SCHEMA_ONLY durability, will be lost when the server restarts. The reason for the server restart is irrelevant; when the server is brought back online, the table (and any associated indexes) is recreated and will be empty.

The SCHEMA_AND_DATA option informs SQL Server that you wish the data inside the memory-optimized table to be permanently stored on non-volatile storage. This will ensure that the data is available even after a server restart. This initially sounds like it would be a problem for a high-performance data processing system, especially when we consider the speed differences between mass storage (HDDs/SSDs) versus main system memory (DIMMs). However, the type and amount of data required to be stored on durable storage is kept to a minimum, so that the chances of a bottleneck in making the data durable are greatly reduced.

The data that gets written to disk contains only the data that has changed, and only the row data of the memory-optimized table and not the index data. Only the index definitions are stored durably, the index content is recreated every time the server is restarted. This recreation is rapid, because SQL Server can create the index based on the data that is already in main memory, which is extremely low latency and high throughput.

There are two streams of data processed and written to disk: checkpoint streams and log streams.

A checkpoint stream is itself split into two files, together known as checkpoint file pairs (CFPs). These CFPs utilize the filestream API which we addressed when creating our test In-Memory OLTP database. The two file types in this file pair are data streams (containing all row versions inserted in a timestamp interval) and delta streams (containing a list of deleted row versions matching the data stream file pair).

The log streams contain the changes made by committed transactions, and like normal transactions, are stored in the SQL Server transaction log. This allows SQL Server to use the entries in the transaction log to reconstruct information during the redo phase of crash recovery.

We can investigate the behavior of Durability, by running our insert stored procedures for the disk-based table and then our SCHEMA_ONLY memory-optimized table. After running each stored procedure we can then investigate the contents of the transaction log using the undocumented and unsupported (but on a test machine, perfectly usable) fn_dblog() to view how these two processes affect the transaction log:

fn_dblog() output after disk-based insert run

The output after the memory-optimized insert is as follows:

fn_dblog() output after memory-optimized insert run

In the preceding screenshots, we see a snapshot of the data retrieved immediately after running first the disk-based insert (figure fn_dblog() output after disk-based insert run) and the memory-optimized insert (figure fn_dblog() output after disk-based insert run). The disk-based insert created thousands of log record entries for each iteration of the loop. This takes time to process and inflates the transaction log unnecessarily. The memory-optimized insert causes zero entries in the transaction log, because the table was defined as SCHEMA_ONLY, and as such has no requirement for log redo or crash recovery. The schema will be loaded at server startup and will be immediately ready for normal operations.

We can now take a look at our filestream folder, where the checkpoint stream is directed. We will find multiple checkpoint files in this folder. Each checkpoint file will contain either data or delta streams, as described earlier in this section of the chapter. The checkpoint files are created and filled with an append-only mechanism, adding data to the end of the file. A single checkpoint file pair will continue to be used until either a manual CHECKPOINT command is used, or 512 MB of transaction information has accumulated since the last automatic checkpoint. When this occurs, the checkpoint process generates a checkpoint file inventory of multiple data and delta files, this inventory file catalogues the data and delta files and writes this information to the transaction log. These files are used in a recovery scenario to be able to reconstruct data for SCHEMA_AND_DATA defined memory-optimized tables:

Filestream folder used for storing memory-optimized checkpoint file pairs

Table of Contents for Data durability concerns

Create new playlist

Sign In

Sign Up

Table of Contents for
Data durability concerns