CHAPTER 7

image

Indexing Memory-Optimized Tables

The past few chapters have focused on indexing related to specialized indexing capabilities in SQL Server. The common thread through those indexes is their reliance on disk-based storage as the primary medium for storing the data and providing data access. SQL Server 2014 introduces a new manner in which a table can be made available,called a memory-optimized table.

Memory-optimized tables are significantly different from traditional tables, with memory being the primary medium for the tables. In this chapter, you’ll look at how the changes for memory-optimized tables affect your ability to index these tables and how to create the ideal indexes.

Image Note  Depending on the source and conversation, memory-optimized tables are also referred to as in-memory OLTP and Hekaton. This book will use the term memory-optimized tables since it aligns to the terminology used in Microsoft Books Online.

Memory-Optimized Tables Overview

Before digging into the indexing options for memory-optimized tables, let’s start with the basics of memory-optimized tables. A memory-optimized table is a new table type introduced in SQL Server 2014. Unlike traditional tables, along with their indexes, memory-optimized tables reside entirely in memory. Memory-optimized tables are supported through disk structures but are not reliant on them for transactional processing. This differs compared to traditional tables, where the table is based on disk storage and typically only a portion of the table and its indexes are in memory.

The value provided by memory-optimized tables is the performance gains that creating tables in this fashion provides for a database. By hosting and managing the entire table in memory, transactions do not need to wait for the disk to bring data to memory for transaction processing to proceed.

The implementation of memory-optimized tables results in a few changes to the way in which tables are architected in SQL Server. First, since the tables are now memory resident, it makes more sense for them to be structured in a manner that is optimal for accessing the data in memory versus retrieving a subset of the data from disk. For this reason, memory-optimized tables use hash and range indexes versus B-trees for storing data. Additionally, the tables don’t synchronize to disk or move around in memory like a traditional table, removing the need for latching between disk and memory structures.

To create memory-optimized tables in a database, there are a few things that need to be prepared within a database. To begin, a filegroup dedicated to memory-optimized data needs to be added to the database with a file to support the memory-optimized tables. Additionally, the database should have the property MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT enabled in most scenarios. This property hints to all transactions against memory-optimized tables to set the isolation level to SNAPSHOT. In Listing 7-1, the database MemOptIndexing is prepared using these settings.

Image Note  The file location for the filestream indicated in Listing 7-1 may need to change to complete in your environment.

To see how to create a memory-optimized table, review the code in Listing 7-2. In this code example, you create the table dbo.SalesOrderHeader. There are two items to be aware of in the table schema. First, the option that creates the table as a memory-optimized table is the MEMORY_OPTIMIZED=ON option. The second is the inclusion of a NONCLUSTERED HASH index on the table to index the data within memory. Other than those items, the table is much like any other table created in SQL Server.

The additional code in Listing 7-2 inserts data into MemOptIndexing.dbo.SalesOrderHeader and queries that same data. To demonstrate the impact of querying the data in a memory-optimized table, a similar query against AdventureWorks2014.sales.SalesOrderHeader is included. Examining the results, shown in Listing 7-3, provides a few insights into memory-optimized tables. First, there is no I/O impact from the memory-optimized tables. While the AdventureWorks2014.sales.SalesOrderHeader query requires 689 reads, there are no reads for MemOptIndexing.dbo.SalesOrderHeader. Second, the amount of CPU time for MemOptIndexing.dbo.SalesOrderHeader is much lower, at 16 ms, than the CPU for the query against AdventureWorks2014.sales.SalesOrderHeader, which is 78 ms.

While there are many more aspects of memory-optimized tables that can be discussed, this overview is intended to provide some of the most basic aspects. The rest of this chapter will examine indexes on memory-optimized tables. While memory-optimized tables are completely in memory, indexes are still required. Being in memory doesn’t prevent the need to find specific data or filter resultsets.

To support indexing of memory-optimized tables, SQL Server supports two indexing options. These are hash and range indexes. Each memory-optimized table can support up to eight indexes. If a primary key is defined, this will be supported by one of the two index types and be one of the allowed indexes. If there is no primary key defined, the table must be created with at least one index. Additionally, since memory-optimized tables do not support schema changes, you will need to define all indexes for a memory-optimized table when it is created.

The remainder of this chapter will focus on the hash and range types of indexes with considerations for building each of them against a memory-optimized table.

Image Note  Index operations for memory-optimized tables are nonlogged activities since they occur only within memory and have no impact on the state of the data stored in the table.

Hash Indexes

The first index type for memory-optimized tables is the hash index. Hash indexes separate the data in the table into a fixed number of buckets. Rows inserted into the table then use a hash function to assign rows to the available buckets. These buckets provide the ability for queries to return specific rows based on the point lookup operations. Hash indexes are designed for types of query workloads where individual rows from the tables need to be retrieved.

When creating hash indexes, the number of buckets to create is a function of the number of rows planned, or expected, for the table. If there will be a large number of rows, a larger bucket count is required. This is an important part of creating and tuning hash indexes on memory-optimized tables. As the number of rows in each bucket increases, the time required to retrieve data increases. The ratio of rows to buckets is something that needs to be carefully considered.

It is generally recommended to over-allocate buckets to hash indexes, with the range of buckets recommended to be between two to five times the number of rows. While this is the recommended practice, it is important to consider how you will be using the tables within your environment and size the buckets accordingly.

To demonstrate the impact of bucket size, Listing 7-4 creates two memory-optimized tables. Both tables have 1,000,000 rows in them, with the first table having 1,000 buckets and the second having 1,000,000 buckets. With this configuration, there will be approximately 1,000 rows per bucket for the first table and 1 row per bucket in the second table.

Image Warning  The code in Listing 7-4 can take up to five minutes to execute.

Prior to executing the next piece of code for this demo, create an Extended Events session based on the Query Detail Tracking template. The session should be created with the default configuration and then launched to the Extended Events live data viewer. Add the columns session_id, statement, writes, physical_reads, logical_reads, duration, and cpu_time to the live viewer window. Lastly, filter the session_id in the output by the session_id values for Listing 7-5, 7-6, 7-8, and 7-9 and the event sql_statement_completed.

When you execute a query against both tables to return the same row, shown in Listing 7-5, you get a slight performance difference between the two. In this sample execution, the execution time for the first query was 493 μs versus 70 μs, shown in Figure 7-1. While this difference is small in total duration, the difference between them for the same query is significant. In solutions where memory-optimized tables will be used to retrieve results, this kind of performance difference can be important.

9781484211199_Fig07-01.jpg

Figure 7-1. Duration for memory-optimized table queries with hash indexes

It’s important not to interpret the results of the last demo to indicate that a 1:1 ratio of buckets to rows is the best practice. If you run another set of queries that retrieves more than a single row, in this case five discrete rows as shown in Listing 7-6, the performance is impacted. In this case, the performance advantage shifts to buckets with more rows. The results now are 1,221 μs for the query on the first table versus 202,218 μs for the second table’s query, as shown in Figure 7-2.

9781484211199_Fig07-02.jpg

Figure 7-2. Duration for memory-optimized table queries with hash indexes

When working with hash indexes, it is important to understand how SQL Server is using the buckets in the hash. One important thing to note is that just because there are enough buckets for each table to have their own bucket, that doesn’t mean each row will get its own bucket. To review the statistics for hash indexes created in this chapter, run the query in Listing 7-7 that accesses the DMV sys.dm_db_xtp_hash_index_stats. This DMV provides information on the number of buckets and how those buckets are populated.

Reviewing the results of Listing 7-7, provided in Figure 7-3, you can see there are a few interesting items to notice. To start, the first table with the 1,000 buckets specified (SalesOrderHeader_low) actually has 1,024 buckets for the index. This is because buckets are created in allocations that align to the power of two. This is the same reason there are 1,048,576 buckets for the 1,000,000 bucket index on SalesOrderHeader_high. The next interesting piece is the number of empty buckets in the hash index on SalesOrderHeader_high. With 1,000,000 rows and more than a million buckets, there are still 37 percent of the buckets that are empty. This happens because with the deterministic hashing function, some hashed values are repeated within the range of values before all the buckets are utilized. This is something to consider when building hash indexes, especially when aiming to have a 1:1 ratio of buckets to rows.

9781484211199_Fig07-03.jpg

Figure 7-3. Output from hash bucket statistics query

Image Note  The query performance details were captured using an Extended Events session based on the Query Detail Tracking template with a filter for the session that included the demonstration queries. You can find more information on building sessions at https://www.simple-talk.com/sql/database-administration/getting-started-with-extended-events-in-sql-server-2012/.

Hash indexes with memory-optimized tables are an important indexing choice when there will be many queries that will access individual rows and seek operations that will provide optimal plans. When building the tables and hash indexes, focus on setting the number of buckets to a size that presents a reasonable ratio of rows to buckets with consideration for the number of rows that will be retrieved through queries.

Range Indexes

The second type of index that is supported for memory-optimized tables is the range index. Range indexes are used to support, as the name implies, range scans of data, along with ordered scans. They leverage a variation of a B-tree, which Microsoft is calling a Bw-tree. The key difference between these two structures is the reference between the nodes in the Bw-tree, which refers to memory locations versus physical page location. When it comes to determining whether to include a range index on a memory-optimized table, the primary consideration will be whether there will be range scans or ORDER BY statements that the table needs to support.

Image Note  You can find more information on Bw-trees at http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf.

To create a range index on a memory-optimized table, you declare the index within the schema of the table by indicating a NONCLUSTERED index with the key values. As shown in Listing 7-8, the index IX_SalesOrderHeader is a range index on the SalesOrderID column. Unlike hash indexes, there are no other configuration items to consider and the index doesn’t need any bucket consideration when they are created.

To demonstrate the value of the range index on memory-optimized tables, let’s use the table created in Listing 7-8 on some queries that will leverage range scans. For this, you will use Listing 7-9, which queries the new table (dbo.SalesOrderHeader_high_range) and the table previously created with just a hash index (dbo.SalesOrderHeader_high). By executing a query against both of those tables for the rows with SalesOrderID between 100 and 10,000, you can see there is a big difference in the execution time. The query with just the hash index on the table runs in 372 ms (see Figure 7-4), while the query against the table with the range index runs in 180 ms. In this case, range indexes provide a substantial performance improvement.

9781484211199_Fig07-04.jpg

Figure 7-4. Duration for memory-optimized table queries with range scan

In a similar fashion, ORDER BY statements are also greatly improved when range indexes on memory-optimized tables are available. Using the code in Listing 7-10, you run two queries against the same tables from the previous demonstration. In this case, using the output in Figure 7-5, you see that the range scan took 103 μs compared to the 372,000 μs that the hash index requires. Again, you get a substantial performance improvement through the use of the range index.

9781484211199_Fig07-05.jpg

Figure 7-5. Duration for memory-optimized table queries with TOP clause and ORDER BY

Similar to hash indexes, range indexes provide a substantial performance improvement opportunity when creating memory-optimized tables. The need to perform range scans and order results is a common scenario in many applications. These operations are greatly improved by the use of range indexes.

Summary

This chapter looked at the types of indexes available for memory-optimized tables. These options, which include hash and range indexes, are the power behind the ability for memory-optimized tables to provide the incredible performance that they deliver. As demonstrated, each index types aligns to different querying patterns on your tables, and it is important to understand those patterns to build the right indexes for your memory-optimized tables and the workloads that they support.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.157