CHAPTER 1

image

Why In-Memory OLTP?

This introductory chapter explains the importance of in-memory databases and the problems they address. It provides an overview of the Microsoft In-Memory OLTP implementation (code name Hekaton) and its design goals. Finally, this chapter discusses the high-level architecture of the In-Memory OLTP Engine and how it is integrated into SQL Server.

Background

Way back when SQL Server and other major databases were originally designed, hardware was very expensive. Servers at that time had just one or very few CPUs, and a small amount of installed memory. Database servers had to work with data that resided on disk, loading it into memory on demand.

The situation has changed dramatically since then. During the last 30 years, memory prices have dropped by a factor of 10 every 5 years. Hardware has become more affordable. It is now entirely possible to buy a server with 32 cores and 1TB of RAM for less than $50,000. While it is also true that databases have become larger, it is often possible for active operational data to fit into the memory.

Obviously, it is beneficial to have data cached in the buffer pool. It reduces the load on the I/O subsystem and improves system performance. However, when systems work under a heavy concurrent load, it is often not enough. SQL Server manages and protects page structures in memory, which introduces large overhead and does not scale well. Even with row-level locking, multiple sessions cannot modify data on the same data page simultaneously and must wait for each other.

Perhaps the last sentence needs to be clarified. Obviously, multiple sessions can modify data rows on the same data page, holding exclusive (X) locks on different rows simultaneously. However, they cannot update physical data page and row objects simultaneously because it could corrupt the in-memory page structure. SQL Server addresses this problem by protecting pages with latches. Latches work in a similar manner to locks, protecting internal SQL Server data structures on the physical level by serializing access to them, so only one thread can update data on the data page in memory at any given point of time.

In the end, this limits the improvements that can be achieved with the current database systems architecture. Although you can scale hardware by adding more CPUs and cores, that serialization quickly becomes a bottleneck and a limiting factor in improving system scalability. Likewise, you cannot improve performance by increasing the CPU clock speed because the silicon chips would melt down. Therefore, the only feasible way to improve database system performance is by reducing the number of CPU instructions that need to be executed to perform an action.

Unfortunately, code optimization is not enough by itself. Consider the situation where you need to update a row in a table. Even when you know the clustered key value, that operation needs to traverse the clustered index tree, obtaining latches and locks on the data pages and a row. In some cases, it needs to update nonclustered indexes, obtaining the latches and locks there. All of that generates log records and requires writing them and the dirty data pages to disk.

All of those actions can lead to a hundred thousand or even millions of CPU instructions to execute. Code optimization can help reduce this number to some degree, but it is impossible to reduce it dramatically without changing the system architecture and the way the system stores and works with data.

These trends and architectural limitations led the Microsoft team to the conclusion that a true in-memory solution should be built using different design principles and architecture than the classic SQL Server Database Engine. The original concept was proposed at the end of 2008, serious planning and design started in 2010, and actual development began in 2011.

The main goal of the project was to build a solution that will be 100 times faster than the existing SQL Server Engine, which explains the code name Hekaton (Greek for one hundred). This goal has yet to be achieved; however, the first production release of In-Memory OLTP can provide 20X-40X performance improvements in certain scenarios.

It is also worth mentioning that the Hekaton design has been targeted towards the OLTP workload. As all of us know, specialized solutions designed for particular tasks and workloads usually outperform general-purpose systems in the targeted areas. The same is true for In-Memory OLTP. It shines with large and very busy OLTP systems that support hundreds or even thousands of concurrent users. At the same time, In-Memory OLTP is not necessarily the best choice for Data Warehouse workload, where other SQL Server components could outperform it.

In-Memory OLTP has been designed with the following goals:

  • Optimize data storage for main memory: Data in In-Memory OLTP is not stored on on-disk data pages nor does it mimic an on-disk storage structure when loaded into memory. This permits the elimination of the complex buffer pool structure and the code that manages it. Moreover, indexes are not persisted on disk, and they are re-created upon startup when the data from memory-resident tables is loaded into memory.
  • Eliminate latches and locks: All In-Memory OLTP internal data structures are latch- and lock-free. In-Memory OLTP uses a new multiversion concurrency control to provide transaction consistency. From a user standpoint, it behaves similar to the regular SNAPSHOT transaction isolation level; however, it does not use a locking or tempdb version store under the hood. This schema allows multiple sessions to work with the same data without locking and blocking each other and improves the scalability of the system, allowing it to fully utilize modern multi-CPU/multi-core hardware.
  • Use native compilation: T-SQL is an interpreted language that provides great flexibility at the cost of CPU overhead. Even a simple statement requires hundreds of thousands of CPU instructions to execute. The In-Memory OLTP Engine addresses this by compiling row-access logic and stored procedures into native machine code.

The In-Memory OLTP Engine is fully integrated in the SQL Server Engine, which is the key differentiator of the Microsoft implementation compared to other in-memory database solutions. You do not need to perform complex system refactoring, splitting data between in-memory and conventional database servers, or moving all of the data from the database into memory. You can separate in-memory and disk data on a table-by-table basis, which allows you to move active operational data into memory, keeping other tables and historical data on disk. In some cases, that migration can even be done transparently to client applications.

It sounds too good to be true and, unfortunately, there are still plenty of roadblocks that you may encounter when working with this technology. The first release of In-Memory OLTP supports just a subset of the SQL Server data types and features, which often requires you to perform code and schema refactoring to utilize it. We will discuss those limitations later in the book; however, you need to know that Microsoft is fully committed to this project. You can expect that future versions of In-Memory OLTP will have a bigger surface area and fewer restrictions compared to the initial release. In fact, you can already see the changes in the CTP releases of SQL Server 2016 and in SQL Databases in Microsoft Azure.

In-Memory OLTP Engine Architecture

In-Memory OLTP is fully integrated into SQL Server, and other SQL Server features and client applications can access it transparently. Internally, however, it works and behaves very differently than the SQL Server Storage Engine. Figure 1-1 shows the architecture of the SQL Server Engine including the In-Memory OLTP components.

9781484211373_Fig01-01.jpg

Figure 1-1. SQL Server Engine architecture

In-Memory OLTP stores the data in memory-optimized tables. These tables reside completely in memory and have a different structure compared to the classic on-disk tables. With one small exception, memory-optimized tables do not store data on the data pages linking the rows together through the chains of memory pointers. It is also worth noting that memory-optimized tables do not share memory with on-disk tables and live outside of the buffer pool.

Image Note  We will discuss memory-optimized tables in detail in Chapter 3.

There are two ways the Database Engine can work with memory-optimized tables. The first is the interop engine. It allows you to reference memory-optimized tables from interpreted T-SQL code. The data location is transparent to the queries; you can access memory-optimized tables, join them together and with on-disk tables, and work with them in the usual way. Most T-SQL features and language constructs are supported in this mode.

You can also access and work with memory-optimized tables using natively compiled stored procedures. You can define those procedures similarly to the regular T-SQL stored procedures using several additional language constructs introduced by In-Memory OLTP.

Natively compiled stored procedures have been compiled into machine code and loaded into SQL Server process memory. Those procedures can introduce significant performance improvements compared to the interop engine; however, they support just a limited set of T-SQL constructs and can access only memory-optimized tables.

Image Note  We will discuss natively compiled stored procedures in Chapter 6.

Now, it’s time to see how In-Memory OLTP components work together, which we will do in the next chapter.

Summary

In-Memory OLTP has been designed using different design principles and architecture than the classic SQL Server Engine. It is a specialized product targeted towards the OLTP workload and can provide up to 20X-40X performance improvements in certain scenarios. Nevertheless, it is fully integrated into the SQL Server Database Engine. The data storage is transparent to the client applications, which do not require any code changes if they use the features supported by In-Memory OLTP.

The data from memory-optimized tables is stored in memory separately from the buffer pool. All of the In-Memory OLTP data structures are completely latch- and lock-free, which allows scaling the systems by adding more CPUs to the servers.

In-Memory OLTP uses native compilation to the machine code for any row-access logic. Moreover, it allows performing native compilation of the stored procedures, which dramatically increase their performance.

The first release of In-Memory OLTP has a large number of limitations; however, Microsoft is fully committed to the project and will address these limitations in future In-Memory OLTP releases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.218.45