In-memory databases

In-memory databases, as the name implies, leverage the computer memory; that is, the RAM, to store datasets. Before we look into how in-memory databases work, it would be worthwhile to recollect how data transfer happens in a typical computer:

Simple Data Flow Computer Hierarchy

As shown in the preceding image, data traverses from disk to memory to the CPU. This is a very high-level generalization of the exact process as there are conditions under which the CPU does not need to send an instruction to read data from memory (such as when the data is already present in the CPU L2 Cache - a part of the CPU that contains memory reserved for caching data), but fundamentally the process is linear between the CPU, RAM, and disk.

Data that is stored on disk can be transferred to the memory at a certain rate that is dependent on the I/O (Input/Output) throughput of the disk. It takes approximately 10-20 milliseconds (ms) to access data from disk. While the exact number varies depending on the size of the data, the minimum seek time (time for the disk to find the location of the data) in itself is approximately 10-15 ms. Compare this with the time it takes to fetch data from memory, which is approximately 100 nanoseconds. Finally, it takes approximately 7 ns to read data from the CPU L2 Cache.

To put this into perspective, the disk access time of 15 milliseconds, namely, 15,000,000 nanoseconds is 150,000 times slower than the time it takes to access data from memory. In other words, data that is already present in memory can be read at an astounding 150 thousand times faster relative to disk. This is essentially true of reading random data. The time to read sequential data is arguably less sensational, but still nearly an order of magnitude faster.

If the disk and RAM were represented as cars, the RAM car would have gone all the way to the moon and be on its way back in the time it would take the disk car to go barely two miles. That is how large the difference is.

Hence, it is natural to conclude from this that if the data were stored in RAM, especially in the case of larger datasets, the access time would be dramatically lower, and consequently the time to process the data (at least on the I/O level) would be significantly reduced.

Traditionally, all data in terms of databases was stored on disk. With the advent of the internet, the industry started leveraging memcached, which provided a means to store data in key-value pairs in memory via an API. For example, it was, and still is, common for MySQL databases to leverage the memcached API to cache objects in memory to optimize read speeds as well as reduce the load on the primary (MySQL) database.

However, as data volumes started to increase, the complexity of using the database and memcached method started to take it's toll, and databases that were exclusively designed to store data in memory (and sometimes both on disk and in memory) were being developed at a rapid pace.

As a result, in-memory databases such as Redis started replacing memcached as the fast cache store for driving websites. In the case of Redis, although the data would be held in memory as key-value pairs, there was an option to persist the data on disk. This differentiated it from solutions such as memcached that were strictly memory caches.

The primary drivers of the move towards in-memory databases can be summarized as follows:

  • Complexity of managing increasing volumes of data such as web traffic by the traditional, for example, MySQL + memcached combination
  • Reduced RAM costs, making it more affordable to purchase larger sizes
  • Overall industry drive towards NoSQL technologies that led to increased focus and community participation towards the development of newer, innovative database platforms
  • Faster data manipulation in memory provided a means to reduce I/O overhead in situations that demanded ultra-fast, low-latency processing of data

Today, some of the leading options for databases that provide in-memory capabilities in the industry include:

Open source

Commercial

Redis

Kdb+

memcacheDB

Oracle TimesTen

Aerospike

SAP HANA

VoltDB

HP Vertica

Apache Ignite

Altibase

Apache Geode

Oracle Exalytics

MonetDB

MemSQL

Note that some of these support hybrid architectures whereby data can reside in memory as well as on disk. In general, data would be transferred from memory to disk for persistence. Also, note that some commercial in-memory databases offer community editions that can be downloaded and used at no charge within the terms of the licenses applicable to the respective solution. In these cases, they are both open source as well as commercial.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.151.220