Selecting hardware

Elasticsearch primarily has memory-bound tasks which rely on the inverted index. The more data that it can fit in the RAM, the faster the performance will be. But this statement cannot always be generalized. It depends on the nature of your data and the type of operations or workload that you are going to have.

Using Elasticsearch doesn't mean that it has to perform all operations in-memory. Elasticsearch also uses on-disk data very efficiently, especially for aggregation operations.

All datatypes (except analyzed strings) support a special data structure called doc_values, which organizes the data on the disk in a columnar fashion. doc_values is useful for sorting and aggregation operations. Since doc_values is enabled by default for all datatypes except analyzed strings, it makes sorts and aggregations run mostly off the disk. Those fields do not need to be loaded in memory to aggregate or sort by them.

As Elasticsearch can scale horizontally, this is a relatively easy decision to make. It is fine to start with nodes of around 16 or 32 GB RAM, with around 8 CPU cores. As we will see in the coming sections, you cannot have Elasticsearch JVM with more than 32 GB of heap; effectively, there is no point in having a machine with more than 64 GB RAM. SSD hard disks are recommended if you are planning to do heavy aggregations.

It is important to benchmark with the initial hardware and then add more nodes or upgrade your nodes.

Table of Contents for Selecting hardware

Create new playlist

Sign In

Sign Up

Table of Contents for
Selecting hardware