Setting up for Linux systems

If you are using a Linux system, you need to manage extra setup to improve performance or to resolve production problems with many indices.

This recipe covers two common errors that happened in production:

  • Too many open files that can corrupt your indices and your data
  • Slow performance in search and indexing due to garbage collector

Note

The other possible big troubles arise when you go out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data loss, it is best practice to monitor the storage spaces.

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in this chapter and a simple text editor to change configuration files.

How to do it...

For improving the performances on Linux systems, we will perform the following steps:

  1. First you need to change the current limit for the user that runs the Elasticsearch server. In these examples, we call it elasticsearch.
  2. For allowing Elasticsearch to manage a large number of files, you need to increment the number of file descriptors (number of files) that a user can manage. To do so, you must edit your /etc/security/limits.conf and add these lines at the end:
            elasticsearch - nofile 65536
            elasticsearch - memlock unlimited
  3. Then, a machine restart is required to be sure that the changes are taken.
  4. The new version of Ubuntu (that is, version 16.04 or more) could skip the /etc/security/limits.conf in init.d scripts; in these cases you need to edit /etc/pam.d/su and uncomment the following line:
            # session    required   pam_limits.so 
    
  5. For controlling memory swapping, you need to set up this parameter in elasticsearch.yml:
             bootstrap.memory_lock
  6. To fix the memory usage size of Elasticsearch server, we need to set up up to the same value ES_MIN_MEM and ES_MAX_MEM in $ES_HOME/bin/elasticsearch.in.sh. You can otherwise setup ES_HEAP_SIZE that automatically initializes the min and max values to the same.

How it works...

The standard limit of file descriptors (max number of open files for user) is typically 1024. When you store a lot of records in several indices, you run out of file descriptors very quickly, so your Elasticsearch server becomes unresponsive and your indices may become corrupted and as a result make you lose your data.

Changing the limit to a very high number, your Elasticsearch doesn't hit the maximum number of open files.

The other settings for memory prevent Elasticsearch swapping memory and give a performance boost in a production environment. This setting is required because during indexing and searching Elasticsearch creates and destroys a lot of objects in memory. This large number of create/destroy actions fragments the memory reducing performances: the memory becomes full of holes and when the system needs to allocate more memory it suffers an overhead to find compacted memory. If you don't set bootstrap.memory_lock: true, Elasticsearch dumps the whole process memory on disk and defragments it back in memory, freezing the system. With this setting, the defragmentation step is done all in memory, with a huge performance boost.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68