ForceMerge an index

The Elasticsearch core is based on Lucene, which stores the data in segments on disk. During the life of an index, a lot of segments are created and changed. With the increase of segment number, the speed of search is decreased due to the time required to read all of them. The ForceMerge operation allows us to consolidate the index for faster search performance and reducing segments.

Getting ready

You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, use the index created in the Creating an index recipe.

How to do it...

The HTTP method used is POST. The URL format for optimizing one or more indices, is:

http://<server>/<index_name(s)>/_flush[?refresh=True]

The URL format for optimizing all the indices in a cluster, is:

http://<server>/_flush[?refresh=True]

For optimizing or to ForceMerge an index, we will perform the steps given as follows:

  1. If we consider the index created in the  Creating an index  recipe, the call will be:
            curl -XPOST 'http://localhost:9200/myindex/_forcemerge'
    
  2. The result returned by Elasticsearch should be:
            {
              "_shards" : {
                "total" : 10,
                "successful" : 5,
                "failed" : 0
              }
            }
    

The result contains the shard operation status.

How it works...

Lucene stores your data in several segments on disk. These segments are created when you index a new document/record or when you delete a document.

In Elasticsearch the deleted document is not removed from disk, but marked deleted (tombstone), to free up space you need to ForceMerge to purge deleted documents.

Due to all these factors the segment number can be large. (For this reason, in the setup we have increased the file description number for Elasticsearch processes.)

Internally Elasticsearch has a merger, which tries to reduce the number of segments, but it's designed to improve the index performances rather than search performances. The ForceMerge operation in Lucene tries to reduce the segments in an IO-heavy way, removing unused ones, purging deleted documents, and rebuilding the index with the minor number of segments.

The main advantages are:

  • Reducing both file descriptors
  • Freeing memory used by the segment readers
  • Improving performance in search due to less segments management

Note

ForceMerge is a very IO-heavy operation. The index can be unresponsive during this optimization. It is generally executed on indices that rarely are modified, such as Logstash previous days.

There's more...

You can pass several additional parameters to the ForceMerge call, such as:

  • max_num_segments: The default value is autodetect. For full optimization, set this value to 1.
  • only_expunge_deletes: The default value is false. Lucene does not delete documents from segments, but it marks them as deleted. This flag only merges segments that have been deleted.
  • flush: The default value is true. Elasticsearch performs a flush after force merge.
  • wait_for_merge: The default value is true. If the request needs to wait then the merge ends.

See also

  • In this chapter, refer to the Refreshing an index recipe to search for more recent indexed data and the Flushing an index recipe to force indexed data writing on disk
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.43.26