Shrinking an index

The latest version of Elasticsearch provides a new way to optimize the index, via the shrink API it's possible to reduce the number of shards of an index.

This feature targets several common scenarios:

  • The wrong number of shards during the initial design sizing. Often sizing the shards without knowing the correct data/text distribution tends to oversize the number of shards
  • Reducing the number of shards to reduce memory and resource usage
  • Reducing the number of shards to speed up searching

Getting ready

You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, use the index created in the Creating an index recipe.

How to do it...

The HTTP method used is POST. The URL format for optimizing one or more indices, is:

http://<server>/<source_index_name>/_shrink/<target_index_name>

To shrink an index, we will perform the steps given as follows:

  1. We need all the primary shards of the index to be shrinking in the same node. We need the name of the node that will contain the shrink index. We can retrieve it via the _nodes API:
            curl -XGET 'http://localhost:9200/_nodes?pretty'
    

    In the result there will be a similar section:

                    ....
                    "nodes" : {
                         "5Sei9ip8Qhee3J0o9dTV4g" : {
                         "name" : "Gin Genie",
                         "transport_address" : "127.0.0.1:9300",
                         "host" : "127.0.0.1",
                         "ip" : "127.0.0.1",
                         "version" : "5.0.0-alpha4",
                    ....
    

    The name of my node is Gin Genie

  2. Now we can change the index settings, forcing allocation to a single node for our index, and disabling the writing for the index. This can be done via:
            curl -XPUT 'http://localhost:9200/myindex/_settings' -d '
            {
              "settings": {
              "index.routing.allocation.require._name": "Gin Genie", 
              "index.blocks.write": true 
             }
            }'
    
  3. We need to check if all the shards are relocated. We can check for the green status:
            curl -XGET 'http://localhost:9200/_cluster/health?pretty' 
    

    The result will be:

                {
                  "cluster_name" : "ESCookBook3",
                  "status" : "green",
                  "timed_out" : false,
                  "number_of_nodes" : 2,
                  "number_of_data_nodes" : 2,
                  "active_primary_shards" : 15,
                  "active_shards" : 15,
                  "relocating_shards" : 0,
                  "initializing_shards" : 0,
                  "unassigned_shards" : 15,
                  "delayed_unassigned_shards" : 0,
                  "number_of_pending_tasks" : 0,
                  "number_of_in_flight_fetch" : 0,
                  "task_max_waiting_in_queue_millis" : 0,
                  "active_shards_percent_as_number" : 50.0
               }
    
  4. The index should be in a read-only state to shrink. We need to disable the writing for the index via:
            curl -XPUT 'http://localhost:9200/myindex/_settings?
            index.blocks.write=true'
    
  5. If we consider the index created in the  Creating an index  recipe, the shrink call for creating the reduced_index, will be:
            curl -XPOST 
            'http://localhost:9200/myindex/_shrink/reduced_index' -d '{
            "settings": {
            "index.number_of_replicas": 1,
            "index.number_of_shards": 1, 
            "index.codec": "best_compression" 
               },
            "aliases": {
            "my_search_indices": {}
              }
            }'
    
  6. The result returned by Elasticsearch should be:
           {"acknowledged":true}
    
  7. We can also wait for a yellow status if the index it is ready to work:
            curl -XGET 'http://localhost:9200/_cluster/health?
            wait_for_status=yellow'
    
  8. Now we can remove the read-only by changing the index settings:
            curl -XPUT 'http://localhost:9200/myindex/_settings?
            index.blocks.write=true'
    

How it works...

The shrink API reduces the number of shards, executing the following steps:

  1. Elasticsearch creates a new target index with the same definition as the source index, but with a smaller number of primary shards.
  2. Elasticsearch hard-links (or copies) segments from the source index into the target index.

Note

If the filesystem doesn't support hard-linking, then all segments are copied into the new index, which is a much more time consuming process.

Elasticsearch recovers the target index as though it were a closed index that has just been reopened. On a Linux system the process is very fast due to hard-links.

The prerequisites for executing a shrink are as follows:

  • All the primary shards must be on the same node
  • The target index must not exist
  • The target number of shards must be a factor of the number of shards in the source index

There's more...

This Elasticsearch functionality provides support for new scenarios in Elasticsearch usage.

The first scenario is when you over estimate the number of shards. If you don't know your data, it's difficult to choose the correct number of shards to be used. So often a Elasticsearch user tends to oversize the number of shards.

Another interesting scenario is to use shrinking to provide a boost at indexing time. The main way to speed up Elasticsearch writing capabilities to a high number of documents is to create indices with a lot of shards (in general, the ingestion speed is about equal to the number of shards multiplied for document/second ingested by a single shard). The standard allocation moves the shards on different nodes, so generally the more shards you have, the faster the writing speed: so to achieve fast writing speed you create 15 or 30 shards for index. After the indexing phase, the index doesn't receive new records (such as time-based indices): the index is only searched, so to speed up the search you can shrink your shards.

See also

  • In this chapter, refer to the ForceMerge an index recipe to optimize your indices for searching
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.170.65