The latest version of Elasticsearch provides a new way to optimize the index, via the shrink API it's possible to reduce the number of shards of an index.
This feature targets several common scenarios:
You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To correctly execute the following commands, use the index created in the Creating an index recipe.
The HTTP method used is POST
. The URL format for optimizing one or more indices, is:
http://<server>/<source_index_name>/_shrink/<target_index_name>
To shrink an index, we will perform the steps given as follows:
_nodes
API:curl -XGET 'http://localhost:9200/_nodes?pretty'
In the result there will be a similar section:
.... "nodes" : { "5Sei9ip8Qhee3J0o9dTV4g" : { "name" : "Gin Genie", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.0.0-alpha4", ....
The name of my node is Gin Genie
curl -XPUT 'http://localhost:9200/myindex/_settings' -d ' { "settings": { "index.routing.allocation.require._name": "Gin Genie", "index.blocks.write": true } }'
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
The result will be:
{ "cluster_name" : "ESCookBook3", "status" : "green", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 15, "active_shards" : 15, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 15, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 50.0 }
curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
reduced_index
, will be:curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression" }, "aliases": { "my_search_indices": {} } }'
{"acknowledged":true}
yellow
status if the index it is ready to work:curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow'
curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
The shrink API reduces the number of shards, executing the following steps:
Elasticsearch recovers the target index as though it were a closed index that has just been reopened. On a Linux system the process is very fast due to hard-links.
The prerequisites for executing a shrink are as follows:
This Elasticsearch functionality provides support for new scenarios in Elasticsearch usage.
The first scenario is when you over estimate the number of shards. If you don't know your data, it's difficult to choose the correct number of shards to be used. So often a Elasticsearch user tends to oversize the number of shards.
Another interesting scenario is to use shrinking to provide a boost at indexing time. The main way to speed up Elasticsearch writing capabilities to a high number of documents is to create indices with a lot of shards (in general, the ingestion speed is about equal to the number of shards multiplied for document/second ingested by a single shard). The standard allocation moves the shards on different nodes, so generally the more shards you have, the faster the writing speed: so to achieve fast writing speed you create 15 or 30 shards for index. After the indexing phase, the index doesn't receive new records (such as time-based indices): the index is only searched, so to speed up the search you can shrink your shards.
18.117.170.65