In this chapter, we will cover the following recipes:
In the Elasticsearch ecosystem, it's important to monitor nodes and clusters to manage and improve their performance and state. There are several issues that can arise at cluster level, such as:
Detecting malfunctioning or poor performance can be done via an API or through some frontends, as we will see in Chapter 12, User Interfaces . These allow the readers to have a working web dashboard on their Elasticsearch data, monitoring the cluster health, backing/restoring their data and allowing the testing of queries before implementing them in the code.
In the Understanding cluster, replication and sharding recipe in Chapter 1, Getting Started , we discussed the Elasticsearch clusters and how to manage them in a red and yellow state.
Elasticsearch provides a convenient way to manage the cluster state, which is one of the first things to check if any problems occur.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via command line you need to install curl
for your operating system.
For controlling the cluster health, we will perform the following steps:
GET
and the curl
command is as follows:curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{ "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 7, "active_shards" : 7, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 7, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 50.0 }
Every Elasticsearch node keeps the cluster status. The status
can be of three types as follows:
green
: This means that everything is okay.yellow
: This means that some nodes or shards are missing, but they don't compromise the cluster functionality. Mainly some replicas are missing (a node is down or there are insufficient nodes for replicas), but there is a least one copy of each active shard; read and write are working. The yellow state is very common in the development stage, when users typically start a single Elasticsearch server.red
: This indicates that some primary shards are missing and these indices are in red status. You cannot write to the indices that are in red status and results may not be complete or only partial results may be returned. Generally, you'll need to restart the node that is down and possibly create some replicas.The cluster health contains a huge amount of information such as follows:
cluster_name
: This is the name of the cluster.timeout
: This is a Boolean value indicating whether the REST API hits the timeout set in the call.number_of_nodes
: This indicates the number of nodes that are in the cluster.number_of_data_nodes
: This shows the number of nodes that can store data (see Chapter 2,
Downloading and Setup
to set up different node types for different types of nodes).active_primary_shards
: This shows the number of active primary shards; the primary shards are the masters for writing operations.active_shards
: This shows the number of active shards. These shards can be used for search.relocating_shards
: This shows the number of shards that are relocating, migrating from a node to another one. This is due mainly to cluster node balancing.initializing_shards
: This shows the number of shards that are in the initializing status. The initializing process is done at shard startup. It's a transient state before becoming active and it's composed of several steps, of which the most important are as follows:
unassigned_shards
: This shows the number of shards that are not assigned to a node. This is usually due to having set a replica number larger than the number of nodes. During startup, shards not already initialized or initializing will be counted here.delayed_unassigned_shards
: This shows the number of shards that will be assigned, but their nodes are configured for a delayed assignment. You can get more information on delayed shard assignment at https://www.elastic.co/guide/en/elasticsearch/reference/5.0/delayed-allocation.html.number_of_pending_tasks
: This is the number of pending tasks at cluster level, such as updates to cluster state, creation indices, and shardsrelocations. It should rarely be anything other than 0.number_of_in_flight_fetch
: Number of cluster updates that must be executed in shards. As the cluster updates are asynchronous, this number is tracking how many still have to be executed in shards.task_max_waiting_in_queue_millis
: This is the maximum time that some cluster tasks have been waiting in the queue. It should rarely be anything other than 0. In case of value different from 0, it means that there are some cluster saturation of resource or similar problems.active_shards_percent_as_number
: This is the percentage of active shards of the total required by the cluster. In a production environment, it should rarely differ from 100 percent, apart from some relocations and shard initializations.Installed plugins can play an important role in shard initialization: if you use a mapping type provided by a native plugin and you remove the plugin (or the plugin cannot be initialized due to API changes), the shard initialization will fail. These issues are easily detected by reading the Elasticsearch log file.
This API call is very useful; it's possible to execute it against one or more indices to obtain their health in the cluster. This approach allows the isolation of indices with problems. The API call to execute this is:
curl -XGET 'http://localhost:9200/_cluster/health/index1,index2,indexN'
The previous calls also have additional request parameters to control the health of the cluster. Additional parameters could be:
level
: This controls the level of the health information that is returned. This parameter accepts only cluster
, index
and shards
.timeout
: This is the wait time for a wait_for_*
parameter (default 30s
).wait_for_status
: This allows the server to wait for the provided status (green
, yellow
or red
) until timeout.wait_for_relocating_shards
: This allows the server to wait until the provided number of relocating shards has been reached, or until the timeout period has been reached (default 0
).wait_for_nodes
: This waits until the defined number of nodes is available in the cluster. The value for this parameter can also be an expression, such as:
>N
,
>=N
,
<N
,
<=N
,
ge(N)
,
gt(N)
,
le(N)
,
lt(N)
.If the number of pending tasks is different from zero, it's good practice to investigate which those pending tasks are. They can be shown using the following API URL:
curl -XGET 'http://localhost:9200/_cluster/pending_tasks'
The return value is a list of pending tasks. Beware that Elasticsearch applies cluster changes very fast, so it often faster to apply those that show themselves to you.
The previous recipe returns information only about the health of the cluster. If you need more details on your cluster, you need to query its state.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command-line, you need to install curl
for your operating system.
To check the cluster state, we will perform the following steps:
GET
, and the curl
command is as follows:curl -XGET 'http://localhost:9200/_cluster/state'
General cluster information:
{ "cluster_name" : "es-cookbook", "version" : 13, "state_uuid" : "QANXXnzhS7aS5HxLlyNKsw", "master_node" : "7NwnFF1JTPOPhOYuP1AVNQ", "blocks" : { },
Node address information:
"nodes" : { "7NwnFF1JTPOPhOYuP1AVNQ" : { "name" : "7NwnFF1", "ephemeral_id" : "OL2uVn3BQ-qMAg32eq_ouQ", "transport_address" : "127.0.0.1:9300", "attributes" : { } } },
"metadata" : { "cluster_uuid" : "8SRm9IGDQcWU7-SoR6gWKg", "templates" : { ".monitoring-data-2" : {...} }, "indices" : { "test-index" : { "state" : "open", "settings" : { "index" : { "creation_date" : "1477683993903", "number_of_shards" : "5", "number_of_replicas" : "1", "uuid" : "KalW90nJSDCTMh42FH62iQ", "version" : { "created" : "5000099" }, "provided_name" : "test-index" } }, "mappings" : { "test-type" : {...truncated...} }, "aliases" : [ "my-cool-alias" ] } } },
"routing_table" : { "indices" : { "test-index" : { "shards" : { "2" : [{ "state" : "STARTED", "primary" : true, "node" : "7NwnFF1JTPOPhOYuP1AVNQ", "relocating_node" : null, "shard" : 2, "index" : "test-index", "allocation_id" : { "id" : "9I5Q8E0VTnCSq7qgpNjGGQ" } }, ... truncated... } } } },
"routing_nodes" : { "unassigned" : [ ], "nodes" : { " 7NwnFF1JTPOPhOYuP1AVNQ" : [ { "state" : "STARTED", "primary" : true, "node" : "7NwnFF1JTPOPhOYuP1AVNQ", "relocating_node" : null, "shard" : 0, "index" : ".monitoring-es-2-2016.10.27", "allocation_id" : { "id" : "F-wl5nuwQqO6lkeV6k83iQ" } ...truncated... ] } }, "allocations" : [ ] }
The cluster state contains the information of the whole cluster; it's normal that its output is very large.
The call output contains common fields, which are as follows:
cluster_name
: This is the name of the cluster.master_node
: This is the identifier of the master node. The master node is the primary node for cluster management, and several sections.blocks
: This section shows the active blocks in a cluster.nodes
: This shows the list of nodes of the cluster. For every node, we have:id
: This is the hash used to identify the node in Elasticsearch. (For example, 7NwnFF1JTPOPhOYuP1AVN
)name
: This is the name of the nodetransport_address
: This is the IP and port used to connect to this nodeattributes
: These are additional node attributesmetadata
: This is the definition of indices (their settings and mappings), ingest pipelines, and stored_scripts
.routing_table
: These are the indices/shards routing tables, which are used to select primary and secondary shards and their nodes.routing_nodes
: This is the routing for the nodes.The metadata section is the most used one, because it contains all the information related to the indices and their mappings. This is a convenient way to gather all the indices mappings in one shot; otherwise you'll need to call the get mapping for every type.
The metadata section is composed of several sections, as follows:
templates
: These are templates that control the dynamic mapping for created indicesindices
: These are the indices that exist in the cluster* ingest
: This stores all the ingest pipelines defined in the systemstored_scripts
: This stores the scripts, which are usually in the form of language#script_name
The indices subsection returns a full representation of all the metadata description for every index. It contains the following:
state
(open/closed): This describes if an index is open (it can be searched and can index data) or closed. (See the
Opening/Closing an Index
recipe in Chapter 4,
Basic Operations
)settings
: These are the index settings. The most important ones are as follows:index.number_of_replicas
: This is the number of replicas of this index. If can be changed with an update index settings call.index.number_of_shards
: This is the number of shards in this index. This value cannot be changed in an index.index.codec
: This is the codec used to store index data. default
is not shown, but the LZ4 algorithm is used. If you want a high compression rate use best_compression
and the DEFLATE algorithm (this will slow down the writing performances slightly).index.version.created
: This is the index version.mappings
: These are defined in the index. This section is similar to the get mapping response. (See the
How to Get a Mapping
recipe in Chapter 4,
Basic Operations
)alias
: This is a list of index aliases, which allows the aggregation of indices in a single name or the definition of alternative names for an index.The routing records for index and shards have similar fields and they are as follows:
state (UNASSIGNED, INITIALITING, STARTED, RELOCATING)
: This shows the state of the shard or indexprimary (true/false)
: This shows whether the shard or node is primarynode
: This shows the ID of the noderelocating_node
: This field, if validated, shows the node id
in which the shard is relocatedshard
: This shows the number of the shardindex
: This shows the name of the index in which the shard is containedThe cluster state call returns a lot of information, and it's possible to filter out the different section parts via the URL.
The complete form URL of the cluster state API is:
http://{elasticsearch_server}/_cluster/state/{metrics}/{indices}
The metrics
could be used to return only parts of the response. It's a comma separated list of the following values:
* version
: This is used to show the version part of the responseblocks
: This is used to show the blocks part of the responsemaster_node
: This is used to show the master node part of the responsenodes
: This is used to show the node part of the responsemetadata
: This is used to show the metadata part of the responserouting_table
: This is used to show the routing_table part of the responseThe indices
value is a comma separated list of index names to include in the in metadata.
The previous recipes allow information to be reutrned to the cluster level; Elasticsearch provides calls to gather information at node level. In production clusters, it's very important to monitor nodes via this API to detect misconfiguration and problems relating to different plugins and modules.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the commandline, you need to install curl
for your operating system.
For getting nodes information, we will perform the following steps:
GET
and the curl
command is as follows:curl -XGET 'http://localhost:9200/_nodes' curl -XGET 'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>'
{ "_nodes" : { "total" : 1, "successful" : 1, "failed" : 0 }, "cluster_name" : "elasticsearch", "nodes" : { "7NwnFF1JTPOPhOYuP1AVNQ" : { "name" : "7NwnFF1", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.0.0", "build_hash" : "253032b", "total_indexing_buffer" : 207775334, "roles" : [ "master", "data", "ingest" ], "settings" : { "cluster" : { "name" : "elasticsearch"}, "node" : {"name" : "7NwnFF1"}, "path" : {"logs" : ".../elasticsearch-5.0.0/logs", "home" : "...elasticsearch-5.0.0" }, "client" : {"type" : "node"}, "http" : {"type" : {"default" : "netty4"}}, "transport" : {"type" : {"default" : "netty4"}}, "script" : {"inline" : "true", "stored" : "true"} }, "os" : { "refresh_interval_in_millis" : 1000, "name" : "Mac OS X", "arch" : "x86_64", "version" : "10.12.1", "available_processors" : 8, "allocated_processors" : 8 }, "process" : { "refresh_interval_in_millis" : 1000, "id" : 82228, "mlockall" : false }, "jvm" : { "pid" : 82228, "version" : "1.8.0_101", "vm_name" : "Java HotSpot(TM) 64-Bit Server VM", "vm_version" : "25.101-b13", "vm_vendor" : "Oracle Corporation", "start_time_in_millis" : 1477840185555, "mem" : { "heap_init_in_bytes" : 2147483648, "heap_max_in_bytes" : 2077753344, "non_heap_init_in_bytes" : 2555904, "non_heap_max_in_bytes" : 0, "direct_max_in_bytes" : 2077753344 }, "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep"], "memory_pools" : [ "Code Cache", "Metaspace", "Compressed Class Space", "Par Eden Space", "Par Survivor Space", "CMS Old Gen" ], "using_compressed_ordinary_object_pointers" : "true" }, "thread_pool" : { "force_merge" : { "type" : "fixed", "min" : 1, "max" : 1, "queue_size" : -1 }, ... truncated ... }, "transport" : { "bound_address" : [ "[fe80::1]:9300", " [::1]:9300","127.0.0.1:9300"], "publish_address" : "127.0.0.1:9300", "profiles" : { } }, "http" : { "bound_address" : [ "[fe80::1]:9200", "[::1]:9200", "127.0.0.1:9200" ], "publish_address" : "127.0.0.1:9200", "max_content_length_in_bytes" : 104857600 }, "plugins" : [ { "name" : "lang-javascript", "version" : "5.0.0", "description" : "The JavaScript language plugin allows to have javascript as the language of scripts to execute.", "classname" : "org.elasticsearch.plugin.javascript.JavaScriptPlugin" }... truncated ... ], "modules" : [ { "name" : "aggs-matrix-stats", "version" : "5.0.0", "description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.","classname" : "org.elasticsearch.search.aggregations.matrix .MatrixAggregationPlugin" }... truncated ... ], "ingest" : { "processors" : [ { "type" : "append" }... truncated ... ] } } } }
The nodes information call provides an overview of the node configuration. It covers a lot of information; the most important sections are as follows:
hostname
: This is the name of the host.ip
: This is the IP of the host.version
: This is the Elasticsearch version. It's best practice that all the nodes of a cluster have the same Elasticsearch version.roles
: This is a list of roles that this node can cover. Developer nodes usually support the three kinds: master
, data
and ingest
.transport_address
: This is the address used by the node for cluster communication.settings
: This section contains information about the current cluster and path of the Elasticsearch node. The most important fields are as follows:cluster_name
: This is the name of the clusternode.name
: This is the name of the nodepath.*
: This is configured path of this Elasticsearch instancescript
: This section is useful to check the script
configuration of the nodeos:
This section provides operating system information about the node that is running Elasticsearch: processors available and allocated and the OS version.process
: This section contains information about the currently running Elasticsearch process.id
: This is the pid ID of the process.mlockall
: This flag defines whether Elasticsearch can use direct memory access. In production, this must be set to active.max_file_descriptors
: This is max file descriptor number.jvm
: This section contains information about the node Java Virtual Machine: version, vendor, name, pid, memory (heap and non-heap).
thread_pool
: This section contains information about several types of thread pool running in a node.transport
: This section contains information about the transport protocol. The transport protocol is used for intra-cluster communication or by the native client to communicate with a cluster. The response format is similar to the HTTP one, as follows:
bound_address
: If a specific IP is not set in the configuration, Elasticsearch bounds all the interfacespublish_address
: This is the address used for publishing the native transport protocolhttp
: This section gives information about http configuration, such as:bound_address
: This is the address bound by Elasticsearch.
max_content_length_in_bytes
(default 104857600
100 MB): This is the maximum size of HTTP content that Elasticsearch will allow to be received. HTTP payloads bigger than this size are rejected.
publish_address
: The address used to publish the Elasticsearch node.plugins
: This section lists every plugin installed in the node, providing information about the following:
modules
: This section lists every module installed in the node. The structure is the same as the plugin section.ingest
: This section contains the list of active processors in the ingest node.The API call allows filtering of the section that must be returned. In the example, we've returned the wholesection. Alternatively, we could select one or more of the following sections:
http
thread_pool
transport
jvm
os
process
plugins
modules
ingest
settings
For example, if you need only the os
and plugins
information, the call will be as follows:
curl -XGET 'http://localhost:9200/_nodes/os,plugins'
The node statistics call API is used to collect real-time metrics of your node, such as memory usage, threads usage, number of indexes, search and so on.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl via the command-line, you need to install curl for your operating system.
For getting nodes statistics, we will perform the following steps:
GET
, and the curl
command is as follows:curl -XGET 'http://localhost:9200/_nodes/stats'curl -XGET 'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>/stats'
A header describing the cluster name and the nodes section:
{ "cluster_name" : "es-cookbook", "nodes" : { " 7NwnFF1JTPOPhOYuP1AVNQ" : { "timestamp" : 1477990951146, "name" : "7NwnFF1", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1:9300", "roles" : [ "master", "data", "ingest" ],
Statistics related to the indices:
"indices" : { "docs" : { "count" : 2030, "deleted" : 0 }, "store" : { "size_in_bytes" : 3290318, "throttle_time_in_millis" : 0 }, "indexing" : { "index_total" : 2000, "index_time_in_millis" : 3901, "index_current" : 0, "index_failed" : 0, "delete_total" : 0, "delete_time_in_millis" : 0, "delete_current" : 0, "noop_update_total" : 0, "is_throttled" : false, "throttle_time_in_millis" : 0 }, ... truncated ... },
Statistics related to the operating system:
"os" : { "timestamp" : 1477990951181, "cpu" : { "percent" : 26, "load_average" : { "1m" : 3.34765625} }, "mem" : { "total_in_bytes" : 17179869184, "free_in_bytes" : 1112723456, "used_in_bytes" : 16067145728, "free_percent" : 6, "used_percent" : 94 }, ...truncated ... },
Statistics related to the current Elasticsearch process:
"process" : { "timestamp" : 1477990951181, "open_file_descriptors" : 283, "max_file_descriptors" : 10240, "cpu" : { "percent" : 0, "total_in_millis" : 247287 }, "mem" : { "total_virtual_in_bytes" : 6683983872 } },
Statistics related to the current JVM
"jvm" : { "timestamp" : 1477990951182, "uptime_in_millis" : 61315659, "mem" : { "heap_used_in_bytes" : 364406464, "heap_used_percent" : 17, "heap_committed_in_bytes" : 2077753344, "heap_max_in_bytes" : 2077753344, "non_heap_used_in_bytes" : 115590776, "non_heap_committed_in_bytes" : 122032128, ... truncated ... } },... truncated ... },
Statistics related to thread pools:
"thread_pool" : { "bulk" : { "threads" : 8, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 8, "completed" : 10 }, ...truncated.... },
Node filesystem statistics:
fs" : { "timestamp" : 1477990951182, "total" : { "total_in_bytes" : 999334871040, "free_in_bytes" : 75898884096, "available_in_bytes" : 75636740096 }, "data" : [ { "path" : .../elasticsearch-5.0.0/data/nodes/0", "mount" : "/ (/dev/disk1)", "type" : "hfs", "total_in_bytes" : 999334871040, "free_in_bytes" : 75898884096, "available_in_bytes" : 75636740096 } ] }...truncated...] },
Statistics related to communications between nodes:
"transport" : { "server_open" : 0, "rx_count" : 8, "rx_size_in_bytes" : 4264, "tx_count" : 8, "tx_size_in_bytes" : 4264 },
Statistics related to HTTP connections:
"http" : { "current_open" : 2, "total_opened" : 13 },
Statistics related to breaker caches:
breakers" : { "request" : { "limit_size_in_bytes" : 1246652006, "limit_size" : "1.1gb", "estimated_size_in_bytes" : 0, "estimated_size" : "0b", "overhead" : 1.0, "tripped" : 0 }, "fielddata" : { "limit_size_in_bytes" : 1246652006, "limit_size" : "1.1gb", "estimated_size_in_bytes" : 0, "estimated_size" : "0b", "overhead" : 1.03, "tripped" : 0 } ... truncated .... } } }
Script related to statistics:
"script" : { "compilations" : 2, "cache_evictions" : 0 },
Cluster state queue:
"discovery" : { "cluster_state_queue" : { "total" : 0, "pending" : 0, "committed" : 0 } }
Ingest statistics:
"ingest" : { "total" : { "count" : 0, "time_in_millis" : 0, "current" : 0, "failed" : 0 }, "pipelines" : { "xpack_monitoring_2" : { "count" : 0, "time_in_millis" : 0, "current" : 0, "failed" : 0 } } }
Every Elasticsearch node, during execution, collects statistics about several aspects of node management; these statistics are accessible via stats API call.
In the next recipes, we will see some example of monitoring applications that use this information to provide real-time status of a node or a cluster.
The main statistics collected by this API are as follows:
fs
: This section contains statistics about the filesystem; free space on devices, mount points, reads and writes. It can be used to remotely control disk usage for your nodes.http
: This gives the number of current open sockets and their maximum number.indices
: This section contains statistics of several indexing aspects:jvm
: This section provides statistics about buffer, pools, garbage collector (creation/destruction of objects and their memory management), memory (used memory, heap, pools), threads and uptime. It should be checked to see if the node is running out of memory.network
: This section provides statistics about TCP traffic, such as open connection, close connections, and data I/O.os
: This sections collects statistics about the Operating System, such as:process
: This section contains statistics about the CPU used by Elasticsearch, memory, and open file descriptors.
thread_pool
: This section monitors all the thread pools available in Elasticsearch. It's important, in the case of low performance, to control whether there are pools that have an excessive overhead. Some of them can be configured to a new maximum value.transport
: This section contains statistics about the transport layer, mainly bytes read and transmitted.breakers
: This section monitors the circuit breakers. It must be checked to see whether it's necessary to optimize resource or queries/aggregations to prevent them being called.The response is very large. It's possible to limit it requesting only required parts. To do this, you need to pass to the API call a query parameter specifying the following desired sections:
fs
http
indices
jvm
network
os
process
thread_pool
transport
breaker
discovery
script
ingest
For example, to request only os
and http
statistics the call becomes:
curl -XGET 'http://localhost:9200/_nodes/stats/os,http'
Elasicsearch 5.x allows the definition of actions that can take some time to complete. The most common ones are as follows:
delete_by_query
update_by_query
reindex
When these actions are called, they create a server side task that executes the job. The task management API allows you to control these actions.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command-line, you need to install curl
for your operating system.
For getting tasks information, we will perform the following steps:
GET
and the curl command is as follows:curl -XGET 'http://localhost:9200/_tasks'curl -XGET 'http://localhost:9200/_tasks?nodes=<nodeId1, nodeId2>'curl - XGET 'http://localhost:9200/_tasks?nodes=<nodeId1, nodeId2>&actions=cluster:'
{ "nodes" : { "7NwnFF1JTPOPhOYuP1AVNQ" : { "name" : "7NwnFF1", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1:9300", "roles" : [ "master", "data", "ingest" ], "tasks" : { "7NwnFF1JTPOPhOYuP1AVNQ:9822" : { "node" : "7NwnFF1JTPOPhOYuP1AVNQ", "id" : 9822, "type" : "transport", "action" : "cluster:monitor/tasks/lists", "start_time_in_millis" : 1477993984920, "running_time_in_nanos" : 102338, "cancellable" : false }, "7NwnFF1JTPOPhOYuP1AVNQ:9823" : { "node" : "7NwnFF1JTPOPhOYuP1AVNQ", "id" : 9823, "type" : "direct", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1477993984920, "running_time_in_nanos" : 62786, "cancellable" : false, "parent_task_id" : "7NwnFF1JTPOPhOYuP1AVNQ:9822" } } } } }
Every task that is executed in Elasticsearch is available in the task list.
The most important properties for the tasks are as follows:
node
: This defines the node that is executing the task.id
: This define the unique ID of the task.action
: This is the name of the action. It's generally composed by an action type, the :
separator and the detailed action.cancellable
: This defines if the task can be canceled. Some tasks such as delete/update by query
or reindex
can be canceled, other are mainly of management and cannot be canceled.parent_task_id
: This defines the group of tasks. Some tasks can be split and executed in several sub-tasks. This value can be used to group these tasks by parent.The id
of the task can be used to filter the response via the node_id
parameter in the API call:
curl -XGET 'http://localhost:9200/_tasks/7NwnFF1JTPOPhOYuP1AVNQ:9822'
If you need to monitor a group of tasks, you can filter by their parent_task_id
with a similar API call:
curl -XGET 'http://localhost:9200/_tasks ?parent_task_id=7NwnFF1JTPOPhOYuP1AVNQ:9822'
Generally, canceling a task could produce some data inconsistency in Elasticsearch due to partial updating or deleting of documents; but, when reindexing, it can make good sense. It's common, when you are reindexing a huge amount of data, that you need to change the mapping or reindex a script in the middle of it. So, in order to not waste time and CPU usage, canceling the reindexing is a sensible solution.
To cancel a task, the API URL is as follows:
curl -XPOST 'http://localhost:9200/_tasks/task_id:1/_cancel'
In the case of a group of tasks, they can be stopped with a single cancel
call using query arguments to select them as follows:
curl -XPOST 'http://localhost:9200/_tasks/_cancel?nodes=nodeId1,nodeId2&actions=*reindex'
Sometimes your cluster slows down due to massive CPU usage and you need to understand why.
Elasticsearch provides the ability to monitor hot threads to be able to understand where the problem is.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command-line, you need to install curl
for your operating system.
For getting task information, we will perform the following steps:
GET
and the curl command is as follows:curl -XGET 'http://localhost:9200/_nodes/hot_threads'curl -XGET 'http://localhost:9200/_nodes/{nodesIds}/hot_threads'
::: {7NwnFF1}{7NwnFF1JTPOPhOYuP1AVNQ}{OL2uVn3BQ-qMAg32eq_ouQ} {127.0.0.1}{127.0.0.1:9300} Hot threads at 2016-11- 01T10:53:39.796Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true: 12.6% (63.1ms out of 500ms) cpu usage by thread 'elasticsearch[7NwnFF1][refresh][T#3]' 4/10 snapshots sharing following 23 elements org.apache.lucene.index.DocumentsWriterPerThread.flush (DocumentsWriterPerThread.java:443) org.apache.lucene.index.DocumentsWriter.doFlush (DocumentsWriter.java:539) org.apache.lucene.index.DocumentsWriter.flushAllThreads (DocumentsWriter.java:653) org.apache.lucene.index.IndexWriter.getReader (IndexWriter.java:438) org.apache.lucene.index.StandardDirectoryReader. doOpenFromWriter (StandardDirectoryReader.java:291) org.apache.lucene.index.StandardDirectoryReader. doOpenIfChanged(StandardDirectoryReader.java:266) ...truncated...
The Hot threads API is quite particular. It returns a text representation of currently running hot threads, so that it's possible to check the causes of slowdown of every single thread by using the stack trace.
To control returned values, there are additional parameters that can be provided as query arguments such as follows:
threads
: This is the number of hot threads to provide (default 3
)interval
: This is the interval for sampling of threads (default 500ms
)type
: This allows the control of different types of hot threads, for example, to check wait and block states (default cpu
, possible values are cpu
/wait
/block
)ignore_idle_threads
: This is used to filter out known idle threads (default true
)During normal Elasticsearch usage, it is not necessary to change the shard allocation, because the default settings work very well with all standard scenarios. Sometimes, due to massive relocation, or due to nodes restarting, or some other cluster issues, it's necessary to monitor or define custom shard allocation.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command line, you need to install curl
for your operating system.
For getting information about the current state of unassigned shard allocation, we will perform the following steps:
GET
and the curl
command is as follows:curl -XGET 'http://localhost:9200/_cluster/allocation/explain? pretty'
{ "shard" : { "index" : ".monitoring-es-2-2016.10.27", "index_uuid" : "cD30b-qtQc2qw62yF-tirA", "id" : 0, "primary" : false }, "assigned" : false, "shard_state_fetch_pending" : false, "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2016-10-30T15:09:54.626Z", "delayed" : false, "allocation_status" : "no_attempt" }, "allocation_delay_in_millis" : 60000, "remaining_delay_in_millis" : 0, "nodes" : { "7NwnFF1JTPOPhOYuP1AVNQ" : { "node_name" : "7NwnFF1", "node_attributes" : { }, "store" : { "shard_copy" : "AVAILABLE" }, "final_decision" : "NO", "final_explanation" : "the shard cannot be assigned because allocation deciders return a NO decision", "weight" : 8.15, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "the shard cannot be allocated on the same node id [7NwnFF1JTPOPhOYuP1AVNQ] on which it already exists" } ] } } }
Elasticsearch allows different shard allocator mechanisms. Sometimes your shards are not assigned to nodes, and it's useful to investigate why Elasticsearch has not allocated by querying the cluster allocation explanation API.
The call returns a lot of information about the unassigned shard, but the most important ones are the decisions
. This is a list of objects that explain the reason that the shard cannot be allocated in the node. In the above example, the result was the shard cannot be allocated on the same node id [7NwnFF1JTPOPhOYuP1AVNQ] on which it already exists"
, which is returned because the shard needs a replica, but the cluster is composed of only one node, so it's not possible to initialize the replicated shard in the cluster.
The cluster allocation explains API provides capabilities to filter the result for searching particular shard: this is very handy if your cluster has a lot of shards. This can be done by adding parameters to be used as a filter in the get body; these parameters are as follows:
index
: This is the index that the shard belongs to.shard
: This is the number of the shard. Shard numbers starts from 0.primary
: true
/false
: Whether the shard to be checked is the primary one or not.The preceding example shard can be filtered using a similar call such as:
curl -XGET 'http://localhost:9200/_cluster/allocation/explain' -d'{ "index": ".monitoring-es-2-2016.10.27", "shard": 0, "primary": false }'
To manually relocate shards, Elasticsearch provides a Cluster Reroute API that allows the migration of shards between nodes. The following is an example of this API:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ { "move" : { "index" : "test-index", "shard" : 0, "from_node" : "node1", "to_node" : "node2" } } ] }'
In this case, the shard 0
of the index test-index
is migrated from node1
to node2
. If you force a shard migration, the cluster starts moving the other shard to rebalance itself.
Monitoring the index segments means monitoring the health of an index. It contains information about the number of segments and data stored in them.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command line, you need to install curl
for your operating system.
For getting information about index segments, we will perform the following steps:
GET
and the curl
command is as follows:curl -XGET 'http://localhost:9200/test-index/_segments'
{ "_shards" : { ...truncated... }, "indices" : { "test-index" : { "shards" : { "0" : [ { "routing" : { "state" : "STARTED", "primary" : true, "node" : "7NwnFF1JTPOPhOYuP1AVNQ" }, "num_committed_segments" : 9, "num_search_segments" : 9, "segments" : { "_0" : { "generation" : 0, "num_docs" : 15, "deleted_docs" : 0, "size_in_bytes" : 31497, "memory_in_bytes" : 6995, "committed" : true, "search" : true, "version" : "6.2.0", "compound" : true }, "_1" : { ...truncated...
The Indices Segments API returns statistics about the segments in an index. This is an important indicator about the health of an index. It returns the following information:
num_docs
: The number of documents stored in the index.deleted_docs
: The number of deleted documents in the index. If this value is high, a lot of space is wasted to tombstone documents in the index.size_in_bytes
: The size of the segments in bytes. If this value is too high, writing speed will be very low.memory_in_bytes
: The memory taken up, in bytes, by the segment.committed
: Whether the segment is committed to disk.search
: Whether the segment is used for searching. During force merge / index optimization, the new segments are created and returned by the API, but they are not available for searching until the end of the optimization.version
: The Lucene version used for creating the index.compound
: Whether the index is a compound one.The most important elements to monitor of the segments are deleted_docs
and the size_in_bytes
because they mean either a waste of disk space or that the shard is too large. If the shard is too large (above 10 GB), for improved performances in writing the best solution is to reindex the index with a large number of shards.
Having large shards also creates a problem in relocating, due to massive data moving between nodes.
During its execution, Elasticsearch caches data to speed up searching, such as cache results, items and filter results.
To free up memory, it's necessary to clean cache API.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .
To execute curl
via the command-line, you need to install curl
for your operating system.
For cleaning the cache, we will perform the following steps:
cleancache
API on an index as follows:curl -XPOST 'http://localhost:9200/test-index/_cache/clear'
{ "_shards" : { "total" : 10, "successful" : 5, "failed" : 0 } }
The cache clean API frees the memory used to cache values in Elasticsearch.
Generally, it's not a good idea to clean the cache because Elasticsearch manages the cache internally itself and cleans obsolete values, but it can be very handy if your node is running out of memory or you want to force a complete cache clean-up.
18.221.19.26