Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Managing Clusters and Nodes

In this chapter, we will cover the following recipes:

Controlling cluster health via API
Controlling cluster state via API
Getting cluster nodes information via API
Getting node statistics via API
Using the task management API
Hot thread API
Managing the shard allocation
Monitoring segments with segment API
Cleaning the cache

Introduction

In the Elasticsearch ecosystem, it's important to monitor nodes and clusters to manage and improve their performance and state. There are several issues that can arise at cluster level, such as:

Node overheads : Some nodes can have too many shards allocated and become a bottleneck for the entire cluster
Node shutdown : This can happen for many reasons, for example, full disks, hardware failures, and power problems
Shard relocation problems or corruptions : Some shards cannot get an online status
Too large shards : If a shard is too big, the index performance decreases due to massive Lucene segments merging
Empty indices and shards : They waste memory and resources, but because every shard has a lot of active thread, if there are a huge number of unused indices and shards, the general cluster performance is degraded

Detecting malfunctioning or poor performance can be done via an API or through some frontends, as we will see in Chapter 12, User Interfaces . These allow the readers to have a working web dashboard on their Elasticsearch data, monitoring the cluster health, backing/restoring their data and allowing the testing of queries before implementing them in the code.

Controlling cluster health via an API

In the Understanding cluster, replication and sharding recipe in Chapter 1, Getting Started , we discussed the Elasticsearch clusters and how to manage them in a red and yellow state.

Elasticsearch provides a convenient way to manage the cluster state, which is one of the first things to check if any problems occur.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via command line you need to install curl for your operating system.

How to do it...

For controlling the cluster health, we will perform the following steps:

To view the cluster health, the HTTP method is GET and the curl command is as follows:
```
        curl -XGET 'http://localhost:9200/_cluster/health?pretty'
```

The result will be as follows:

        { 
          "cluster_name" : "elasticsearch", 
          "status" : "yellow", 
          "timed_out" : false, 
          "number_of_nodes" : 1, 
          "number_of_data_nodes" : 1, 
          "active_primary_shards" : 7, 
          "active_shards" : 7, 
          "relocating_shards" : 0, 
          "initializing_shards" : 0, 
          "unassigned_shards" : 7, 
          "delayed_unassigned_shards" : 0, 
          "number_of_pending_tasks" : 0, 
          "number_of_in_flight_fetch" : 0, 
          "task_max_waiting_in_queue_millis" : 0, 
          "active_shards_percent_as_number" : 50.0 
        }

How it works...

Every Elasticsearch node keeps the cluster status. The status can be of three types as follows:

green: This means that everything is okay.
yellow: This means that some nodes or shards are missing, but they don't compromise the cluster functionality. Mainly some replicas are missing (a node is down or there are insufficient nodes for replicas), but there is a least one copy of each active shard; read and write are working. The yellow state is very common in the development stage, when users typically start a single Elasticsearch server.
red: This indicates that some primary shards are missing and these indices are in red status. You cannot write to the indices that are in red status and results may not be complete or only partial results may be returned. Generally, you'll need to restart the node that is down and possibly create some replicas.

Tip

The yellow/red states could be transient if some nodes are in recovery mode. In this case, just wait until recovery completes.

The cluster health contains a huge amount of information such as follows:

cluster_name: This is the name of the cluster.
timeout: This is a Boolean value indicating whether the REST API hits the timeout set in the call.
number_of_nodes: This indicates the number of nodes that are in the cluster.
number_of_data_nodes: This shows the number of nodes that can store data (see Chapter 2, Downloading and Setup to set up different node types for different types of nodes).
active_primary_shards: This shows the number of active primary shards; the primary shards are the masters for writing operations.
active_shards: This shows the number of active shards. These shards can be used for search.
relocating_shards: This shows the number of shards that are relocating, migrating from a node to another one. This is due mainly to cluster node balancing.
initializing_shards: This shows the number of shards that are in the initializing status. The initializing process is done at shard startup. It's a transient state before becoming active and it's composed of several steps, of which the most important are as follows:
- Copy shard data copy if it's a replica of another one
- Check Lucene indices
- Process transaction log as needed
unassigned_shards: This shows the number of shards that are not assigned to a node. This is usually due to having set a replica number larger than the number of nodes. During startup, shards not already initialized or initializing will be counted here.
delayed_unassigned_shards: This shows the number of shards that will be assigned, but their nodes are configured for a delayed assignment. You can get more information on delayed shard assignment at https://www.elastic.co/guide/en/elasticsearch/reference/5.0/delayed-allocation.html.
number_of_pending_tasks: This is the number of pending tasks at cluster level, such as updates to cluster state, creation indices, and shardsrelocations. It should rarely be anything other than 0.
number_of_in_flight_fetch: Number of cluster updates that must be executed in shards. As the cluster updates are asynchronous, this number is tracking how many still have to be executed in shards.
task_max_waiting_in_queue_millis: This is the maximum time that some cluster tasks have been waiting in the queue. It should rarely be anything other than 0. In case of value different from 0, it means that there are some cluster saturation of resource or similar problems.
active_shards_percent_as_number: This is the percentage of active shards of the total required by the cluster. In a production environment, it should rarely differ from 100 percent, apart from some relocations and shard initializations.

Installed plugins can play an important role in shard initialization: if you use a mapping type provided by a native plugin and you remove the plugin (or the plugin cannot be initialized due to API changes), the shard initialization will fail. These issues are easily detected by reading the Elasticsearch log file.

Tip

When upgrading your cluster to a new Elasticsearch release, be sure to upgrade your mapping plugins or at least check that they can work with the new Elasticsearch release. If you don't do this, you risk your shards failing to initialize and giving a red status to your cluster.

There's more...

This API call is very useful; it's possible to execute it against one or more indices to obtain their health in the cluster. This approach allows the isolation of indices with problems. The API call to execute this is:

curl -XGET 'http://localhost:9200/_cluster/health/index1,index2,indexN'

The previous calls also have additional request parameters to control the health of the cluster. Additional parameters could be:

level: This controls the level of the health information that is returned. This parameter accepts only cluster, index and shards.
timeout: This is the wait time for a wait_for_* parameter (default 30s).
wait_for_status: This allows the server to wait for the provided status (green, yellow or red) until timeout.
wait_for_relocating_shards: This allows the server to wait until the provided number of relocating shards has been reached, or until the timeout period has been reached (default 0).
wait_for_nodes: This waits until the defined number of nodes is available in the cluster. The value for this parameter can also be an expression, such as: >N , >=N , <N , <=N , ge(N) , gt(N) , le(N) , lt(N) .

If the number of pending tasks is different from zero, it's good practice to investigate which those pending tasks are. They can be shown using the following API URL:

curl -XGET 'http://localhost:9200/_cluster/pending_tasks'

The return value is a list of pending tasks. Beware that Elasticsearch applies cluster changes very fast, so it often faster to apply those that show themselves to you.

Controlling cluster state via an API

The previous recipe returns information only about the health of the cluster. If you need more details on your cluster, you need to query its state.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

To check the cluster state, we will perform the following steps:

To view the cluster state, the HTTP method is GET, and the curl command is as follows:
```
        curl -XGET 'http://localhost:9200/_cluster/state' 
```

The result will contain the following data sections:

General cluster information:

        {
         "cluster_name" : "es-cookbook",
         "version" : 13,
         "state_uuid" : "QANXXnzhS7aS5HxLlyNKsw",
         "master_node" : "7NwnFF1JTPOPhOYuP1AVNQ",
         "blocks" : { },

Node address information:

        "nodes" : {
        "7NwnFF1JTPOPhOYuP1AVNQ" : {
         "name" : "7NwnFF1",
         "ephemeral_id" : "OL2uVn3BQ-qMAg32eq_ouQ",
         "transport_address" : "127.0.0.1:9300",
         "attributes" : { }
        }
        },

Cluster metadata information (templates, indices with mappings and alias):

        "metadata" :  
        {  
            "cluster_uuid" : "8SRm9IGDQcWU7-SoR6gWKg", 
            "templates" : { 
              ".monitoring-data-2" : {...} 
            }, 
            "indices" : { 
              "test-index" : { 
                "state" : "open", 
                "settings" : { 
                  "index" : { 
                    "creation_date" : "1477683993903", 
                    "number_of_shards" : "5", 
                    "number_of_replicas" : "1", 
                   "uuid" : "KalW90nJSDCTMh42FH62iQ", 
                   "version" : { 
                      "created" : "5000099" 
                    }, 
                   "provided_name" : "test-index" 
                  } 
               }, 
                "mappings" : { 
                 "test-type" : {...truncated...} 
                }, 
                "aliases" : [ "my-cool-alias" ] 
              } 
            } 
          },

Routing tables to find the shards:

        "routing_table" : { 
             "indices" : { 
             "test-index" : { 
                "shards" : { 
                  "2" : [{ 
                      "state" : "STARTED", 
                      "primary" : true, 
                      "node" : "7NwnFF1JTPOPhOYuP1AVNQ", 
                      "relocating_node" : null, 
                      "shard" : 2, 
                      "index" : "test-index", 
                      "allocation_id" : { 
                        "id" : "9I5Q8E0VTnCSq7qgpNjGGQ" 
                      } 
                   }, 
                  ... truncated... 
                } 
             } 
           } 
         },

Routing nodes:

        "routing_nodes" : { 
           "unassigned" : [ ], 
           "nodes" : { 
             " 7NwnFF1JTPOPhOYuP1AVNQ" : [ { 
               "state" : "STARTED", 
                  "primary" : true, 
                 "node" : "7NwnFF1JTPOPhOYuP1AVNQ", 
                  "relocating_node" : null, 
                  "shard" : 0, 
                  "index" : ".monitoring-es-2-2016.10.27", 
                 "allocation_id" : { 
                   "id" : "F-wl5nuwQqO6lkeV6k83iQ" 
                 } 
             ...truncated... ] 
            } 
          }, 
          "allocations" : [ ] 
        }

How it works...

The cluster state contains the information of the whole cluster; it's normal that its output is very large.

The call output contains common fields, which are as follows:

cluster_name: This is the name of the cluster.
master_node: This is the identifier of the master node. The master node is the primary node for cluster management, and several sections.
blocks: This section shows the active blocks in a cluster.
nodes: This shows the list of nodes of the cluster. For every node, we have:
- id: This is the hash used to identify the node in Elasticsearch. (For example, 7NwnFF1JTPOPhOYuP1AVN)
- name: This is the name of the node
- transport_address: This is the IP and port used to connect to this node
- attributes: These are additional node attributes
metadata: This is the definition of indices (their settings and mappings), ingest pipelines, and stored_scripts.
routing_table: These are the indices/shards routing tables, which are used to select primary and secondary shards and their nodes.
routing_nodes: This is the routing for the nodes.

The metadata section is the most used one, because it contains all the information related to the indices and their mappings. This is a convenient way to gather all the indices mappings in one shot; otherwise you'll need to call the get mapping for every type.

The metadata section is composed of several sections, as follows:

templates: These are templates that control the dynamic mapping for created indices
indices: These are the indices that exist in the cluster
* ingest: This stores all the ingest pipelines defined in the system
stored_scripts: This stores the scripts, which are usually in the form of language#script_name

The indices subsection returns a full representation of all the metadata description for every index. It contains the following:

state (open/closed): This describes if an index is open (it can be searched and can index data) or closed. (See the Opening/Closing an Index recipe in Chapter 4, Basic Operations )
settings: These are the index settings. The most important ones are as follows:
- index.number_of_replicas: This is the number of replicas of this index. If can be changed with an update index settings call.
- index.number_of_shards: This is the number of shards in this index. This value cannot be changed in an index.
- index.codec: This is the codec used to store index data. default is not shown, but the LZ4 algorithm is used. If you want a high compression rate use best_compression and the DEFLATE algorithm (this will slow down the writing performances slightly).
- index.version.created: This is the index version.
mappings: These are defined in the index. This section is similar to the get mapping response. (See the How to Get a Mapping recipe in Chapter 4, Basic Operations )
alias: This is a list of index aliases, which allows the aggregation of indices in a single name or the definition of alternative names for an index.

The routing records for index and shards have similar fields and they are as follows:

state (UNASSIGNED, INITIALITING, STARTED, RELOCATING): This shows the state of the shard or index
primary (true/false): This shows whether the shard or node is primary
node: This shows the ID of the node
relocating_node: This field, if validated, shows the node id in which the shard is relocated
shard: This shows the number of the shard
index: This shows the name of the index in which the shard is contained

There's more...

The cluster state call returns a lot of information, and it's possible to filter out the different section parts via the URL.

The complete form URL of the cluster state API is:

http://{elasticsearch_server}/_cluster/state/{metrics}/{indices}

The metrics could be used to return only parts of the response. It's a comma separated list of the following values:

* version: This is used to show the version part of the response
blocks: This is used to show the blocks part of the response
master_node: This is used to show the master node part of the response
nodes: This is used to show the node part of the response
metadata: This is used to show the metadata part of the response
routing_table: This is used to show the routing_table part of the response

The indices value is a comma separated list of index names to include in the in metadata.

Getting nodes information via API

The previous recipes allow information to be reutrned to the cluster level; Elasticsearch provides calls to gather information at node level. In production clusters, it's very important to monitor nodes via this API to detect misconfiguration and problems relating to different plugins and modules.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the commandline, you need to install curl for your operating system.

How to do it...

For getting nodes information, we will perform the following steps:

To retrieve the node information, the HTTP method is GET and the curl command is as follows:

        curl -XGET 'http://localhost:9200/_nodes' 
        curl -XGET 'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>'

The result will contain a lot of information about the node. It's huge, so the repetitive parts have been truncated below:

       { 
         "_nodes" : { 
           "total" : 1, 
           "successful" : 1, 
          "failed" : 0 
         }, 
         "cluster_name" : "elasticsearch", 
         "nodes" : { 
            "7NwnFF1JTPOPhOYuP1AVNQ" : { 
             "name" : "7NwnFF1", 
             "transport_address" : "127.0.0.1:9300", 
             "host" : "127.0.0.1", 
             "ip" : "127.0.0.1", 
             "version" : "5.0.0", 
             "build_hash" : "253032b", 
             "total_indexing_buffer" : 207775334, 
             "roles" : [ "master", "data", "ingest" ], 
             "settings" : { 
                "cluster" : { "name" : "elasticsearch"}, 
                "node" : {"name" : "7NwnFF1"}, 
                "path" : {"logs" : ".../elasticsearch-5.0.0/logs", 
                  "home" : "...elasticsearch-5.0.0" 
                }, 
                "client" : {"type" : "node"}, 
                "http" : {"type" : {"default" : "netty4"}}, 
                "transport" : {"type" : {"default" : "netty4"}}, 
                "script" : {"inline" : "true", "stored" : "true"} 
              }, 
              "os" : { 
                "refresh_interval_in_millis" : 1000, 
                "name" : "Mac OS X", 
                "arch" : "x86_64", 
                "version" : "10.12.1", 
                "available_processors" : 8, 
                "allocated_processors" : 8 
              }, 
              "process" : { 
                "refresh_interval_in_millis" : 1000, 
                "id" : 82228, 
                "mlockall" : false 
              }, 
              "jvm" : { 
                "pid" : 82228, 
                "version" : "1.8.0_101", 
                "vm_name" : "Java HotSpot(TM) 64-Bit Server VM", 
                "vm_version" : "25.101-b13", 
                "vm_vendor" : "Oracle Corporation", 
                "start_time_in_millis" : 1477840185555, 
                "mem" : { 
                  "heap_init_in_bytes" : 2147483648, 
                  "heap_max_in_bytes" : 2077753344, 
                  "non_heap_init_in_bytes" : 2555904, 
                  "non_heap_max_in_bytes" : 0, 
                  "direct_max_in_bytes" : 2077753344 
                }, 
                "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep"], 
                "memory_pools" : [ "Code Cache", "Metaspace", 
                "Compressed Class Space", "Par Eden Space", 
                "Par Survivor Space", "CMS Old Gen" ], 
                "using_compressed_ordinary_object_pointers" : "true" 
              }, 
              "thread_pool" : { 
                "force_merge" : { 
                  "type" : "fixed", 
                  "min" : 1, 
                  "max" : 1, 
                  "queue_size" : -1 
                }, ... truncated ... 
              }, 
               "transport" : { 
               "bound_address" : [ "[fe80::1]:9300", "  
               [::1]:9300","127.0.0.1:9300"], 
                "publish_address" : "127.0.0.1:9300", 
                "profiles" : { } 
              }, 
              "http" : { 
                "bound_address" : [ "[fe80::1]:9200", "[::1]:9200",            
                "127.0.0.1:9200"  ], 
                "publish_address" : "127.0.0.1:9200", 
                "max_content_length_in_bytes" : 104857600 
              }, 
              "plugins" : [ 
                { 
                  "name" : "lang-javascript", 
                  "version" : "5.0.0", 
                  "description" : "The JavaScript language plugin 
                  allows to have javascript as the language of scripts    
                  to execute.", "classname" :   
                 "org.elasticsearch.plugin.javascript.JavaScriptPlugin" 
                }... truncated ... 
              ], 
              "modules" : [ 
                { 
                  "name" : "aggs-matrix-stats", 
                  "version" : "5.0.0", 
                  "description" : "Adds aggregations whose input are a   
                  list of numeric fields and output includes a   
                  matrix.","classname" :                
                  "org.elasticsearch.search.aggregations.matrix
                  .MatrixAggregationPlugin" 
                }... truncated ... 
              ], 
              "ingest" : { 
               "processors" : [ 
                  { 
                    "type" : "append" 
                  }... truncated ...        ] 
              } 
            } 
          } 
        }

How it works...

The nodes information call provides an overview of the node configuration. It covers a lot of information; the most important sections are as follows:

hostname: This is the name of the host.
ip: This is the IP of the host.
version: This is the Elasticsearch version. It's best practice that all the nodes of a cluster have the same Elasticsearch version.
roles: This is a list of roles that this node can cover. Developer nodes usually support the three kinds: master, data and ingest.
transport_address: This is the address used by the node for cluster communication.
settings: This section contains information about the current cluster and path of the Elasticsearch node. The most important fields are as follows:
- cluster_name: This is the name of the cluster
- node.name: This is the name of the node
- path.*: This is configured path of this Elasticsearch instance
- script: This section is useful to check the script configuration of the node
os: This section provides operating system information about the node that is running Elasticsearch: processors available and allocated and the OS version.
process: This section contains information about the currently running Elasticsearch process.
id: This is the pid ID of the process.
mlockall: This flag defines whether Elasticsearch can use direct memory access. In production, this must be set to active.
max_file_descriptors: This is max file descriptor number.
jvm: This section contains information about the node Java Virtual Machine: version, vendor, name, pid, memory (heap and non-heap).

Tip

It's highly recommended to run all the nodes on the same JVM version and type.
thread_pool: This section contains information about several types of thread pool running in a node.
transport: This section contains information about the transport protocol. The transport protocol is used for intra-cluster communication or by the native client to communicate with a cluster. The response format is similar to the HTTP one, as follows:
- bound_address: If a specific IP is not set in the configuration, Elasticsearch bounds all the interfaces
- publish_address: This is the address used for publishing the native transport protocol
- http: This section gives information about http configuration, such as:
- bound_address: This is the address bound by Elasticsearch.
max_content_length_in_bytes (default 104857600 100 MB): This is the maximum size of HTTP content that Elasticsearch will allow to be received. HTTP payloads bigger than this size are rejected.

Note

The default 100 MB http limit, which can be changed in elasticsearch.yml, can result in a malfunction due to a large payload (often in conjunction with a mapper plugin attachment), so it's important to keep this limit in mind when doing bulk actions or working with attachment.
publish_address: The address used to publish the Elasticsearch node.
plugins: This section lists every plugin installed in the node, providing information about the following:
- name: This is the plugin name
- description: This is the plugin description
- version: This is the plugin version
- classname: This is the Java class used to load the plugin
  
  Tip
  
  All the nodes must have the same plugin version. Different plugin versions in a node bring unexpected failures.
modules: This section lists every module installed in the node. The structure is the same as the plugin section.
ingest: This section contains the list of active processors in the ingest node.

There's more...

The API call allows filtering of the section that must be returned. In the example, we've returned the wholesection. Alternatively, we could select one or more of the following sections:

http
thread_pool
transport
jvm
os
process
plugins
modules
ingest
settings

For example, if you need only the os and plugins information, the call will be as follows:

curl -XGET 'http://localhost:9200/_nodes/os,plugins'

Getting node statistics via the API

The node statistics call API is used to collect real-time metrics of your node, such as memory usage, threads usage, number of indexes, search and so on.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting nodes statistics, we will perform the following steps:

To retrieve the node statistic, the HTTP method is GET, and the curl command is as follows:

        curl -XGET 'http://localhost:9200/_nodes/stats'curl -XGET   
        'http://localhost:9200/_nodes/<nodeId1>,<nodeId2>/stats'

The result will be a long list of all the node statistics. The most significant parts of the results are as follows:

A header describing the cluster name and the nodes section:

        { 
          "cluster_name" : "es-cookbook", 
          "nodes" : { 
            " 7NwnFF1JTPOPhOYuP1AVNQ" : { 
              "timestamp" : 1477990951146, 
              "name" : "7NwnFF1", 
              "transport_address" : "127.0.0.1:9300", 
              "host" : "127.0.0.1", 
              "ip" : "127.0.0.1:9300", 
              "roles" : [ 
                "master", 
                "data", 
                "ingest" 
              ],

Statistics related to the indices:

       "indices" : { 
         "docs" : { 
           "count" : 2030, 
           "deleted" : 0 
         }, 
         "store" : { 
           "size_in_bytes" : 3290318, 
           "throttle_time_in_millis" : 0 
         }, 
         "indexing" : { 
           "index_total" : 2000, 
           "index_time_in_millis" : 3901, 
           "index_current" : 0, 
           "index_failed" : 0, 
           "delete_total" : 0, 
           "delete_time_in_millis" : 0, 
           "delete_current" : 0, 
           "noop_update_total" : 0, 
           "is_throttled" : false, 
           "throttle_time_in_millis" : 0 
         }, 
         ... truncated ... 
       },

Statistics related to the operating system:

        "os" : { 
          "timestamp" : 1477990951181, 
          "cpu" : { 
            "percent" : 26, 
            "load_average" : { "1m" : 3.34765625} 
          }, 
          "mem" : { 
            "total_in_bytes" : 17179869184, 
            "free_in_bytes" : 1112723456, 
            "used_in_bytes" : 16067145728, 
            "free_percent" : 6, 
            "used_percent" : 94 
          }, ...truncated ... 
        },

Statistics related to the current Elasticsearch process:

        "process" : { 
          "timestamp" : 1477990951181, 
          "open_file_descriptors" : 283, 
          "max_file_descriptors" : 10240, 
          "cpu" : { 
            "percent" : 0, 
            "total_in_millis" : 247287 
          }, 
          "mem" : { 
            "total_virtual_in_bytes" : 6683983872 
          } 
        },

Statistics related to the current JVM

        "jvm" : { 
          "timestamp" : 1477990951182, 
          "uptime_in_millis" : 61315659, 
          "mem" : { 
            "heap_used_in_bytes" : 364406464, 
            "heap_used_percent" : 17, 
            "heap_committed_in_bytes" : 2077753344, 
            "heap_max_in_bytes" : 2077753344, 
            "non_heap_used_in_bytes" : 115590776, 
            "non_heap_committed_in_bytes" : 122032128, 
        ... truncated ... 
            } 
          },... truncated ... 
        },

Statistics related to thread pools:

        "thread_pool" : { 
          "bulk" : { 
            "threads" : 8, 
            "queue" : 0, 
            "active" : 0, 
            "rejected" : 0, 
            "largest" : 8, 
            "completed" : 10 
          }, ...truncated.... 
        },

Node filesystem statistics:

        fs" : { 
          "timestamp" : 1477990951182, 
          "total" : { 
            "total_in_bytes" : 999334871040, 
            "free_in_bytes" : 75898884096, 
            "available_in_bytes" : 75636740096 
          }, 
          "data" : [ 
            { 
              "path" : .../elasticsearch-5.0.0/data/nodes/0", 
              "mount" : "/ (/dev/disk1)", 
              "type" : "hfs", 
              "total_in_bytes" : 999334871040, 
              "free_in_bytes" : 75898884096, 
              "available_in_bytes" : 75636740096 
            } 
          ] 
        }...truncated...] 
        },

Statistics related to communications between nodes:

      "transport" : { 
        "server_open" : 0, 
        "rx_count" : 8, 
        "rx_size_in_bytes" : 4264, 
        "tx_count" : 8, 
        "tx_size_in_bytes" : 4264 
      },

Statistics related to HTTP connections:

      "http" : { 
        "current_open" : 2, 
        "total_opened" : 13 
      },

Statistics related to breaker caches:

      breakers" : { 
        "request" : { 
          "limit_size_in_bytes" : 1246652006, 
          "limit_size" : "1.1gb", 
          "estimated_size_in_bytes" : 0, 
          "estimated_size" : "0b", 
          "overhead" : 1.0, 
          "tripped" : 0 
        }, 
        "fielddata" : { 
          "limit_size_in_bytes" : 1246652006, 
          "limit_size" : "1.1gb", 
          "estimated_size_in_bytes" : 0, 
          "estimated_size" : "0b", 
          "overhead" : 1.03, 
          "tripped" : 0 
        } ... truncated .... 
    } 
  } 
}

Script related to statistics:

"script" : { 
        "compilations" : 2, 
        "cache_evictions" : 0 
      },

Cluster state queue:

"discovery" : { 
        "cluster_state_queue" : { 
          "total" : 0, 
          "pending" : 0, 
          "committed" : 0 
        } 
      }

Ingest statistics:

        "ingest" : { 
               "total" : { 
                 "count" : 0, 
                 "time_in_millis" : 0, 
                  "current" : 0, 
                  "failed" : 0 
                }, 
                "pipelines" : { 
                  "xpack_monitoring_2" : { 
                   "count" : 0, 
                    "time_in_millis" : 0, 
                    "current" : 0, 
                    "failed" : 0 
                  } 
                } 
              }

How it works...

Every Elasticsearch node, during execution, collects statistics about several aspects of node management; these statistics are accessible via stats API call.

In the next recipes, we will see some example of monitoring applications that use this information to provide real-time status of a node or a cluster.

The main statistics collected by this API are as follows:

fs: This section contains statistics about the filesystem; free space on devices, mount points, reads and writes. It can be used to remotely control disk usage for your nodes.
http: This gives the number of current open sockets and their maximum number.
indices: This section contains statistics of several indexing aspects:
- Usage for fields and caches
- Statistics about operations such as, get, indexing, flush, merges, refresh, warmer
jvm: This section provides statistics about buffer, pools, garbage collector (creation/destruction of objects and their memory management), memory (used memory, heap, pools), threads and uptime. It should be checked to see if the node is running out of memory.
network: This section provides statistics about TCP traffic, such as open connection, close connections, and data I/O.
os: This sections collects statistics about the Operating System, such as:
- CPU usage
- Node load
- Memory and swap
- Uptime
process: This section contains statistics about the CPU used by Elasticsearch, memory, and open file descriptors.

Tip

It's very important to monitor the open file descriptors, because if you run out of them, the indices may be corrupted.
thread_pool: This section monitors all the thread pools available in Elasticsearch. It's important, in the case of low performance, to control whether there are pools that have an excessive overhead. Some of them can be configured to a new maximum value.
transport: This section contains statistics about the transport layer, mainly bytes read and transmitted.
breakers: This section monitors the circuit breakers. It must be checked to see whether it's necessary to optimize resource or queries/aggregations to prevent them being called.

There's more...

The response is very large. It's possible to limit it requesting only required parts. To do this, you need to pass to the API call a query parameter specifying the following desired sections:

fs
http
indices
jvm
network
os
process
thread_pool
transport
breaker
discovery
script
ingest

For example, to request only os and http statistics the call becomes:

curl -XGET 'http://localhost:9200/_nodes/stats/os,http'

Using the task management API

Elasicsearch 5.x allows the definition of actions that can take some time to complete. The most common ones are as follows:

delete_by_query
update_by_query
reindex

When these actions are called, they create a server side task that executes the job. The task management API allows you to control these actions.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting tasks information, we will perform the following steps:

To retrieve the node information, the HTTP method is GET and the curl command is as follows:

        curl -XGET 'http://localhost:9200/_tasks'curl -XGET    
        'http://localhost:9200/_tasks?nodes=<nodeId1, nodeId2>'curl - 
        XGET 'http://localhost:9200/_tasks?nodes=<nodeId1,   
        nodeId2>&actions=cluster:'

The result will be something similar to the preceding one:

        { 
          "nodes" : { 
            "7NwnFF1JTPOPhOYuP1AVNQ" : { 
              "name" : "7NwnFF1", 
              "transport_address" : "127.0.0.1:9300", 
              "host" : "127.0.0.1", 
              "ip" : "127.0.0.1:9300", 
              "roles" : [ 
                "master", 
                "data", 
                "ingest" 
              ], 
              "tasks" : { 
                "7NwnFF1JTPOPhOYuP1AVNQ:9822" : { 
                  "node" : "7NwnFF1JTPOPhOYuP1AVNQ", 
                  "id" : 9822, 
                  "type" : "transport", 
                  "action" : "cluster:monitor/tasks/lists", 
                  "start_time_in_millis" : 1477993984920, 
                  "running_time_in_nanos" : 102338, 
                  "cancellable" : false 
                }, 
                "7NwnFF1JTPOPhOYuP1AVNQ:9823" : { 
                  "node" : "7NwnFF1JTPOPhOYuP1AVNQ", 
                  "id" : 9823, 
                  "type" : "direct", 
                  "action" : "cluster:monitor/tasks/lists[n]", 
                  "start_time_in_millis" : 1477993984920, 
                  "running_time_in_nanos" : 62786, 
                  "cancellable" : false, 
                  "parent_task_id" : "7NwnFF1JTPOPhOYuP1AVNQ:9822" 
                } 
              } 
            } 
          } 
        }

How it works...

Every task that is executed in Elasticsearch is available in the task list.

The most important properties for the tasks are as follows:

node: This defines the node that is executing the task.
id: This define the unique ID of the task.
action: This is the name of the action. It's generally composed by an action type, the : separator and the detailed action.
cancellable: This defines if the task can be canceled. Some tasks such as delete/update by query or reindex can be canceled, other are mainly of management and cannot be canceled.
parent_task_id: This defines the group of tasks. Some tasks can be split and executed in several sub-tasks. This value can be used to group these tasks by parent.

The id of the task can be used to filter the response via the node_id parameter in the API call:

curl -XGET 'http://localhost:9200/_tasks/7NwnFF1JTPOPhOYuP1AVNQ:9822'

If you need to monitor a group of tasks, you can filter by their parent_task_id with a similar API call:

curl -XGET 'http://localhost:9200/_tasks ?parent_task_id=7NwnFF1JTPOPhOYuP1AVNQ:9822'

There's more...

Generally, canceling a task could produce some data inconsistency in Elasticsearch due to partial updating or deleting of documents; but, when reindexing, it can make good sense. It's common, when you are reindexing a huge amount of data, that you need to change the mapping or reindex a script in the middle of it. So, in order to not waste time and CPU usage, canceling the reindexing is a sensible solution.

To cancel a task, the API URL is as follows:

curl -XPOST 'http://localhost:9200/_tasks/task_id:1/_cancel'

In the case of a group of tasks, they can be stopped with a single cancel call using query arguments to select them as follows:

curl -XPOST 'http://localhost:9200/_tasks/_cancel?nodes=nodeId1,nodeId2&actions=*reindex'

Hot thread API

Sometimes your cluster slows down due to massive CPU usage and you need to understand why.

Elasticsearch provides the ability to monitor hot threads to be able to understand where the problem is.

Note

In Java, hot threads are threads that are using a lot of CPU and take a long time to execute.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For getting task information, we will perform the following steps:

To retrieve the node information, the HTTP method is GET and the curl command is as follows:

        curl -XGET 'http://localhost:9200/_nodes/hot_threads'curl -XGET 
        'http://localhost:9200/_nodes/{nodesIds}/hot_threads'

The result will be something similar to the preceding one:

        ::: {7NwnFF1}{7NwnFF1JTPOPhOYuP1AVNQ}{OL2uVn3BQ-qMAg32eq_ouQ}   
        {127.0.0.1}{127.0.0.1:9300}   Hot threads at 2016-11-  
        01T10:53:39.796Z, interval=500ms, busiestThreads=3,  
        ignoreIdleThreads=true:   12.6% (63.1ms out of 500ms) cpu usage 
        by thread 'elasticsearch[7NwnFF1][refresh][T#3]'     4/10 
        snapshots sharing following 23 elements       
        org.apache.lucene.index.DocumentsWriterPerThread.flush
        (DocumentsWriterPerThread.java:443)        
        org.apache.lucene.index.DocumentsWriter.doFlush
        (DocumentsWriter.java:539)         
        org.apache.lucene.index.DocumentsWriter.flushAllThreads
        (DocumentsWriter.java:653)         
        org.apache.lucene.index.IndexWriter.getReader
        (IndexWriter.java:438)          
        org.apache.lucene.index.StandardDirectoryReader.
        doOpenFromWriter
        (StandardDirectoryReader.java:291)         
        org.apache.lucene.index.StandardDirectoryReader.
        doOpenIfChanged(StandardDirectoryReader.java:266)
        ...truncated...

How it works...

The Hot threads API is quite particular. It returns a text representation of currently running hot threads, so that it's possible to check the causes of slowdown of every single thread by using the stack trace.

To control returned values, there are additional parameters that can be provided as query arguments such as follows:

threads : This is the number of hot threads to provide (default 3)
interval: This is the interval for sampling of threads (default 500ms)
type: This allows the control of different types of hot threads, for example, to check wait and block states (default cpu, possible values are cpu/wait/block)
ignore_idle_threads: This is used to filter out known idle threads (default true)

Tip

Hot threads are an advanced monitor feature provided by Elasticsearch, and it's very handy to debug slowness in a production cluster as it can be used as a run-time debugger .

Managing the shard allocation

During normal Elasticsearch usage, it is not necessary to change the shard allocation, because the default settings work very well with all standard scenarios. Sometimes, due to massive relocation, or due to nodes restarting, or some other cluster issues, it's necessary to monitor or define custom shard allocation.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command line, you need to install curl for your operating system.

How to do it...

For getting information about the current state of unassigned shard allocation, we will perform the following steps:

To retrieve the cluster allocation information, the HTTP method is GET and the curl command is as follows:
```
        curl -XGET 'http://localhost:9200/_cluster/allocation/explain?
        pretty'
```

The result will be something similar to the preceding one:

        { 
          "shard" : { 
            "index" : ".monitoring-es-2-2016.10.27", 
           "index_uuid" : "cD30b-qtQc2qw62yF-tirA", 
            "id" : 0, 
            "primary" : false 
          }, 
          "assigned" : false, 
          "shard_state_fetch_pending" : false, 
          "unassigned_info" : { 
            "reason" : "CLUSTER_RECOVERED", 
            "at" : "2016-10-30T15:09:54.626Z", 
            "delayed" : false, 
            "allocation_status" : "no_attempt" 
            }, 
          "allocation_delay_in_millis" : 60000, 
          "remaining_delay_in_millis" : 0, 
          "nodes" : { 
            "7NwnFF1JTPOPhOYuP1AVNQ" : { 
              "node_name" : "7NwnFF1", 
               "node_attributes" : { }, 
              "store" : { 
                "shard_copy" : "AVAILABLE" 
              }, 
              "final_decision" : "NO", 
              "final_explanation" : "the shard cannot be assigned   
              because allocation deciders return a NO decision", 
              "weight" : 8.15, 
              "decisions" : [ 
                { 
                  "decider" : "same_shard", 
                  "decision" : "NO", 
                  "explanation" : "the shard cannot be allocated on the 
                  same node id [7NwnFF1JTPOPhOYuP1AVNQ] on which it 
                  already exists" 
                } 
             ] 
            } 
          } 
        }

How it works...

Elasticsearch allows different shard allocator mechanisms. Sometimes your shards are not assigned to nodes, and it's useful to investigate why Elasticsearch has not allocated by querying the cluster allocation explanation API.

The call returns a lot of information about the unassigned shard, but the most important ones are the decisions. This is a list of objects that explain the reason that the shard cannot be allocated in the node. In the above example, the result was the shard cannot be allocated on the same node id [7NwnFF1JTPOPhOYuP1AVNQ] on which it already exists", which is returned because the shard needs a replica, but the cluster is composed of only one node, so it's not possible to initialize the replicated shard in the cluster.

There's more...

The cluster allocation explains API provides capabilities to filter the result for searching particular shard: this is very handy if your cluster has a lot of shards. This can be done by adding parameters to be used as a filter in the get body; these parameters are as follows:

index: This is the index that the shard belongs to.
shard: This is the number of the shard. Shard numbers starts from 0.
primary: true/false: Whether the shard to be checked is the primary one or not.

The preceding example shard can be filtered using a similar call such as:

curl -XGET 'http://localhost:9200/_cluster/allocation/explain' -d'{      
  "index": ".monitoring-es-2-2016.10.27",
  "shard": 0,  "primary": false
}'

To manually relocate shards, Elasticsearch provides a Cluster Reroute API that allows the migration of shards between nodes. The following is an example of this API:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
    "commands" : [ {
        "move" :
            {              
"index" : "test-index", "shard" : 0,
              "from_node" : "node1", "to_node" : "node2"
            }
        }
    ]
}'

In this case, the shard 0 of the index test-index is migrated from node1 to node2. If you force a shard migration, the cluster starts moving the other shard to rebalance itself.

Monitoring segments with the segment API

Monitoring the index segments means monitoring the health of an index. It contains information about the number of segments and data stored in them.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command line, you need to install curl for your operating system.

How to do it...

For getting information about index segments, we will perform the following steps:

To retrieve the index segments, the HTTP method is GET and the curl command is as follows:

        curl -XGET 'http://localhost:9200/test-index/_segments'

The result will be something similar to the preceding one:

        { 
          "_shards" : { ...truncated... }, 
          "indices" : { 
           "test-index" : { 
             "shards" : { 
                "0" : [ 
                  { 
                    "routing" : { 
                     "state" : "STARTED", 
                     "primary" : true, 
                     "node" : "7NwnFF1JTPOPhOYuP1AVNQ" 
                   }, 
                   "num_committed_segments" : 9, 
                   "num_search_segments" : 9, 
                   "segments" : { 
                      "_0" : { 
                        "generation" : 0, 
                        "num_docs" : 15, 
                        "deleted_docs" : 0, 
                        "size_in_bytes" : 31497, 
                        "memory_in_bytes" : 6995, 
                        "committed" : true, 
                        "search" : true, 
                        "version" : "6.2.0", 
                        "compound" : true 
                      }, 
                     "_1" : { 
           ...truncated...

Tip

In Elasticsearch, there is the special alias _all that defines all the indices. This can be used in all the APIs that require a list of index names.

How it works...

The Indices Segments API returns statistics about the segments in an index. This is an important indicator about the health of an index. It returns the following information:

num_docs: The number of documents stored in the index.
deleted_docs: The number of deleted documents in the index. If this value is high, a lot of space is wasted to tombstone documents in the index.
size_in_bytes: The size of the segments in bytes. If this value is too high, writing speed will be very low.
memory_in_bytes: The memory taken up, in bytes, by the segment.
committed: Whether the segment is committed to disk.
search: Whether the segment is used for searching. During force merge / index optimization, the new segments are created and returned by the API, but they are not available for searching until the end of the optimization.
version: The Lucene version used for creating the index.
compound: Whether the index is a compound one.

The most important elements to monitor of the segments are deleted_docs and the size_in_bytes because they mean either a waste of disk space or that the shard is too large. If the shard is too large (above 10 GB), for improved performances in writing the best solution is to reindex the index with a large number of shards.

Having large shards also creates a problem in relocating, due to massive data moving between nodes.

Tip

It's impossible to define the perfect size for a shard. In general, a good size for a shard that doesn't need to be frequently updated is between 10 GB to 25 GB.

Cleaning the cache

During its execution, Elasticsearch caches data to speed up searching, such as cache results, items and filter results.

To free up memory, it's necessary to clean cache API.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup .

To execute curl via the command-line, you need to install curl for your operating system.

How to do it...

For cleaning the cache, we will perform the following steps:

We call the cleancache API on an index as follows:

        curl -XPOST 'http://localhost:9200/test-index/_cache/clear'

The result returned by Elasticsearch, if everything is okay, should be as follows:

        { 
          "_shards" : { 
            "total" : 10, 
            "successful" : 5, 
            "failed" : 0 
          } 
        }

How it works...

The cache clean API frees the memory used to cache values in Elasticsearch.

Generally, it's not a good idea to clean the cache because Elasticsearch manages the cache internally itself and cleans obsolete values, but it can be very handy if your node is running out of memory or you want to force a complete cache clean-up.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Managing Clusters and Nodes

Create new playlist

Sign In

Sign Up

Chapter 10. Managing Clusters and Nodes

Introduction

Controlling cluster health via an API

Getting ready

How to do it...

How it works...

Tip

Tip

There's more...

See also

Controlling cluster state via an API

Getting ready

How to do it...

How it works...

There's more...

See also

Getting nodes information via API

Getting ready

How to do it...

How it works...

Tip

Note

Tip

There's more...

See also

Getting node statistics via the API

Getting ready

How to do it...

How it works...

Tip

There's more...

Using the task management API

Getting ready

How to do it...

How it works...

There's more...

See also

Hot thread API

Note

Getting ready

How to do it...

How it works...

Tip

Managing the shard allocation

Getting ready

How to do it...

How it works...

There's more...

See also

Monitoring segments with the segment API

Getting ready

How to do it...

Tip

How it works...

Tip

See also

Cleaning the cache

Getting ready

How to do it...

How it works...

Table of Contents for
10. Managing Clusters and Nodes