Reindexing an index

There are a lot of common scenarios that involve changing your mapping. Due to limitation to Elasticsearch mapping, that is, it not being possible to delete a defined one, you often need to reindex index data. The most common scenarios are:

  • Changing an analyzer for a mapping
  • Adding a new subfield to a mapping and you need to reprocess all the records to search for the new subfield
  • Removing an unused mapping
  • Changing a record structure that requires a new mapping

Getting ready

You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, the index created in the Creating an index recipe is required.

How to do it...

The HTTP method to reindex an index is POST. The URL formats to get mapping is http://<server>/_reindex.

To get a mapping from the type of an index, we will perform the steps given as follows:

  1. If we want to reindex data from myindex to the myindex2 index, the call will be:
            curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{
              "source": {
                "index": "myindex"
              },
              "dest": {
                "index": "myindex2"
              }
            }'
    
  2. The result returned by Elasticsearch should be:
            {
              "took" : 66,
              "timed_out" : false,
              "total" : 2,
              "updated" : 0,
              "created" : 2,
              "deleted" : 0,
              "batches" : 1,
              "version_conflicts" : 0,
              "noops" : 0,
              "retries" : {
                "bulk" : 0,
                "search" : 0
              },
              "throttled_millis" : 0,
              "requests_per_second" : "unlimited",
              "throttled_until_millis" : 0,
              "failures" : [ ]
             }

How it works...

The reindex functionality introduced in Elasticsearch 5.x provides an efficient way to reindex a document.

In the previous Elasticsearch version, this functionality was to be implemented at a client level. The advantages of the new Elasticsearch implementations are as follows:

  • Fast copy of data because it is completely managed to server side.
  • Better management of the operation due to the new task API.
  • Better error handling support as it is done at server level. This allows us to better manage failover during the reindex operation.

At server level, this action is composed of the following steps:

  1. Initialization of an Elasticsearch task to manage the operation.
  2. Creation of the target index and copying the source mappings if required.
  3. Executing a query to collect the documents to be reindexed.
  4. Reindex all the documents via bulk operations until all documents are reindexed.

The main parameters that can be provided to this action are:

  • The source section that manages how to select source documents. The most important sub sections are as follows:
    • index, which is the source index to be used. It can also be a list of indices.
    • type (optional), which is the source type to be reindexed. It can also be a list of types.
    • query (optional), which is an Elasticsearch query to be used to select parts of the document.
    • sort (optional), which can be used to provide a way of sorting the documents.
  • The dest section that manages how to control the target written documents. The most important parameters in this section are:
    • index, which is the target index to be used. If it is not available, it's created.
    • version_type (optional), if it is set to external, the external version is preserved.
    • routing (optional), which controls the routing in the destination index. It can be:
    • keep (the default), which preserves the original routing
    • discard, which discards the original routing
    • =<text>, which uses the text value for the routing
    • pipeline (optional), which allows you to define a custom pipeline for ingestion. We will see more about the ingestion pipeline in Chapter 13, Ingest.
  • size (optional), the number of documents to be reindexed.
  • script (optional), which allows you to define a scripting for document manipulation. This case will be discussed in the Reindex with a custom script recipe in Chapter 9, Scripting.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.177.151