The snapshot and restore APIs are very fast and the preferred way to back up data, but they have some limitations, such as:
To be able to copy data in this scenario, the solution is to use the reindex API using a remote server.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via command line, you need to install curl
for your operative system.
To copy an index from a remote server, we need to execute the following steps:
config/elasticsearch.yml
section reindex.remote.whitelist
in a similar line:reindex.remote.whitelist: ["192.168.1.227:9200"]
test-source
index data in a test-dest
via the remote REST endpoint in this way:curl -XPOST "http://localhost:9200/_reindex" -d' { "source": { "remote": { "host": "http://192.168.1.227:9200" }, "index": "test-source" }, "dest": { "index": "test-dest" } }'
The result will be similar to a local reindex that we have already seen in the Reindex an index recipe in Chapter 4, Basic Operations.
The reindex API allows you to call a remote cluster. Every version of the Elasticsearch server is supported (mainly 1.x or above).
The reindex API executes a scan query on the remote index cluster and puts the data in the current cluster. This process can take a lot of time, depending on the amount of data that needs to be copied and the time required to index that data.
The source section contains important parameters to control the fetched data, such as:
remote
: This is a section that contains information on the remote cluster connection.index
: This is the remote index that must be used to fetch the data. It can also be an alias or multiple indices via globs.query
: This parameter is optional: it's a standard query that can be used to select the document that must be copied.size
: This parameter is optional and the buffer is up to 200MB, the number of the documents to be used for the bulk read/write.The remote
section of the configuration is composed of the following parameters:
host
: The remote REST endpoint of the clusterusername
: The username to be used for copying the data (an optional parameter)password
: The password for the user to access the remote cluster (optional)There are a lot advantages to using this approach on standard snapshot and restore, including:
18.216.42.251