In the previous recipe, we defined a repository: the place where we will store the backups. Now we can create snapshots of indices, a full backup of an index, in the exact instant that the command is called.
For every repository it's possible to define multiple snapshots.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line you need to install curl
for your operating system.
To correctly execute the following command, the repository created in the previous recipe is required.
To manage a snapshot, we will perform the following steps:
snap_1
for the test
and test1
indices, the HTTP method is PUT
and the curl
command is as follows:curl -XPUT "http://localhost:9200/_snapshot/my_repository/snap_1? wait_for_completion=true" -d '{ "indices": " test-index,test-2", "ignore_unavailable": "true", "include_global_state": false }'
The result will be as follows:
{ "snapshot" : { "snapshot" : "snap_1", "uuid" : "h01mw-HATOiDMVp2k1xaLg", "version_id" : 5000099, "version" : "5.0.0", "indices" : [ "test-index" ], "state" : "SUCCESS", "start_time" : "2016-11-06T10:43:56.064Z", "start_time_in_millis" : 1478429036064, "end_time" : "2016-11-06T10:43:56.066Z", "end_time_in_millis" : 1478429036066, "duration_in_millis" : 2, "failures" : [ ], "shards" : { "total" : 5, "failed" : 0, "successful" : 5 } } }
If you check your filesystem, the /tmp/my_repository
directory is populated with some files such as: index
(a directory that contains our data), metadata-snap_1
, snapshot-snap_1
.
GET
and the curl
command is:curl -XGET 'http://localhost:9200/_snapshot/my_repository/snap_1?pretty'
The result will be the same of previous step.
DELETE
and the curl
command is:curl -XDELETE 'http://localhost:9200/_snapshot/my_repository/snap_1'
The result will be:
{"acknowledged":true}
The minimum configuration required to create a snapshot is the name of the repository and the name of the snapshot (that is snap_1
).
If no other parameters are set, the snapshot command will dump all the cluster data. To control the snapshot process, some parameters are available:
indices
(a comma delimited list of indices, wildcards are accepted), this controls the indices that must be dumped.ignore_unavailable
(default false
), this prevents the snapshot from failing if some indices are missing.include_global_state
(defaults to true
, available values are true
/false
/partial
), this controls storing the global state in the snapshot. If a primary shard is not available, the snapshot fails.The query argument wait_for_completion
, used also in the example, allows you to wait for the snapshot to end before returning the call. It's very useful if you want to automate your snapshot script to sequentially back up indices.
If the wait_for_completion
is not set, in order to check the snapshot status, a user must monitor it via the snapshot GET
call.
The snapshots are incremental, so only changed files are copied between two snapshots of the same index. This approach reduces both the time and disk usage during snapshots.
The snapshot process is designed to be as fast as possible, so it implemented a direct copy of Lucene index segments in the repository. To prevent changes and index corruption during the copy, all the segments needed to be copied are blocked from changing until the end of the snapshot.
Elasticsearch takes care of everything during a snapshot, including preventing writing data to files that are in the snapshot process, and managing cluster events (shard relocating, failures, and so on).
To retrieve all the available snapshots for a repository the command is:
curl -XGET 'http://localhost:9200/_snapshot/my_repository/_all'
The snapshot process can be monitored via the _status
endpoint, which provides a complete overview of the snapshot status.
For the current example, the snapshot _status
API call will be:
curl -XGET
"http://localhost:9200/_snapshot/my_repository/snap_1/_status?pretty"
The result is very long and consists of the following sections:
{ "snapshots" : [ { "snapshot" : "snap_1", "uuid" : "BZQRYMPTRAyGr6b8k0h9jQ", "repository" : "my_repository", "state" : "SUCCESS",
"shards_stats" : { "initializing" : 0, "started" : 0, "finalizing" : 0, "done" : 5, "failed" : 0, "total" : 5 },
"stats" : { "number_of_files" : 125, "processed_files" : 125, "total_size_in_bytes" : 1497330, "processed_size_in_bytes" : 1497330, "start_time_in_millis" : 1415914845427, "time_in_millis" : 1254 },
"indices" : { "test-index" : { "shards_stats" : { "initializing" : 0, "started" : 0, "finalizing" : 0, "done" : 5, "failed" : 0, "total" : 5 }, "stats" : { "number_of_files" : 125, "processed_files" : 125, "total_size_in_bytes" : 1497330, "processed_size_in_bytes" : 1497330, "start_time_in_millis" : 1415914845427, "time_in_millis" : 1254 },
"shards" : { "0" : { "stage" : "DONE", "stats" : { "number_of_files" : 25, "processed_files" : 25, "total_size_in_bytes" : 304773, "processed_size_in_bytes" : 304773, "start_time_in_millis" : 1415914845427, "time_in_millis" : 813 } },... truncated...
The status response is very rich, and it can also be used to estimate the performance of the snapshot and the size required in time for the incremental backups.
18.191.93.12