Executing a snapshot

In the previous recipe, we defined a repository: the place where we will store the backups. Now we can create snapshots of indices, a full backup of an index, in the exact instant that the command is called.

For every repository it's possible to define multiple snapshots.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line you need to install curl for your operating system.

To correctly execute the following command, the repository created in the previous recipe is required.

How to do it...

To manage a snapshot, we will perform the following steps:

  1. To create a snapshot called snap_1 for the test and test1 indices, the HTTP method is PUT and the curl command is as follows:
            curl -XPUT 
            "http://localhost:9200/_snapshot/my_repository/snap_1?
            wait_for_completion=true" -d '{
            "indices": " test-index,test-2",
            "ignore_unavailable": "true",
            "include_global_state": false
            }'
    

    The result will be as follows:

                 { 
                  "snapshot" : { 
                    "snapshot" : "snap_1", 
                    "uuid" : "h01mw-HATOiDMVp2k1xaLg", 
                    "version_id" : 5000099, 
                    "version" : "5.0.0", 
                    "indices" : [ "test-index" ], 
                    "state" : "SUCCESS", 
                    "start_time" : "2016-11-06T10:43:56.064Z", 
                    "start_time_in_millis" : 1478429036064, 
                    "end_time" : "2016-11-06T10:43:56.066Z", 
                    "end_time_in_millis" : 1478429036066, 
                    "duration_in_millis" : 2, 
                    "failures" : [ ], 
                    "shards" : { 
                    "total" : 5, 
                      "failed" : 0, 
                      "successful" : 5 
                     } 
                   } 
                 }

    If you check your filesystem, the /tmp/my_repository directory is populated with some files such as: index (a directory that contains our data), metadata-snap_1, snapshot-snap_1.

  2. To retrieve snapshot information, the HTTP method is GET and the curl command is:
            curl -XGET 
            'http://localhost:9200/_snapshot/my_repository/snap_1?pretty'
    

    The result will be the same of previous step.

  3. To delete a snapshot, the HTTP method is DELETE and the curl command is:
            curl -XDELETE 
            'http://localhost:9200/_snapshot/my_repository/snap_1'
    

    The result will be:

                  {"acknowledged":true}
    

How it works...

The minimum configuration required to create a snapshot is the name of the repository and the name of the snapshot (that is snap_1).

If no other parameters are set, the snapshot command will dump all the cluster data. To control the snapshot process, some parameters are available:

  • indices (a comma delimited list of indices, wildcards are accepted), this controls the indices that must be dumped.
  • ignore_unavailable (default false), this prevents the snapshot from failing if some indices are missing.
  • include_global_state (defaults to true, available values are true/false/partial), this controls storing the global state in the snapshot. If a primary shard is not available, the snapshot fails.

The query argument wait_for_completion, used also in the example, allows you to wait for the snapshot to end before returning the call. It's very useful if you want to automate your snapshot script to sequentially back up indices.

If the wait_for_completion is not set, in order to check the snapshot status, a user must monitor it via the snapshot GET call.

The snapshots are incremental, so only changed files are copied between two snapshots of the same index. This approach reduces both the time and disk usage during snapshots.

The snapshot process is designed to be as fast as possible, so it implemented a direct copy of Lucene index segments in the repository. To prevent changes and index corruption during the copy, all the segments needed to be copied are blocked from changing until the end of the snapshot.

Note

Lucene's segment copy is at the shard level, so if you have a cluster of several nodes, and you have a local repository, the snapshot is spread through all the nodes. For this reason, in a production cluster the repository must be shared in order to easily collect all the backup fragments.

Elasticsearch takes care of everything during a snapshot, including preventing writing data to files that are in the snapshot process, and managing cluster events (shard relocating, failures, and so on).

To retrieve all the available snapshots for a repository the command is:

curl -XGET 'http://localhost:9200/_snapshot/my_repository/_all'

There's more...

The snapshot process can be monitored via the _status endpoint, which provides a complete overview of the snapshot status.

For the current example, the snapshot _status API call will be:

curl -XGET 
"http://localhost:9200/_snapshot/my_repository/snap_1/_status?pretty"

The result is very long and consists of the following sections:

  • Information about the snapshot:
             { 
              "snapshots" : [ { 
                "snapshot" : "snap_1", 
                "uuid" : "BZQRYMPTRAyGr6b8k0h9jQ", 
                "repository" : "my_repository", 
                "state" : "SUCCESS", 
    
  • Global shards statistics:
            "shards_stats" : { 
              "initializing" : 0, 
              "started" : 0, 
              "finalizing" : 0, 
              "done" : 5, 
              "failed" : 0, 
              "total" : 5 
            }, 
    
  • Snapshot's global statistics:
            "stats" : { 
              "number_of_files" : 125, 
              "processed_files" : 125, 
              "total_size_in_bytes" : 1497330, 
              "processed_size_in_bytes" : 1497330, 
              "start_time_in_millis" : 1415914845427, 
              "time_in_millis" : 1254 
            }, 
    
  • Drill-down of snapshot index statistics:
            "indices" : { 
              "test-index" : { 
                "shards_stats" : { 
                  "initializing" : 0, 
                  "started" : 0, 
                  "finalizing" : 0, 
                  "done" : 5, 
                  "failed" : 0, 
                  "total" : 5 
                }, 
                "stats" : { 
                  "number_of_files" : 125, 
                  "processed_files" : 125, 
                  "total_size_in_bytes" : 1497330, 
                  "processed_size_in_bytes" : 1497330, 
                  "start_time_in_millis" : 1415914845427, 
                  "time_in_millis" : 1254 
                }, 
    
  • Statistics for each index and shard:
              "shards" : { 
                "0" : { 
                  "stage" : "DONE", 
                  "stats" : { 
                    "number_of_files" : 25, 
                    "processed_files" : 25, 
                    "total_size_in_bytes" : 304773, 
                    "processed_size_in_bytes" : 304773, 
                    "start_time_in_millis" : 1415914845427, 
                    "time_in_millis" : 813 
                  } 
                },... truncated... 
    

The status response is very rich, and it can also be used to estimate the performance of the snapshot and the size required in time for the incremental backups.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.7.224