How does the snapshot process works?

As stated earlier, a repository can contain multiple snapshots of the same cluster. Therefore, the snapshots files are stored in compact form. This means that your data will not be repeated when you have multiple snapshots of the same indices. At first, Elasticsearch checks the list of the index files. Then, it copies only newly created or changed files since the last snapshot. Now look at the following example:

curl -XGET localhost:9200/my_index/_search?pretty
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "my_index",
            "_type": "snapshot",
            "_id": "AVCmN4l-7pWKrBPkopj3",
            "_score": 1,
            "_source": {
               "title": "Document A"
            }
         },
         {
            "_index": "my_index",
            "_type": "snapshot",
            "_id": "AVCmN5iN7pWKrBPkopj4",
            "_score": 1,
            "_source": {
               "title": "Document B"
            }
         }
      ]
   }
}

We have an index named my_index and it stores two documents. Let's create a snapshot of this index for now as follows:

curl -XPUT localhost:9200/_snapshot/my_backup/first_snapshot -d '{
  "indices": "my_index",
  "ignore_unavailable": false,
  "include_global_state": true,
  "partial": true
}'
{"accepted":true}

Now let's add another document to the index:

curl -XPOST localhost:9200/my_index/snapshot -d '{
  "title": "Document C"
}'

And now let's create a new snapshot again for this index:

curl -XPUT localhost:9200/_snapshot/my_backup/second_snapshot -d '{
  "indices": "my_index",
  "ignore_unavailable": false,
  "include_global_state": true,
  "partial": true
}'
{"accepted":true}

Let's now get for information about the two snapshots that we created:

curl -XGET localhost:9200/_snapshot/my_backup/first_snapshot/_status
{
   "snapshots": [
      {
         "snapshot": "first_snapshot",
         "repository": "kodcucomfs",
         "state": "SUCCESS",
         "shards_stats": {
            "initializing": 0,
            "started": 0,
            "finalizing": 0,
            "done": 1,
            "failed": 0,
            "total": 1
         },
         "stats": {
            "number_of_files": 7,
            "processed_files": 7,
            "total_size_in_bytes": 5059,
            "processed_size_in_bytes": 5059,
            "start_time_in_millis": 1445897714549,
            "time_in_millis": 7
         },
         ...
curl -XGET localhost:9200/_snapshot/my_backup/second_snapshot/_status
{
   "snapshots": [
      {
         "snapshot": "second_snapshot",
         "repository": "kodcucomfs",
         "state": "SUCCESS",
         "shards_stats": {
            "initializing": 0,
            "started": 0,
            "finalizing": 0,
            "done": 1,
            "failed": 0,
            "total": 1
         },
         "stats": {
            "number_of_files": 4,
            "processed_files": 4,
            "total_size_in_bytes": 2667,
            "processed_size_in_bytes": 2667,
            "start_time_in_millis": 1445897737400,
            "time_in_millis": 7
         },
  …

As you can see, the size of first_snapshot is approximately twice the size of second_snapshot. The reason is that my_index had two documents during the first_snapshot creation. We have created the second_snapshot after adding the third document in the my_index. So, the second_snapshot includes reference to one document while the first_snapshot includes reference to two documents. This intelligent behavior saves time and system resources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.210.104