Updating with scripting

ElasticSearch allows updating a document in-place.

Updating a document via scripting reduces networking traffic (otherwise, you need to fetch the document, change the field, and send it back) and allows improving performance when you need to process a huge amount of documents.

Getting ready

You need a working ElasticSearch cluster and an index populated with the script used for facet processing, available in the online code.

How to do it...

For updating using a scripting, we will perform the following steps:

  1. We'll write an update action that adds a tag value to a list of tags available in the source of a document. It should look as shown in the following code:
     curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/9/_update?&pretty=true' -d '{
        "script" : "ctx._source.tag += tag",
        "params" : {
            "tag" : "cool"
        }
    }'
  2. If everything is correct, the result returned by ElasticSearch should be:
    {
      "ok" : true,
      "_index" : "test-index",
      "_type" : "test-type",
      "_id" : "9",
      "_version" : 2
    }

How it works...

The REST HTTP method used to update a document is POST.

The URL contains only the index name, the type, and the document ID, as follows:

http://<server>/<index_name>/<type>/<document_id>/_update

The update action is composed of three different steps:

  • Get operation, very fast: This operation works on real-time data (no need to refresh) and retrieves the record
  • Script execution: The script is executed on the document and, if required, the document is updated
  • Saving the document: The document, if required, is saved

The script execution follows the workflow in the following manner:

  • The script is compiled and the result is cached to improve re-execution. The compilation depends on the scripting language; it allows detecting errors in the script such as typographical errors, syntax errors and language-related errors. The compilation step can be CPU bound, so ElasticSearch caches the compilation results for further execution.
  • The document is executed in the script context. The document data is available in the ctx variable in the script.

The update script can set several parameters in the ctx variable. The most important parameters are:

  • ctx._source: This contains the source of the document
  • ctx._timestamp: If it's defined, this value is set to the document timestamp
  • ctx.op: This defines the main operation type to be executed. There are several available values, such as:
    • index: The default value is nothing is defined: the record is re-indexed with the update values
    • delete: The document is deleted after the update
    • none: The document is skipped without re-indexing the document

Tip

If you need to execute a large number of update operations, it's better to perform them in bulk to improve your application's performance.

There's more...

The previous example can be rewritten using the JavaScript language, and it looks as shown in the following code:

curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/9/_update?&pretty=true' -d '{
    "script" : "ctx._source.tag += tag",
    "lang":"js",
    "params" : {
        "tag" : "cool"
    }
}'

The previous example can be written using the Python language, as follows:

curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/9/_update?&pretty=true' -d '{
    "script" : "ctx["_source"]["tag"] = list(ctx["_source"]["tag"]) + [tag]",
    "lang":"python",
    "params" : {
        "tag" : "cool"
    }
}'

In the Python example, the Java list must be converted into a Python list to allow add elements; the back conversion is automatically done.

Tip

To improve the performance if a field is not changed, it's a good practice to set the ctx._op variable equal to none to disable the indexing of the unchanged document.

In the following example we will execute an update that adds new "tags" and "labels" to an object, but we will mark for indexing the document only if the tags or labels values are changed.

curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/9/_update?&pretty=true' -d '{
  "script" : "ctx.op = "none";
  if(ctx._source.containsValue("tags")){
    foreach(item:new_tags){
      if(!ctx._source.tags.contains(item)){
        ctx._source.tags += item;
        ctx.op = "index";
      }
    }
  }else{
    ctx._source.tags=new_tags;
    ctx.op = "index";
  };
  if(ctx._source.containsValue("labels")){
    foreach(item:new_labels){
      if(!ctx._source.labels.contains(item)){
        ctx._source.labels += item;
        ctx.op = "index";
      }
    }
  }else{
    ctx._source.labels=new_labels;
    ctx.op = "index";
  };",
  "params" : {
    "new_tags" : ["cool", "nice"],
    "new_labels" : ["red", "blue", "green"]
  }
}'

The preceding script uses the following steps:

  1. It marks the operation to none to prevent indexing if in the following steps the original source is not changed.
  2. It checks if the tags field is available in the source object.
  3. If the tags field is available in the source object, it iterates all the values of the new_tags list. If the value is not available in the current tags list, it adds it and updates the operation to index.
  4. It the tags field doesn't exist in the source object, it simply adds it to the source and marks the operation to index.
  5. The steps from 2 to 4 are repeated for the labels value. The repetition is present in this example to show the ElasticSearch user how it is possible to update multiple values in a single update operation.

This script could be quite complex, but it shows the powerful capabilities of scripting in ElasticSearch.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.142.74