Updating a document

Documents stored in Elasticsearch can be updated during their lives. There are two available solutions to do this operation in Elasticsearch: adding a new document or using the update call.

The update call can work in two ways:

  • Providing a script that uses the update strategy
  • Providing a document that must be merged with the original one

The main advantage of an update versus an index is the networking reduction.

Getting ready

You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, use the indexed document in the Indexing a document recipe.

To use dynamic scripting languages, they must be enabled: see Chapter 9, Scripting.

How to do it...

As we are changing the state of the data, the HTTP method is POST and the REST URL is:

http://<server>/<index_name>/<type_name>/<id>/_update

To update a document, we will perform the following steps:

  1. If we consider the type order of the previous recipe, the call to update a document will be:
            curl -XPOST   
            'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/
            _update?pretty' -d '{
                "script" : {
                  "inline":"ctx._source.in_stock_items += params.count",
                  "params" : {
                    "count" : 4
                }
              }
            }'
    
  2. If the request is successful, the result returned by Elasticsearch should be:
            {
              "_index" : "myindex",
              "_type" : "order",
              "_id" : "2qLrAfPVQvCRMe7Ku8r0Tw",
              "_version" : 2,
              "result" : "updated",
              "_shards" : {
                "total" : 2,
                "successful" : 1,
                "failed" : 0
              }
            }
    
  3. The record will be as follows:
            {
                "_id": "2qLrAfPVQvCRMe7Ku8r0Tw",
                "_index": "myindex",
                "_source": {
                    "customer_id": "customer1",
                    "date": "2013-06-07T12:14:54",
                    "id": "1234",
        
              "in_stock_items": 4,
        
            ....
                    "sent": true
                },
                "_type": "order",
                "_version": 3,
                "exists": true
            }
    
  4. The visible changes are:

    The scripted field is changed

    The version is incremented

    If you are using Elasticsearch (1.2 or above) and you have disabled scripting support (default configuration), an error will be raised:

                {
                  "error" : {
                    "root_cause" : [
                      {
                        "type" : "remote_transport_exception",
                        "reason" : "[Gin Genie][127.0.0.1:9300] 
                        [indices:data/write/update[s]]"
                      }
                    ],
                    "type" : "illegal_argument_exception",
                    "reason" : "failed to execute script",
                    "caused_by" : {
                      "type" : "illegal_state_exception",
                      "reason" : "scripts of type [inline], operation 
                      [update] and lang [painless] are disabled"
                    }
                  },
                  "status" : 400
                }
    

How it works...

The update operation takes a document, it applies to this document the changes required in the script or in the update document, and it will reindex the changed document. In Chapter 9, Scripting a we will explore the scripting capabilities of Elasticsearch.

The standard language for scripting in Elasticsearch is Painless and it's used in these examples.

The script can operate on the ctx._source: the source of the document (it must be stored to work) and it can change the document in place. It's possible to pass parameters to a script passing a JSON object. These parameters are available in the execution context.

A script can control the Elasticsearch behavior after the script execution via setting the ctx.op value of the context. Available values are as follows:

  • ctx.op="delete" the document will be deleted after the script execution.
  • ctx.op="none" the document will skip the indexing process. A good practice to improve performances it is to set the ctx.op="none" if the script doesn't update the document to prevent a reindexing overhead.

The ctx also manages the timestamp of the record in ctx._timestamp. It's possible to also pass an additional object in the upsert property to be used if the document is not available in the index:

 curl -XPOST 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_update' -d '{
 "script" : {
  "inline":"ctx._source.in_stock_items += params.count",
  "params" : {
  "count" : 4
  }
  },
 "upsert" : {"in_stock_items":4}}'

If you need to replace some field values, a good solution is not to write complex update script, but to use the special property doc, which allows us to overwrite the values of an object. The document provided in the doc parameter will be merged with the original one. This approach is easier to use, but it cannot set the ctx.op, so if the update doesn't change the value of the original document, the next successive phase will always be executed:

   curl -XPOST 
   'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_update'  
   -d '{"doc" : {"in_stock_items":10}}'

If the original document is missing, it is possible to provide a doc value (the document to be created) for an upsert as a doc_as_upsert parameter:

   curl -XPOST    
   'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_update'
   -d '{"doc" : {"in_stock_items":10}, "doc_as_upsert":true}'

Using Painless scripting, it is possible to apply an advanced operation on fields, such as:

  • Remove a field, that is:
        "script" : {"inline": "ctx._source.remove("myfield"}}
  • Add a new field, that is:
        "script" : {"inline": "ctx._source.myfield=myvalue"}}

The update REST call is very useful because it has some advantages:

  • It reduces the bandwidth usage, because the update operation doesn't need a round trip to the client of the data
  • It's safer, because it automatically manages the optimistic concurrent control: if a change happens during script execution, the script it's reexecuted with updates the data
  • It can be bulk executed

See also

  • Refer to the following recipe, Speeding up atomic operations, to learn how to use bulk operations to reduce the networking load and speed up ingestion
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.182.50