Getting a document

After having indexed a document, during your application life it must probably be retrieved.

The GET REST call allows us to get a document in real time without the need of a refresh.

Getting ready

You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, use the indexed document in the Indexing a document recipe.

How to do it...

The GET method allows us to return a document given its index, type, and ID.

The REST API URL is:

http://<server>/<index_name>/<type_name>/<id>

To get a document, we will perform the following steps:

  1. If we consider the document, which we had indexed in the previous recipe, the call will be:
            curl -XGET 
            'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw?
            pretty=true'
    
  2. The result returned by Elasticsearch should be the indexed document:
            {
            "_index":"myindex",
            "_type":"order",
            "_id":"2qLrAfPVQvCRMe7Ku8r0Tw",
            "_version":1,
            "found":true, 
            "_source" : {
                "id" : "1234",
                "date" : "2013-06-07T12:14:54",
                "customer_id" : "customer1",
                "sent" : true,
                "items":[
                    {"name":"item1", "quantity":3, "vat":20.0},
                    {"name":"item2", "quantity":2, "vat":20.0},
                    {"name":"item3", "quantity":1, "vat":10.0}
                ]
            }}
    
  3. Our indexed data is contained in the _source parameter, but other information is returned:
    • _index: The index that stores the document
    • _type: The type of the document
    • _id: The ID of the document
    • _version: The version of the document
    • found: Whether the document has been found
    • If the record is missing, an error 404 is returned as status code and the return JSON will be:
                   {
                     "_id": "2qLrAfPVQvCRMe7Ku8r0Tw",
                     "_index": "myindex",
                     "_type": "order",
                     "found": false
                   }

How it works...

The Elasticsearch GET API on the document doesn't require a refresh: all the GET calls are in real time.

This call is very fast because Elasticsearch redirects the search only on the shard that contains the document without an other overhead, and the document IDs are often cached in memory for fast look up.

The source of the document is only available if the _source field is stored (default settings in Elasticsearch).

There are several additional parameters that can be used to control the get call:

  • fields allow us to retrieve only a subset of fields. This is very useful to reduce bandwidth or to retrieve calculated fields such as the attachment mapping ones:
        curl 'http://localhost:9200/myindex/order/
        2qLrAfPVQvCRMe7Ku8r0Tw?fields=date,sent'
  • routing allows us to specify the shard to be used for the get operation. To retrieve a document, the routing used in indexing time must be the same as the search time:
        curl 'http://localhost:9200/myindex/order/
        2qLrAfPVQvCRMe7Ku8r0Tw?routing=customer_id'
  • refresh allows us to refresh the current shard before doing the get operation (it must be used with care because it slows down indexing and introduces some overhead):
        curl http://localhost:9200/myindex/order/
        2qLrAfPVQvCRMe7Ku8r0Tw?refresh=true
  • preference allows us to control which shard replica is chosen to execute the GET method. Generally, Elasticsearch chooses a random shard for the GET call. The possible values are as follows:
    • _primary for the primary shard.
    • _local, first trying the local shard and then falling back to a random choice. Using the local shard reduces the bandwidth usage and should generally be used with autoreplicating shards (replica set to 0-all).
    • custom value for selecting a shard related value such as customer_id and username.

There is more...

The GET API is very fast, so a good practice for developing applications is to try to use as much as possible. Choosing the correct ID form during application development can bring a big boost in performance.

If the shard, which contains the document, is not bound to an ID, to fetch the document a query with an ID filter (we will see them in Chapter 6, Text and Numeric Queries in the Using a IDS query recipe) is required.

If you don't need to fetch the record, but only check the existence, you can replace GET with HEAD and the response will be status code 200 if it exists, or 404 if it is missing.

The GET call has also a special endpoint _source that allows fetching only the source of the document.

The GET source REST API URL is:

http://<server>/<index_name>/<type_name>/<id>/_source

To fetch the source of the previous order, we will call:

    curl -XGET  
    http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw/_source

See also

  • Refer to the Speeding up the GET operation recipe in this chapter to learn how to execute multiple GET in one shot to reduce fetching time.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.113.208