In Elasticsearch there are two vital operations: index and search.
Indexing means storing one or more documents in an index: a similar concept of inserting records in a relational database.
In Lucene, the core engine of Elasticsearch, inserting or updating a document has the same cost: in Lucene and Elasticsearch update means replace.
You need an up-and-running Elasticsearch installation, as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To correctly execute the following commands, use the index and mapping created in the Putting a mapping in an index recipe.
To index a document, several REST entry points can be used:
Method |
URL |
|
|
|
|
|
|
To index a document, we need to perform the following steps:
order
of the previous chapter, the call to index a document will be as follows:curl -XPOST 'http://localhost:9200/myindex/order/2qLrAfPVQvCRMe7Ku8r0Tw' -d '{ "id" : "1234", "date" : "2013-06-07T12:14:54", "customer_id" : "customer1", "sent" : true, "in_stock_items" : 0, "items":[ {"name":"item1", "quantity":3, "vat":20.0}, {"name":"item2", "quantity":2, "vat":20.0}, {"name":"item3", "quantity":1, "vat":10.0} ] }'
{ "_index" : "myindex", "_type" : "order", "_id" : "2qLrAfPVQvCRMe7Ku8r0Tw", "_version" : 1, "forced_refresh" : false, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }
Some additional information is returned from the index operation, such as:
2qLrAfPVQvCRMe7Ku8r0Tw
)"create":true
in this example)One of the most used APIs in Elasticsearch is the index. Basically, indexing a JSON document consists internally in the following steps:
It's important to choose the correct ID for indexing your data. If you don't provide an ID, Elasticsearch during the indexing phase will automatically associate a new one to your document. To improve performances, the ID should generally be of the same character length to improve balancing of the data tree that stores them.
Due to the REST call nature, it's better to pay attention when not using ASCII characters due to URL encoding and decoding (or be sure that the client framework you use correctly escapes them).
Depending on the mappings, other actions take place during the indexing phase: propagation on replica, nested processing, and percolator.
The document will be available for standard search calls after a refresh (forced with an API call or after the time slice of 1 second, Neal Real-Time): every GET API on the document doesn't require a refresh and can be instantly available.
The refresh can also be forced by specifying the refresh
parameter during indexing.
Elasticsearch allows passing the index API URL several query parameters to control how the document is indexed. The most used ones are as follows:
routing
: Which controls the shard to be used for indexing, that is:curl -XPOST 'http://localhost:9200/myindex/order?routing=1'
parent
: Which defines the parent of a child document and uses this value to apply routing. The parent object must be specified in the mappings:curl -XPOST 'http://localhost:9200/myindex/order?parent=12'
timestamp
: The timestamp to be used in indexing the document. It must be activated in the mappings:curl -XPOST 'http://localhost:9200/myindex/order?timestamp= 2013-01-25T19%3A22%3A22'
consistency(one/quorum/all)
: By default, an index operation succeeds if a quorum (>replica/2+1
) of active shards are available. The right consistency value can be changed for index action:curl -XPOST 'http://localhost:9200/myindex/order? consistency=one'
replication (sync/async)
: Elasticsearch returns from an index operation when all the shards of the current replication group have executed the index operation. Setting the replication async, allows us to execute the index action synchronous only on the primary shard and asynchronous on secondary shards. In this way, the API call returns the response action faster:curl -XPOST 'http://localhost:9200/myindex/order? replication=async' ...
version
: The version allows us to use the Optimistic Concurrency Control (http://en.wikipedia.org/wiki/Optimistic_concurrency_control). The first time index of a document, the version 1, is set on the document. At every update this value is incremented. Optimistic Concurrency Control is a way to manage concurrency in every insert or update operation. The passed version value is the last seen version (usually returned by a get or a search). The index happens only if the current index version value is equal to the passed one:curl -XPOST 'http://localhost:9200/myindex/order?version=2' ...
op_type
: Which can be used to force a create on a document. If a document with the same ID exists, the index fails:curl -XPOST 'http://localhost:9200/myindex/order? op_type=create'...
refresh
: Which forces a refresh after having indexed the document. It allows having documents ready for search after indexing them:curl -XPOST 'http://localhost:9200/myindex/order? refresh=true'...
timeout
: Which defines a time to wait for the primary shard to be available. Sometimes the primary shard is not in a writable status (relocating or recovering from a gateway) and a timeout for the write operation is raised after one minute:curl -XPOST 'http://localhost:9200/myindex/order?timeout=5m' ...
3.144.93.141