The bulk API allows to perform many index/delete operations in a single API call. It can greatly increase the indexing speed and should be preferred for optimal performance.
You can use the bulk API as follows:
curl -XPUT localhost:9200/_bulk --data-binary @/Users/hakdogan/Desktop/bulk.json
We're providing text file input to curl in the preceding command; therefore, we must use the --data-binary
flag instead of plain -d
. After the data flag, the full path of the file beginning should be noted with the @
symbol. The contents of the file are as follows:
{ "create" : { "_index" : "my_index", "_type" : "my_type", "_id": 1} } { "title":"How to use the Bulk API"} { "create" : { "_index" : "my_index", "_type" : "my_type", "_id": 2} } { "title":"Sizing bulk requests"}
If a document with the same index and type exists already, the request will fail for the create
action. The following is an example of how to use the update
and delete
actions:
{ "update" : { "_index" : "my_index", "_type" : "my_type", "_id": 1} } { "doc": { "title":"How to use the Bulk API for indexing speed"} } { "delete" : { "_index" : "my_index", "_type" : "my_type", "_id": 2} }
In the preceding example, we provided index and type names explicitly in the file. If you provide the index or the index/type names in the command line, they will be used by default on bulk items that do not provide them explicitly. Also note that the file format uses literal 's as delimiters. You should pay attention to it.
Bulk sizing is important, especially when working with large data. It is important to know that there is no correct size of bulk request to perform in a single bulk action. There are some factors at this point, for example, physical size of documents (not document count of an index), cluster configuration, and so on. Ideal size will change for different situations. So, there is no current solution for every situation. Nevertheless, 5–10 MB per bulk can be recommended for the beginning. You can slowly increase it until you do not see performance gains anymore by monitoring your nodes. You can use the BulkProcessor
class for performing bulk sizing. It has setBulkSize
method, it takes a parameter of type ByteSizeValue
. This parameter defines at which size we want to flush the bulk.
18.117.8.216