Bulk API

The bulk API allows to perform many index/delete operations in a single API call. It can greatly increase the indexing speed and should be preferred for optimal performance.

You can use the bulk API as follows:

curl -XPUT localhost:9200/_bulk --data-binary @/Users/hakdogan/Desktop/bulk.json

We're providing text file input to curl in the preceding command; therefore, we must use the --data-binary flag instead of plain -d. After the data flag, the full path of the file beginning should be noted with the @ symbol. The contents of the file are as follows:

{ "create" : { "_index" : "my_index", "_type" : "my_type", "_id": 1} }
{ "title":"How to use the Bulk API"}
{ "create" : { "_index" : "my_index", "_type" : "my_type", "_id": 2} }
{ "title":"Sizing bulk requests"}

If a document with the same index and type exists already, the request will fail for the create action. The following is an example of how to use the update and delete actions:

{ "update" : { "_index" : "my_index", "_type" : "my_type", "_id": 1} }
{ "doc": { "title":"How to use the Bulk API for indexing speed"} }
{ "delete" : { "_index" : "my_index", "_type" : "my_type", "_id": 2} }

In the preceding example, we provided index and type names explicitly in the file. If you provide the index or the index/type names in the command line, they will be used by default on bulk items that do not provide them explicitly. Also note that the file format uses literal 's as delimiters. You should pay attention to it.

Note

You must be sure that the client does not send HTTP chunks when using the HTTP API because this attempt will slow your work down.

Bulk sizing

Bulk sizing is important, especially when working with large data. It is important to know that there is no correct size of bulk request to perform in a single bulk action. There are some factors at this point, for example, physical size of documents (not document count of an index), cluster configuration, and so on. Ideal size will change for different situations. So, there is no current solution for every situation. Nevertheless, 5–10 MB per bulk can be recommended for the beginning. You can slowly increase it until you do not see performance gains anymore by monitoring your nodes. You can use the BulkProcessor class for performing bulk sizing. It has setBulkSize method, it takes a parameter of type ByteSizeValue. This parameter defines at which size we want to flush the bulk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.8.216