Using versioning

When working with NoSQL solutions such as Solr, we usually don't have the notion of transaction and we can't predict the sequence in which documents will be received by Solr and indexed especially when indexing is done from multiple threads and machines. However, in certain cases, such a functionality is needed at least to some degree. For example, we don't want to run an update on a document that was updated between the time period we read the document and sent the update. This recipe will show you how to avoid such situations.

Getting ready

This recipe uses the functionality discussed in the Updating document fields recipe from Chapter 2, Indexing Your Data. Read this recipe before proceeding.

How to do it...

For the purpose of this recipe, we assume that we have an e-commerce library. When updating prices of the books, we need to read the document to get the current price, update it in the UI, and index the document. However, it can happen that the same book is being updated by different people and we should reject concurrent updates.

  1. We will start with the index structure. For the purpose of this recipe, we assume that we have the following fields in the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="price" type="tfloat" indexed="true" stored="true" />
    <field name="_version_" type="long" indexed="true" stored="true"/>
  2. The next step is to index the test data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Solr cookbook</field>
      <field name="price">39.99</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Mechanics cookbook</field>
      <field name="price">19.99</field>
     </doc>
     <doc>
      <field name="id">3</field>
      <field name="name">ElasticSearch book</field>
      <field name="price">49.99</field>
     </doc>
    </add>
  3. Now if we would like to get our books and update them, we will just run a simple search similar to this:
    http://localhost:8983/solr/cookbook/select?q=*:*

    The response to the preceding query will look as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">0</int>
      <lst name="params">
       <str name="q">*:*</str>
      </lst>
     </lst>
     <result name="response" numFound="3" start="0">
      <doc>
       <str name="id">1</str>
       <str name="name">Solr cookbook</str>
       <float name="price">39.99</float>
       <long name="_version_">1481498739861356544</long></doc>
      <doc>
       <str name="id">2</str>
       <str name="name">Mechanics cookbook</str>
       <float name="price">19.99</float>
       <long name="_version_">1481498739931611136</long></doc>
      <doc>
       <str name="id">3</str>
       <str name="name">ElasticSearch book</str>
       <float name="price">49.99</float>
       <long name="_version_">1481498739932659712</long></doc>
     </result>
    </response>
  4. Now let's update the book called Solr cookbook, the one with the id field equal to 1. To do this, we will run the following command (note that the values of the _version_ field will be different if you run the example on your Solr, so adjust it accordingly):
    curl 'localhost:8983/solr/cookbook/update?commit=true&_version_=1481498739861356544' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":29.99}}]'
    

    If everything goes well, our book will be updated and we should see the following response returned by Solr:

    {"responseHeader":{"status":0,"QTime":79}}

    However, if someone has already modified the document, we will see a response similar to the following one:

    {"responseHeader":{"status":409,"QTime":4},"error":{"msg":"version conflict for 1 expected=1481498739861356544 actual=1481499824288169984","code":409}}

As we can see, the optimistic locking works. Let's now see how it works.

How it works...

To keep the recipe as simple as I can, I decided to keep the index structure simple. It consist of three fields—one responsible for document identifier (the id field), one used to hold the name of the book (the name field), and the last one used to hold the price of the book (the price field). Of course, we also have _version_, which is an internal field used by Solr for versioning, and it is required for SolrCloud deployments. Our example data is also very simple, so I'll skip discussing it.

As you can see in the response along with the stored fields in the results, Solr returns the _version_ field with a generated value. We use this value during the update request that we send to Solr. We set the _version_ field value in the request to the same value that we got in the search results. By doing this, we tell Solr that we expect it to update the document with a certain version.

The logic behind the _version_ field is as follows:

  • If the _version_ parameter is set to a value greater than 1 during the update, Solr will require the document to have the same version as the value of the parameter. If the versions don't match, the update will be rejected.
  • If the _version_ parameter is set to 1 during the update, Solr requires the updated document to exist and doesn't care about a specific version. If the document doesn't exist, the update will be rejected.
  • If the _version_ parameter is set to 0 during the update, Solr doesn't put any constrains on the document in the index. If the document exists, it will be updated; if the document doesn't exist, it will be created.
  • If the _version_ parameter is set to a value lower than 0 during the update, Solr will require the document not to exist in the index as no such document will be created. If the document exists, the update will be rejected.

As you can see, the second update that we made using the same _version_ value was not successful. This is because the document has already been updated and its version is already different. This is exactly what we are aiming for.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220