Time for action – updating an existing document

Sometimes you may want to update an existing document. Suppose we want to add to an existing document before a uri field that indicates its provenance:

  1. All we have to do is edit the XML file seen before adding a field value like this:
    <field name='url_string'>http://en.wikipedia.org/wiki/Design_Patterns</field>
  2. Then we re-execute the same cURL command:
    >> curl -X POST 'http://localhost:8983/solr/simple/update?commit=true&wt=json' -H 'Content-Type: text/xml' -d @docs.xml
    

What just happened?

Solr uses uniqueKey (in our case, ID) for identifying the document. The default update behavior in Solr is based on a delete-and-add strategy. If we execute our example, we will obtain the same document containing a new uri_string field. But this returned document is actually the result of a full delete (of the document with that ID) and a full post (of the entire modified document).

Verifying how the update process works is simple; however, we can create a new document containing only the ID and uri field. In this case the update will produce a result in which we will be able to recognize the last posted values. This may sound strange, but it is useful in most contexts as the metadata regarding a specific resource is maintained all at the same time, and deleting and rewriting them can help us maintain optimized indexes.

There can be other cases in which you would probably want to update on a per-field basis. This is quite similar to what normally happens with a database management system, and it introduces us to the context of atomic updates, which we will see in Chapter 3, Indexing Example Data from DBpedia – Paintings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.85.221