There are times when migrating data from one collection to another is a good option, for example, if you have data of multiple clients in different shards. Some of the clients are paying for faster searches and more indexing throughput, and you would like to migrate the data of those clients to another collection so that it can be moved to new, more powerful nodes. If we use routing during indexation, Solr has a nice feature for us—the Collections API and its migrate
command. This recipe will show you how to use it.
Before continuing, you should read the Using routing recipe in Chapter 7, In the Cloud. It provides a description on how to use routing, which is essential to fully understand this recipe. We also assume that we have two collections—one called customers
that will hold our data and the second, empty one called important_customers
. Both the collections were created using the same configuration shown in this recipe. If you want to know more about how to create a new SolrCloud cluster, refer to the Creating a new SolrCloud cluster recipe in Chapter 7, In the Cloud. This recipe will show you how to create a new SolrCloud cluster and create a collection.
For the purpose of this recipe, we will use the following index structure (we need to add the following section to our schema.xml
file):
<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text_general" indexed="true" stored="true" /> <field name="customer" type="string" indexed="true" stored="true" />
For a customer with the name customer_1
, we have the following data (stored in a file called data_customer_1.xml
):
<add> <doc> <field name="id">customer_1!1</field> <field name="title">Customer document 1</field> <field name="customer">customer_1</field> </doc> <doc> <field name="id">customer_1!2</field> <field name="title">Customer document 2</field> <field name="customer">customer_1</field> </doc> </add>
For a customer with the name customer_2
, we have the following data (stored in a file called data_customer_2.xml
):
<add> <doc> <field name="id">customer_2!3</field> <field name="title">Customer document 3</field> <field name="customer">customer_2</field> </doc> <doc> <field name="id">customer_2!4</field> <field name="title">Customer document 4</field> <field name="customer">customer_2</field> </doc> </add>
We assume that we have the data indexed into the collection called customers
:
customer_2
collection to another collection called important_customers
that we already created and that is empty. To do this, we will run the following command:curl 'localhost:8983/solr/admin/collections?action=MIGRATE&collection=customers&target.collection=important_customers&split.key=customer_2!&forward.timeout=60'
commit
command to force the reload of index reader. We will do this using the following command:curl 'http://localhost:8983/solr/important_customers/update' --data-binary '<commit/>' -H 'Content-type:application/xml'
http://localhost:8983/solr/important_customers/select?q=*:*
The response should be as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">*:*</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="id">customer_2!3</str> <str name="title">Customer document 3</str> <str name="customer">customer_2</str> <long name="_version_">1481976223112364032</long></doc> <doc> <str name="id">customer_2!4</str> <str name="title">Customer document 4</str> <str name="customer">customer_2</str> <long name="_version_">1481976223113412608</long></doc> </result> </response>
As we can see in the new collection, we have only the data for the customer we wanted. We can now remove this data from the original collection because Solr doesn't do that by default.
Let's now see how it works.
Our index structure is very simple. Each of the documents has three fields—one for the document identifier (the id
field), one for the title of the document (the title
field), and the field called
customer that will be used for filtering. Each identifier is prefixed with the customer name followed by the !
character. We talked about this already in Chapter 7, In the Cloud; when using the composite routing, we can prefix the document identifier with a value and the !
character after that. This means that Solr will use the cookbook1234
value to determine in which shard the document will be indexed.
We assumed that we have the data indexed and we want to migrate the data of the second customer to another collection that we created upfront. We do this by running the MIGRATE
command to the Collections API (action=MIGRATE
sent to the /solr/admin/collections
REST endpoint). We provide the source collection, which is in our case the customers
collection (the collection=customers
parameter), and we provide the target collection to where the data should be migrated to (the target.collection=important_customers
parameter). In addition to this, we need the routing key, which we used during indexation, which for our second customer is customer_2!
(the split.key
parameter). Finally, we define the forward.timeout
parameter that controls for how long Solr will re-route the write request from the source collection to the target one. It is the user's responsibility to switch read and write operations to the target collection after the migration has been done. Note that the source collection will not be modified by the migrate request. The migration of documents is a synchronous operation and it is advised to keep the timeout on the client side high, although even with that the HTTP
command might timeout during execution when a large number of documents need to be migrated. This doesn't mean that the operation will not be successful—Solr will continue the migration in the background and you should check the logs if errors occur.
There are a few things that we need to remember when migrating data between collections:
shard.key
parameter value spans multiple shards. Solr will do this automatically.migrate
command although they should be removed once the command finishes executing.CompositeId
router.Finally, after the command was successful, we send the commit
command to refresh the index reader so that the data is visible. As we can see in the response, everything went well and we can now remove data from the original collection.
3.146.221.149