In SolrCloud
, the main goal behind distributed indexing is to send a document to any node in the cluster and have that document indexed in the shard.
Solr uses a document router to assign a document to a shard. There are two basic document routing strategies:
compositeId
(default)In compositeId
(default), when we send documents to Solr for indexing, Solr uses the hash of the document to distribute the load to multiple Solr instances. Previously in this chapter, we added a few documents to the index. Now let's see how Solr distributes the load to multiple Solr instances.
As we're running two instances of Solr locally (shard1
on 8983
and shard2
on 8987
), we'll run the following two queries with the distrib
flag set to false
. The flag will tell Solr to run the query in a non-distributed way, which means the result that we will get will be only for the shard on which the query is running:
shard1
:curl "http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&distrib=false&rows=0&wt=json&indent=true"
shard2
:curl "http://localhost:8987/solr/musicCatalogue-solrcloud/select?q=*%3A*&distrib=false&rows=0&wt=json&indent=true"
After running it, we'll get this response on each of the shards:
{ "responseHeader": { "status": 0, "QTime": 1, "params": { "q": "*:*", "distrib": "false", "indent": "true", "rows": "0", "wt": "json" } }, "response": { "numFound": 1, "start": 0, "docs": [] } }
As we can see from the preceding response, two documents that we've indexed previously have gone to two different instances. Now let's run the query one more time, this time with the distributed request flag set to true (default), and we return the id
and shard using the fl
flag:
curl "http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&wt=json&indent=true&fl=id,[shard]"
We'll get the following response from Solr:
{ "responseHeader": { "status": 0, "QTime": 29, "params": { "q": "*:*", "indent": "true", "fl": "id,[shard]", "wt": "json" } }, "response": { "numFound": 2, "start": 0, "maxScore": 1.0, "docs": [ { "id": "1", "[shard]": "http://192.168.56.1:8983/solr/musicCatalogue-solrcloud_shard1_replica1/|http://192.168.56.1:8987/solr/musicCatalogue-solrcloud_shard1_replica2/" }, { "id": "2", "[shard]": "http://192.168.56.1:8983/solr/musicCatalogue-solrcloud_shard2_replica1/|http://192.168.56.1:8987/solr/musicCatalogue-solrcloud_shard2_replica2/" } ] } }
As we can see, both the documents have gone to two different shards, which are musicCatalogue-solrcloud_shard1_replica2
and musicCatalogue-solrcloud_shard2_replica2
.
We've thus seen how Solr automatically distributes documents to different Solr shards. In Solr, we can also stop this feature using the solr.NoOpDistributingUpdateProcessorFactory
processor, which will help us send documents to just one node and not distribute the load among shards. We might have a requirement where we want to use only one of the nodes to store a specific type of document. For this specific scenario, we can use the solr.NoOpDistributingUpdateProcessorFactory
processor.
Let's see how we can use this processor in Solr and stop the automatic distribution of documents to shards:
solrconfig.xml
, which available at %SOLR_HOME%/example/solr/collection1/conf
, with the following lines:<updateRequestProcessorChain> <processor class="solr.NoOpDistributingUpdateProcessorFactory"/> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain>
$ $SOLR_HOME/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir %SOLR_HOME%/example/solr/collection1/conf -confname default
$ curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=musicCatalogue-solrcloud&wt=json&indent=2"
Let's send some data to the Shard 1
(Port 8983
) instance using the following command:
$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/musicCatalogue-solrcloud/update' --data-binary ' [ { "id": "3", "title": "I should go in Shard 1" }, { "id": "4", "title": "I should also go in Shard 1" } ]'
After running this query, we'll get the following response if everything goes successfully:
{ "responseHeader": { "status": 0, "QTime": 0 } }
Now let's run a query against Shard 1
to check whether the documents that we've indexed go to Shard 1
or not:
wget http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&wt=json&indent=true&distrib=false
After running this query, we can see the following output:
{ "responseHeader": { "status": 0, "QTime": 0, "params": { "q": "*:*", "distrib": "false", "indent": "true", "wt": "json" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "id": "2", "title": [ "Apache Solr Cookbook" ], "_version_": 1516224138192093184 }, { "id": "3", "title": [ "I should go in Shard 1" ] }, { "id": "4", "title": [ "I should also go in Shard 1" ] } ] } }
We've seen from the preceding example how we can easily stop the automatic distribution of indexed documents to other shards in the cluster.
Previously, we saw how we can use the distrib
flag in Solr to query individual shards. Solr also provides us with a shards flag, which we can use to query an individual shard or a group of shards for documents. Let's see how we can use this flag in Solr:
18.116.90.140