Distributed indexing and searching

In SolrCloud, the main goal behind distributed indexing is to send a document to any node in the cluster and have that document indexed in the shard.

Solr uses a document router to assign a document to a shard. There are two basic document routing strategies:

  • compositeId (default)
  • Implicit

In compositeId (default), when we send documents to Solr for indexing, Solr uses the hash of the document to distribute the load to multiple Solr instances. Previously in this chapter, we added a few documents to the index. Now let's see how Solr distributes the load to multiple Solr instances.

As we're running two instances of Solr locally (shard1 on 8983 and shard2 on 8987), we'll run the following two queries with the distrib flag set to false. The flag will tell Solr to run the query in a non-distributed way, which means the result that we will get will be only for the shard on which the query is running:

  • shard1:
    curl "http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&distrib=false&rows=0&wt=json&indent=true"
    
  • shard2:
    curl "http://localhost:8987/solr/musicCatalogue-solrcloud/select?q=*%3A*&distrib=false&rows=0&wt=json&indent=true"
    

After running it, we'll get this response on each of the shards:

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "q": "*:*",
      "distrib": "false",
      "indent": "true",
      "rows": "0",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": []
  }
}

As we can see from the preceding response, two documents that we've indexed previously have gone to two different instances. Now let's run the query one more time, this time with the distributed request flag set to true (default), and we return the id and shard using the fl flag:

curl "http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&wt=json&indent=true&fl=id,[shard]"

We'll get the following response from Solr:

{
  "responseHeader": {
    "status": 0,
    "QTime": 29,
    "params": {
      "q": "*:*",
      "indent": "true",
      "fl": "id,[shard]",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "maxScore": 1.0,
    "docs": [
      {
        "id": "1",
        "[shard]": "http://192.168.56.1:8983/solr/musicCatalogue-solrcloud_shard1_replica1/|http://192.168.56.1:8987/solr/musicCatalogue-solrcloud_shard1_replica2/"
      },       
      {
        "id": "2",
        "[shard]": "http://192.168.56.1:8983/solr/musicCatalogue-solrcloud_shard2_replica1/|http://192.168.56.1:8987/solr/musicCatalogue-solrcloud_shard2_replica2/"
      }
    ]
  }
}

As we can see, both the documents have gone to two different shards, which are musicCatalogue-solrcloud_shard1_replica2 and musicCatalogue-solrcloud_shard2_replica2.

We've thus seen how Solr automatically distributes documents to different Solr shards. In Solr, we can also stop this feature using the solr.NoOpDistributingUpdateProcessorFactory processor, which will help us send documents to just one node and not distribute the load among shards. We might have a requirement where we want to use only one of the nodes to store a specific type of document. For this specific scenario, we can use the solr.NoOpDistributingUpdateProcessorFactory processor.

Let's see how we can use this processor in Solr and stop the automatic distribution of documents to shards:

  1. We'll update solrconfig.xml, which available at %SOLR_HOME%/example/solr/collection1/conf, with the following lines:
      <updateRequestProcessorChain>
        <processor class="solr.NoOpDistributingUpdateProcessorFactory"/>
        <processor class="solr.LogUpdateProcessorFactory"/>
        <processor class="solr.RunUpdateProcessorFactory"/>
      </updateRequestProcessorChain>
  2. After making the change, we'll upload the configuration to Zookeeper using this line:
    $ $SOLR_HOME/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir %SOLR_HOME%/example/solr/collection1/conf -confname default
  3. We'll then reload the collection using the following command:
    $ curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=musicCatalogue-solrcloud&wt=json&indent=2"
  4. After reloading the configuration, we'll post some data to one of the shards and see whether the automatic distribution of documents has been stopped or not.

    Let's send some data to the Shard 1 (Port 8983) instance using the following command:

    $ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/musicCatalogue-solrcloud/update' --data-binary '
     [
       {
         "id": "3",
         "title": "I should go in Shard 1"
       },
       {
         "id": "4",
         "title": "I should also go in Shard 1"
       }
     ]'

    After running this query, we'll get the following response if everything goes successfully:

    {
      "responseHeader": {
        "status": 0,
        "QTime": 0
      }
    }

Now let's run a query against Shard 1 to check whether the documents that we've indexed go to Shard 1 or not:

wget http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*%3A*&wt=json&indent=true&distrib=false

After running this query, we can see the following output:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "q": "*:*",
      "distrib": "false",
      "indent": "true",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "docs": [
      {
        "id": "2",
        "title": [
          "Apache Solr Cookbook"
        ],
        "_version_": 1516224138192093184
      },
      {
        "id": "3",
        "title": [
          "I should go in Shard 1"
        ]
      },
      {
        "id": "4",
        "title": [
          "I should also go in Shard 1"
        ]
      }
    ]
  }
}

We've seen from the preceding example how we can easily stop the automatic distribution of indexed documents to other shards in the cluster.

Previously, we saw how we can use the distrib flag in Solr to query individual shards. Solr also provides us with a shards flag, which we can use to query an individual shard or a group of shards for documents. Let's see how we can use this flag in Solr:

  • Single shard:
    curl "http://localhost:8983/solr/musicCatalogue-solrcloud/select?q=*:*&shards=localhost:8987/solr"
    
  • Group of shards:
    curl "http://localhost:8983/solr/musicCatalogue/select?q=*:*&shards=localhost:8987/solr,localhost:8983/solr"
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.90.140