SolrCloud – core concepts

In order to scale, a collection is split or partitioned into various shards having documents distributed, which means shards have subsets of overall documents in the collection. A cluster hosts multiple Solr Documents' collections. The maximum number of documents that a collection can hold and also the parallelization for individual search requests are based on the number of shards a collection has.

When we set up SolrCloud, we created two nodes. A Solr cluster is made up of two or more Solr nodes, and each node can host multiple cores. We also created a couple of shards with a replication factor of two. The greater the number of replicas that each shard has, the more redundancy we are building into the collection. And this determines the overall high availability and the number of concurrent search requests that can be processed.

SolrCloud does not employ the master-slave strategy. Every shard has at least one replica, so at any point in time, one of them acts as a leader based on the first come, first serve principle. If the leader goes down, a new leader is appointed automatically. The document is first indexed in the leader shard replica and then the leader sends the update to all the other replicas.

When a leader is down, by default all the replicas can become leaders; but in order for this to happen, it becomes mandatory that every replica is in sync with the current leader at all times. Any new document that is added to the leader must be sent across all the replicas and all of them have to issue a commit. Imagine what would happen if a replica goes down and there are a large number of updates until the time the replica rejoins the cluster. Obviously, the recovery would be very slow. It all depends on the use case that an organization wants for their syncing strategy. They can keep it as is with real-time syncing, or opt for not syncing in real time or making the replica ineligible for becoming the leader.

We can set the following replica types when creating a new collection or adding a new replica:

Replica type Description
NRT

Near real-time (NRT) which is the default and initially the only replica type supported by Solr. The way it works is as follows: it maintains a transaction log and writes new documents locally to its index. Here, any replica of this type is eligible for becoming a leader.

TLOG

The only difference between NRT and TLOG is that while the former indexes document changes locally, TLOG does not do that. The TLOG type of replica only maintains a transaction log, resulting in better speeds compared to NR as there are no commits involved. Just like NRT, this type of replica is eligible to become a shard leader.

PULL

Does not maintain transaction and document changes locally. The only thing it does is pull or replicate the index from the shard leader. Having this replica type ensures that the replica never becomes a shard leader.

 

We can create different mixes of replica type combos for our replicas. Some commonly used combinations are as follows:

Combination Description
All NRT replicas

This can be used wherever the update index throughput is less since it involves a commit on every replica during index. Can be ideal for small to medium clusters or wherever the update index throughput is less.

All TLOG replicas

This can be used if we have more replicas per shard, with the replicas being able to handle update requests; NRT is not desired.

TLOG replicas with PULL replicas

This combination is used more often in scenarios where we want to increase the availability of search queries and document updates take the backseat. This will give an outdated result as we are not having NRT updates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.94.79