Sharding algorithm and fault tolerance

We have already seen the sharding, collection and replicas. In this section we will look at some of the important aspects of sharding, and how it plays a role in scalability and high availability. The strategy for creating new shards is highly dependent upon the hardware and the shard size. Let's say, you have two machines M1 & M2, of, the same configuration, each with one shard. Shard A is loaded with 1 million index documents, and shard B is loaded with 100 documents. When a query is fired, the query response to any Solr queries is determined by the query response of slowest node (in this case shard A). Hence having a shard with near to equal shard sizes can perform better in this case.

Document Routing and Sharding

Typically, when any enterprise search is deployed, the size of documents to be indexed keeps growing over time. Since SolrCloud provides a way to create a cluster of Solr nodes running on index shards, it becomes feasible to scale up the enterprise search infrastructure with time. However, as the shard size grows, it becomes difficult to manage them on a single shard. SolrCloud can be started with numOfShards controlling how many shards are run in the cloud. To route the newly indexed documents, take a look at the following flowchart:

Document Routing and Sharding

When a Solr instance is started, it firstly registers itself with ZooKeeper, creating Ephemeral Node or Znodes. A ZooKeeper provides a shared hierarchical namespace for processes to co-ordinate with each other. The namespace consists of the data registered, called Znodes. Apache Solr provides you with two ways to distribute Solr documents across shards. Auto-sharding distributes the documents automatically through its own hashing algorithm. Each shard is allocated with a range for hashing, and it can be seen in /clusterstate.json as shown in the following screenshot:

Document Routing and Sharding

Another way of distributing the document across a shard is to use custom sharding. With custom sharding, client applications that pass documents for indexing to Apache Solr are primarily responsible for placing them in a shard. Each document has a unique ID attribute, and a shard key can be prefixed to this ID, for example: shard1!docId55. The ! operator acts as a separator. Custom Sharding helps users in influence the storage for his document indexes.

Users can choose various strategies for distributing the shards across different nodes for efficient usage. Similarly, a query can be performed on a specific shard (instead of a complete index) by passing shard.keys=shard1!,shard2! as a query parameter. These features enable Apache Solr to work in a multi-tenancy environment, or as a regional distributed search. You can also spread tenants across multiple shards by introducing another prefix for the unique ID. The syntax for this looks like:

Shard_key/number!doc_id

Shard splitting

The feature of Shard splitting was introduced in Apache Solr 4.3. It is designed to work with Apache Solr's auto-sharding. It allows users to split shards without breaking the search runtime or even the indexing. A shard can be split into two by running the following URL on your browser:

http://localhost:8983/solr/admin/collections?collection=collection1&shard=[shard_name]&action=SPLITSHARD

As you split the shards, the average query performance tends to slow down. The call to SPLITSHARD will create two new shards (shard1_1, and shard1_2 out of shard1) as shown in the following screenshot:

Shard splitting

The numbers of documents are divided equally across these two sub-shards. Once the split is complete, shard1 will be made inactive. The new subshards get created in construction state, and the index updates on shard start getting forwarded to new subshards. Once the splitting is complete, the parent shard becomes inactive. The old shard can be deleted by calling DELETESHARD in the following way:

http://localhost:8983/solr/admin/collections?collection=collection1&shard=shard1&action=DELETESHARD

With large index sizes, the search performance can become slow. Auto-sharding in Solr lets you start with a fixed number of shards, and shard splitting, offers an easy way to reduce the size of each shard across Solr cores as the index size grows.

Tip

Although the parent shard is inactive, Solr Admin UI does not become aware of the states, and shows the parent shard in green (active) state.

Load balancing and fault tolerance in SolrCloud

SolrCloud provides built-in load balancing capabilities to its clients. So, when a request is sent to one of the servers, it is re-directed to the respective leader to get all the information. If your client application is Java based, you can rely on CloudSolrServer and LbHttpSolrServer (load balanced HTTP server) classes of SolrJ to perform indexing and search across SolrCloud. CloudSolrServer will load balance queries across all operational servers automatically. The Java code through SolrJ for searching on SolrCloud is as follows:

CloudSolrServer server = new CloudSolrServer("localhost:9983");
server.setDefaultCollection("collection1");
SolrQuery solrQuery = new SolrQuery("*.*");
QueryResponse response = server.query(solrQuery);
SolrDocumentList dList = response.getResults();
for (int i = 0; i < dList.getNumFound(); i++) 
{
  for (Map.Entry mE : dList.get(i).entrySet()) 
    {
      System.out.println(mE.getKey() + ":" + mE.getValue());
  }
}

Fault tolerance is the ability to keep the system functions working with degraded support, even in the case of failure of a system components. Fault tolerance in SolrCloud is managed at different levels.

Since SolrCloud performs its own load balancing, a call to any one of the nodes participating in the cloud can be made. The applications that do not rely on a Java-based client may require a load balancer to fire queries. The intent of the load balancer is not to balance the load, but enable the removal of a single point of failure for the calling party. So in case of a failure of node1, the load balancer can forward the query to node2, thus enabling fault tolerance in Apache Solr.

When a search request is fired on SolrCloud, the request gets executed on all leaders of that shard (unless a user chooses a shard in their query). If one of the nodes is failing to respond to a Solr query due to an error, the wait for the final search result can be avoided by enabling support for partial results. This support can be enabled by passing shards.tolerant=true. This read-side fault tolerance ensures that the system returns the results, in spite of the unavailability of the node.

Apache Solr also supports write-side fault tolerance that makes the instance durable, even in the instance of power failures, restarts, JVM crash, and so on. Each node participating in Solr maintains a transaction log, tracking all the changes to the node. This logging helps Solr node to recover in case of failures or interruption during the indexing operation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.156.50