Configuring Solr for near real-time search

Real-time search is the ability to search for content immediately after adding/updating it. A typical scenario is that a user is performing some sort of add/update action on content, then the system is able to process the change fast enough so that if the user then searches for that immediately, they will always be able to see the latest changes applied. Near real-time search (often abbreviated to NRT) allows for a larger time window—most would say less than 5 seconds. This time window, however big or small it is, is also known as the index latency. Solr 4's commits are faster than before, and it has a new even faster soft commit ability. As a result, all apps can have NRT search, and with some tuning, some can commit so fast that you can reasonably say you have real-time search!

Here are a series of tips to consider in your quest for the holy grail of real-time search with Solr:

  • Use soft commits with autoCommit! Solr's default example configuration ships this way; the only thing you need to do is supply a commitWithin time (perhaps 1 or 2 seconds) on the commits you issue from a client, which will trigger a soft commit sometime within that window. Ensure that your window is large enough for how long a soft commit takes. Test this simply by using softCommit=true in your URL to update Solr. The indexing chapter has some more information on the subject. The autoCommit window in solrconfig.xml should be somewhere between 15 seconds and a minute or so.
  • If your query load is high enough that you need replicas, you should use SolrCloud and definitely not the old master/slave replication setup. Near real-time search at scale is one of SolrCloud's main features.
  • Minimize warming, which hugely affects how long commits, especially soft ones, take. Reduce the autowarmCount of your caches and reduce the amount of work your queries do in the newSearcher listener. Keep those queries to their essentials—a query that uses sorting, faceting, and function queries on applicable fields.
  • Use docValues on any field that you sort on, facet on, or some other features. This was explained earlier in this chapter.
  • Follow any previous guidance on performance tuning, especially schema-related advice to minimize indexing time.
  • Use SSD disks if you can afford it. Definitely avoid virtualization.
  • Spread the documents over more shards so that the shards are smaller, which will query faster. In striving for NRT search, many configuration choices slow down searches, and so, smaller shards help balance those effects.
  • Consider reducing the ratio of Solr shards on a machine per number of CPU cores so that more machine resources are available for the frequent commit rate and warming activity.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.197.26