Chapter 10. Scaling Solr

You've deployed Solr, and the world is beating a path to your door, leading to a sharp increase in the number of queries being issued, and meanwhile you've indexed tenfold the amount of information you originally expected. You discover that Solr is taking longer to respond to queries and index new content. When this happens, it's time to start looking at what configuration changes you can make to Solr to support more load. We'll look at a series of changes/optimizations that you can make, starting with the simplest changes that give the most bang for your buck to more complex changes that require thorough analysis of the impact of the system changes.

In this chapter, we will cover the following topics:

  • Tuning complex systems
  • Testing Solr performance with SolrMeter
  • Optimizing a single Solr server – scale up
  • Configuring Solr for near real-time search
  • Moving to multiple Solr servers (scale wide with SolrCloud)

Tip

In a hurry?

If you flipped to this chapter because you need to speed up Solr queries, look at the Solr caching section. If you have lots of data, or want near real-time search, then jump down to SolrCloud in the Scale Wide section.

Tuning complex systems is hard

Tuning any complex system, whether it's a database, a message queuing system, or the deep dark internals of an operating system, is something of a black art. Researchers and vendors have spent decades figuring out how to measure the performance of systems and coming up with approaches for maximizing the performance of those systems. For some systems that have been around for decades, such as databases, you can just search online for tuning tips for X database and find explicit rules that suggest what you need to do to gain performance. However, even with those well-researched systems, it still can be a matter of trial and error.

In order to measure the impact of your changes, you should look at a couple of metrics and optimize for these three parameters:

  • Transactions Per Second (TPS): In the Solr world, how many search queries and document updates are you able to perform per second? You can get a sense of that by using the Plugins / Stats page and looking at the avgTimePerRequest and avgRequestsPerSecond parameters of your request handlers.
  • CPU usage: This is used to quickly gain a sense of the CPU usage of Solr using JConsole. You can also use OS-specific tools such as PerfMon (Windows) and top (Unix) to monitor your Java processes, which can be helpful if you have a number of services running on the same box that are competing for resources (not recommended for maximum scaling).
  • Memory usage: When tuning for memory management, you are looking to ensure that the amount of memory allocated to Solr doesn't constantly grow. While it's okay for the memory consumption to go up a bit, letting it grow unconstrained eventually means you will receive out-of-memory errors! As a result, you need to have balanced increases in memory consumption with significant increases in TPS. You can use JConsole to keep an eye on memory usage.

In order to get a sense of what the Steady State for your application is, you can gather the statistics by using the SolrMeter load testing tool to put your Solr deployment under load. We'll discuss in the next section how to build a load testing script with SolrMeter that accurately mirrors your real-world interactions with Solr. This effort will give you a tool that can be run repeatedly and allows more of an apple-to-apple comparison of the impact of the changes to your configuration.

Solr's architecture has benefited from its heritage as the search engine developed in-house from 2004 to 2006 to power CNET.com, a site that, at the time of writing, is ranked 86th for traffic by Alexa.com. Solr, out-of-the-box, is already very performant, with extensive effort spent by the community to ensure that there are minimal bottlenecks. Additional tuning will trade-off increases in search performance at the expense of disk index size, indexing speed, and/or memory requirements (and vice versa). The approaches are as follows:

  • Scale up: This is the optimization of a single instance of Solr, which looks at caching and memory configuration. Run Solr on a dedicated server (no virtualization) with very fast CPUs and SSD drives with lots of RAM if you can afford it. In the scale up approach, you are trying to maximize what you can get out of a single server.
  • Scale horizontally: This looks at moving to multiple Solr servers using SolrCloud. If your queries run quickly with an acceptable avgTimePerRequest, but have too many incoming requests, then replicate your complete index across multiple Solr nodes. If your queries take too long to complete due to the complexity or size of the index, then use sharding to share the load of processing a single query across multiple sharded Solr servers.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.137.164