Configuring the filter cache

During consulting engagements, I tend to see that Solr users forget or simply don't know how to use filter queries or simple filters. People tend to add another clause with a logical operator to the main query—they forget how efficient filters can be, at least when used wisely. That's why whenever I can, I tell people using Solr to use filter queries. However, when using filter queries, it is nice to know how to set up a cache that is responsible for holding the filter results—the filter cache. This recipe will show you how to properly set up the filter cache.

Getting ready

Remember that the cache usage is dependent on your queries, update rates, searcher reopening, and so on. In this recipe, you will see cache configuration based on some assumptions; however, you will see the logic behind choosing the right cache configuration. You can use the same logic to adjust caches in your Solr deployment.

Also remember that the filter cache in Solr is a top-level cache, so whenever the searcher is reopened, the cache is invalidated. This might cause your cache to be almost useful for rapidly changing data, and it is sometimes better to disable the cache completely by removing its configuration from the solrconfig.xml file.

How to do it...

For the purpose of this recipe, let's assume that we have a single Solr slave instance to handle all the queries coming from the application. We took the logs from the last three months and analyzed them. From that, we know that our queries are making about 2,000 different filter queries. By getting this information, we can set up the filter cache for our instance. This configuration should look like the one shown here (add this to the solrconfig.xml configuration file):

<filterCache
   class="solr.FastLRUCache"
   size="2000"
   initialSize="2000"
   autowarmCount="1000"/>

And that's it. Now let's see what those values mean.

How it works...

As you might have noticed, adding the filter cache to the solrconfig.xml file is a simple task; you just need to know how many unique filters your Solr instance is receiving. We define it in the filterCache XML tag and specify a few parameters that define the query result cache behavior. First of all, the class parameter tells Solr which Java class to use as the implementation. In our example, we use solr.FastLRUCache because we think that we will get more information than what we will put into the cache. The next parameter tells Solr about the maximum size of the cache (the size parameter). In our case, we said that we have about 2,000 unique filters and we set the maximum size to that value. This is done because each entry of the filter cache stores the unordered sets of Solr document identifiers that match the given filter. This way, after the first use of the filter, Solr can use the filter cache to apply filters and thus save the I/O operations.

The next parameter—initialSize— tells Solr about the initial size of the filter cache. I tend to set it to the same value as the size parameter to avoid cache resize. So in our example, we set the value to 2000.

The last parameter (the autowarmCount parameter) says how many entries should be copied when Solr is invalidating caches (for example, after a commit operation). I tend to set this parameter to a quarter of the maximum size of the cache. This is done because I did not want the caches to be warming too long. However, remember that the autowarming time depends on your deployment and the autowarmCount parameter should be adjusted if needed.

Remember that when using the values shown in the example, you should always observe your Solr instance and act when you see that your cache is either too small or too large.

Like with all things, you should pay attention to your cache usage as your Solr instances work. If you see evictions, then this might be a signal that your caches are too small. If you have very poor hit rate, it's sometimes better to turn off the cache. Cache setup is one of those things in Apache Solr that is very dependent on your data, queries, and users, so I'll repeat once again—keep an eye on your caches and don't be afraid to react and change them. For example, take a look at the following screenshot that shows you that the filter cache is probably too small because evictions are happening (this is a screenshot of the Solr administration panel):

How it works...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.159.82