Configuring the document cache

Cache can play a major role in your deployment's performance. One of the caches that you can use to configure when setting up Solr is the document cache. It is responsible for storing Lucene internal documents that have been fetched from the disk. The proper configuration of this cache can save precious I/O calls and therefore boost the whole deployment performance. This recipe will show you how to properly configure the document cache.

Getting ready

Remember that the cache usage depends on your queries, update rates, searcher reopening, and on many other things. In this recipe, you will see cache configuration based on some assumptions; however, you will see the logic behind choosing the right cache configuration. You can use the same logic to adjust caches in your Solr deployment.

Also remember that the document cache in Solr is a top-level cache, so whenever a searcher is reopened, the cache is invalidated. This might cause your cache to be almost useless for rapidly changing data, and it is sometimes better to disable the cache completely by removing its configuration from the solrconfig.xml file.

How to do it...

For this recipe, I assumed that we are dealing with the deployment of Solr where we have about 100,000,000 documents. In our case, a single Solr instance gets a maximum of 100 concurrent queries and the maximum number of documents that the query can fetch is 256.

With the preceding parameters, our document cache should look somewhat similar to this (add this to the solrconfig.xml configuration file):

<documentCache
   class="solr.LRUCache"
   size="25600"
   initialSize="25600"/>

You can see that we didn't specify the autowarmCount parameter—this is because the document cache uses Lucene's internal ID to identify documents. These identifiers can't be copied between the index changes and thus we cannot automatically warm this cache.

How it works...

The document cache configuration is simple. We define it in the documentCache XML tag and specify a few parameters that define the document cache behavior. First of all, we define the class parameter that tells Solr which Java class to use for implementation. In our example, we use the solr.LRUCache parameter because we think that we will be adding more information into the cache than we will getting from the cache. When you see that you are getting more information than you add, consider using the solr.FastLRUCache parameter. The next parameter tells Solr about the maximum size of the cache (the size parameter). As the Solr documentation says, we should always set this value higher than the maximum number of results returned by the query multiplied by the maximum concurrent queries that we think will be sent to the Solr instance. This will ensure that we always have enough space in the cache, so that Solr will not have to fetch the data from the index multiple times during a single query.

The last parameter tells Solr about the initial size of the cache (the initialSize parameter). I tend to set it to the same value as the size parameter to ensure that Solr won't be wasting its resources on cache resizing.

Of course, we can't automatically warm the document cache, because it stores internal Lucene document identifiers and those changes with the searcher reopening. Hence it doesn't make sense to use the autowarmCount parameter.

There is one thing to remember. The more fields marked as stored in the index structure you have, the higher the memory usage of this cache will be.

Remember that the values used in the recipe are examples, which worked for that particular data. You should always observe your Solr instance and act when you see that your cache is acting in the wrong way. Remember that having very large cache with very low hit rate can be worse than having no cache at all.

Like all things, you should pay attention to your cache usage as your Solr instances work. If there are any evictions, then it might be a signal that your caches are too small. If you have very poor cache hit rate, it's advisable to turn off the cache sometimes. Cache setup is one of those things in Apache Solr that is very dependent on your data, queries, and users; so I'll repeat once again—keep an eye on your caches and don't be afraid to react and change them. The information regarding the cache hit rate can be found in the Solr administration panel or in good monitoring software.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.81