Speeding Solr through Solr caching

Many times, the queries that are run on search are repetitive in nature. In such cases, the presence of a cache brings down the average response time for search results. Apache Solr provides a caching mechanism on top of the index data. This cache, unlike a normal cache, does not carry any expiry (persistent cache). It is associated with IndexWriter. The following are the three different types of cache supported by Solr:

  • LRUCache: This is Least Recently Used (based on synchronized LinkedHashMap) (default)
  • FastLRUCache: This is a newer form of cache and is expected to be faster than all the others
  • LFUCache: This is Least Frequently Used (based on ConcurrentHashMap)

The following are the common parameters for the cache in solrconfig.xml:

Parameter

Description

class

You can specify the type of cache you wish to attach, that is, LRUCache, FastLRUCache, or LFUCache.

size

This is the maximum size a cache can reach.

initialSize

This is the initial size of the cache when it is initialized.

autowarmCount

This is the number of entries to seed from the old cache. We will look at this in the next section.

minSize

This is applicable for FastLRUCache; after the cache reaches its peak size, it tries to reduce the cache size to minSize. The default value is 90 percent of the size.

acceptableSize

If FastLRUCache cannot reduce to minSize when the cache reaches its peak, it will at least touch to acceptableSiz e.

The filter cache

The filter cache provides a caching layer on top of filter queries. For any query that is fired as a filter query, Solr first looks into the cache for search results. If not found, it gets fired on the repository, and the results are moved to the cache. Each filter is cached separately; when queries are filtered, this cache returns the results and eventually, based on the filtering criteria, the system performs an intersection of them. If a search is using faceting, use of the filter cache provides better performance. This cache stores the document IDs in an unordered state.

The query result cache

As the name suggests, this cache is responsible for caching the query results. This way, repeated requests for similar searches does not require complete search, but instead it can return the results from the cache. This cache will store the top N query results for each query passed by the user. It stores an ordered set of document IDs. This cache is useful where similar queries are passed again and again. You can specify the maximum number of documents that can be cached by this cache in solrconfig.xml. Consider the following snippet:

<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

The document cache

The document cache is responsible for storing Solr documents into the cache. Once a document is cached, it does not require a fetch request on the disk thereby reducing the disk IOs. This cache is part of Apache Solr and it is different from the disk cache that operating systems provide. This cache works on IDs of a document, so the autowarming feature does not really seem to bring any impact, since the document IDs keep changing as and when there is a change in index.

Note

The size of the document cache should be based on your size of results and size of the maximum number of queries allowed to run; this will ensure there is no re-fetch of documents by Solr.

The field value cache

Field value cache provides caching of Solr fields. It can provide sorting on fields, and it supports multivalued fields. This cache is useful when you have extensive use of Solr faceting. Since faceting is a field-based caching of fields, this cache is used mainly for faceting. You can monitor the status of the field value cache through the Apache Solr Administration console, and it provides information pertaining to hit ratio, number of hits, load on cache, and so on.

The warming up cache

Cache autowarming is a feature by which a cache can pre populate itself with objects from old search instances/cache. This property can be set in solrconfig.xml. Whenever the user commits new documents for Solr index, Apache Solr starts a new searcher, and it copies the cache from the previous searcher. A cache can be preloaded by explicitly running frequently run queries. This way Solr does not start with no cache. Having a prepopulated cache helps in that any query run after Solr is up will utilize the pre populated cache content. The following snippet from solrconfig.xml shows an example of autowarming of Solr cache on a new search initialization:

<listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <!-- seed common sort fields -->
        <lst> <str name="q">java books databases</str></lst>
      </arr>
</listener>

The autowarming count determines Solr startup time, so for frequently updated indexes it is better to keep this count low. The count can be set by adding the attribute to your cache element in solrconfig.xml. The following is an example of this:

<filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.174.202