Controlling the filter execution to improve expensive filter performance

If you use filter queries extensively, which isn't a bad thing at all, you might be wondering whether there is something you can do to improve the execution time of some of your filter queries. For example, if you have some filter queries that use heavy function queries, you might want to have them executed only on the documents that passed all the other filters. Let's see how to do that.

Getting ready

Before continuing reading, read the Avoiding caching of rare filters to improve performance recipe in this chapter.

How to do it...

  1. Let's assume that we have the following query being used to get the documents we are interested in:
    q=solr+cookbook&fq=category:books&fq={!frange l=10 u=100}log(sum(sqrt(popularity),100))&fq={!frange l=0 u=10}if(exists(price_a),sum(0,price_a),sum(0,price))
  2. For the purpose of this recipe, let's assume that fq={!frange l=10 u=100}log(sum(sqrt(popularity),100)) and fq={!frange l=0 u=10}if(exists(price_a),sum(0,price_a),sum(0,price)) are the filter queries that are heavy and we would like to optimize their execution. They shouldn't be cached and the last filter present in the query should only be executed on the documents that match other filters. In order to do this, we need to modify our query, so it should look as follows:
    q=solr+cookbook&fq=category:books&fq={!frange l=10 u=100 cache=false cost=50}log(sum(sqrt(popularity),100))&fq={!frange l=0 u=10 cache=false cost=150}if(exists(price_promotion),sum(0,price_promotion),sum(0,price))

As you can see, we've added another two attributes, cache=false and cost with two values 50 and 150. Let's see what they mean.

How it works...

As you can see in the first query, we search for the words solr cookbook and we want the result set to be narrowed to the books category. This part of the query is not heavy when it comes to execution. We also want the documents to be narrowed to the documents category to only those that have the value of the log(sum(sqrt(popularity),100)) function between 10 and 100. And in addition to that, the last filter query specifies that we want our documents to be filtered to only those that have the price_promotion field (or the price field if the price_promotion field isn't filled) value between 0 and 10.

Our requirements were such that the second filter query (the one with the log function query) should be executed after the fq=category:books filter query and the last filter should be executed at the end, only on the documents that were matched by other filters. So basically, the last filter should be executed on a subset of the whole results set. We wanted to do this because the last filter is heavy when it comes to execution and we want to limit the number of documents it needs to process.

To match the requirements, we set these two filters to not be cached (cache=false) and introduced the cost parameter. The cost parameter in filter queries specifies the order in which noncached filter queries are executed—the higher the cost value, the later the filter query will be executed.

So our second filter (the one with cost=50) should be executed after the fq=category:books filter query and the last filter query (the one with cost=150) will be executed as the last one.

In addition to this, because the cost of the second noncached filter query is higher or equal to 100, that filter will be only executed on the documents that matched the main query and all the other filters.

Remember that the cost attribute only works when the filter query is not cached.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.6.154