Configuring numerical fields for high-performance sorting and range queries

Let's assume we have Apache Solr deployment where we use range queries. Some of those are run against string fields, while others are run against numerical fields. We identified that our numerical range queries are executing slower than we would like them to run. The usual question arises—is there something that we can do? Of course there is, and this recipe will show you what.

How to do it...

  1. Let's begin with the definition of a field that we use to run our numerical range queries (we add it to the schema.xml file):
    <field name="price" type="float" indexed="true" stored="true"/>
  2. The second step is to define the float field type (again, we add this to the schema.xml file):
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="8" />
  3. And now the usual query that is run against the preceding field:
    q=*:*&fq=price:[10.0+TO+59.00]&facet=true&facet.field=price
  4. In order to have your numerical range queries' performance improved, there is just a single thing you need to do—decrease the precisionStep attribute of the float field type; for example, from 8 to 4. Our field type definition will look as follows:
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="4" positionIncrementGap="0"/>
    

After the preceding change, you will have to reindex your data and you will see that your numerical queries are running faster. How much faster—that depends on your setup. Now let's take a look at how it works.

How it works...

As you can see in the preceding example, we use a simple float based field to run numerical range queries. Before the changes, we set precisionStep on our field type as 8. This attribute (specified in bits) tells Lucene (which Solr is built on top of) how many tokens should be indexed for a single value in such a field. Smaller precisionStep values (when precisionStep is greater than 0) will lead to more tokens generated by a single value and thus making range queries faster. Because of this, when we decreased our precisionStep value from 8 to 4, we saw the performance increase.

However, remember that decreasing the precisionStep value will lead to slightly larger indices. Also setting the precisionStep value to 0 turns off indexing of multiple tokens per value, so don't use that value if you want your range queries to perform faster.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.221.149