Faceting field values

Field value faceting is the most common type of faceting. The first example in this chapter demonstrated it in action. Solr, in essence, iterates over all of the indexed terms for the field and tallies a count for the number of searched documents that include the term. Sophisticated algorithms and caching makes this so fast that its overhead is usually negligible.

The following are the request parameters to use it:

  • facet.field: You must set this parameter to a field's name in order to facet on that field. Repeat this parameter for each field to be faceted on. See the previous Field requirements section.

    Note

    The remaining faceting parameters can be set on a per-field basis, otherwise they apply to all faceted fields that don't have a field-specific setting. You will usually specify them per field, especially if you are faceting on more than one field, so that you don't get your faceting configuration mixed up. For example: f.r_type.facet.sort=index (r_type is a field name, facet.sort is a facet parameter).

  • facet.limit: This limits the number of facet values returned in the search result of a field. As these are usually going to be displayed to the user, it doesn't make sense to have a large number of these in the response. If you need all of them, then disable the limit with a value of -1. It defaults to 100.
  • facet.sort: This is set to either count to sort the facet values by descending totals, or to index to sort lexicographically, as if you sorted on the field. If facet.limit is greater than zero (typical), then Solr picks count as the default, otherwise index is chosen.
  • facet.offset: This is the offset into the facet value list from which the values are returned. It enables paging of facet values when used with facet.limit and defaults to 0.
  • facet.mincount: This tells Solr to exclude facet values whose counts are less than the number given. It is applied before limit and offset so that paging works as expected. It is common to set this to 1 since 0 is almost useless. This defaults to 0.
  • facet.missing: When enabled, this causes the response to include the number of searched documents that have no indexed terms. The first facet example in the chapter demonstrates this. It defaults to false and is set to true or on.
  • facet.prefix: This filters the facet values to those starting with this value. This is applied before limit and offset so that paging works as expected. At the end of this chapter, you'll see how this can be used for hierarchical faceting. In the next chapter, you'll see how faceting with this prefix can be used to power query-term suggests.
  • facet.threads (advanced): When this parameter is set, Solr loads the fields related to faceting concurrently. The value (an integer) specifies the number of threads to use. Solr will use the Java Interger.MAX_VALUE if this parameter is set to a negative number. No threads are spawned if this parameter is not set.
  • facet.method (advanced): This parameter tells Solr which of its three different field-value faceting algorithms to use in order to influence memory use, query performance, and commit speed. Solr usually makes good choices by default. You can specify one of enum, fcs or fc, or neither, and Solr will, under the right circumstances, choose the third, known as UnInvertedField. fc refers to the field cache which is only for single-valued fields that are not tokenized. Trie-based fields that are configured for fast range queries (for example, tint, not int) are only facetable with UnInvertedField. If you set facet.method incorrectly, then Solr will ignore it.

    Tip

    When to specify facet.method

    Normally, you should not specify facet.method, thereby letting Solr's internal logic choose an appropriate algorithm. However, if you are faceting on a multi-valued field that only has a small number of distinct values (less than 100, but ideally perhaps 10), then we suggest setting this to enum. Solr will use a filter cache entry for each value, so keep that in mind when optimizing that cache's size. Solr uses enum by default for Boolean fields only, as it knows there can only be two values. Another parameter we'll mention for completeness is facet.enum.cache.minDf, which is the minimum document frequency for filter cache entries (0—no minimum by default). If the field contains rarely used values occurring less than ~30 times, then setting this threshold to 30 makes sense.

Alphabetic range bucketing

Solr does not directly support alphabetic range bucketing (A-C, D-F, and so on). However, with a creative application of text analysis and a dedicated field, we can achieve this with little effort. Let's say we want to have these range buckets on the release names. We need to extract the first character of r_name, and store this into a field that will be used for this purpose. We'll call it r_name_facetLetter. Here is our field definition:

<field name="r_name_facetLetter" type="bucketFirstLetter" stored="false" />

And here is the copyField:

<copyField source="r_name" dest="r_name_facetLetter" />

The definition of the bucketFirstLetter type is the following:

<fieldType name="bucketFirstLetter" class="solr.TextField" sortMissingLast="true" omitNorms="true">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="^([a-zA-Z]).*" group="1" />
    <filter class="solr.SynonymFilterFactory" synonyms="mb_letterBuckets.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>
</fieldType>

The PatternTokenizerFactory class, as configured, plucks out the first character, and the SynonymFilterFactory class maps each letter of the alphabet to a range such as A-C. The mapping is in conf/mb_letterBuckets.txt. The field types used for faceting generally have a KeywordTokenizerFactory class for the query analysis to satisfy a possible filter query on a given facet value returned from a previous faceted search. After validating these changes with Solr's analysis admin screen, we then re-index the data. For the facet query, we're going to advise Solr to use the enum method, because there aren't many facet values in total. Here's the URL to search Solr: http://localhost:8983/solr/mbreleases/select?indent=on&wt=json&q=*:*&facet=on&facet.field=r_name_facetLetter&facet.sort=lex&facet.missing=on&facet.method=enum.

The URL produces results containing the following facet data:

{"facet_counts": {
  "facet_queries": {},
  "facet_fields": {
  "r_name_facetLetter": [
    "a-c", 99005,
    "d-f", 68376,
    "g-i", 60569,
    "j-l", 49871,
    "m-o", 59006,
    "p-r", 47032,
    "s-u", 143376,
    "v-z", 33233,
    null, 42622]},
  "facet_dates": {},
  "facet_ranges": {}}}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.254.35