Faceting numeric and date ranges

Solr has built-in support for faceting numeric and date fields by a range and a divided interval. You can think of this as a convenience feature that calculates the ranges for you with succinct input parameters and output, rather than you calculating and submitting a series of facet queries—facet queries are described after this section.

Range faceting is particularly useful for dates. We'll demonstrate an example against MusicBrainz release dates and another against MusicBrainz track durations, and then describe the parameters and their options.

Note

Date faceting is the date-specific predecessor of range faceting and is deprecated as of Solr 3. Date faceting uses similar parameters starting with facet.date and has similar output under facet_dates.

Here's the URL:

http://localhost:8983/solr/mbreleases/mb_releases?indent=on&wt=json&omitHeader=true&rows=0&facet=true&facet.range.other=all&f.r_event_date_earliest.facet.range.start=NOW/YEAR-10YEARS&facet.range=r_event_date_earliest&facet.range.end=NOW/YEAR&facet.range.gap=+1YEAR&q=smashing

And here's the response:

{"response":{"numFound":248,"start":0,"docs":[]},
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{
      "r_event_date_earliest":{
        "counts":[
          "2003-01-01T00:00:00Z",2,
          "2004-01-01T00:00:00Z",1,
          "2005-01-01T00:00:00Z",1,
          "2006-01-01T00:00:00Z",3,
          "2007-01-01T00:00:00Z",11,
          "2008-01-01T00:00:00Z",0,
          "2009-01-01T00:00:00Z",0,
          "2010-01-01T00:00:00Z",0,
          "2011-01-01T00:00:00Z",0,
          "2012-01-01T00:00:00Z",0],
        "gap":"+1YEAR",
        "start":"2003-01-01T00:00:00Z",
        "end":"2013-01-01T00:00:00Z",
        "before":93,
        "after":0,
        "between":18}}}}

This example demonstrates a few things, not only range faceting:

  • /mb_releases is a request handler using dismax to query appropriate release fields.
  • q=smashing indicates that we're faceting on a keyword search instead of all the documents. We kept the rows at zero, which is unrealistic, but not pertinent as the rows setting does not affect facets.
  • The facet start date was specified using the field specific syntax for demonstration purposes. You would do this with every parameter if you need to do a range facet on other fields; otherwise, don't bother.
  • The "start" and "end" part below the facet counts indicates the upper bound of the last facet count. It may or may not be the same as facet.range.end (see facet.range.hardend explained in the next section).
  • The before, after, and between counts are to specify facet.range.other. We'll see shortly what this means.

The results of our facet range query show that there were three releases in 2006 and eleven in 2007. There is no data after that, since the data is out of date at this point.

Here is another example, this time using range faceting on a number—MusicBrainz track durations (in seconds). The URL is http://localhost:8983/solr/mbtracks/mb_tracks?wt=json&omitHeader=true&rows=0&facet.range.other=after&facet=true&q=Geek&facet.range.start=0&facet.range=t_duration&facet.range.end=240&facet.range.gap=60.

This is the response:

{"response":{"numFound":552,"start":0,"docs":[]},
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{
      "t_duration":{
        "counts":[
          "0",128,
          "60",64,
          "120",111,
          "180",132],
        "gap":60,
        "start":0,
        "end":240,
        "after":117}}}}

Taking the first facet, we see that there are 128 tracks that are 0–59 seconds long, given the keyword search "Geek".

Range facet parameters

All of the range faceting parameters start with facet.range. As with most other faceting parameters, they can be made field specific in the same way. The parameters are explained as follows:

  • facet.range: You must set this parameter to a field's name to range-facet on that field. The trie-based numeric and date field types (those starting with t, as in tlong and tdate) perform best, but others will work. Repeat this parameter for each field to be faceted on.

    Note

    The remainder of these range faceting parameters can be specified on a per-field basis in the same fashion as field-value faceting parameters can. For example, f.r_event_date_earliest.facet.range.start.

  • facet.range.start: This is mandatory. It is a number or date to specify the start of the range to facet on. For dates, see the Date math section in Chapter 5, Searching. Using NOW with some Solr date math is quite effective as in this example: NOW/YEAR-5YEARS, interpreted as five years ago, starting at the beginning of the year.
  • facet.range.end: This is mandatory. It is a number or date to specify the end of the range. It has the same syntax as facet.range.start. Note that the actual end of the range may be different (see facet.range.hardend).
  • facet.range.gap: This is also mandatory. It specifies the interval to divide the range. For dates, it uses a subset of Solr's Date Math syntax, as it's a time duration and not a particular time. It should always start with a +. For example, +1YEAR or +1MINUTE+30SECONDS. Note that after URL encoding, + becomes %2B.

    Note

    Note that for dates, the facet.range.gap is not necessarily a fixed length of time. For example, +1MONTH is different depending on the month.

  • facet.range.hardend: This parameter instructs Solr on what to do when facet.range.gap does not divide evenly into the facet range (start | end). If this is true, then the last range will be shortened. Moreover, you will observe that the end value in the facet results is the same as facet.range.end. Otherwise, by default, the end is essentially increased sufficiently so that the ranges are all equal according to the gap value. The default value is false.
  • facet.range.other: This parameter adds more faceting counts depending on its value. It can be specified multiple times. See the example using this at the start of this section. It defaults to none.
    • before: Count of documents before the faceted range
    • after: Count of documents following the faceted range
    • between: Count of documents within the faceted range
    • none (disabled): The default
    • all: Shortcut for all three (before, between, and after)
  • facet.range.include: This specifies which range boundaries are inclusive. The choices are lower, upper, edge, outer, and all (all being equivalent to all the others). This parameter can be set multiple times to combine choices and defaults to lower. Instead of defining each value, we will describe when a given boundary is inclusive:
    • The lower boundary of a gap-based range is included if lower is specified. It is also included if it's the first gap range and edge is specified.
    • The upper boundary of a gap-based range is included if upper is specified. It is also included if it's the last gap range and edge is specified.
    • The upper boundary of the before range is included if the boundary is not already included by the first gap-based range. It's also included if outer is specified.
    • The lower boundary of the after range is included if the boundary is not already included by the last gap-based range. It's also included if outer is specified.

      Tip

      Avoid double counting

      The default facet.range.include of lower ensures that an indexed value occurring at a range boundary is counted in exactly one of the adjacent ranges. This is usually desirable, but your requirements may differ. To ensure you don't double count, don't choose both lower and upper together and don't choose outer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.152.17