Excluding filters – multiselect faceting

Consider a scenario where you are implementing faceted navigation and you want to let the user pick several values of a field to filter on instead of just one. Typically, when an individual facet value is chosen, this becomes a filter. The filter makes subsequent faceting on that field almost pointless because the filter filters out the possibility of seeing other facet choices—assuming a single-valued field. In this scenario, we'd like to exclude this filter for this facet field.

Excluding filters – multiselect faceting

The preceding screenshot is from http://search-lucene.com, in which you can search across the mailing lists, API documentation, and other places that have information about Lucene, Solr, and other related projects. This screenshot shows that it lets users choose more than one type of information to filter results on at the same time, by letting users pick as many check boxes as they like.

We'll demonstrate the problem that multiselect faceting solves with a MusicBrainz example and then show how to solve it.

Here is a search for releases containing smashing and faceting on r_type. We'll leave rows at 0 for brevity, but observe the numFound value nonetheless. At this point, the user has not chosen a filter (therefore no fq):http://localhost:8983/solr/mbreleases/mb_releases?indent=on&wt=json&omitHeader=true&rows=0&q=smashing&facet=on&facet.field=r_type&facet.mincount=1&facet.sort=index.

The output of the preceding URL is as follows:

{"response":{"numFound":248,"start":0,"docs":[]},
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "r_type":[
        "Album",29,
        "Compilation",41,
        "EP",7,
        "Interview",3,
        "Live",95,
        "Other",19,
        "Remix",1,
        "Single",45,
        "Soundtrack",1]},
    "facet_dates":{},
    "facet_ranges":{}}}

Now the user chooses the Album facet value. This adds a filter query. As a result, now the URL is as before but has &fq=r_type%3AAlbum at the end and has this output:

{"response":{"numFound":29,"start":0,"docs":[]},
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "r_type":[
        "Album",29]},
    "facet_dates":{},
    "facet_ranges":{}}}

Notice that the other r_type facet counts are gone because of the filter, yet in this scenario, we want these to show the user what their counts would be if the filter wasn't there. The reduced numFound of 29 is good though, because at this moment, the user did indeed filter on a value.

Solr can solve this problem with some additional metadata on both the filter query and the facet field reference using local-params. The local-params syntax was described in Chapter 5, Searching, where it appears at the beginning of a query to switch the query parser and to supply parameters to it. As you're about to see, it can also be supplied at the start of facet.field—a bit of a hack, perhaps, to implement this feature. The previous example would change as follows:

  • fq would now be {!tag=foo}r_type:Album
  • facet.field would now be {!ex=foo}r_type

    Note

    Remember to URL encode this added syntax when used in the URL. The only problem character is =, which becomes %3D.

Now, we will explain each parameter of the previous example:

  • The tag parameter is a local parameter to give an arbitrary label to this filter query.
  • The tag name foo was an arbitrarily chosen name to illustrate that it doesn't matter what it's named. If multiple fields and filter queries are to be tagged correspondingly, then you would probably use the field name as the tag name to differentiate them consistently.
  • The ex parameter is a local parameter on a facet field that refers to tagged filter queries to be excluded in the facet count. Multiple tags can be referenced with commas separating them. For example, {!ex=t1,t2,t3}r_type.
  • The advanced usage is not shown here, which is an optional facet field local-param called key that provides an alternative label to the field name in the response. By providing an alternative name, the field can be faceted on multiple times with varying names and filter query exclusions.

The new complete URL is http://localhost:8983/solr/mbreleases/mb_releases?indent=on&wt=json&omitHeader=true&rows=0&q=smashing&facet=on&facet.field={!ex=foo}r_type&facet.mincount=1&facet.sort=index&fq={!tag=foo}r_type:Album.

And here is the output. The facet counts are back, but numFound remains at the filtered 29:

{"response":{"numFound":29,"start":0,"docs":[]},
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "r_type":[
        "Album",29,
        "Compilation",41,
        "EP",7,
        "Interview",3,
        "Live",95,
        "Other",19,
        "Remix",1,
        "Single",45,
        "Soundtrack",1]},
    "facet_dates":{},
    "facet_ranges":{}}}

At this point, if the user chooses additional values from this facet, then the filter query would be modified to allow for more possibilities, such as fq={!tag=foo}r_type:Album r_type:Other (not URL escaped for clarity). This filters for releases that are either of type Album or Other, as the default query parser Boolean logic is OR.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.240.21