Chapter 7. Faceting

Faceting is Solr's killer feature. It's a must-have feature for most search implementations, especially those with structured data like in e-commerce. Yet there are few products that have this capability, especially in open source. Of course, search fundamentals, including highlighting, are critical too, but they tend to be taken for granted. Faceting enhances search results with aggregated information over all documents matching the search query. It can answer questions about the MusicBrainz data such as:

  • How many releases are official, bootleg, or promotional?
  • What were the top five most common countries in which the releases occurred?
  • Over the past ten years, how many were released in each year?
  • How many releases have names in the ranges A-C, D-F, G-I, and so on?
  • How many tracks are < 2 minutes long, 2-3 long minutes, 3-4 minutes long, or longer?

    Note

    In a hurry?

    Faceting is a key feature. Look through the upcoming example, which demonstrates the most common type of faceting, and review the faceting types.

Faceting in the context of the user experience is often referred to as faceted navigation, but also faceted search, faceted browsing, guided navigation, or parametric search. The facets are typically displayed with clickable links that apply Solr filter queries to a subsequent search. Endeca's excellent UX Design Pattern Library contains many screenshots worth viewing. Visit http://www.oracle.com/webfolder/ux/applications/uxd/endeca/content/library/en/home.html and click on Faceted Navigation.

If we revisit the comparison of search technology to databases, then faceting is more or less analogous to SQL's GROUP BY feature on a column with count(*). However, in Solr, facet processing is performed subsequent to an existing search as part of a single request-response, with both the primary search results and the faceting results coming back together. In SQL you would need to perform a series of separate queries to get the same information. Furthermore, faceting works so fast that its search response time overhead is often negligible. For more information on why implementing faceting with relational databases is hard and doesn't scale, visit this old article at http://web.archive.org/web/20090321120327/http://www.kimbly.com/blog/000239.html.

A quick example – faceting release types

Observe the following search results. The echoParams parameter is set to explicit (defined in solrconfig.xml) so that the search parameters are seen here. This example is using the default lucene query parser. The dismax query parser is more typical, but it has no bearing on these examples. The query parameter q is *:*, which matches all documents. In this case, the index only has releases, so there is no need to apply filters. Filter queries are used in conjunction with faceting a fair amount, so be sure you are familiar with them; see Chapter 5, Searching. To keep this example brief, we set rows to 2. Sometimes when using faceting, you only want the facet information and not the main search, so you would set rows to 0.

{"responseHeader":{
  "status":0,
  "QTime":3,
  "params":{
    "facet":"true",
    "f.r_official.facet.method":"enum",
    "f.r_official.facet.missing":"true",
    "facet.field":"r_official",
    "fq":"type:Release",
    "fl":"r_name",
    "q":"*:*",
    "wt":"json",
    "rows":"2"}},
"response":{"numFound":603090,"start":0,"docs":[
    {"r_name":"Texas International Pop Festival 11-30-69"},
    {"r_name":"40 Jahre"}]},
"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
    "r_official":[
      "Official",519168,
      "Bootleg",19559,
      "Promotion",16562,
      "Pseudo-Release",2819,
      null,44982]},
  "facet_dates":{},
  "facet_ranges":{}}}

Note

It's critical to understand that faceting numbers are computed over the entire search result—603,090 releases, which is all of the releases in this example—and not just the two rows being returned.

The facet-related search parameters are highlighted at the top. The facet.missing parameter was set using the field-specific syntax, which will be explained shortly.

Notice that the facet results (highlighted) follow the main search result and are given the name facet_counts. In this example, we only faceted on one field, r_official, but you'll learn in a bit that you can facet on as many fields as you desire. Within "r_official" lie the facet counts for this field—value and count pairs. The first value in a pair, such as "Official", holds a facet value, which is simply an indexed term, and the integer following it is the number of documents in the search results containing that term—the facet count. The last facet has the count but no corresponding name. It is a special facet to indicate how many documents in the results don't have any indexed terms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.144.69