Collapsing and expanding

The collapse query parser and expand search component are two related features that arrived in Solr 4.7 as an alternative to Solr's Result Grouping feature (group=true). First, we'll describe these two features and then compare it to result grouping.

The Collapse query parser

The collapse query parser filters search results so that only one document is returned out of all of those for a given field's value. Said differently, it collapses search results to one document per group of those with the same field value. This query parser is a special type called post-filter, which can only be used as a filter query because it needs to see the results of all other filter queries and the main query. In order to pick which document of a set is chosen to be the one returned, it by default it picks the highest scoring one, but it can be configured to choose based on the document with the highest or lowest value of a field or function query.

An excerpt of the query in action is this filter query: fq={!collapse field=t_a_id}. A complete example will be shown soon. There are only a few parameters:

  • field: This refers to the field to group documents by, which should be single-valued and ideally have DocValues enabled—the same requirement and recommendation for a field that you sort on.
  • min or max: This is either a field or function query that yields a ranking value used to choose which document to return in a grouped set. For min, the document with the smallest value is chosen, and for max, the largest. If your function query needs to be computed based on the document's score, refer to that via cscore().
  • nullPolicy: This refers to the policy on how to treat blank/null values for the group field. It can be ignore, collapse, or expand. By default, documents having no value are ignored (filtered out). If nullPolicy is set to collapse, the documents with no value in this field are treated as one group and therefore one document will be chosen from them. If this parameter is expand then all of these documents are returned (they aren't collapsed).

The Expand component

The expand search component augments the response to return more documents from the groups that were collapsed. It can also be used without collapsing by similarly returning other documents that share the field values found in the main search results. This information is in its own part of Solr's response format, quite unlike the Result Grouping format. Here are the parameters:

  • expand: This is set to true. Generally, it's the only required parameter.
  • expand.field: This is the field to expand search results for documents in the main result list. It's inferred if you use the collapse query parser; otherwise it is required.
  • expand.sort: This overrides the sort parameter for use in expanding.
  • expand.rows: This defines how many rows to return for each group. Defaults to 5.
  • expand.q: This overrides the q parameter for the expanded results.
  • expand.fq: This overrides the fq parameters for the expanded results.

An example

Here's a quick example using MusicBrainz track data collapsing by artist. The query is Cherub Rock (a song/track name). We expand to show one additional document in each group:

http://localhost:8983/solr/mbtracks/mb_tracks?wt=json
&q="Cherub+Rock"&fl=score,id,t_a_id,t_a_name,t_name,t_r_name
&rows=2
&fq={!collapse field=t_a_id}
&expand=true&expand.rows=1

And here's the response:

{
  "responseHeader":{
    "status":0,
    "QTime":20},
  "response":{"numFound":22575,"start":0,"maxScore":15.757925,"docs":[
  {
    "id":"Track:414903",
    "t_name":"Cherub Rock",
    "t_a_id":11650,
    "t_a_name":"The Smashing Pumpkins",
    "t_r_name":"Cherub Rock",
    "score":15.757925},
  {
    "id":"Track:6855353",
    "t_name":"Cherub Rock",
    "t_a_id":33677,
    "t_a_name":"Razed in Black",
    "t_r_name":"Cherub Rock: A Gothic-Industrial Tribute to the Smashing Pumpkins",
    "score":14.348505}]
},
"expanded":{
  "33677":{"numFound":1,"start":0,"maxScore":0.13129683,"docs":[
    {
      "id":"Track:4034054",
      "t_name":"Share This Poison",
      "t_a_id":33677,
      "t_a_name":"Razed in Black",
      "t_r_name":"Rock Sound: Music With Attitude, Volume 52",
      "score":0.13129683}]
  },
  "11650":{"numFound":91,"start":0,"maxScore":12.960967,"docs":[
    {
      "id":"Track:7413518",
      "t_name":"Cherub Rock",
      "t_a_id":11650,
      "t_a_name":"The Smashing Pumpkins",
      "t_r_name":"Guitar Hero™ III: Legends of Rock Companion Pack",
      "score":12.960967}]
  }}}

The effect of collapsing is generally straightforward, and there is no impact to the response format. Interpreting the expanded section can be confusing. Firstly, as you can see, the ordering of the expanded groups isn't significant—it's not the same as the main results. Next, understand that each part underneath the expanded section is a mini result list keyed by the group field value. The first one shown is for field value 33677, and it says numFound is 1. But since the main result list has one document already, you can interpret this as that there are a total of two documents matching the query that have this field value. Likewise, 92 (91 + 1) documents have the field value 11650.

Compared to Result grouping

Result grouping, also known as field collapsing or simply grouping, has been around since Solr 3 and is somewhat obsoleted by collapse and expand. It's technically built into the query component instead of being its own component. For comparison purposes, here is a group query equivalent to the previous example:

http://localhost:8983/solr/mbtracks/mb_tracks?wt=json
&q="Cherub+Rock"
&fl=score,id,t_a_id,t_a_name,t_name,t_r_name&rows=2
&group=true&group.field=t_a_id&group.ngroups=true&group.limit=2

And here is the result:

{
  "responseHeader":{
    "status":0,
    "QTime":49},
  "grouped":{
    "t_a_id":{
      "matches":105155,
      "ngroups":22575,
      "groups":[{
          "groupValue":11650,
          "doclist":{"numFound":92,"start":0,"maxScore":15.757925,"docs":[
              {
                "id":"Track:414903",
                "t_name":"Cherub Rock",
                "t_a_id":11650,
                "t_a_name":"The Smashing Pumpkins",
                "t_r_name":"Cherub Rock",
                "score":15.757925},
              {
                "id":"Track:7413518",
                "t_name":"Cherub Rock",
                "t_a_id":11650,
                "t_a_name":"The Smashing Pumpkins",
                "t_r_name":"Guitar Hero™ III: Legends of Rock Companion Pack",
                "score":12.960967}]
          }},
        {
          "groupValue":33677,
          "doclist":{"numFound":2,"start":0,"maxScore":14.348505,"docs":[
              {
                "id":"Track:6855353",
                "t_name":"Cherub Rock",
                "t_a_id":33677,
                "t_a_name":"Razed in Black",
                "t_r_name":"Cherub Rock: A Gothic-Industrial Tribute to the Smashing Pumpkins",
                "score":14.348505},
              {
                "id":"Track:4034054",
                "t_name":"Share This Poison",
                "t_a_id":33677,
                "t_a_name":"Razed in Black",
                "t_r_name":"Rock Sound: Music With Attitude, Volume 52",
                "score":0.13129683}]
          }}]}}}

We've highlighted the beginning part of the grouping, which reflects that a grouped response has a fairly different response structure than a regular one. The matches number is 105155, which is equivalent to numFound if grouping weren't enabled—the number of matching documents. ngroups is 22575, which is the number of groups found. Each group begins by showing the group's value and then a document list structure that looks just like normal search results.

Result grouping is often much slower than collapse and expand, particularly when the number of possible groups is high in relation to the number of documents, as in the preceding example. It's even more dramatic if you only need the top document since you needn't use the expand component. Nevertheless, Result grouping is not quite obsolete because it has some unique features over collapse and expand:

  • One request can group results multiple times for different fields
  • One request can hold multiple queries to independently get results for (group.query)
  • It can group based on the value returned from a function query versus being limited to a field's value
  • It can instruct the faceting component to facet on the leading document as if it had all field values in its group (group.facet)

If you'd like to learn more about Result Grouping and its parameters, see the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/Result+Grouping.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.93.44