Calculating faceting for relevant documents in groups

If you have ever used the field-collapsing functionality of Solr, you might be wondering whether there is a possibility of using that functionality and faceting. Of course, there is, but the default behavior still works and so you get the faceting calculation on the basis of documents and not on document groups. In this recipe, we will learn how to query Solr so that it returns facets calculated for the most relevant document in each group.

Getting ready

Before reading the following recipe, let's take a look at Grouping documents by the field value, Grouping documents by the query value, and Grouping documents by the function value recipes in Chapter 8, Using Additional Functionalities. Also if you are not familiar with faceting functionality, read the first three recipes of this chapter.

How to do it...

  1. In the first step, we need to create an index. For the purpose of this recipe, let's assume that we have the following index structure (just add the following section to your schema.xml file):
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="category" type="string" indexed="true" stored="true" />
    <field name="stock" type="boolean" indexed="true" stored="true" />
  2. The second step is to index the data. We will use an example data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Book 1</field>
      <field name="category">books</field>
      <field name="stock">true</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Book 2</field>
      <field name="category">books</field>
      <field name="stock">true</field>
     </doc>
     <doc>
      <field name="id">3</field>
      <field name="name">Workbook 1</field>
      <field name="category">Workbooks</field>
      <field name="stock">false</field>
     </doc>
     <doc>
      <field name="id">4</field>
      <field name="name">Workbook 2</field>
      <field name="category">Workbooks</field>
      <field name="stock">true</field>
     </doc>
    </add>
  3. So, now it's time for our query. Let's assume that we want our results to be grouped on the values of the category field and we want the faceting to be calculated on the stock field. Also remember that we are only interested in the most relevant document from each result group when it comes to faceting. So, the query that would tell Solr to do what we want should look as follows:
    http://localhost:8983/solr/cookbook/select?q=*:*&facet=true&facet.field=stock&group=true&group.field=category&group.truncate=true

    The results for the preceding query would look as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
        <str name="q">*:*</str>
        <str name="facet.field">stock</str>
        <str name="group.truncate">true</str>
        <str name="facet">true</str>
        <str name="group.field">category</str>
        <str name="group">true</str>
      </lst>
    </lst>
    <lst name="grouped">
      <lst name="category">
        <int name="matches">4</int>
        <arr name="groups">
          <lst>
            <str name="groupValue">books</str>
            <result name="doclist" numFound="2" start="0">
              <doc>
                <str name="id">1</str>
                <str name="name">Book 1</str>
                <str name="category">books</str>
                <bool name="stock">true</bool>
                <long name="_version_">1487145087213240320</long></doc>
            </result>
          </lst>
          <lst>
            <str name="groupValue">Workbooks</str>
            <result name="doclist" numFound="2" start="0">
              <doc>
                <str name="id">3</str>
                <str name="name">Workbook 1</str>
                <str name="category">Workbooks</str>
                <bool name="stock">false</bool>
                <long name="_version_">1487145087281397760</long></doc>
            </result>
          </lst>
        </arr>
      </lst>
    </lst>
    <lst name="facet_counts">
      <lst name="facet_queries"/>
      <lst name="facet_fields">
        <lst name="stock">
          <int name="false">1</int>
          <int name="true">1</int>
        </lst>
      </lst>
      <lst name="facet_dates"/>
      <lst name="facet_ranges"/>
      <lst name="facet_intervals"/>
    </lst>
    </response>

As you can see, everything has worked as it should have. Now let's see how it works in detail.

How it works...

Our data is very simple. As you can see in the field definition section of the schema.xml file and the example data, every document is described by four fields:

  • id
  • name
  • category
  • stock

I think that their names speak for themselves and I don't need to discuss them.

When it comes to the query, we fetch all the documents from the index (the q=*:* parameter). Next, we say that we want to use faceting and want it to be calculated on the stock field. We want the grouping mechanism to be active and also want to group documents on the basis of the category field (all the query parameters responsible for defining faceting and grouping behavior are described in the appropriate recipes in this book, so take a look at those if you are not familiar with those parameters). And finally, something new—the group.truncate parameter is set to true. If it is set to true, like in our case, facet counts will be calculated using only the most relevant document in each of the calculated groups. So, in our case for the group with the category field equal to books, we have the true value in the stock field and for the second group we have false in the stock field. Of course, we are looking at the most relevant documents, so the first ones in our case. As you can easily see, we've got two facet counts for the stock field both having a count of 1, which is what we would expect.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.228.156