Using decision tree faceting

Imagine that in our store we have products divided into categories. In addition to this, we store information about the stock of the items. Now, we want to show our crew how many of the products in the categories are in stock and how many are missing. The first thing that comes to mind is to use the faceting mechanism and some additional calculation. But why bother, when Solr 4.0 and later can do that calculation for us with the use of so-called pivot faceting? This recipe will show you how to use it.

How to do it...

  1. We start with defining the index structure that we can easily use. We do this by adding the following field definitions to the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="category" type="string" indexed="true" stored="true" />
    <field name="stock" type="boolean" indexed="true" stored="true" />
  2. Now, let's index the following example data:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Book 1</field>
      <field name="category">books</field>
      <field name="stock">true</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Book 2</field>
      <field name="category">books</field>
      <field name="stock">true</field>
     </doc>
     <doc>
      <field name="id">3</field>
      <field name="name">Workbook 1</field>
      <field name="category">workbooks</field>
      <field name="stock">false</field>
     </doc>
     <doc>
      <field name="id">4</field>
      <field name="name">Workbook 2</field>
      <field name="category">workbooks</field>
      <field name="stock">true</field>
     </doc>
    </add>
  3. Let's assume that we are running a query from the administration panel of our shop and we are not interested in the documents at all. We only want to know how many documents are in stock and how many are not for each of the categories. The query implementing this logic should look as follows:
    http://localhost:8983/solr/cookbook/select?q=*:*&rows=0&facet=true&facet.pivot=category,stock

    The response to the preceding query is as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">*:*</str>
       <str name="facet.pivot">category,stock</str>
       <str name="rows">0</str>
       <str name="facet">true</str>
      </lst>
     </lst>
     <result name="response" numFound="4" start="0">
     </result>
     <lst name="facet_counts">
      <lst name="facet_queries"/>
      <lst name="facet_fields"/>
      <lst name="facet_dates"/>
      <lst name="facet_ranges"/>
      <lst name="facet_pivot">
       <arr name="category,stock">
        <lst>
         <str name="field">category</str>
         <str name="value">books</str>
         <int name="count">2</int>
         <arr name="pivot">
          <lst>
           <str name="field">stock</str>
           <bool name="value">true</bool>
           <int name="count">2</int>
          </lst>
         </arr>
        </lst>
        <lst>
         <str name="field">category</str>
         <str name="value">workbooks</str>
         <int name="count">2</int>
         <arr name="pivot">
          <lst>
           <str name="field">stock</str>
           <bool name="value">false</bool>
           <int name="count">1</int>
          </lst>
          <lst>
           <str name="field">stock</str>
           <bool name="value">true</bool>
           <int name="count">1</int>
          </lst>
         </arr>
        </lst>
       </arr>
      </lst>
     </lst>
    </response>

As you can see, we've achieved what we wanted, now let's see how it works.

How it works...

Our data is very simple. As you can see in the field definition section of the schema.xml file and the example data, every document is described by four fields:

  • id
  • name
  • category
  • stock

I think that their names speak for themselves and I don't need to discuss them.

The interesting bit starts with the query. We specified that we want the query to match all the documents (the q=*:* parameter), but we don't want to see any documents in the response (the rows=0 parameter). In addition to this, we want to have faceting calculation enabled (the facet=true parameter) and we will use the decision tree faceting—aka pivot faceting. We do this by specifying which fields should be included in the tree faceting. In our case, we want the top-level of the pivot facet to be calculated on the basis of the category field and the second level (the one nested in the category field calculation) should be based on the values available in the stock field. Of course, if you would like to have another value of another field nested under the stock field, you can do that by adding another field to the facet.pivot query parameter. Assuming that you would like to see faceting on field price nested under the stock field, your facet.pivot parameter would look like this: facet.pivot=category,stock,price.

As you can see in the response, each nested faceting calculation result is written inside the <arr name="pivot"> XML tag. So, let's look at the response structure. The first level of your facet pivot tree is based on the category field. You can see that there are two books (<int name="count">2</int>) in the books category (<str name="value">books</str>) and all these books have the stock field (<str name="field">stock</str>) set to true (<bool name="value">true</bool>). For the workbooks category, the situation is a bit different because you can see two different sections there—one for documents with the stock field equal to false and the other with the stock field set to true. However, in the end the calculation is correct and that's what we wanted!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.17.27