Implementing the autocomplete functionality for categories

Sometimes, we are interested not in our product's name for autocomplete, but in something else. Imagine that we want to show the category of our products in the autocomplete box along with the number of products in each category. Let's see how we can use faceting to achieve such functionality.

How to do it...

  1. Let's start with the example data, which is going to be indexed and looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">First Solr Cookbook</field>
      <field name="category">Books</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Second Solr Cookbook</field>
      <field name="category">Books And Tutorials</field>
     </doc>
     <doc>
      <field name="id">3</field>
      <field name="name">Elasticsearch Server</field>
      <field name="category">Books And Tutorials</field>
     </doc>
    </add>
  2. Our schema.xml configuration file that can handle the preceding data will look as follows:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="category" type="text_lowercase" indexed="true" stored="true" />
  3. One final thing is the text_lowercase type definition, which will be also placed in the schema.xml file, and it will look as follows:
    <fieldType name="text_lowercase" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
    </fieldType>
  4. So now, if we would like to get all the categories that start with boo, along with the number of products in these categories, we can send the following query:
    curl 'http://localhost:8983/solr/cookbook/select?q=*:*&rows=0&facet=true&facet.field=category&facet.mincount=1&facet.limit=5&facet.prefix=boo'
    

    And the following response was returned by Solr:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">42</int>
      <lst name="params">
        <str name="q">*:*</str>
        <str name="facet.limit">5</str>
        <str name="facet.field">category</str>
        <str name="facet.prefix">boo</str>
        <str name="facet.mincount">1</str>
        <str name="rows">0</str>
        <str name="facet">true</str>
      </lst>
    </lst>
    <result name="response" numFound="3" start="0">
    </result>
    <lst name="facet_counts">
      <lst name="facet_queries"/>
      <lst name="facet_fields">
        <lst name="category">
          <int name="books and tutorials">2</int>
          <int name="books">1</int>
        </lst>
      </lst>
      <lst name="facet_dates"/>
      <lst name="facet_ranges"/>
      <lst name="facet_intervals"/>
    </lst>
    </response>

As you can see, we have two categories each with a single product in them, so this is what matches our example data. Let's now see how it works.

How it works...

Our data is very simple; we have three fields for each of our documents—one for the identifier fields, one to hold the name of the document, and one for its category. We will use the category field to do the autocomplete functionality and we will use faceting for it.

If you take a look at the index structure, for the category field, we use a special type—the text_lowercase one. What it does is it stores the category as a single token in the index because of using solr.KeywordTokenizerFactory, but we also lowercase with the appropriate filter. This is because we want to send the lowercased queries when we use faceting.

The query is quite simple; we query for all the documents (the q=*:* parameter) and we don't want any results returned (the rows=0 parameter). We will use faceting for autocomplete, so we turn it on (the facet=true parameter), and we will specify the category field to calculate the faceting on (facet.field=category). We are also only interested in faceting calculation for the values that have at least one document in it (facet.mincount=1) and we want the top five results (facet.limit=5). One of the most important parameters in the query is the facet.prefix one—with the use of this parameter, we will return on those results in faceting that start with the prefix defined by the mentioned parameter, which can be seen in the results. And of course remember that faceting results are by default sorted by their numerousness.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.37.10