Getting the number of documents with the same field value

Imagine a situation where you have to return the number of documents with the same field value besides the search results. For example, you have an application that allows your user to search for companies in Europe and your client wants to have the number of companies in the cities where the companies that were found by the query are located. To do this, you can of course run several queries, but Solr provides a mechanism called faceting that can do this for you. This recipe will show you how to use it.

How to do it...

  1. Let's start by assuming that we have the following fields present in the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="city" type="string" indexed="true" stored="true" />
  2. The next step is to index the following example data:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Company 1</field>
      <field name="city">New York</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Company 2</field>
      <field name="city">New Orleans</field>
     </doc>
     <doc>
      <field name="id">3</field>
      <field name="name">Company 3</field>
      <field name="city">New York</field>
     </doc>
    </add>
  3. Suppose that our hypothetical user searches for the word company. Apart from the query results, we would also like to return the number of documents with the same city. The query that will give us what we want should look as follows:
    http://localhost:8983/solr/cookbook/select?q=name:company&facet=true&facet.field=city

    The result of the preceding query should be as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">name:company</str>
       <str name="facet.field">city</str>
       <str name="facet">true</str>
      </lst>
     </lst>
     <result name="response" numFound="3" start="0">
      <doc>
       <str name="id">1</str>
       <str name="name">Company 1</str>
       <str name="city">New York</str>
       <long name="_version_">1471068544442564608</long></doc>
      <doc>
       <str name="id">2</str>
       <str name="name">Company 2</str>
       <str name="city">New Orleans</str>
       <long name="_version_">1471068544491847680</long></doc>
      <doc>
       <str name="id">3</str>
       <str name="name">Company 3</str>
       <str name="city">New York</str>
       <long name="_version_">1471068544492896256</long></doc>
     </result>
     <lst name="facet_counts">
      <lst name="facet_queries"/>
      <lst name="facet_fields">
       <lst name="city">
        <int name="New York">2</int>
        <int name="New Orleans">1</int>
       </lst>
      </lst>
      <lst name="facet_dates"/>
      <lst name="facet_ranges"/>
     </lst>
    </response>

As you can see, besides the normal results list, we got faceting results with the numbers that we wanted. Now let's see how that happened.

How it works...

The index structure and the data are pretty simple and make the example easier to understand. Each company is described by three fields. We are particularly interested in the city field. This is the field that we want to use to get the number of companies that have the same value in this field, which basically means that they are in the same city. The city field is configured to use the string type—the one that is not analyzed—the value that we pass in the field will be indexed without any additional processing by Solr. This is because field faceting works on the indexed tokens. If we analyze the field, faceting will be calculated for each token and not for the whole field value (which is the city name in our case).

To get the desired results, we run a query to Solr and inform the query parser that we want the documents that have the word company in the name field. Additionally, we can say that we want to enable faceting mechanism—we can say that using the facet=true parameter. The facet.field parameter tells Solr which field to use to calculate faceting numbers. You can specify the facet.field parameter multiple times to get faceting numbers for different fields in the same query.

As you can see in the results list, the results of all types of faceting are grouped in the list with the name="facet_counts" attribute. The field-based faceting is grouped under the list with the name="facet_fields" attribute. Every field that you specified using the facet.field parameter has its own list that has the attribute name the same as the value of the parameter in the query—in our case, it is city. Then, finally, you can see the results that we are interested in—the pairs of value (the name attribute) and how many documents have the value in the specified field.

There's more...

There are two more things that I would like to show you about field faceting.

How to show facets with counts greater than zero

The default behavior of Solr is to show all the faceting results, no matter what the counts are. If you want to show only the facets with counts greater than zero, then you should add the facet.mincount=1 parameter to the query (you can set this parameter to another value if you are interested in any arbitrary value).

Lexicographical sorting of the faceting results

If you want to sort the faceting results lexicographically, not by the highest count (which is the default behavior), then you need to add the facet.sort=index parameter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.235.79