Imagine a situation where you have to return the number of documents with the same field value besides the search results. For example, you have an application that allows your user to search for companies in Europe and your client wants to have the number of companies in the cities where the companies that were found by the query are located. To do this, you can of course run several queries, but Solr provides a mechanism called faceting that can do this for you. This recipe will show you how to use it.
schema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text_general" indexed="true" stored="true" /> <field name="city" type="string" indexed="true" stored="true" />
<add> <doc> <field name="id">1</field> <field name="name">Company 1</field> <field name="city">New York</field> </doc> <doc> <field name="id">2</field> <field name="name">Company 2</field> <field name="city">New Orleans</field> </doc> <doc> <field name="id">3</field> <field name="name">Company 3</field> <field name="city">New York</field> </doc> </add>
company
. Apart from the query results, we would also like to return the number of documents with the same city. The query that will give us what we want should look as follows:http://localhost:8983/solr/cookbook/select?q=name:company&facet=true&facet.field=city
The result of the preceding query should be as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">name:company</str> <str name="facet.field">city</str> <str name="facet">true</str> </lst> </lst> <result name="response" numFound="3" start="0"> <doc> <str name="id">1</str> <str name="name">Company 1</str> <str name="city">New York</str> <long name="_version_">1471068544442564608</long></doc> <doc> <str name="id">2</str> <str name="name">Company 2</str> <str name="city">New Orleans</str> <long name="_version_">1471068544491847680</long></doc> <doc> <str name="id">3</str> <str name="name">Company 3</str> <str name="city">New York</str> <long name="_version_">1471068544492896256</long></doc> </result> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="city"> <int name="New York">2</int> <int name="New Orleans">1</int> </lst> </lst> <lst name="facet_dates"/> <lst name="facet_ranges"/> </lst> </response>
As you can see, besides the normal results list, we got faceting results with the numbers that we wanted. Now let's see how that happened.
The index structure and the data are pretty simple and make the example easier to understand. Each company is described by three fields. We are particularly interested in the city
field. This is the field that we want to use to get the number of companies that have the same value in this field, which basically means that they are in the same city. The city
field is configured to use the string
type—the one that is not analyzed—the value that we pass in the field will be indexed without any additional processing by Solr. This is because field faceting works on the indexed tokens. If we analyze the field, faceting will be calculated for each token and not for the whole field value (which is the city name in our case).
To get the desired results, we run a query to Solr and inform the query parser that we want the documents that have the word company
in the name
field. Additionally, we can say that we want to enable faceting mechanism—we can say that using the facet=true
parameter. The facet.field
parameter tells Solr which field to use to calculate faceting numbers. You can specify the facet.field
parameter multiple times to get faceting numbers for different fields in the same query.
As you can see in the results list, the results of all types of faceting are grouped in the list with the name="facet_counts"
attribute. The field-based faceting is grouped under the list with the name="facet_fields"
attribute. Every field that you specified using the facet.field
parameter has its own list that has the attribute name
the same as the value of the parameter in the query—in our case, it is city
. Then, finally, you can see the results that we are interested in—the pairs of value (the name
attribute) and how many documents have the value in the specified field.
There are two more things that I would like to show you about field faceting.
The default behavior of Solr is to show all the faceting results, no matter what the counts are. If you want to show only the facets with counts greater than zero, then you should add the facet.mincount=1
parameter to the query (you can set this parameter to another value if you are interested in any arbitrary value).
18.119.235.79