Imagine a situation where we have a simple document to be indexed to Solr with titles and tags. What we will want to do is separate the premium documents that have more tag values because they are better in terms of our business. Of course, we can count the number of tags ourselves, but why not let Solr do this? This recipe will show you how to do this with Solr.
Let's look at the steps we need to take to count the number of field values.
schema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text_general" indexed="true" stored="true"/> <field name="tags" type="string" indexed="true" stored="true" multiValued="true"/> <field name="tags_count" type="int" indexed="true" stored="true"/>
<add> <doc> <field name="id">1</field> <field name="title">Solr Cookbook 4</field> <field name="tags">solr</field> </doc> <doc> <field name="id">2</field> <field name="title">Solr Cookbook 4 second edition</field> <field name="tags">search</field> <field name="tags">solr</field> <field name="tags">cookbook</field> </doc> </add>
solrconfig.xml
file. First, we add the proper update request processor to the file:<updateRequestProcessorChain name="count"> <processor class="solr.CloneFieldUpdateProcessorFactory"> <str name="source">tags</str> <str name="dest">tags_count</str> </processor> <processor class="solr.CountFieldValuesUpdateProcessorFactory"> <str name="fieldName">tags_count</str> </processor> <processor class="solr.DefaultValueUpdateProcessorFactory"> <str name="fieldName">tags_count</str> <int name="value">0</int> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
/update
handler in the solrconfig.xml
file so that it looks like this:<requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">count</str> </lst> </requestHandler>
http://localhost:8983/solr/cookbook/select?q=title:cookbook&bf=field(tags_count)&defType=edismax
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">title:cookbook</str> <str name="defType">edismax</str> <str name="bf">field(tags_count)</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="id">2</str> <str name="title">Solr Cookbook 4 second edition</str> <arr name="tags"> <str>search</str> <str>solr</str> <str>cookbook</str> </arr> <int name="tags_count">3</int> <long name="_version_">1467535763434373120</long></doc> <doc> <str name="id">1</str> <str name="title">Solr Cookbook 4</str> <arr name="tags"> <str>solr</str> </arr> <int name="tags_count">1</int> <long name="_version_">1467535763382992896</long></doc> </result> </response>
Now, let's see how it works.
The index structure is quite simple. It contains a unique identifier field, a title, a field holding tags, and a field holding the count of tags. As you can see, in the example data, we provide the identifier of the document, its title, and the tags. What we don't provide is the number of tags that we calculate during indexation.
We also defined a new update request processor chain called count
. It contains five update processors.
The first update processor, solr.CloneFieldUpdateProcessorFactory
, is responsible for copying the value of the field defined by the source
property to a field defined by the dest
property. The second update processor, solr.CountFieldValuesUpdateProcessorFactory
, replaces the actual value of the field defined by the fieldName
property with the count of values. This is why we need the solr.CloneFieldUpdateProcessorFactory
update processor before solr.CountFieldValuesUpdateProcessorFactory
. The third update processor, solr.DefaultValueUpdateProcessorFactory
, sets the default value (defined by the value
property) for the field defined by the fieldName
property. The other request processors are responsible for logging the request information and running the update. By defining this chain, we tell Solr that we want the tags
field to be cloned into tags_count
first, then we want the counts to be calculated and placed in the tags_count
field; if we don't have a value in the tags_count
field, we set it to 0
.
We also define the solr.UpdateRequestHandler
configuration and then alter the default configuration by adding the defaults
section and including the update.chain
property to count
(our update request processor chain name). This means that our defined update request processor chain will be used with every indexing request.
Our query searches for every document that includes the cookbook
term in the title
field. We will also use the edismax
query parser (defType=edismax
). We also include a simple boosting function that boosts documents by the value of their tags_count
field (bf=field(tags_count)
). As you can see in the results, we get what we wanted to achieve.
18.222.182.66