Time for action – defining facets over enumerated fields

Faceted search is generally used with enumerations: The faceted results can be used, as we saw, for suggesting a sort of "horizontal" classification. Moreover, we can play with different fields and enable/disable terms from them as in the HTML prototype seen before. This is very useful to produce a guided navigation for the user using filters.

  1. To use facets this way, we should use solr.KeywordTokenizerFactory in the schema.xml file as given in the following code:
    <fieldType name="text_facets" class="solr.TextField">
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory"mapping="mapping-ISOLatin1Accent.txt" />
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>
    
    ...
    
    <dynamicField name="*_entity" type="text_facets" multiValued="false" indexed="true" stored="true" />
  2. Once we have modified our schema.xml file for the core at /SolrStarterBook/solr-app/chp06/paintings, we simply populate it with the example data and then start it from the /SolrStarterBook/test/chp06 path (remember that you can use the corresponding .bat if you are on a Windows machine):
    >> start.sh
    >> post.sh (from another terminal window)
    
  3. When the core has been started and populated with our data, we will have to populated the fields to be used for entity enumeration with the facets. For example, if we want to obtain a list of all the artists, we will use the following command:
    >> curl -X GET 'http://localhost:8983/solr/paintings/select?q=*:*&rows=0&facet=true&facet.field=artist_entity&facet.limit=-1&facet.mincount=2&facet.sort=count&wt=json&json.nl=map'
    

What just happened?

In the example, we have activated facets, and then requested faceting results for the field artist_entity. Here, the role of keyword tokenization is crucial. We have to be sure that the original term (or terms) is preserved; so, in most cases, we need to adopt a copyfield in order to play with different text analysis on the source and destination fields.

In our scenario, we decided not to use copyfields, but to play with a simple external script during the update phase as we saw in the previous examples. We need to put in our solrconfig.xml file to populate the fields defined for enumeration, as shown in the following code:

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">script</str>
  </lst>
</requestHandler>
...
<updateRequestProcessorChain name="script">
  <processor class="solr.StatelessScriptUpdateProcessorFactory">
    <str name="script">normalize_entities.js</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

An important aspect to consider if you plan to consume the results from JavaScript into an HTML interface is the format of the response. Here, we used the parameters wt=json and json.nl=map to request the Json format with a named list in the format of a map, as shown in the following screenshot:

What just happened?

In the screenshot, you can see the respective output structure for an array of an array (json.nl=arrarr), simple array alternating name and value (json.nl=flat), and finally the map (json.nl=map).

We decided to obtain a list of the artists who are related to at least two documents (facet.mincount=2) and the complete list too (facet.limit=-1). You can easily obtain the list of terms ordered from the most recurring ones to the lesser recurring ones using facet.sort=count or have them in a lexicographic order using facet.sort=index. Sometimes, it is useful for the frontend to have the complete list even if a term has no matching documents. In this case, we can simply add the facet.missing=true parameter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.25.41