Declare the request handler and include the cluster search component

Define the request handler for the cluster in solrconfig.xml:

<requestHandler name="/clustering" startup="lazy" enable="true" class="solr.SearchHandler">
 <lst name="defaults">
 <bool name="clustering">true</bool>
 <bool name="clustering.results">true</bool>
 <!-- Field name with the logical "title" of a each document (optional) -->
 <str name="carrot.title">name</str>
 <!-- Field name with the logical "URL" of a each document (optional) -->
 <str name="carrot.url">id</str>
 <!-- Field name with the logical "content" of a each document (optional) -->
 <str name="carrot.snippet">features</str>
 <!-- Apply highlighter to the title/ content and use this for clustering. -->
 <bool name="carrot.produceSummary">true</bool>
 <!-- the maximum number of labels per cluster -->
 <!--<int name="carrot.numDescriptions">5</int>-->
 <!-- produce sub clusters -->
 <bool name="carrot.outputSubClusters">false</bool>
 <!-- Configure the remaining request handler parameters. -->
 <str name="defType">edismax</str>
 <str name="qf">
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 </str>
 <str name="q.alt">*:*</str>
 <str name="rows">100</str>
 <str name="fl">*,score</str>
 </lst>
 <arr name="last-components">
 <str>clustering</str>
 </arr>
</requestHandler>

Now we are done with cluster configurations. Let's run a cluster for the built-in example techproducts by setting enable="true" in the searchComponent and requestHandler configuration. We can enable the same by specifying a JVM system property using the following command:

solr start -e techproducts -Dsolr.clustering.enabled=true

Let's run a query for electronics using the configured request handler /clustering and see the cluster response.

URL: http://localhost:8983/solr/techproducts/clustering?q=electronics&rows=100:

{
  "responseHeader":{
    "status":0,
    "QTime":32},
  "response":{"numFound":14,"start":0,"maxScore":2.9029632,"docs":[
      ....
    ....
      ]
  },
  "clusters":[{
      "labels":["DDR"],
      "score":3.037927435185717,
      "docs":["TWINX2048-3200PRO",
        "VS1GB400C3",
        "VDBDB1A16"]},
    {
      "labels":["iPod"],
      "score":7.317758461138239,
      "docs":["F8V7067-APL-KIT",
        "IW-02",
        "MA147LL/A"]},
    {
      "labels":["Canon"],
      "score":6.785392802370259,
      "docs":["0579B002",
        "9885A004"]},
    {
      "labels":["Hard Drive"],
      "score":10.460153088070832,
      "docs":["SP2514N",
        "6H500F0"]},
    {
      "labels":["Retail"],
      "score":1.629540936033123,
      "docs":["TWINX2048-3200PRO",
        "VS1GB400C3"]},
    {
      "labels":["Video"],
      "score":10.060361253597023,
      "docs":["MA147LL/A",
        "100-435805"]},
    {
      "labels":["Other Topics"],
      "score":0.0,
      "other-topics":true,
      "docs":["EN7800GTX/2DHTV/256M",
        "3007WFP",
        "VA902B"]}]
}

Here we can see a few clusters discovered for the query (q=electronics). Each cluster has a label and the score shows the kindness of the cluster. The score is specific to an algorithm and meaningful only in relation to the scores of other clusters in the same set. A score with a higher value is better.

Table of Contents for Declare the request handler and include the cluster search component

Create new playlist

Sign In

Sign Up

Table of Contents for
Declare the request handler and include the cluster search component