A short list of search components

We already briefly introduced the idea behind the Solr components workflow, and we see that it's possible to use multiple query handlers at the same time by specifying a defType parameter. The defType parameter helps us to choose between some different query handlers, including our customized ones, if there are. We will focus on the next paragraphs of the standard and Dismax query handlers.

The defType and qt parameters that we have already seen are similar but different: the qt parameter can be used to define and customize a search handler in the solrconfig.xml file while defType is a parameter that's used to inform a search handler what type of query parser to use.

For example, for using Edismax, it is possible to adopt some different equivalent syntax as follows:

q={!dismax qf=query-filter-condition} some-query-terms
q={!type=dismax qf=query-filter-condition} some-query-terms
q=some-query-terms&defType=dismax&qf=query-filter-condition

We have already seen how to select different response writers using a simple wt parameter.

Both request handlers and response writers are particular types of SolrPlugin that are available at http://wiki.apache.org/solr/SolrPlugins.

Tip

One of the possible ways to extend Solr components is to write a new Java class that should subscribe one of the SolrPlugin interfaces defined in the org.apache.solr.util.plugin package.

Furthermore, if a plugin implements ResourceLoaderAware or SolrCoreAware, it can easily share some configurations handled by the main core.

For our purposes, it's important to have some more experience with the search-specific components: http://wiki.apache.org/solr/SearchComponent; in order to have a much wider point of view, we can think about Solr out-of-the-box capabilities in search and related actions. For your convenience I have summarized some of the most important ones in the following table, with references to the chapters where we have the chance to play a bit with them:

Reference

Parameter

Component

Chapters

Search Handlers: http://wiki.apache.org/solr/SearchHandler

Query using some of the query parsers

query

Query component

3,4,5

Add the highlight results section

highlight

Highlight component

5

Add the more like this results section

mlt

MoreLikeThis component

6

Provide faceted results

facet

Facet component

6

Add some statistics to the results

stats

Stats component

5

Add a debug section for inspecting queries

debug

Debug component

5

Others

Provide a spellchecker, useful as autosuggester

 

SpellCheck component

5

Promote some term or phrase

 

QueryElevation component

5

Provide Bloom index capabilitiy

 

BloomIndex component

5

Expose the IDF and TF Lucene vector weights

 

TermVector component and Termscomponent

6

Provide clustering capabilities for fields and documents

 

Clustering component

6

We will see faceted searches and other related topics in Chapter 6, Using Faceted Search – from Searching to Finding, such as grouping results or defining a "more like this" handler. Moreover, we will have the chance to get an introductory idea about some advanced and still evolving topics such as clustering terms and documents, and Lucene terms vector weights suitable to functions and similarity queries.

Adding the blooming filter and real-time Get

Lastly, a blooming filter can be seen as a look-up for the existence of a specific value in a read-only collection: Solr uses this especially for verifying the presence of a value in a distributed cloud instance. Adding a blooming filter handler in our configuration is easy, as shown in the following code:

<searchComponent name="bloom" class="org.apache.solr.handler.component.BloomIndexComponent">
  <str name="field">uri</str>
</searchComponent>
<requestHandler name="/bloom" class="org.apache.solr.handler.component.SearchHandler">
  <arr name="components">
    <str>bloom</str>
  </arr>
</requestHandler>

And all we should do is check for the existence of a specific resource using the following command:

>> curl -X GET 'http://localhost:8080/solr/bloom?q=uri:http://dbpedia.org/resource/Drawing_Hands'

At the moment of writing this book, there were issues in the current implementation of bloom filters, so I suggest you not use them; but it's interesting to follow this idea because, with this functionality, you can check not only for the existence of a specific resource but you can obtain false positives (which "maybe" existing) that can be used as a test for finding your own resources.

However, we don't need a bloom filter if we want to look up a specific resource in a faster way, without the cost of doing a search in the index (and then reopening a new searcher). In this case, we can simply use the RealTimeGetHandler as follows:

<requestHandler name="/get" class="solr.RealTimeGetHandler">
    <lst name="defaults">
      <str name="omitHeader">true</str>
      <str name="wt">json</str>
      <str name="indent">true</str>
    </lst>
 </requestHandler>
<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog class="solr.FSUpdateLog">
    <str name="dir">${solr.ulog.dir:}</str>
  </updateLog>
</updateHandler>

And we can verify whether a document has been added to the index by using the following command:

>> curl -X GET 'http://localhost:8080/solr/get?q=uri:http://dbpedia.org/resource/Drawing_Hands'

Note that using this command, we will know if a specific document has been added to the index before a commit operation is performed, since they have already been added with some soft commit. This can be very useful in many situations, especially while using sharding with SolrCloud, which we will see in Chapter 8, Indexing External Data Sources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.90.182