Chapter 4. Searching the Example Data

In this chapter, we will proceed with short examples and cover the essential syntax, which can be expanded later in several ways.

Starting from a core configuration, which is more or less identical to the one seen in Chapter 3, Indexing Example Data from DBpedia – Paintings, we will focus on the standard basic parameters. We will see how to pass a simple query with multiple terms, assembling them by various type of Boolean conditions, including the adoption of incomplete or misspelled terms. We will also see how to shape results projecting fields, and pagination, introducing the possibility of rearranging and sorting results.

We will continue with some more complex and specific requests in the further chapters.

Looking at Solr's standard query parameters

The basic engine of Solr is Lucene. So, Solr accepts a query syntax based on Lucene. Even if there are some minor differences, they should not affect our experiments, because they involve more advanced behavior. You can find an explanation of the Solr Query syntax on wiki at: http://wiki.apache.org/solr/SolrQuerySyntax.

Let's see some example of a query using the basic parameters. Before starting our tests, we need to configure a new core again in the usual way.

Adding a timestamp field for tracking the last modified time

We can define a new core using the the same configuration seen in Chapter 3, Indexing Example Data from DBpedia – Paintings, for the paintings core, we can copy and paste both the schema.xml and solrconfig.xml files and start from those.

Once the new core is created at the location/SolrStarterBook/chp04/solr-app/paintings/, we can add a new administrative metadata field in the schema.xml file, which describes the date and time of last modification of a document:

<fieldType name="date" class="solr.TrieDateField"  />
<field name="timestamp" type="date" indexed="true" stored="true" />

We will use this field later in this chapter.

The field is defined with a specific datatype that uses a canonical representation of the date and time (ISO 8601 standard). We could add current time while indexing a new document by simply posting the value NOW, which is used as a placeholder for the current time calculation:

<field name="timestamp">NOW</field>

This method is good for local Solr, but it can cause some problems to the distributed instances of SolrCloud, because it can produce unexpected difference in time. A good practice is to modify the default update handler in solrconfig.xml, introducing a specific update chain for generating timestamps:

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
  <str name="update.chain">timestamp</str>
  </lst>
</requestHandler>
<updateRequestProcessorChain name="timestamp" default="true">
  <processor class="solr.TimestampUpdateProcessorFactory">
    <str name="fieldName">timestamp</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

This example is useful to get an anticipation of how a custom update chain works (we will see this in detail later). It is interesting to notice how a specific component will be used to compute the current time (TimeStampUpdateProcessor). Moreover, when using distributed instances of Solr, every local update process (RunUpdateProcess) will have its own update chain executed and then a distributed update (DistributingUpdateProcessor) will be attached in the end by default.

Sending Solr's query parameters over HTTP

It is important to take care of the fact that our queries to Solr are sent over the HTTP protocol, unless we are using Solr in embedded mode (we will see this later). With cURL, we can handle the HTTP encoding of parameters. For example, we can write the following command:

>> curl -X POST 'http://localhost:8983/solr/paintings/select?start=3&rows=2&fq=painting&wt=json&indent=true' --data-urlencode 'q=leonardo da vinci&fl=artist title'

This previous command is used in place of the following command:

>> curl -X GET "http://localhost:8983/solr/paintings/select?q=leonardo%20da%20vinci&fq=painting&start=3&row=2&fl=artist%20title&wt=json&indent=true"

In the example, notice how we can write the parameter's values, including characters that need to be encoded over HTTP by using the --data-urlencode parameter.

Testing HTTP parameters on browsers

On modern browsers such as Firefox or Chrome, you can look at the parameters directly into the provided console. For example, using Chrome you can open the console (the keyboard shortcut for opening the console in Chrome is F12):

Testing HTTP parameters on browsers

In the previous image, you can see under this Query String Parameters section on the right that the parameters are shown on a list, We can easily switch between the encoded and the more readable un-encoded value's version.

If you don't like using Chrome or Firefox and want a similar tool, you can try the Firebug lite (http://getfirebug.com/firebuglite). This is a JavaScript library conceived to port Firebug plugin functionality to every browser, but at the cost of adding this library to your HTML page during the test process.

Choosing a format for the output

When sending a query to Solr directly (using the browser or cURL), we can ask for results in multiple formats. This includes, for example, JSON:

>> curl -X GET 'http://localhost:8983/solr/paintings/select?q=*:*&wt=json&indent=true'

Here, the indent=true parameter can be used to improve readability. We will see different options for wt (response writer) in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.19.243