Solr's XML response format

The <response/> element wraps the entire response. The first child element is <lst name="responseHeader">, which is intuitively the response header that captures some basic metadata about the response. Some of the fields you'll find in the responseHeader include:

  • status: This is always 0. If a Solr error occurs, then the HTTP response status code will reflect it and a plain HTML page will display the error.
  • QTime: This refers to the number of milliseconds Solr takes to process the entire request on the server. Due to internal caching, you should see this number drop to a couple of milliseconds or so for subsequent requests of the same query. If subsequent identical searches are much faster, yet you see the same QTime, then your web browser (or intermediate HTTP proxy) has cached the response. Solr's HTTP caching configuration will be discussed in Chapter 10, Scaling Solr.
  • Other data may be present depending on query parameters.

The main body of the response is the search result listing enclosed by <result name="response" numFound="399182" start="0">, and it contains a <doc> child node for each returned document. Some of the fields have been explained here:

  • numFound: This is the total number of documents matched by the query. This is not impacted by the rows parameter, and as such may be larger (but not smaller) than the number of child <doc> elements.
  • start: This is the same as the start request parameter (described shortly), which is the offset of the returned results into the query's result set.
  • maxScore: Of all documents matched by the query (numFound), this is the highest score. If you didn't explicitly ask for the score in the field list using the fl request parameter (described shortly), then this won't be here. Scoring will be described in the next chapter.

The contents of the <result> element are a list of doc elements. Each of these elements represents a document in the index. The child elements of a doc element represent fields in the index and are named correspondingly. The types of these elements use Solr's generic data representation, which was described earlier. They are simple values if they are not multi-valued in the schema. For multi-valued values, the field would be represented by an ordered array of simple values.

There was no data following the results element in our demonstration query. However, there can be, depending on the query parameters enabling features such as faceting and highlighting. When we cover those features, the corresponding XML will be explained.

Parsing the URL

When the admin Query page form is submitted, the form parameters become the query string component of the URL. This URL can be seen at the top of the search results section. Take a good look at the URL; understanding the URL's structure is very important to grasp how searching Solr works:

http://localhost:8983/solr/mbartists/select?q=*%3A*&wt=xml
  • The /solr/ is the web application context where Solr is installed on the Java servlet engine. If you have a dedicated server for Solr, then you might opt to install it at the root. This would make it just /. How to do this is beyond the scope of this book, but letting it remain at /solr/ is fine.
  • After the web application context is a reference to the Solr core named mbartists. If you are experimenting with Solr's example setup, you won't see a core name because it has a default one. We'll see more about configuring Solr cores in Chapter 11, Deployment.
  • The /select is a reference to the Solr request handler. More on this is covered next in the Understanding request handlers section.
  • Following the ? is a set of unordered URL parameters, also known as query parameters in the context of Solr. The format of this part of the URL is an & that separates sets of unordered name-value pairs. As the form doesn't have an option for all query parameters, you will manually modify the URL in your browser to add query parameters as needed.

Text in the URL must be UTF-8 encoded then URL-escaped so that the URL complies with its specification. This concept should be familiar to anyone who has done web development. Depending on the context in which the URL is actually constructed, there are API calls you should use to ensure that this escaping happens properly. For example, in JavaScript, you could use encodeURIComponent(). In the previous URL, Solr interpreted %3A as a colon. The most common escaped character in URLs is a space, which is escaped as either + or %20. Fortunately, when experimenting with URLs, browsers are lenient and will permit some characters that should be escaped. For more information on URL encoding, see http://en.wikipedia.org/wiki/Percent-encoding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.11.34