In this chapter we will have the chance to index and query some local PDFs (some examples are provided for your tests) as first use cases, even if you do not yet have any knowledge of Solr.
We will have a hands on with both cURL and the browser. We will see how an index is made and how to interact with it in various ways, introducing the web user interface. We will describe the main concepts behind what is an index and a core, which will be useful for the examples covered in the subsequent chapters.
The main component in Solr is the Lucene library, a full-text search library written in Java. Since Solr hides the Lucene layer from us, we don't have to study how Lucene works in detail now; you can study it in depth later. Yet it is important to have an idea of what a Lucene index is, and how it's made. Lucene's core concepts are as follows:
The best way to understand how a generic query works is by focusing on documents and trying to imagine how to search for them. While searching for the string Solr Book
in the field title
, if the index has been created and the fields in our query exist, we expect Lucene to search correspondences for the name-value pair title:'Solr Book'
iterating over all the existing documents currently added to the index.
These kind of document-oriented representations are often useful, as it is a common way of representing data used by many people. However, the real internal structure adopted for storing index data (and the actual process to search over the index data) is less intuitive, and we will cover it later in this chapter.
18.223.213.238