Chapter 1. Getting Ready with the Essentials

In this chapter we will introduce Apache Solr. We will start by giving an idea about what it is and when the project began.

We will prepare our local Solr installation, using the standard distribution to run our examples; we will also see how to start/stop Solr from the command line in a simple way, how to index some example data, and how to perform the first query on the web interface.

We will also introduce some convenient tools such as cURL, which we will use later in the book. This is a simple and effective way to play with our examples, which we will use in the next chapters too.

Understanding Solr

Apache Solr is an open source Enterprise Java full-text search server. It was initially started in 2004 at CNET (at that time, one of the most well-known site for news and reviews on technology), then it became an Apache project in 2007, and since then it has been used for many projects and websites. It was initially conceived to be a web application for providing a wide range of full-text search capabilities, using and extending the power of the well-known Apache Lucene library. The two projects have been merged into a single development effort since 2010, with improved modularity.

Solr is designed to be a standalone web server, exposing full-text and other functionalities via its own REST-like services, which can be consumed in many different ways from nearly any platform or language. This is the most common use case, and we will focus on it.

It can be also used as an embedded framework if needed, adding some of its functionalities into our Java application by a direct call to its internal API. This is a special case: useful if you need it, for example, for using its features inside a desktop application. We will only give some suggestions on how to start programming using an embedded Solr instance, at the end of the book.

Moreover Solr is not a database; it is very different from the relational ones, as it is designed to manage indexes of the actual data (let's say, metadata useful for searching over the actual data) and not the data itself or the relations between them. However, this distinction can be very blurry in some contexts, and Solr itself is becoming a good NoSQL solution for some specific use cases. You can also see Solr as an open and evolving platform, with integrations to external third-party libraries: for data acquisitions, language processing, document clustering, and more. We will have the chance to cite some of those advanced topics when needed though the book, to have a broader idea of the possible scenarios, looking for interesting readings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.173.199