Getting started

We will get started by downloading Solr, examining its directory structure, and then finally run it.

This will set you up for the next section, which tours a running Solr 5 server.

  1. Get Solr: You can download Solr from its website http://lucene.apache.org/solr/. This book assumes that you downloaded one of the binary releases (not the src (source) based distribution). In general, we recommend using the latest release since Solr and Lucene's code are extensively tested. For downloadable example source code, and book errata describing how future Solr releases affect the book content, visit our website http://www.solrenterprisesearchserver.com/.
  2. Get Java: The only prerequisite software needed to run Solr is Java 7 (that is, Java Version 1.7). But the latest version is Java 8, and you should use that. Typing java –version at a command line will tell you exactly which version of Java you are using, if any.

    Java is available on all major platforms, including Windows, Solaris, Linux, and Mac OS X. Visit http://www.java.com to download the distribution for your platform. Java always comes with the Java Runtime Environment (JRE) and that's all Solr requires. The Java Development Kit (JDK) includes the JRE plus the Java compiler and various diagnostic utility programs. One such useful program is JConsole, which we'll discuss in Chapter 11, Deployment, and Chapter 10, Scaling Solr and so the JDK distribution is recommended.

    Note

    Solr is a Java-based web application, but you don't need to be particularly familiar with Java in order to use it. This book assumes no such knowledge on your part.

  3. Get the book supplement: This book includes a code supplement available at our website http://www.solrenterprisesearchserver.com/; you can also find it on Packt Publishing's website at http://www.packtpub.com/books/content/support. The software includes a Solr installation configured for data from MusicBrainz.org, a script to download, and indexes that data into Solr—about 8 million documents in total, and of course various sample code and material organized by chapter. This supplement is not required to follow any of the material in the book. It will be useful if you want to experiment with searches using the same data used for the book's searches or if you want to see the code referenced in a chapter. The majority of the code is for Chapter 9, Integrating Solr.

Solr's installation directory structure

When you unzip Solr after downloading it, you should find a relatively straightforward directory structure (differences between Solr 4 and 5 are briefly explained here):

  • contrib: The Solr contrib modules are extensions to Solr:
    • analysis-extras: This directory includes a few text analysis components that have large dependencies. There are some International Components for Unicode (ICU) unicode classes for multilingual support—a Chinese stemmer and a Polish stemmer. You'll learn more about text analysis in the next chapter.
    • clustering: This directory will have an engine for clustering search results. There is a one-page overview in Chapter 8, Search Components.
    • dataimporthandler: The DataImportHandler (DIH) is a very popular contrib module that imports data into Solr from a database and some other sources. See Chapter 4, Indexing Data.
    • extraction: Integration with Apache Tika—a framework for extracting text from common file formats. This module is also called SolrCell and Tika is also used by the DIH's TikaEntityProcessor—both are discussed in Chapter 4, Indexing Data.
    • langid: This directory contains a contrib module that provides the ability to detect the language of a document before it's indexed. More information can be found on the Solr's Language Detection wiki page at http://wiki.apache.org/solr/LanguageDetection.
    • map-reduce: This directory has utilities for working with Solr from Hadoop Map-Reduce. This is discussed in Chapter 9, Integrating Solr.
    • morphlines-core: This directory contains Kite Morphlines, a document ingestion framework that has support for Solr. The morphlines-cell directory has components related to text extraction. Morphlines is mentioned in Chapter 9, Integrating Solr.
    • uima: This directory contains library for Integration with Apache UIMA—a framework for extracting metadata out of text. There are modules that identify proper names in text and identify the language, for example. To learn more, see Solr's UIMA integration wiki at http://wiki.apache.org/solr/SolrUIMA.
    • velocity: This directory will have a simple search UI framework based on the Velocity templating language. See Chapter 9, Integrating Solr.
  • dist: In this directory, you will see Solr's core and contrib JAR files. In previous Solr versions, the WAR file was found here as well. The core JAR file is what you would use if you're embedding Solr within an application. The Solr test framework JAR and /test-framework directory contain the libraries needed in testing Solr extensions. The SolrJ JAR and /solrj-lib are what you need to build Java based clients for Solr.
  • docs: This directory contains documentation and "Javadocs" for the related assets for the public Solr website, a quick tutorial, and of course Solr's API.

    Note

    If you are looking for documentation outside of this book, you are best served by the Solr Reference Guide. The docs directory isn't very useful.

  • example: Pre Solr 5, this was the complete Solr server, meant to be an example layout for deployment. It included the Jetty servlet engine (a Java web server), Solr, some sample data and sample Solr configurations. With the introduction of Solr 5, only the example-DIH and exampledocs are kept, the rest was moved to a new server directory.
    • example/example-DIH: These are DataImportHandler configuration files for the example Solr setup. If you plan on importing with DIH, some of these files may serve as good starting points.
    • example/exampledocs: These are sample documents to be indexed into the default Solr configuration, along with the post.jar program for sending the documents to Solr.
  • server: The files required to run Solr as a server process are located here. The interesting child directories are as follows:
    • server/contexts: This is Jetty's WebApp configuration for the Solr setup.
    • server/etc: This is Jetty's configuration. Among other things, here you can change the web port used from the presupplied 8983 to 80 (HTTP default).
    • server/logs: Logs are by default output here. Introduced in Solr 5 was collecting JVM metrics, which are output to solr_gc.log. When you are trying to size your Solr setup they are a good source of information.
    • server/resources: The configuration file for Log4j lives here. Edit it to change the behavior of the Solr logging, (though you can also changes levels of debugging at runtime through the Admin console).
    • server/solr: The configuration files for running Solr are stored here. The solr.xml file, which provides overall configuration of Solr lives here, as well as zoo.cfg which is required by SolrCloud. The subdirectory /configsets stores example configurations that ship with Solr.
    • example/webapps: This is where Jetty expects to deploy Solr from. A copy of Solr's WAR file is here, which contains Solr's compiled code and all the dependent JAR files needed to run it.
    • example/solr-webapp: This is where Jetty deploys the unpacked WAR file.

Running Solr

Solr ships with a number of example collection configurations. We're going to run one called techproducts. This example will create a collection and insert some sample data.

Note

The addition of scripts for running Solr is one of the best enhancements in Solr 5. Previously, to start Solr, you directly invoked Java via java –jar start.jar. Deploying to production meant figuring out how to migrate into an existing Servlet environment, and was the source of much frustration.

First, go to the bin directory, and then run the main Solr command. On Windows, it will be solr.cmd, on *nix systems it will be just solr. Jetty's start.jar file by typing the following command:

>>cd bin
>>./solr start –e techproducts

The >> notation is the command prompt and is not part of the command. You'll see a few lines of output as Solr is started, and then the techproducts collection is created via an API call. Then the sample data is loaded into Solr. When it's done, you'll be directed to the Solr admin at http://localhost:8983/solr.

To stop Solr, use the same Solr command script:

>>./solr stop
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.131.168