Appendix A. Installing Lucene

The Java version of Lucene is just another JAR file, less than 1 MB in size. Using Lucene’s API in your code requires only this single JAR file on your build and runtime classpath; it has no dependencies. This appendix provides the specifics of where to obtain Lucene, how to work with the distribution contents, and how to build Lucene directly from its source code. If you’re using a port of Lucene in a language other than Java, refer to chapter 10 and the documentation provided with the port. If you’re using a contrib module, chapter 8 describes how they’re built. This appendix covers the core library of the Java version only.

A.1. Binary installation

To obtain the binary distribution of Lucene, follow these steps:

1.  Download the latest binary Lucene release from the download area of the Apache Lucene website: http://lucene.apache.org/java. As of this writing, the latest version is 3.0.1; the subsequent steps assume this version. Download either the .zip or .tar.gz file, whichever format is more convenient for your environment.

2.  Extract the binary file to the directory of your choice on your file system. The archive contains a top-level directory named lucene-3.0.1, so it’s safe to extract to c: on Windows or your home directory on Unix. On Windows, if you have WinZip handy, use it to open the .zip file and extract its contents to c:. If you’re on Unix or you’re using cygwin on Windows, unzip and untar (tar zxvf lucene-3.0.1.tar.gz) the .tar.gz file in your home directory.

3.  Under the created lucene-3.0.1 directory, you’ll find lucene-core-3.0.1.jar. This is the only file required to introduce Lucene into your applications. How you incorporate Lucene’s JAR file into your application depends on your environment; there are numerous options. We recommend using Ant to build your application’s code. Be sure your code is compiled against the Lucene JAR using the classpath options of the <javac> task.

4.  Include Lucene’s JAR file in your application’s distribution appropriately. For example, a web application using Lucene would include lucene-core-3.0.1.jar in the WEB-INF/lib directory. For command-line applications, be sure Lucene is on the classpath when launching the JVM.

The binary distribution includes a substantial amount of documentation, including Javadocs. The root of the documentation is docs/index.html, which you can open in a web browser. Lucene’s distribution also ships two demonstration applications. We apologize in advance for the crude state of these demos—they lack polish when it comes to ease of use—but the documentation (found in docs/demo.html) describes how to use them step by step; we also cover the basics of running them here.

A.2. Running the command-line demo

The command-line Lucene demo consists of two command-line programs: one that indexes a directory tree of files and another that provides a simple search interface. They’re contained in a separate JAR file, lucene-demos-3.0.1.jar, and are similar to the Indexer and Searcher examples we covered in chapter 1. To run this demo, set your current working directory to the directory where the binary distribution was expanded. Next, run IndexFiles like this:

java -cp lucene-core-3.0.1.jar;lucene-demos-3.0.1.jar
     org.apache.lucene.demo.IndexFiles docs
...
 adding docs/queryparsersyntax.html
 adding docs/resources.html
 adding docs/systemproperties.html
 adding docs/whoweare.html
 9454 total milliseconds

This command indexes the entire docs directory tree into an index stored in the index subdirectory of the location where you executed the command.

 

Note

Literally every file in the docs directory tree is indexed, including binary files such as *.png and *.jpg. None of the files are parsed; instead, each file is indexed by streaming its bytes into StandardAnalyzer.

 

To search the index just created, execute SearchFiles in this manner:

java -cp lucene-core-3.0.1.jar;lucene-demos-3.0.1.jar
                org.apache.lucene.demo.SearchFiles

Query: IndexSearcher AND QueryParser
Searching for: +indexsearcher +queryparser
10 total matching documents
0. docs/api/index-all.html
1. docs/api/allclasses-frame.html
2. docs/api/allclasses-noframe.html
3. docs/api/org/apache/lucene/search/class-use/Query.html
4. docs/api/overview-summary.html
5. docs/api/overview-tree.html
6. docs/demo2.html
7. docs/demo4.html
8. docs/api/org/apache/lucene/search/package-summary.html
9. docs/api/org/apache/lucene/search/package-tree.html

SearchFiles prompts interactively with Query:. QueryParser is used with StandardAnalyzer to create a Query. A maximum of 10 hits are shown at a time; if there are more, you can page through them. Press Ctrl-C to exit the program.

Next, let’s look at the web demo.

A.3. Running the web application demo

The web demo is slightly involved to set up and run properly. You need a web container; our instructions are for Tomcat 6.0.18. The docs/demo.html documentation provides detailed instructions for setting up and running the web application, but you can also follow the steps provided here.

The index used by the web application differs slightly from that in the command-line demo. First, it restricts itself to indexing only .html, .htm, and .txt files. Each file it processes (including .txt files) is parsed using a custom rudimentary HTML parser. To build the index initially, execute IndexHTML:

java -cp lucene-core-3.0.1.jar;lucene-demos-3.0.1.jar
  org.apache.lucene.demo.IndexHTML -create -index webindex docs
...
adding docs/resources.html
adding docs/systemproperties.html
adding docs/whoweare.html
Optimizing index...
7220 total milliseconds

The -index webindex switch sets the location of the index directory. In a moment, you’ll need the full path to this directory to configure the web application. The final docs argument to IndexHTML is the directory tree to index. The -create switch creates an index from scratch. Remove this switch to update the index with files that have been added or changed since the last time the index was built.

Next, deploy luceneweb.war (from the root directory of the extracted distribution) into CATALINA_HOME/webapps. Start Tomcat, wait for the container to complete the startup routine, then edit CATALINA_HOME/webapps/lucene-web/configuration.jsp using a text editor (Tomcat should have expanded the .war file into a luceneweb directory automatically). Change the value of indexLocation appropriately, as in this example, specifying the absolute path to the index you built with IndexHTML:

String indexLocation =
      "/dev/LuceneInAction/install/lucene-3.0.1/webindex";

Now you’re ready to try the web application. Visit http://localhost:8080/luceneweb in your web browser, and you should see “Welcome to the Lucene Template application...” (you can also change the header and footer text in configuration.jsp). If all is well with your configuration, searching for Lucene-specific words such as "QueryParser AND Analyzer" should list valid results based on Lucene’s documentation.

You may try to click on one of the search results links and receive an error. IndexHTML indexes a url field, which in this case is a relative path of docs/.... To make the result links work properly, copy the docs directory from the Lucene distribution to CATALINA_HOME/webapps/luceneweb.

 

Note

Now that you’ve built two indexes, one for the command-line demo and the other for the web application demo, it’s a perfect time to try Luke. See section 8.1 for details on using Luke. Point it at the index, and surf around a bit to get a feel for Luke and the contents of the index.

 

Next you’ll see how to build Lucene from sources, which is useful if you’d like to start tinkering with your own changes to Lucene’s source code.

A.4. Building from source

Lucene’s source code is freely and easily available from Apache’s Subversion repository. The prerequisites to obtain and build Lucene from source are Subversion client, Java Developer Kit (JDK), and Apache Ant. Follow these steps to build Lucene:

1.  Check out the source code from Apache’s Subversion repository. Follow the instructions at the Lucene Java website (http://lucene.apache.org/java) to access the repository using anonymous read-only access. This boils down to executing the following commands (from cygwin on Windows, or a Unix shell):

svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/lucenelucene-trunk

2.  Build Lucene with Ant. At the command prompt, set your current working directory to the directory where you checked out the Lucene Subversion repository (C:apachelucene-trunk, for example). Type ant at the command line. Lucene’s JAR will be compiled to the build subdirectory. The JAR filename is lucene-core-<version>.jar, where <version> depends on the current state of the code you obtained. It will typically be the next minor release, with a –dev attached, for example 3.1-dev.

3.  Run the unit tests. If the Ant build succeeds, next run ant test (add JUnit’s JAR to ANT_HOME/lib if it isn’t already there) and ensure that all of Lucene’s unit tests pass.

Lucene uses JFlex grammars for StandardTokenizer, and JavaCC grammars for QueryParser and the demo HTMLParser. The already-compiled .java version of the .jj files exists in the Subversion source code, so neither JFlex nor JavaCC are needed for compilation. But if you wish to modify the parser grammars, you need JFlex and JavaCC; you must also run the ant jflex or ant javacc target. You can find more details in the BUILD.txt file in the root directory of Lucene’s Subversion repository.

A.5. Troubleshooting

We’d rather not try to guess what kinds of issues you may run into as you follow the steps to install Lucene, build Lucene, or run the demos. Checking the FAQ, searching the archives of the lucene-user email list, and using Lucene’s issue-tracking system are good first steps when you have questions or issues. You’ll find details at the Lucene website: http://lucene.apache.org/java.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.124.177