Chapter 1. Meet Lucene
Chapter 2. Building a search index
Listing 2.1. Adding documents to an index
Listing 2.2. Deleting documents from an index
Listing 2.3. Updating indexed Documents
Listing 2.4. Selectively boosting documents and fields
Listing 2.5. Using file-based locks to enforce a single writer at a time
Chapter 3. Adding search to your application
Listing 3.1. Simple searching with TermQuery
Listing 3.2. QueryParser, which makes it trivial to translate search text into a Query
Listing 3.3. Near-real-time search
Listing 3.4. The explain() method
Listing 3.6. Using BooleanQuery to combine required subqueries
Listing 3.7. Using BooleanQuery to combine optional subqueries.
Chapter 4. Lucene’s analysis process
Listing 4.1. AnalyzerDemo: seeing analysis in action
Listing 4.2. AnalyzerUtils: delving into an analyzer
Listing 4.3. Seeing the term, offsets, type, and position increment of each token
Listing 4.4. Searching for words that sound like one another
Listing 4.5. TokenFilter that replaces tokens with their metaphone equivalents
Listing 4.6. Testing the synonym analyzer
Listing 4.7. SynonymAnalyzer implementation
Listing 4.8. SynonymFilter: buffering tokens and emitting one at a time
Listing 4.9. SynonymAnalyzerTest: showing that synonym queries work
Listing 4.10. Testing SynonymAnalyzer with QueryParser
Listing 4.11. Visualizing the position increment of each token
Listing 4.12. PositionalPorterStopAnalyzer: stemming and stop word removal
Listing 4.13. Using QueryParser to match part numbers
Listing 4.14. ChineseDemo: illustrates what analyzers do with Chinese text
Listing 4.15. NutchExample: demonstrating Nutch analysis and query parsing
Chapter 5. Advanced search techniques
Listing 5.1. Sorting search hits by field
Listing 5.2. Show results when sorting by different fields
Listing 5.3. Setting up an index to test MultiPhraseQuery
Listing 5.4. Using MultiPhraseQuery to match more than one term at each position
Listing 5.5. Mimicking MultiPhraseQuery using BooleanQuery
Listing 5.6. Using QueryParser to produce a MultiPhraseQuery
Listing 5.7. MultiFieldQueryParser, which searches on multiple fields at once
Listing 5.8. SpanQuery demonstration infrastructure
Listing 5.9. dumpSpans method, used to see all spans matched by any SpanQuery
Listing 5.10. Finding matches near one another using SpanNearQuery
Listing 5.11. Taking the union of two span queries using SpanOrQuery
Listing 5.12. Using TermRangeFilter to filter by title
Listing 5.13. Setting up an index to use for testing the security filter
Listing 5.14. Securing the search space with a filter
Listing 5.15. Using recency to boost search results
Listing 5.16. Testing recency boosting
Listing 5.17. Securing the search space with a filter
Listing 5.18. Finding similar books to a specific example book
Listing 5.19. Build category vectors by aggregating for each category
Listing 5.20. Aggregate term frequencies for each category
Listing 5.21. Finding the closest vector to match the best category
Listing 5.22. Computing term vector angles for a new book against a given category
Listing 5.23. Using TimeLimitingCollector to stop a slow search
Chapter 6. Extending search
Listing 6.1. Indexing geographic data
Listing 6.2. DistanceComparatorSource
Listing 6.3. Accessing custom sorting values for search results
Listing 6.4. Custom Collector: collects all book links
Listing 6.5. Testing the BookLinkCollector
Listing 6.6. A collector that gathers all matching documents and scores into a List
Listing 6.7. Disallowing wildcard and fuzzy queries
Listing 6.8. Using a custom QueryParser
Listing 6.9. Extending QueryParser to properly handle numeric fields
Listing 6.10. Extending QueryParser to handle date fields
Listing 6.11. Testing date range parsing
Listing 6.12. Using the client locale in a web application
Listing 6.13. Translating PhraseQuery to SpanNearQuery
Listing 6.14. Retrieving filter information from external source with SpecialsFilter
Listing 6.15. Using a FilteredQuery
Listing 6.16. Custom filter to add payloads to warning terms inside bulletin documents
Listing 6.17. Using payloads to boost certain term occurrences
Chapter 7. Extracting text with Tika
Listing 7.1. Class to extract text from arbitrary documents and index it with Lucene
Listing 7.2. XML snippet representing an address book entry
Listing 7.3. Using the SAX API to parse an address book entry
Chapter 8. Essential Lucene extensions
Listing 8.1. Creating combinations of adjacent letters with ngram filters
Listing 8.2. Highlighting terms using CSSs
Listing 8.3. Highlighting matches in search results
Listing 8.4. Highlighting terms using FastVectorHighlighter
Listing 8.5. Creating the spellchecker index
Listing 8.6. Finding the list of candidates using the spellchecker index
Chapter 9. Further Lucene extensions
Listing 9.1. Base test case to see ChainedFilter in action
Listing 9.2. Storing an index in Berkeley DB, using JEDirectory
Listing 9.3. Looking up synonyms from a WordNet-based index
Listing 9.4. WordNetSynonymEngine generates synonyms from WordNet’s database
Listing 9.5. Search request handler using XML query parser
Listing 9.6. Using XSL to transform the user’s input into the corresponding XML query
Listing 9.7. Extending the XML query parser with a custom FilterBuilder
Listing 9.8. Indexing a document for spatial search
Listing 9.9. Sorting and filtering by spatial criteria
Listing 9.10. Finding restaurants near home with Spatial Lucene
Listing 9.11. SearchServer: a remote search server using RMI
Listing 9.12. SearchClient accesses RMI-exposed objects from SearchServer
Chapter 10. Using Lucene from other programming languages
Listing 10.1. Using CLucene’s IndexWriter and IndexSearcher API
Listing 10.2. C# code for indexing *.txt files with Lucene.Net
Chapter 11. Lucene administration and performance tuning
Listing 11.1. Testing indexing throughput using Wikipedia documents
Listing 11.2. Indexing with threads, compound, extra RAM, and larger mergeFactor
Listing 11.3. Drop-in IndexWriter class to use multiple threads for indexing
Listing 11.4. Adding a new custom task to contrib/benchmark
Listing 11.5. Safely reopening IndexSearcher in a multithreaded world
Listing 11.6. Drop-in replacement for FSDirectory to track open files
Chapter 13. Case study 2: SIREn
Listing 13.1. How SirenPayloadFilter processes the token stream
Listing 13.2. Interface of SirenIdIterator
Chapter 14. Case study 3: LinkedIn
Appendix C. Lucene/contrib benchmark
Listing C.1. Computing precision and recall statistics for your IndexSearcher
18.223.209.180