List of Listings

Chapter 1. Meet Lucene

Listing 1.1. Indexer, which indexes .txt files

Listing 1.2. Searcher, which searches a Lucene index

Chapter 2. Building a search index

Listing 2.1. Adding documents to an index

Listing 2.2. Deleting documents from an index

Listing 2.3. Updating indexed Documents

Listing 2.4. Selectively boosting documents and fields

Listing 2.5. Using file-based locks to enforce a single writer at a time

Chapter 3. Adding search to your application

Listing 3.1. Simple searching with TermQuery

Listing 3.2. QueryParser, which makes it trivial to translate search text into a Query

Listing 3.3. Near-real-time search

Listing 3.4. The explain() method

Listing 3.5. PrefixQuery

Listing 3.6. Using BooleanQuery to combine required subqueries

Listing 3.7. Using BooleanQuery to combine optional subqueries.

Listing 3.8. PhraseQuery

Listing 3.9. WildcardQuery

Listing 3.10. Creating a TermRangeQuery using QueryParser

Chapter 4. Lucene’s analysis process

Listing 4.1. AnalyzerDemo: seeing analysis in action

Listing 4.2. AnalyzerUtils: delving into an analyzer

Listing 4.3. Seeing the term, offsets, type, and position increment of each token

Listing 4.4. Searching for words that sound like one another

Listing 4.5. TokenFilter that replaces tokens with their metaphone equivalents

Listing 4.6. Testing the synonym analyzer

Listing 4.7. SynonymAnalyzer implementation

Listing 4.8. SynonymFilter: buffering tokens and emitting one at a time

Listing 4.9. SynonymAnalyzerTest: showing that synonym queries work

Listing 4.10. Testing SynonymAnalyzer with QueryParser

Listing 4.11. Visualizing the position increment of each token

Listing 4.12. PositionalPorterStopAnalyzer: stemming and stop word removal

Listing 4.13. Using QueryParser to match part numbers

Listing 4.14. ChineseDemo: illustrates what analyzers do with Chinese text

Listing 4.15. NutchExample: demonstrating Nutch analysis and query parsing

Chapter 5. Advanced search techniques

Listing 5.1. Sorting search hits by field

Listing 5.2. Show results when sorting by different fields

Listing 5.3. Setting up an index to test MultiPhraseQuery

Listing 5.4. Using MultiPhraseQuery to match more than one term at each position

Listing 5.5. Mimicking MultiPhraseQuery using BooleanQuery

Listing 5.6. Using QueryParser to produce a MultiPhraseQuery

Listing 5.7. MultiFieldQueryParser, which searches on multiple fields at once

Listing 5.8. SpanQuery demonstration infrastructure

Listing 5.9. dumpSpans method, used to see all spans matched by any SpanQuery

Listing 5.10. Finding matches near one another using SpanNearQuery

Listing 5.11. Taking the union of two span queries using SpanOrQuery

Listing 5.12. Using TermRangeFilter to filter by title

Listing 5.13. Setting up an index to use for testing the security filter

Listing 5.14. Securing the search space with a filter

Listing 5.15. Using recency to boost search results

Listing 5.16. Testing recency boosting

Listing 5.17. Securing the search space with a filter

Listing 5.18. Finding similar books to a specific example book

Listing 5.19. Build category vectors by aggregating for each category

Listing 5.20. Aggregate term frequencies for each category

Listing 5.21. Finding the closest vector to match the best category

Listing 5.22. Computing term vector angles for a new book against a given category

Listing 5.23. Using TimeLimitingCollector to stop a slow search

Chapter 6. Extending search

Listing 6.1. Indexing geographic data

Listing 6.2. DistanceComparatorSource

Listing 6.3. Accessing custom sorting values for search results

Listing 6.4. Custom Collector: collects all book links

Listing 6.5. Testing the BookLinkCollector

Listing 6.6. A collector that gathers all matching documents and scores into a List

Listing 6.7. Disallowing wildcard and fuzzy queries

Listing 6.8. Using a custom QueryParser

Listing 6.9. Extending QueryParser to properly handle numeric fields

Listing 6.10. Extending QueryParser to handle date fields

Listing 6.11. Testing date range parsing

Listing 6.12. Using the client locale in a web application

Listing 6.13. Translating PhraseQuery to SpanNearQuery

Listing 6.14. Retrieving filter information from external source with SpecialsFilter

Listing 6.15. Using a FilteredQuery

Listing 6.16. Custom filter to add payloads to warning terms inside bulletin documents

Listing 6.17. Using payloads to boost certain term occurrences

Chapter 7. Extracting text with Tika

Listing 7.1. Class to extract text from arbitrary documents and index it with Lucene

Listing 7.2. XML snippet representing an address book entry

Listing 7.3. Using the SAX API to parse an address book entry

Listing 7.4. Using Apache Commons Digester to parse XML

Chapter 8. Essential Lucene extensions

Listing 8.1. Creating combinations of adjacent letters with ngram filters

Listing 8.2. Highlighting terms using CSSs

Listing 8.3. Highlighting matches in search results

Listing 8.4. Highlighting terms using FastVectorHighlighter

Listing 8.5. Creating the spellchecker index

Listing 8.6. Finding the list of candidates using the spellchecker index

Listing 8.7. Using MoreLikeThis to find similar documents

Chapter 9. Further Lucene extensions

Listing 9.1. Base test case to see ChainedFilter in action

Listing 9.2. Storing an index in Berkeley DB, using JEDirectory

Listing 9.3. Looking up synonyms from a WordNet-based index

Listing 9.4. WordNetSynonymEngine generates synonyms from WordNet’s database

Listing 9.5. Search request handler using XML query parser

Listing 9.6. Using XSL to transform the user’s input into the corresponding XML query

Listing 9.7. Extending the XML query parser with a custom FilterBuilder

Listing 9.8. Indexing a document for spatial search

Listing 9.9. Sorting and filtering by spatial criteria

Listing 9.10. Finding restaurants near home with Spatial Lucene

Listing 9.11. SearchServer: a remote search server using RMI

Listing 9.12. SearchClient accesses RMI-exposed objects from SearchServer

Listing 9.13. Customizing the flexible query parser

Chapter 10. Using Lucene from other programming languages

Listing 10.1. Using CLucene’s IndexWriter and IndexSearcher API

Listing 10.2. C# code for indexing *.txt files with Lucene.Net

Listing 10.3. Searching an index with Lucene.Net

Listing 10.4. Creating an index with KinoSearch

Chapter 11. Lucene administration and performance tuning

Listing 11.1. Testing indexing throughput using Wikipedia documents

Listing 11.2. Indexing with threads, compound, extra RAM, and larger mergeFactor

Listing 11.3. Drop-in IndexWriter class to use multiple threads for indexing

Listing 11.4. Adding a new custom task to contrib/benchmark

Listing 11.5. Safely reopening IndexSearcher in a multithreaded world

Listing 11.6. Drop-in replacement for FSDirectory to track open files

Chapter 13. Case study 2: SIREn

Listing 13.1. How SirenPayloadFilter processes the token stream

Listing 13.2. Interface of SirenIdIterator

Listing 13.3. Creation of an entity description query

Listing 13.4. Integration of SIREn through Solr schema.xml

Chapter 14. Case study 3: LinkedIn

Listing 14.1. Indexing data events with Zoie

Listing 14.2. FieldCacheIndexReaderWarmerIndexWritercommitIndexWritergetReader for near-real-time searchIndexWritersetIndexReaderWarmernear-real-time searchSegmentReaderZoiecompared to Lucene’s built-in NRT searchZoieIndexReaderAll indexing in Zoie is achieved through indexing requests

Listing 14.3. Distributed search with Zoie

Appendix C. Lucene/contrib benchmark

Listing C.1. Computing precision and recall statistics for your IndexSearcher

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.209.180