Part 2. Applied Lucene

Lucene itself is just a JAR, with the real fun and power coming from what you build around it. Part 2 explores ways to leverage Lucene. Projects commonly demand full-text searching of Microsoft Office, PDF, HTML, XML, and other document formats. “Extracting text with Tika” (chapter 7) illuminates ways to index these document types into Lucene. So many extensions have been developed to augment and extend Lucene that we dedicate two chapters, “Essential Lucene Extensions” (chapter 8) and “Further Lucene extensions” (chapter 9) to them. Although Java is the primary language used with Lucene, the index format is language neutral. “Using Lucene from other programming languages,” (chapter 10) explores Lucene usage from languages such as C++, C#, Python, Perl, and Ruby. “Lucene administration and performance tuning” (chapter 11 and the final chapter in part 2) dives into the nitty-gritty details for managing Lucene’s consumption of resources like memory, disk space, and file descriptors. You’ll also learn how to improve indexing and searching performance metrics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.11