Introduction

This book is for developers who are building or planning to build an enterprise search engine using Apache Solr. Chapters 1 and 3 can be read by anyone who intends to learn the basics of information retrieval, search engines, and Apache Solr specifically. Chapter 2 kick-starts development with Solr and will prove to be a great resource for Solr newbies and administrators. All other chapters explore the Solr features and approaches for developing a practical and effective search engine.

This book covers use cases and examples from various domains such as e-commerce, legal, medical, and music, which will help you understand the need for certain features and how to approach the solution. While discussing the features, the book generally provides a snapshot of the required configuration, the command (using curl) to execute the feature, and a code snippet as required. The book dives into implementation details and writing plug-ins for integrating custom features.

What this book doesn’t cover is performance improvement in Solr and optimizing it for high-speed indexing. This book covers Solr features through release 5.3.1, which is the latest at the time of this writing.

What This Book Covers

Chapter 1, Apache Solr: An Introduction, as the name states, starts with an introduction to Apache Solr and its ecosystem. It then discusses the features, reasons for Solr’s popularity, its building blocks, and other information that will give you a holistic view about Solr. It also introduces related technologies and compares it to other alternatives.

Chapter 2, Solr Setup and Administration, begins with Solr fundamentals and covers Solr setup, steps for indexing your first set of documents and searching them. It then describes the Solr administrative features and various management options.

Chapter 3, Information Retrieval, is dedicated to the concepts of information retrieval, content extraction, and text processing.

Chapter 4, Schema Design and Text Analysis, covers the schema design, text analysis, going schemaless, and managed schemas in Solr. It also describes common text-analysis techniques.

Chapter 5, Indexing Data, concentrates on the Solr indexing process by describing the indexing request flow, various indexing tools, supported document formats, and important update request processors. This is also the first chapter that provides the steps to write a Solr plug-in, a custom UpdateRequestProcessor in this case.

Chapter 6, Searching Data, describes the Solr searching process, various query types, important query parsers, supported request parameters, and steps for writing a custom SearchComponent.

Chapter 7, Searching Data: Part 2, continues the previous chapter and covers local parameters, result grouping, statistics, faceting, reranking queries, and joins. It also dives into the details of function queries for deducing a practical relevance ranking and steps for writing your own named function.

Chapter 8, Solr Scoring, explains the Solr scoring process, supported scoring models, the score computation, and steps for customizing similarity.

Chapter 9, Additional Features, explores Solr features including spell-checking, autosuggestion, document similarity, and sponsored search.

Chapter 10, Traditional Scaling and SolrCloud, covers the distributed architectures supported by Solr and steps for setting up SolrCloud, creating a collection, distributed indexing and searching, shard splitting and ZooKeeper.

Chapter 11, Semantic Search, introduces the concept of semantic search and covers the tools and techniques for integrating semantic capabilities in Solr.

What You Need for This Book

Apache Solr requires Java Runtine Environment (JRE) 1.7 or newer. The provided custom Java code is tested on Java Development Kit (JDK) 1.8 and requires Apache Maven.

The last chapter requires downloading resources required by Apache OpenNLP and WordNet.

Who This Book Is For

This book expects you to have basic understanding of the Java programming language, which is essential if you want to execute the custom components.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.160.242