Preface

If you need to add search capabilities to your server or application, you probably need Apache Solr. This is an enterprise search server, which is designed to develop good search experiences for the users. A search experience should include common full-text keyword-based search, spellchecking, autosuggestion, and recommendations and highlighting. But Solr does even more. It provides faceted search, and it can help us shape a user experience that is centered on faceted navigation. The evolution of the platform is open to integration, ranging from Named Entity Recognition to document clustering based on the topic similarities between different documents in a collection.

However, this book is not a comprehensive guide to all its technical features, instead, it is designed to introduce you to very simple, practical, easy-to-follow examples of the essential features. You can follow the examples step-by-step and discuss them with your team if you want. The chapters follow a narrative path, from the basics to the introduction of more complex topics, in order to give you a wide view of the context and suggest to you where to move next.

The examples will then use real data about paintings collected from DBpedia, data from the Web Gallery of Arts site, and the recently released free dataset from the Tate gallery. These examples are a good playground for experimentation because they contain lots of information, intuitive metadata, and even errors and noises that can be used for realistic testing. I hope you will have fun working with those, but you will also see how to index your own rich document (PDF, Word, or others). So, you will also be able to use your own data for the examples, if you want.

What this book covers

Chapter 1, Getting Ready with the Essentials, introduces Solr. We'll cite some well-known sites that are already using features and patterns we'd like to be able to manage with Solr. You'll also see how to install Java, Solr, and cURL and verify that everything is working fine with the first simple query.

Chapter 2, Indexing with Local PDF Files, explains briefly how a Lucene index is made. The core concepts such as inverted index, document, field, and tokenization will be introduced. You'll see how to write a basic configuration and test it over real data, indexing the PDF files directly. At the end, there is a small list of useful commands that can be used during the development and the maintenance of a Solr index.

Chapter 3, Indexing Example Data from DBpedia – Paintings, explains how to design an entity, and introduces the core types and concepts useful for writing a schema. You will write a basic text analysis, see how to post a new document using JSON, and acquire practical knowledge on how the update process works. Finally, you'll have the chance to create an index on real data collected from DBpedia.

Chapter 4, Searching the Example Data, covers the basic and most important Solr query parameters. You'll also see how to use the HTTP query parameter by simulating remote queries with cURL. You'll see some basic type of queries, analyze the structure of the results, and see how to handle results in some commonly used ways.

Chapter 5, Extending Search, introduces different and more flexible query parsers, which can be used with the default Lucene one. You will see how to debug the different parsers. Also, you'll start using more advanced query components, for example, highlighting, spellchecking, and spatial search.

Chapter 6, Using Faceted Search – from Searching to Finding, introduces faceted search with different practical examples. You'll see how facets can be used to support the user experience for searches, as well as for exposing the suggestions useful for raw data analysis. Very common concepts such as matching and similarity will be introduced and will be used for practical examples on recommendation. You'll also work with filtering and grouping terms, and see how a query is actually parsed.

Chapter 7, Working with Multiple Entities, Multicores, and Distributed Search, explains how to work with a distributed search. We will focus not only on how to use multiple cores on a local machine, but also on the pros and cons of using multiple entities on a single denormalized index. Eventually, we will be performing data analysis on that. You will also analyze different strategies from a single index to a SolrCloud distributed search.

Chapter 8, Indexing External Data Sources, covers different practical examples of using the DataImportHandler components for indexing different data sources. You'll work with data from a relational database, and from the data collected before, as well as from remote sources on the Web by combining multiple sources in a single example.

Chapter 9, Introducing Customizations, explains how to customize text analysis for a specific language, and how to start writing new components using a language supported on the JVM. In particular, we'll see how how it is simple to write a very basic Named Entity Recognizer for adding annotations into the text, and how to adopt an HTML5-compliant template directly as an alternate response writer. The examples will be presented using Java and Scala, and they will be tested using JUnit and Maven.

Appendix, Solr Clients and Integrations, introduces a short list of technologies that are currently using Solr, from CMS to external applications. You'll also see how Solr can be embedded inside a Java (or JVM) application, and how it's also possible to write a custom client combining SolrJ and one of the languages supported on the JVM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.62