Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Authors

About the Cover Illustration

Chapter 1. The search relevance problem

1.1. Your goal: gaining the skills of a relevance engineer

1.2. Why is search relevance so hard?

1.2.1. What’s a “relevant” search result?

1.2.2. Search: there’s no silver bullet!

1.3. Gaining insight from relevance research

1.3.1. Information retrieval

1.3.2. Can we use information retrieval to solve relevance?

1.4. How do you solve relevance?

1.5. More than technology: curation, collaboration, and feedback

1.6. Summary

Chapter 2. Search—under the hood

2.1. Search 101

2.1.1. What’s a search document?

2.1.2. Searching the content

2.1.3. Exploring content through search

2.1.4. Getting content into the search engine

2.2. Search engine data structures

2.2.1. The inverted index

2.2.2. Other pieces of the inverted index

2.3. Indexing content: extraction, enrichment, analysis, and indexing

2.3.1. Extracting content into documents

2.3.2. Enriching documents to clean, augment, and merge data

2.3.3. Performing analysis

2.3.4. Indexing

2.4. Document search and retrieval

2.4.1. Boolean search: AND/OR/NOT

2.4.2. Boolean queries in Lucene-based search (MUST/MUST_NOT/SHOULD)

2.4.3. Positional and phrase matching

2.4.4. Enabling exploration: filtering, facets, and aggregations

2.4.5. Sorting, ranked results, and relevance

2.5. Summary

Chapter 3. Debugging your first relevance problem

3.1. Applications to Solr and Elasticsearch: examples in Elasticsearch

3.2. Our most prominent data set: TMDB

3.3. Examples programmed in Python

3.4. Your first search application

3.4.1. Your first searches of the TMDB Elasticsearch index

3.5. Debugging query matching

3.5.1. Examining the underlying query strategy

3.5.2. Taking apart query parsing

3.5.3. Debugging analysis to solve matching issues

3.5.4. Comparing your query to the inverted index

3.5.5. Fixing our matching by changing analyzers

3.6. Debugging ranking

3.6.1. Decomposing the relevance score with Lucene’s explain feature

3.6.2. The vector-space model, the relevance explain, and you

3.6.3. Practical caveats to the vector space model

3.6.4. Scoring matches to measure relevance

3.6.5. Computing weights with TF × IDF

3.6.6. Lies, damned lies, and similarity

3.6.7. Factoring in the search term’s importance

3.6.8. Fixing Space Jam vs. alien ranking

3.7. Solved? Our work is never over!

3.8. Summary

Chapter 4. Taming tokens

4.1. Tokens as document features

4.1.1. The matching process

4.1.2. Tokens, more than just words

4.2. Controlling precision and recall

4.2.1. Precision and recall by example

4.2.2. Analysis for precision or recall

4.2.3. Taking recall to extremes

4.3. Precision and recall—have your cake and eat it too

4.3.1. Scoring strength of a feature in a single field

4.3.2. Scoring beyond TF × IDF: multiple search terms and multiple fields

4.4. Analysis strategies

4.4.1. Dealing with delimiters

4.4.2. Capturing meaning with synonyms

4.4.3. Modeling specificity in search

4.4.4. Modeling specificity with synonyms

4.4.5. Modeling specificity with paths

4.4.6. Tokenize the world!

4.4.7. Tokenizing integers

4.4.8. Tokenizing geographic data

4.4.9. Tokenizing melodies

4.5. Summary

Chapter 5. Basic multifield search

5.1. Signals and signal modeling

5.1.1. What is a signal?

5.1.2. Starting with the source data model

5.1.3. Implementing a signal

5.1.4. Signal modeling: data modeling for relevance

5.2. TMDB—search, the final frontier!

5.2.1. Violating the prime directive

5.2.2. Flattening nested docs

5.3. Signal modeling in field-centric search

5.3.1. Starting out with best_fields

5.3.2. Controlling field preference in search results

5.3.3. Better best_fields with more-precise signals?

5.3.4. Letting losers share the glory: calibrating best_fields

5.3.5. Counting multiple signals using most_fields

5.3.6. Boosting in most_fields

5.3.7. When additional matches don’t matter

5.3.8. What’s the verdict on most_fields?

5.4. Summary

Chapter 6. Term-centric search

6.1. What is term-centric search?

6.2. Why do you need term-centric search?

6.2.1. Hunting for albino elephants

6.2.2. Finding an albino elephant in the Star Trek example

6.2.3. Avoiding signal discordance

6.2.4. Understanding the mechanics of signal discordance

6.3. Performing your first term-centric searches

6.3.1. Working with the term-centric ranking function

6.3.2. Running a term-centric query parser (into the ground)

6.3.3. Understanding field synchronicity

6.3.4. Field synchronicity and signal modeling

6.3.5. Query parsers and signal discordance

6.3.6. Tuning term-centric search

6.4. Solving signal discordance in term-centric search

6.4.1. Combining fields into custom all fields

6.4.2. Solving signal discordance with cross_fields

6.5. Combining field-centric and term-centric strategies: having your cake and eating it too

6.5.1. Grouping “like fields” together

6.5.2. Understanding the limits of like fields

6.5.3. Combining greedy naïve search and conservative amplifiers

6.5.4. Term-centric vs. field-centric, and precision vs. recall

6.5.5. Considering filtering, boosting, and reranking

6.6. Summary

Chapter 7. Shaping the relevance function

7.1. What do we mean by score shaping?

7.2. Boosting: shaping by promoting results

7.2.1. Boosting: the final frontier

7.2.2. When boosting—add or multiply? Boolean or function query?

7.2.3. You choose door A: additive boosting with Boolean queries

7.2.4. You choose door B: function queries using math for ranking

7.2.5. Hands-on with function queries: simple multiplicative boosting

7.2.6. Boosting basics: signals, signals everywhere

7.3. Filtering: shaping by excluding results

7.4. Score-shaping strategies for satisfying business needs

7.4.1. Search all the movies!

7.4.2. Modeling your boosting signals

7.4.3. Building the ranking function: adding high-value tiers

7.4.4. High-value tier scored with a function query

7.4.5. Ignoring TF × IDF

7.4.6. Capturing general-quality metrics

7.4.7. Achieving users’ recency goals

7.4.8. Combining the function queries

7.4.9. Putting it all together!

7.5. Summary

Chapter 8. Providing relevance feedback

8.1. Relevance feedback at the search box

8.1.1. Providing immediate results with search-as-you-type

8.1.2. Helping users find the best query with search completion

8.1.3. Correcting typos and misspellings with search suggestions

8.2. Relevance feedback while browsing

8.2.1. Building faceted browsing

8.2.2. Providing breadcrumb navigation

8.2.3. Selecting alternative results ordering

8.3. Relevance feedback in the search results listing

8.3.1. What information should be presented in listing items?

8.3.2. Relevance feedback through snippets and highlighting

8.3.3. Grouping similar documents

8.3.4. Helping the user when there are no results

8.4. Summary

Chapter 9. Designing a relevance-focused search application

9.1. Yowl! The awesome new start-up!

9.2. Gathering information and requirements

9.2.1. Understand users and their information needs

9.2.2. Understand business needs

9.2.3. Identify required and available information

9.3. Designing the search application

9.3.1. Visualize the user’s experience

9.3.2. Define fields and model signals

9.3.3. Combine and balance signals

9.4. Deploying, monitoring, and improving

9.4.1. Monitor

9.4.2. Identify problems and fix them!

9.5. Knowing when good is good enough

9.6. Summary

Chapter 10. The relevance-centered enterprise

10.1. Feedback: the bedrock of the relevance-centered enterprise

10.2. Why user-focused culture before data-driven culture?

10.3. Flying relevance-blind

10.4. Relevance feedback awakenings: domain experts and expert users

10.5. Relevance feedback maturing: content curation

10.5.1. The role of the content curator

10.5.2. The risk of miscommunication with the content curator

10.6. Relevance streamlined: engineer/curator pairing

10.7. Relevance accelerated: test-driven relevance

10.7.1. Understanding test-driven relevance

10.7.2. Using test-driven relevance with user behavioral data

10.8. Beyond test-driven relevance: learning to rank

10.9. Summary

Chapter 11. Semantic and personalized search

11.1. Personalizing search based on user profiles

11.1.1. Gathering user profile information

11.1.2. Tying profile information back to the search index

11.2. Personalizing search based on user behavior

11.2.1. Introducing collaborative filtering

11.2.2. Basic collaborative filtering using co-occurrence counting

11.2.3. Tying user behavior information back to the search index

11.3. Basic methods for building concept search

11.3.1. Building concept signals

11.3.2. Augmenting content with synonyms

11.4. Building concept search using machine learning

11.4.1. The importance of phrases in concept search

11.5. The personalized search—concept search connection

11.6. Recommendation as a generalization of search

11.6.1. Replacing search with recommendation

11.7. Best wishes on your search relevance journey

11.8. Summary

Appendix A. Indexing directly from TMDB

A.1. Setting the TMDB key and loading the IPython notebook

A.2. Setting up for the TMDB API

A.3. Crawling the TMDB API

A.4. Indexing TMDB movies to Elasticsearch

Appendix B. Solr reader’s companion

B.1. Chapter 4: taming Solr’s terms

B.1.1. Summary of Solr analysis and mapping features

B.1.2. Building custom analyzers in Solr

B.1.3. Using field mappings in Solr

B.2. Chapters 5 and 6: multifield search in Solr

B.2.1. Summary of query feature mappings

B.2.2. Understanding query differences between Solr and Elasticsearch

B.2.3. Querying Solr: the ergonomics

B.2.4. Term-centric and field-centric search with the edismax query parser

B.2.5. All fields and cross_fields search

B.3. Chapter 7: shaping Solr’s ranking function

B.3.1. Summary of boosting feature mappings

B.3.2. Solr’s Boolean boosting

B.3.3. Solr’s function queries

B.3.4. Multiplicative boosting in Solr

B.4. Chapter 8: relevance feedback

B.4.1. Summary of relevance feedback feature mappings

B.4.2. Solr autocomplete: match phrase prefix

B.4.3. Faceted browsing in Solr

B.4.4. Field collapsing

B.4.5. Suggestion and highlighting components

Index

List of Figures

List of Tables

List of Listings

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.10.69