0%

Book Description

Summary

Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are built.

About this Book

There is so much text in our lives, we are practically drowning in it. Fortunately, there are innovative tools and techniques for managing unstructured information that can throw the smart developer a much-needed lifeline. You’ll find them in this book.

Taming Text is a practical, example-driven guide to working with text in real applications. This book introduces you to useful techniques like full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. You’ll explore real use cases as you systematically absorb the foundations upon which they are built.

Written in a clear and concise style, this book avoids jargon, explaining the subject in terms you can understand without a background in statistics or natural language processing. Examples are in Java, but the concepts can be applied in any language.

What's Inside

  • When to use text-taming techniques

  • Important open-source libraries like Solr and Mahout

  • How to build text-processing applications

About the Authors

Grant Ingersoll is an engineer, speaker, and trainer, a Lucene committer, and a cofounder of the Mahout machine-learning project. Thomas Morton is the primary developer of OpenNLP and Maximum Entropy. Drew Farris is a technology consultant, software developer, and contributor to Mahout, Lucene, and Solr.

Table of Contents

  1. Copyright
  2. Brief Table of Contents
  3. Table of Contents
  4. Foreword
  5. Preface
  6. Acknowledgments
  7. About this Book
  8. About the Cover Illustration
  9. Chapter 1. Getting started taming text
  10. Chapter 2. Foundations of taming text
  11. Chapter 3. Searching
  12. Chapter 4. Fuzzy string matching
  13. Chapter 5. Identifying people, places, and things
  14. Chapter 6. Clustering text
  15. Chapter 7. Classification, categorization, and tagging
  16. Chapter 8. Building an example question answering system
  17. Chapter 9. Untamed text: exploring the next frontier
  18. Index
  19. List of Figures
  20. List of Tables
  21. List of Listings
18.221.241.116