Preface

Getting quality labeled data for supervised learning is an important step towards training performant machine learning models. In many real-world projects, getting labeled data often takes up a significant amount of time. Weak Supervision is emerging as an important catalyst towards enabling data science teams to fuse insights from heuristics , and crowd-sourcing to produce weakly labeled datasets that can be used as inputs for machine learning and deep learning tasks.

Who Should Read This Book

The primary audience of the book will be professional and citizen data scientists who are already working on machine learning projects, and face the typical challenges of getting good, quality labeled data for these projects. They will have working knowledge of the programming language Python, and are familiar with machine learning libraries and tools.

Navigating This Book

This book is organized roughly as follows:

  • Chapter 1 provides a basic introduction to the field of Weak Supervision, and how data scientists and machine learning engineers can use it as part of the data science process.

  • Chapter 2 discusses how to get started with using Snorkel for weak supervision and introduces concepts in using Snorkel for data programming.

  • Chapter 3 describes how to use Snorkel for labeling, and provides code examples on how one can use Snorkel to label a text and image dataset.

  • Chapters 4 and 5 are included as part of the book to enable practitioners to have an end-to-end understanding of how to use a weakly labeled dataset for text and image classification

  • Chapter 6 discusses the practical considerations on using Snorkel with large datasets, and how to use Spark clusters to scale labeling.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

All the code in the book is available in the following GitHub repository https://bit.ly/WeakSupervisionBook. The code in the chapters is correct but is a subset of the overall codebase. The code in the chapters is meant to outline the concepts. To run the code for yourself, we encourage you to clone the book GitHub repository.

If you have a technical question or a problem using the code examples, please send email to .

This book is here to help you get your job done. In general, if an example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Practical Weak Supervision by Wee Hyong Tok, Amit Bahree, and Senja Filipi (O’Reilly). Copyright 2022 Wee Hyong Tok, Amit Bahree, and Senja Filipi, 978-1-492-07706-0.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Online Learning

Note

For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/practicalWeakSupervision.

Email to comment or ask technical questions about this book.

For news and information about our books and courses, visit http://oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

The authors will like to thank the following people, who helped us significantly in improving the content and code samples in the book.

  • Alex Ratner, Jared Dunnmon, Paroma Varma, Jason Fries, Stephen Bach, Braden Hancock, Fred Sala, and Devang Sachdev from Snorkel, for their valuable reviews, suggestions, and pointers. Their input helped us tremendously to improve the book.

  • Technical reviewers Siyu Yang, Hong Lu, and Juan Manuel Contreras for their effort, and insightful suggestions that helped us improve the content and code samples.

  • Jeff Bleiel, Kristen Brown, Rebecca Novack, and the rest of the O’Reilly team for being part of this book-writing journey. From the initial book brainstorms to the countless hours spent on reviewing the books, edits, and discussions. We could not have done it without the amazing support from the O’Reilly team.

Writing a book is a journey, and would not have been possible without strong family support; the authors spent many weekends and holidays working on the book. We would like to thank our family for supporting us on this journey.

  • Thankful for my wife, Meenakshi for her patience and keeping the coffee on; my daughter Maya to believe in me and correcting my grammar with minimal eye-rolling; our dog Champ for forgiving me, no matter how buggy the code. And finally, to our readers, who look up and took a leap of faith – may you have as wonderful a journey as I have had to create this – the long nights, the code, the data, the primitive debugging experience! Don’t let the logic kill off the magic! — Amit

  • Grateful for all the inspiration I have had over the years from many friends and coworkers, for Llambi’s unconditioned love and support, and for the joy and wisdom Ansen and Ennio added to our lives. — Senja

  • Dedicated to the wonderful love in my life — Juliet, Nathaniel, and Jayden. Love for the family is infinite. — Wee Hyong

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.61.16