0%

Human-in-the-Loop Machine Learning lays out methods for humans and machines to work together effectively. You’ll find best practices on selecting sample data for human feedback, quality control for human annotations, and designing annotation interfaces. You’ll learn to create training data for labeling, object detection, and semantic segmentation, sequence labeling, and more. The book starts with the basics and progresses to advanced techniques like transfer learning and self-supervision within annotation workflows.

Table of Contents

  1. inside front cover
  2. Human-in-the-Loop Machine Learning
  3. Copyright
  4. brief contents
  5. contents
  6. front matter
    1. foreword
    2. preface
    3. acknowledgments
    4. about this book
    5. Who should read this book
    6. How this book is organized: A road map
    7. About the code
    8. liveBook discussion forum
    9. Other online resources
    10. about the author
  7. Part 1 First steps
  8. 1 Introduction to human-in-the-loop machine learning
    1. 1.1 The basic principles of human-in-the-loop machine learning
    2. 1.2 Introducing annotation
    3. 1.2.1 Simple and more complicated annotation strategies
    4. 1.2.2 Plugging the gap in data science knowledge
    5. 1.2.3 Quality human annotation: Why is it hard?
    6. 1.3 Introducing active learning: Improving the speed and reducing the cost of training data
    7. 1.3.1 Three broad active learning sampling strategies: Uncertainty, diversity, and random
    8. 1.3.2 What is a random selection of evaluation data?
    9. 1.3.3 When to use active learning
    10. 1.4 Machine learning and human–computer interaction
    11. 1.4.1 User interfaces: How do you create training data?
    12. 1.4.2 Priming: What can influence human perception?
    13. 1.4.3 The pros and cons of creating labels by evaluating machine learning predictions
    14. 1.4.4 Basic principles for designing annotation interfaces
    15. 1.5 Machine-learning-assisted humans vs. human-assisted machine learning
    16. 1.6 Transfer learning to kick-start your models
    17. 1.6.1 Transfer learning in computer vision
    18. 1.6.2 Transfer learning in NLP
    19. 1.7 What to expect in this text
    20. Summary
  9. 2 Getting started with human-in-the-loop machine learning
    1. 2.1 Beyond hacktive learning: Your first active learning algorithm
    2. 2.2 The architecture of your first system
    3. 2.3 Interpreting model predictions and data to support active learning
    4. 2.3.1 Confidence ranking
    5. 2.3.2 Identifying outliers
    6. 2.3.3 What to expect as you iterate
    7. 2.4 Building an interface to get human labels
    8. 2.4.1 A simple interface for labeling text
    9. 2.4.2 Managing machine learning data
    10. 2.5 Deploying your first human-in-the-loop machine learning system
    11. 2.5.1 Always get your evaluation data first
    12. 2.5.2 Every data point gets a chance
    13. 2.5.3 Select the right strategies for your data
    14. 2.5.4 Retrain the model and iterate
    15. Summary
  10. Part 2 Active learning
  11. 3 Uncertainty sampling
    1. 3.1 Interpreting uncertainty in a machine learning model
    2. 3.1.1 Why look for uncertainty in your model?
    3. 3.1.2 Softmax and probability distributions
    4. 3.1.3 Interpreting the success of active learning
    5. 3.2 Algorithms for uncertainty sampling
    6. 3.2.1 Least confidence sampling
    7. 3.2.2 Margin of confidence sampling
    8. 3.2.3 Ratio sampling
    9. 3.2.4 Entropy (classification entropy)
    10. 3.2.5 A deep dive on entropy
    11. 3.3 Identifying when different types of models are confused
    12. 3.3.1 Uncertainty sampling with logistic regression and MaxEnt models
    13. 3.3.2 Uncertainty sampling with SVMs
    14. 3.3.3 Uncertainty sampling with Bayesian models
    15. 3.3.4 Uncertainty sampling with decision trees and random forests
    16. 3.4 Measuring uncertainty across multiple predictions
    17. 3.4.1 Uncertainty sampling with ensemble models
    18. 3.4.2 Query by Committee and dropouts
    19. 3.4.3 The difference between aleatoric and epistemic uncertainty
    20. 3.4.4 Multilabeled and continuous value classification
    21. 3.5 Selecting the right number of items for human review
    22. 3.5.1 Budget-constrained uncertainty sampling
    23. 3.5.2 Time-constrained uncertainty sampling
    24. 3.5.3 When do I stop if I’m not time- or budget-constrained?
    25. 3.6 Evaluating the success of active learning
    26. 3.6.1 Do I need new test data?
    27. 3.6.2 Do I need new validation data?
    28. 3.7 Uncertainty sampling cheat sheet
    29. 3.8 Further reading
    30. 3.8.1 Further reading for least confidence sampling
    31. 3.8.2 Further reading for margin of confidence sampling
    32. 3.8.3 Further reading for ratio of confidence sampling
    33. 3.8.4 Further reading for entropy-based sampling
    34. 3.8.5 Further reading for other machine learning models
    35. 3.8.6 Further reading for ensemble-based uncertainty sampling
    36. Summary
  12. 4 Diversity sampling
    1. 4.1 Knowing what you don’t know: Identifying gaps in your model’s knowledge
    2. 4.1.1 Example data for diversity sampling
    3. 4.1.2 Interpreting neural models for diversity sampling
    4. 4.1.3 Getting information from hidden layers in PyTorch
    5. 4.2 Model-based outlier sampling
    6. 4.2.1 Use validation data to rank activations
    7. 4.2.2 Which layers should I use to calculate model-based outliers?
    8. 4.2.3 The limitations of model-based outliers
    9. 4.3 Cluster-based sampling
    10. 4.3.1 Cluster members, centroids, and outliers
    11. 4.3.2 Any clustering algorithm in the universe
    12. 4.3.3 K-means clustering with cosine similarity
    13. 4.3.4 Reduced feature dimensions via embeddings or PCA
    14. 4.3.5 Other clustering algorithms
    15. 4.4 Representative sampling
    16. 4.4.1 Representative sampling is rarely used in isolation
    17. 4.4.2 Simple representative sampling
    18. 4.4.3 Adaptive representative sampling
    19. 4.5 Sampling for real-world diversity
    20. 4.5.1 Common problems in training data diversity
    21. 4.5.2 Stratified sampling to ensure diversity of demographics
    22. 4.5.3 Represented and representative: Which matters?
    23. 4.5.4 Per-demographic accuracy
    24. 4.5.5 Limitations of sampling for real-world diversity
    25. 4.6 Diversity sampling with different types of models
    26. 4.6.1 Model-based outliers with different types of models
    27. 4.6.2 Clustering with different types of models
    28. 4.6.3 Representative sampling with different types of models
    29. 4.6.4 Sampling for real-world diversity with different types of models
    30. 4.7 Diversity sampling cheat sheet
    31. 4.8 Further reading
    32. 4.8.1 Further reading for model-based outliers
    33. 4.8.2 Further reading for cluster-based sampling
    34. 4.8.3 Further reading for representative sampling
    35. 4.8.4 Further reading for sampling for real-world diversity
    36. Summary
  13. 5 Advanced active learning
    1. 5.1 Combining uncertainty sampling and diversity sampling
    2. 5.1.1 Least confidence sampling with cluster-based sampling
    3. 5.1.2 Uncertainty sampling with model-based outliers
    4. 5.1.3 Uncertainty sampling with model-based outliers and clustering
    5. 5.1.4 Representative sampling cluster-based sampling
    6. 5.1.5 Sampling from the highest-entropy cluster
    7. 5.1.6 Other combinations of active learning strategies
    8. 5.1.7 Combining active learning scores
    9. 5.1.8 Expected error reduction sampling
    10. 5.2 Active transfer learning for uncertainty sampling
    11. 5.2.1 Making your model predict its own errors
    12. 5.2.2 Implementing active transfer learning
    13. 5.2.3 Active transfer learning with more layers
    14. 5.2.4 The pros and cons of active transfer learning
    15. 5.3 Applying active transfer learning to representative sampling
    16. 5.3.1 Making your model predict what it doesn’t know
    17. 5.3.2 Active transfer learning for adaptive representative sampling
    18. 5.3.3 The pros and cons of active transfer learning for representative sampling
    19. 5.4 Active transfer learning for adaptive sampling
    20. 5.4.1 Making uncertainty sampling adaptive by predicting uncertainty
    21. 5.4.2 The pros and cons of ATLAS
    22. 5.5 Advanced active learning cheat sheets
    23. 5.6 Further reading for active transfer learning
    24. Summary
  14. 6 Applying active learning to different machine learning tasks
    1. 6.1 Applying active learning to object detection
    2. 6.1.1 Accuracy for object detection: Label confidence and localization
    3. 6.1.2 Uncertainty sampling for label confidence and localization in object detection
    4. 6.1.3 Diversity sampling for label confidence and localization in object detection
    5. 6.1.4 Active transfer learning for object detection
    6. 6.1.5 Setting a low object detection threshold to avoid perpetuating bias
    7. 6.1.6 Creating training data samples for representative sampling that are similar to your predictions
    8. 6.1.7 Sampling for image-level diversity in object detection
    9. 6.1.8 Considering tighter masks when using polygons
    10. 6.2 Applying active learning to semantic segmentation
    11. 6.2.1 Accuracy for semantic segmentation
    12. 6.2.2 Uncertainty sampling for semantic segmentation
    13. 6.2.3 Diversity sampling for semantic segmentation
    14. 6.2.4 Active transfer learning for semantic segmentation
    15. 6.2.5 Sampling for image-level diversity in semantic segmentation
    16. 6.3 Applying active learning to sequence labeling
    17. 6.3.1 Accuracy for sequence labeling
    18. 6.3.2 Uncertainty sampling for sequence labeling
    19. 6.3.3 Diversity sampling for sequence labeling
    20. 6.3.4 Active transfer learning for sequence labeling
    21. 6.3.5 Stratified sampling by confidence and tokens
    22. 6.3.6 Create training data samples for representative sampling that are similar to your predictions
    23. 6.3.7 Full-sequence labeling
    24. 6.3.8 Sampling for document-level diversity in sequence labeling
    25. 6.4 Applying active learning to language generation
    26. 6.4.1 Calculating accuracy for language generation systems
    27. 6.4.2 Uncertainty sampling for language generation
    28. 6.4.3 Diversity sampling for language generation
    29. 6.4.4 Active transfer learning for language generation
    30. 6.5 Applying active learning to other machine learning tasks
    31. 6.5.1 Active learning for information retrieval
    32. 6.5.2 Active learning for video
    33. 6.5.3 Active learning for speech
    34. 6.6 Choosing the right number of items for human review
    35. 6.6.1 Active labeling for fully or partially annotated data
    36. 6.6.2 Combining machine learning with annotation
    37. 6.7 Further reading
    38. Summary
  15. Part 3 Annotation
  16. 7 Working with the people annotating your data
    1. 7.1 Introduction to annotation
    2. 7.1.1 Three principles of good data annotation
    3. 7.1.2 Annotating data and reviewing model predictions
    4. 7.1.3 Annotations from machine learning-assisted humans
    5. 7.2 In-house experts
    6. 7.2.1 Salary for in-house workers
    7. 7.2.2 Security for in-house workers
    8. 7.2.3 Ownership for in-house workers
    9. 7.2.4 Tip: Always run in-house annotation sessions
    10. 7.3 Outsourced workers
    11. 7.3.1 Salary for outsourced workers
    12. 7.3.2 Security for outsourced workers
    13. 7.3.3 Ownership for outsourced workers
    14. 7.3.4 Tip: Talk to your outsourced workers
    15. 7.4 Crowdsourced workers
    16. 7.4.1 Salary for crowdsourced workers
    17. 7.4.2 Security for crowdsourced workers
    18. 7.4.3 Ownership for crowdsourced workers
    19. 7.4.4 Tip: Create a path to secure work and career advancement
    20. 7.5 Other workforces
    21. 7.5.1 End users
    22. 7.5.2 Volunteers
    23. 7.5.3 People playing games
    24. 7.5.4 Model predictions as annotations
    25. 7.6 Estimating the volume of annotation needed
    26. 7.6.1 The orders-of-magnitude equation for number of annotations needed
    27. 7.6.2 Anticipate one to four weeks of annotation training and task refinement
    28. 7.6.3 Use your pilot annotations and accuracy goal to estimate cost
    29. 7.6.4 Combining types of workforces
    30. Summary
  17. 8 Quality control for data annotation
    1. 8.1 Comparing annotations with ground truth answers
    2. 8.1.1 Annotator agreement with ground truth data
    3. 8.1.2 Which baseline should you use for expected accuracy?
    4. 8.2 Interannotator agreement
    5. 8.2.1 Introduction to interannotator agreement
    6. 8.2.2 Benefits from calculating interannotator agreement
    7. 8.2.3 Dataset-level agreement with Krippendorff’s alpha
    8. 8.2.4 Calculating Krippendorff’s alpha beyond labeling
    9. 8.2.5 Individual annotator agreement
    10. 8.2.6 Per-label and per-demographic agreement
    11. 8.2.7 Extending accuracy with agreement for real-world diversity
    12. 8.3 Aggregating multiple annotations to create training data
    13. 8.3.1 Aggregating annotations when everyone agrees
    14. 8.3.2 The mathematical case for diverse annotators and low agreement
    15. 8.3.3 Aggregating annotations when annotators disagree
    16. 8.3.4 Annotator-reported confidences
    17. 8.3.5 Deciding which labels to trust: Annotation uncertainty
    18. 8.4 Quality control by expert review
    19. 8.4.1 Recruiting and training qualified people
    20. 8.4.2 Training people to become experts
    21. 8.4.3 Machine-learning-assisted experts
    22. 8.5 Multistep workflows and review tasks
    23. 8.6 Further reading
    24. Summary
  18. 9 Advanced data annotation and augmentation
    1. 9.1 Annotation quality for subjective tasks
    2. 9.1.1 Requesting annotator expectations
    3. 9.1.2 Assessing viable labels for subjective tasks
    4. 9.1.3 Trusting an annotator to understand diverse responses
    5. 9.1.4 Bayesian Truth Serum for subjective judgments
    6. 9.1.5 Embedding simple tasks in more complicated ones
    7. 9.2 Machine learning for annotation quality control
    8. 9.2.1 Calculating annotation confidence as an optimization task
    9. 9.2.2 Converging on label confidence when annotators disagree
    10. 9.2.3 Predicting whether a single annotation is correct
    11. 9.2.4 Predicting whether a single annotation is in agreement
    12. 9.2.5 Predicting whether an annotator is a bot
    13. 9.3 Model predictions as annotations
    14. 9.3.1 Trusting annotations from confident model predictions
    15. 9.3.2 Treating model predictions as a single annotator
    16. 9.3.3 Cross-validating to find mislabeled data
    17. 9.4 Embeddings and contextual representations
    18. 9.4.1 Transfer learning from an existing model
    19. 9.4.2 Representations from adjacent easy-to-annotate tasks
    20. 9.4.3 Self-supervision: Using inherent labels in the data
    21. 9.5 Search-based and rule-based systems
    22. 9.5.1 Data filtering with rules
    23. 9.5.2 Training data search
    24. 9.5.3 Masked feature filtering
    25. 9.6 Light supervision on unsupervised models
    26. 9.6.1 Adapting an unsupervised model to a supervised model
    27. 9.6.2 Human-guided exploratory data analysis
    28. 9.7 Synthetic data, data creation, and data augmentation
    29. 9.7.1 Synthetic data
    30. 9.7.2 Data creation
    31. 9.7.3 Data augmentation
    32. 9.8 Incorporating annotation information into machine learning models
    33. 9.8.1 Filtering or weighting items by confidence in their labels
    34. 9.8.2 Including the annotator identity in inputs
    35. 9.8.3 Incorporating uncertainty into the loss function
    36. 9.9 Further reading for advanced annotation
    37. 9.9.1 Further reading for subjective data
    38. 9.9.2 Further reading for machine learning for annotation quality control
    39. 9.9.3 Further reading for embeddings/contextual representations
    40. 9.9.4 Further reading for rule-based systems
    41. 9.9.5 Further reading for incorporating uncertainty in annotations into the downstream models
    42. Summary
  19. 10 Annotation quality for different machine learning tasks
    1. 10.1 Annotation quality for continuous tasks
    2. 10.1.1 Ground truth for continuous tasks
    3. 10.1.2 Agreement for continuous tasks
    4. 10.1.3 Subjectivity in continuous tasks
    5. 10.1.4 Aggregating continuous judgments to create training data
    6. 10.1.5 Machine learning for aggregating continuous tasks to create training data
    7. 10.2 Annotation quality for object detection
    8. 10.2.1 Ground truth for object detection
    9. 10.2.2 Agreement for object detection
    10. 10.2.3 Dimensionality and accuracy in object detection
    11. 10.2.4 Subjectivity for object detection
    12. 10.2.5 Aggregating object annotations to create training data
    13. 10.2.6 Machine learning for object annotations
    14. 10.3 Annotation quality for semantic segmentation
    15. 10.3.1 Ground truth for semantic segmentation annotation
    16. 10.3.2 Agreement for semantic segmentation
    17. 10.3.3 Subjectivity for semantic segmentation annotations
    18. 10.3.4 Aggregating semantic segmentation to create training data
    19. 10.3.5 Machine learning for aggregating semantic segmentation tasks to create training data
    20. 10.4 Annotation quality for sequence labeling
    21. 10.4.1 Ground truth for sequence labeling
    22. 10.4.2 Ground truth for sequence labeling in truly continuous data
    23. 10.4.3 Agreement for sequence labeling
    24. 10.4.4 Machine learning and transfer learning for sequence labeling
    25. 10.4.5 Rule-based, search-based, and synthetic data for sequence labeling
    26. 10.5 Annotation quality for language generation
    27. 10.5.1 Ground truth for language generation
    28. 10.5.2 Agreement and aggregation for language generation
    29. 10.5.3 Machine learning and transfer learning for language generation
    30. 10.5.4 Synthetic data for language generation
    31. 10.6 Annotation quality for other machine learning tasks
    32. 10.6.1 Annotation for information retrieval
    33. 10.6.2 Annotation for multifield tasks
    34. 10.6.3 Annotation for video
    35. 10.6.4 Annotation for audio data
    36. 10.7 Further reading for annotation quality for different machine learning tasks
    37. 10.7.1 Further reading for computer vision
    38. 10.7.2 Further reading for annotation for natural language processing
    39. 10.7.3 Further reading for annotation for information retrieval
    40. Summary
  20. Part 4 Human–computer interaction for machine learning
  21. 11 Interfaces for data annotation
    1. 11.1 Basic principles of human–computer interaction
    2. 11.1.1 Introducing affordance, feedback, and agency
    3. 11.1.2 Designing interfaces for annotation
    4. 11.1.3 Minimizing eye movement and scrolling
    5. 11.1.4 Keyboard shortcuts and input devices
    6. 11.2 Breaking the rules effectively
    7. 11.2.1 Scrolling for batch annotation
    8. 11.2.2 Foot pedals
    9. 11.2.3 Audio inputs
    10. 11.3 Priming in annotation interfaces
    11. 11.3.1 Repetition priming
    12. 11.3.2 Where priming hurts
    13. 11.3.3 Where priming helps
    14. 11.4 Combining human and machine intelligence
    15. 11.4.1 Annotator feedback
    16. 11.4.2 Maximizing objectivity by asking what other people would annotate
    17. 11.4.3 Recasting continuous problems as ranking problems
    18. 11.5 Smart interfaces for maximizing human intelligence
    19. 11.5.1 Smart interfaces for semantic segmentation
    20. 11.5.2 Smart interfaces for object detection
    21. 11.5.3 Smart interfaces for language generation
    22. 11.5.4 Smart interfaces for sequence labeling
    23. 11.6 Machine learning to assist human processes
    24. 11.6.1 The perception of increased efficiency
    25. 11.6.2 Active learning for increased efficiency
    26. 11.6.3 Errors can be better than absence to maximize completeness
    27. 11.6.4 Keep annotation interfaces separate from daily work interfaces
    28. 11.7 Further reading
    29. Summary
  22. 12 Human-in-the-loop machine learning products
    1. 12.1 Defining products for human-in-the-loop machine learning applications
    2. 12.1.1 Start with the problem you are solving
    3. 12.1.2 Design systems to solve the problem
    4. 12.1.3 Connecting Python and HTML
    5. 12.2 Example 1: Exploratory data analysis for news headlines
    6. 12.2.1 Assumptions
    7. 12.2.2 Design and implementation
    8. 12.2.3 Potential extensions
    9. 12.3 Example 2: Collecting data about food safety events
    10. 12.3.1 Assumptions
    11. 12.3.2 Design and implementation
    12. 12.3.3 Potential extensions
    13. 12.4 Example 3: Identifying bicycles in images
    14. 12.4.1 Assumptions
    15. 12.4.2 Design and implementation
    16. 12.4.3 Potential extensions
    17. 12.5 Further reading for building human-in-the-loop machine learning products
    18. Summary
  23. appendix Machine learning refresher
    1. A.1 Interpreting predictions from a model
    2. A.1.1 Probability distributions
    3. A.2 Softmax deep dive
    4. A.2.1 Converting the model output to confidences with softmax
    5. A.2.2 The choice of base/temperature for softmax
    6. A.2.3 The result from dividing exponentials
    7. A.3 Measuring human-in-the-loop machine learning systems
    8. A.3.1 Precision, recall, and F-score
    9. A.3.2 Micro and macro precision, recall, and F-score
    10. A.3.3 Taking random chance into account: Chance-adjusted accuracy
    11. A.3.4 Taking confidence into account: Area under the ROC curve (AUC)
    12. A.3.5 Number of model errors spotted
    13. A.3.6 Human labor cost saved
    14. A.3.7 Other methods for calculating accuracy in this book
  24. index
  25. inside back cover
3.144.233.150