Human-in-the-Loop Machine Learning

Release Date: 2021/07/01

ISBN: 9781617296741

Topic:

26
Chapters

0-1
Hours read

0k
Total Words

Start Reading Now
Add to Wishlist
View table of contents

Human-in-the-Loop Machine Learning lays out methods for humans and machines to work together effectively. You’ll find best practices on selecting sample data for human feedback, quality control for human annotations, and designing annotation interfaces. You’ll learn to create training data for labeling, object detection, and semantic segmentation, sequence labeling, and more. The book starts with the basics and progresses to advanced techniques like transfer learning and self-supervision within annotation workflows.

inside front cover
Human-in-the-Loop Machine Learning
Copyright
brief contents
contents
front matter
1. foreword
2. preface
3. acknowledgments
4. about this book
5. Who should read this book
6. How this book is organized: A road map
7. About the code
8. liveBook discussion forum
9. Other online resources
10. about the author
Part 1 First steps
1 Introduction to human-in-the-loop machine learning
1. 1.1 The basic principles of human-in-the-loop machine learning
2. 1.2 Introducing annotation
3. 1.2.1 Simple and more complicated annotation strategies
4. 1.2.2 Plugging the gap in data science knowledge
5. 1.2.3 Quality human annotation: Why is it hard?
6. 1.3 Introducing active learning: Improving the speed and reducing the cost of training data
7. 1.3.1 Three broad active learning sampling strategies: Uncertainty, diversity, and random
8. 1.3.2 What is a random selection of evaluation data?
9. 1.3.3 When to use active learning
10. 1.4 Machine learning and human–computer interaction
11. 1.4.1 User interfaces: How do you create training data?
12. 1.4.2 Priming: What can influence human perception?
13. 1.4.3 The pros and cons of creating labels by evaluating machine learning predictions
14. 1.4.4 Basic principles for designing annotation interfaces
15. 1.5 Machine-learning-assisted humans vs. human-assisted machine learning
16. 1.6 Transfer learning to kick-start your models
17. 1.6.1 Transfer learning in computer vision
18. 1.6.2 Transfer learning in NLP
19. 1.7 What to expect in this text
20. Summary
2 Getting started with human-in-the-loop machine learning
1. 2.1 Beyond hacktive learning: Your first active learning algorithm
2. 2.2 The architecture of your first system
3. 2.3 Interpreting model predictions and data to support active learning
4. 2.3.1 Confidence ranking
5. 2.3.2 Identifying outliers
6. 2.3.3 What to expect as you iterate
7. 2.4 Building an interface to get human labels
8. 2.4.1 A simple interface for labeling text
9. 2.4.2 Managing machine learning data
10. 2.5 Deploying your first human-in-the-loop machine learning system
11. 2.5.1 Always get your evaluation data first
12. 2.5.2 Every data point gets a chance
13. 2.5.3 Select the right strategies for your data
14. 2.5.4 Retrain the model and iterate
15. Summary
Part 2 Active learning
3 Uncertainty sampling
1. 3.1 Interpreting uncertainty in a machine learning model
2. 3.1.1 Why look for uncertainty in your model?
3. 3.1.2 Softmax and probability distributions
4. 3.1.3 Interpreting the success of active learning
5. 3.2 Algorithms for uncertainty sampling
6. 3.2.1 Least confidence sampling
7. 3.2.2 Margin of confidence sampling
8. 3.2.3 Ratio sampling
9. 3.2.4 Entropy (classification entropy)
10. 3.2.5 A deep dive on entropy
11. 3.3 Identifying when different types of models are confused
12. 3.3.1 Uncertainty sampling with logistic regression and MaxEnt models
13. 3.3.2 Uncertainty sampling with SVMs
14. 3.3.3 Uncertainty sampling with Bayesian models
15. 3.3.4 Uncertainty sampling with decision trees and random forests
16. 3.4 Measuring uncertainty across multiple predictions
17. 3.4.1 Uncertainty sampling with ensemble models
18. 3.4.2 Query by Committee and dropouts
19. 3.4.3 The difference between aleatoric and epistemic uncertainty
20. 3.4.4 Multilabeled and continuous value classification
21. 3.5 Selecting the right number of items for human review
22. 3.5.1 Budget-constrained uncertainty sampling
23. 3.5.2 Time-constrained uncertainty sampling
24. 3.5.3 When do I stop if I’m not time- or budget-constrained?
25. 3.6 Evaluating the success of active learning
26. 3.6.1 Do I need new test data?
27. 3.6.2 Do I need new validation data?
28. 3.7 Uncertainty sampling cheat sheet
29. 3.8 Further reading
30. 3.8.1 Further reading for least confidence sampling
31. 3.8.2 Further reading for margin of confidence sampling
32. 3.8.3 Further reading for ratio of confidence sampling
33. 3.8.4 Further reading for entropy-based sampling
34. 3.8.5 Further reading for other machine learning models
35. 3.8.6 Further reading for ensemble-based uncertainty sampling
36. Summary
4 Diversity sampling
1. 4.1 Knowing what you don’t know: Identifying gaps in your model’s knowledge
2. 4.1.1 Example data for diversity sampling
3. 4.1.2 Interpreting neural models for diversity sampling
4. 4.1.3 Getting information from hidden layers in PyTorch
5. 4.2 Model-based outlier sampling
6. 4.2.1 Use validation data to rank activations
7. 4.2.2 Which layers should I use to calculate model-based outliers?
8. 4.2.3 The limitations of model-based outliers
9. 4.3 Cluster-based sampling
10. 4.3.1 Cluster members, centroids, and outliers
11. 4.3.2 Any clustering algorithm in the universe
12. 4.3.3 K-means clustering with cosine similarity
13. 4.3.4 Reduced feature dimensions via embeddings or PCA
14. 4.3.5 Other clustering algorithms
15. 4.4 Representative sampling
16. 4.4.1 Representative sampling is rarely used in isolation
17. 4.4.2 Simple representative sampling
18. 4.4.3 Adaptive representative sampling
19. 4.5 Sampling for real-world diversity
20. 4.5.1 Common problems in training data diversity
21. 4.5.2 Stratified sampling to ensure diversity of demographics
22. 4.5.3 Represented and representative: Which matters?
23. 4.5.4 Per-demographic accuracy
24. 4.5.5 Limitations of sampling for real-world diversity
25. 4.6 Diversity sampling with different types of models
26. 4.6.1 Model-based outliers with different types of models
27. 4.6.2 Clustering with different types of models
28. 4.6.3 Representative sampling with different types of models
29. 4.6.4 Sampling for real-world diversity with different types of models
30. 4.7 Diversity sampling cheat sheet
31. 4.8 Further reading
32. 4.8.1 Further reading for model-based outliers
33. 4.8.2 Further reading for cluster-based sampling
34. 4.8.3 Further reading for representative sampling
35. 4.8.4 Further reading for sampling for real-world diversity
36. Summary
5 Advanced active learning
1. 5.1 Combining uncertainty sampling and diversity sampling
2. 5.1.1 Least confidence sampling with cluster-based sampling
3. 5.1.2 Uncertainty sampling with model-based outliers
4. 5.1.3 Uncertainty sampling with model-based outliers and clustering
5. 5.1.4 Representative sampling cluster-based sampling
6. 5.1.5 Sampling from the highest-entropy cluster
7. 5.1.6 Other combinations of active learning strategies
8. 5.1.7 Combining active learning scores
9. 5.1.8 Expected error reduction sampling
10. 5.2 Active transfer learning for uncertainty sampling
11. 5.2.1 Making your model predict its own errors
12. 5.2.2 Implementing active transfer learning
13. 5.2.3 Active transfer learning with more layers
14. 5.2.4 The pros and cons of active transfer learning
15. 5.3 Applying active transfer learning to representative sampling
16. 5.3.1 Making your model predict what it doesn’t know
17. 5.3.2 Active transfer learning for adaptive representative sampling
18. 5.3.3 The pros and cons of active transfer learning for representative sampling
19. 5.4 Active transfer learning for adaptive sampling
20. 5.4.1 Making uncertainty sampling adaptive by predicting uncertainty
21. 5.4.2 The pros and cons of ATLAS
22. 5.5 Advanced active learning cheat sheets
23. 5.6 Further reading for active transfer learning
24. Summary
6 Applying active learning to different machine learning tasks
1. 6.1 Applying active learning to object detection
2. 6.1.1 Accuracy for object detection: Label confidence and localization
3. 6.1.2 Uncertainty sampling for label confidence and localization in object detection
4. 6.1.3 Diversity sampling for label confidence and localization in object detection
5. 6.1.4 Active transfer learning for object detection
6. 6.1.5 Setting a low object detection threshold to avoid perpetuating bias
7. 6.1.6 Creating training data samples for representative sampling that are similar to your predictions
8. 6.1.7 Sampling for image-level diversity in object detection
9. 6.1.8 Considering tighter masks when using polygons
10. 6.2 Applying active learning to semantic segmentation
11. 6.2.1 Accuracy for semantic segmentation
12. 6.2.2 Uncertainty sampling for semantic segmentation
13. 6.2.3 Diversity sampling for semantic segmentation
14. 6.2.4 Active transfer learning for semantic segmentation
15. 6.2.5 Sampling for image-level diversity in semantic segmentation
16. 6.3 Applying active learning to sequence labeling
17. 6.3.1 Accuracy for sequence labeling
18. 6.3.2 Uncertainty sampling for sequence labeling
19. 6.3.3 Diversity sampling for sequence labeling
20. 6.3.4 Active transfer learning for sequence labeling
21. 6.3.5 Stratified sampling by confidence and tokens
22. 6.3.6 Create training data samples for representative sampling that are similar to your predictions
23. 6.3.7 Full-sequence labeling
24. 6.3.8 Sampling for document-level diversity in sequence labeling
25. 6.4 Applying active learning to language generation
26. 6.4.1 Calculating accuracy for language generation systems
27. 6.4.2 Uncertainty sampling for language generation
28. 6.4.3 Diversity sampling for language generation
29. 6.4.4 Active transfer learning for language generation
30. 6.5 Applying active learning to other machine learning tasks
31. 6.5.1 Active learning for information retrieval
32. 6.5.2 Active learning for video
33. 6.5.3 Active learning for speech
34. 6.6 Choosing the right number of items for human review
35. 6.6.1 Active labeling for fully or partially annotated data
36. 6.6.2 Combining machine learning with annotation
37. 6.7 Further reading
38. Summary
Part 3 Annotation
7 Working with the people annotating your data
1. 7.1 Introduction to annotation
2. 7.1.1 Three principles of good data annotation
3. 7.1.2 Annotating data and reviewing model predictions
4. 7.1.3 Annotations from machine learning-assisted humans
5. 7.2 In-house experts
6. 7.2.1 Salary for in-house workers
7. 7.2.2 Security for in-house workers
8. 7.2.3 Ownership for in-house workers
9. 7.2.4 Tip: Always run in-house annotation sessions
10. 7.3 Outsourced workers
11. 7.3.1 Salary for outsourced workers
12. 7.3.2 Security for outsourced workers
13. 7.3.3 Ownership for outsourced workers
14. 7.3.4 Tip: Talk to your outsourced workers
15. 7.4 Crowdsourced workers
16. 7.4.1 Salary for crowdsourced workers
17. 7.4.2 Security for crowdsourced workers
18. 7.4.3 Ownership for crowdsourced workers
19. 7.4.4 Tip: Create a path to secure work and career advancement
20. 7.5 Other workforces
21. 7.5.1 End users
22. 7.5.2 Volunteers
23. 7.5.3 People playing games
24. 7.5.4 Model predictions as annotations
25. 7.6 Estimating the volume of annotation needed
26. 7.6.1 The orders-of-magnitude equation for number of annotations needed
27. 7.6.2 Anticipate one to four weeks of annotation training and task refinement
28. 7.6.3 Use your pilot annotations and accuracy goal to estimate cost
29. 7.6.4 Combining types of workforces
30. Summary
8 Quality control for data annotation
1. 8.1 Comparing annotations with ground truth answers
2. 8.1.1 Annotator agreement with ground truth data
3. 8.1.2 Which baseline should you use for expected accuracy?
4. 8.2 Interannotator agreement
5. 8.2.1 Introduction to interannotator agreement
6. 8.2.2 Benefits from calculating interannotator agreement
7. 8.2.3 Dataset-level agreement with Krippendorff’s alpha
8. 8.2.4 Calculating Krippendorff’s alpha beyond labeling
9. 8.2.5 Individual annotator agreement
10. 8.2.6 Per-label and per-demographic agreement
11. 8.2.7 Extending accuracy with agreement for real-world diversity
12. 8.3 Aggregating multiple annotations to create training data
13. 8.3.1 Aggregating annotations when everyone agrees
14. 8.3.2 The mathematical case for diverse annotators and low agreement
15. 8.3.3 Aggregating annotations when annotators disagree
16. 8.3.4 Annotator-reported confidences
17. 8.3.5 Deciding which labels to trust: Annotation uncertainty
18. 8.4 Quality control by expert review
19. 8.4.1 Recruiting and training qualified people
20. 8.4.2 Training people to become experts
21. 8.4.3 Machine-learning-assisted experts
22. 8.5 Multistep workflows and review tasks
23. 8.6 Further reading
24. Summary
9 Advanced data annotation and augmentation
1. 9.1 Annotation quality for subjective tasks
2. 9.1.1 Requesting annotator expectations
3. 9.1.2 Assessing viable labels for subjective tasks
4. 9.1.3 Trusting an annotator to understand diverse responses
5. 9.1.4 Bayesian Truth Serum for subjective judgments
6. 9.1.5 Embedding simple tasks in more complicated ones
7. 9.2 Machine learning for annotation quality control
8. 9.2.1 Calculating annotation confidence as an optimization task
9. 9.2.2 Converging on label confidence when annotators disagree
10. 9.2.3 Predicting whether a single annotation is correct
11. 9.2.4 Predicting whether a single annotation is in agreement
12. 9.2.5 Predicting whether an annotator is a bot
13. 9.3 Model predictions as annotations
14. 9.3.1 Trusting annotations from confident model predictions
15. 9.3.2 Treating model predictions as a single annotator
16. 9.3.3 Cross-validating to find mislabeled data
17. 9.4 Embeddings and contextual representations
18. 9.4.1 Transfer learning from an existing model
19. 9.4.2 Representations from adjacent easy-to-annotate tasks
20. 9.4.3 Self-supervision: Using inherent labels in the data
21. 9.5 Search-based and rule-based systems
22. 9.5.1 Data filtering with rules
23. 9.5.2 Training data search
24. 9.5.3 Masked feature filtering
25. 9.6 Light supervision on unsupervised models
26. 9.6.1 Adapting an unsupervised model to a supervised model
27. 9.6.2 Human-guided exploratory data analysis
28. 9.7 Synthetic data, data creation, and data augmentation
29. 9.7.1 Synthetic data
30. 9.7.2 Data creation
31. 9.7.3 Data augmentation
32. 9.8 Incorporating annotation information into machine learning models
33. 9.8.1 Filtering or weighting items by confidence in their labels
34. 9.8.2 Including the annotator identity in inputs
35. 9.8.3 Incorporating uncertainty into the loss function
36. 9.9 Further reading for advanced annotation
37. 9.9.1 Further reading for subjective data
38. 9.9.2 Further reading for machine learning for annotation quality control
39. 9.9.3 Further reading for embeddings/contextual representations
40. 9.9.4 Further reading for rule-based systems
41. 9.9.5 Further reading for incorporating uncertainty in annotations into the downstream models
42. Summary
10 Annotation quality for different machine learning tasks
1. 10.1 Annotation quality for continuous tasks
2. 10.1.1 Ground truth for continuous tasks
3. 10.1.2 Agreement for continuous tasks
4. 10.1.3 Subjectivity in continuous tasks
5. 10.1.4 Aggregating continuous judgments to create training data
6. 10.1.5 Machine learning for aggregating continuous tasks to create training data
7. 10.2 Annotation quality for object detection
8. 10.2.1 Ground truth for object detection
9. 10.2.2 Agreement for object detection
10. 10.2.3 Dimensionality and accuracy in object detection
11. 10.2.4 Subjectivity for object detection
12. 10.2.5 Aggregating object annotations to create training data
13. 10.2.6 Machine learning for object annotations
14. 10.3 Annotation quality for semantic segmentation
15. 10.3.1 Ground truth for semantic segmentation annotation
16. 10.3.2 Agreement for semantic segmentation
17. 10.3.3 Subjectivity for semantic segmentation annotations
18. 10.3.4 Aggregating semantic segmentation to create training data
19. 10.3.5 Machine learning for aggregating semantic segmentation tasks to create training data
20. 10.4 Annotation quality for sequence labeling
21. 10.4.1 Ground truth for sequence labeling
22. 10.4.2 Ground truth for sequence labeling in truly continuous data
23. 10.4.3 Agreement for sequence labeling
24. 10.4.4 Machine learning and transfer learning for sequence labeling
25. 10.4.5 Rule-based, search-based, and synthetic data for sequence labeling
26. 10.5 Annotation quality for language generation
27. 10.5.1 Ground truth for language generation
28. 10.5.2 Agreement and aggregation for language generation
29. 10.5.3 Machine learning and transfer learning for language generation
30. 10.5.4 Synthetic data for language generation
31. 10.6 Annotation quality for other machine learning tasks
32. 10.6.1 Annotation for information retrieval
33. 10.6.2 Annotation for multifield tasks
34. 10.6.3 Annotation for video
35. 10.6.4 Annotation for audio data
36. 10.7 Further reading for annotation quality for different machine learning tasks
37. 10.7.1 Further reading for computer vision
38. 10.7.2 Further reading for annotation for natural language processing
39. 10.7.3 Further reading for annotation for information retrieval
40. Summary
Part 4 Human–computer interaction for machine learning
11 Interfaces for data annotation
1. 11.1 Basic principles of human–computer interaction
2. 11.1.1 Introducing affordance, feedback, and agency
3. 11.1.2 Designing interfaces for annotation
4. 11.1.3 Minimizing eye movement and scrolling
5. 11.1.4 Keyboard shortcuts and input devices
6. 11.2 Breaking the rules effectively
7. 11.2.1 Scrolling for batch annotation
8. 11.2.2 Foot pedals
9. 11.2.3 Audio inputs
10. 11.3 Priming in annotation interfaces
11. 11.3.1 Repetition priming
12. 11.3.2 Where priming hurts
13. 11.3.3 Where priming helps
14. 11.4 Combining human and machine intelligence
15. 11.4.1 Annotator feedback
16. 11.4.2 Maximizing objectivity by asking what other people would annotate
17. 11.4.3 Recasting continuous problems as ranking problems
18. 11.5 Smart interfaces for maximizing human intelligence
19. 11.5.1 Smart interfaces for semantic segmentation
20. 11.5.2 Smart interfaces for object detection
21. 11.5.3 Smart interfaces for language generation
22. 11.5.4 Smart interfaces for sequence labeling
23. 11.6 Machine learning to assist human processes
24. 11.6.1 The perception of increased efficiency
25. 11.6.2 Active learning for increased efficiency
26. 11.6.3 Errors can be better than absence to maximize completeness
27. 11.6.4 Keep annotation interfaces separate from daily work interfaces
28. 11.7 Further reading
29. Summary
12 Human-in-the-loop machine learning products
1. 12.1 Defining products for human-in-the-loop machine learning applications
2. 12.1.1 Start with the problem you are solving
3. 12.1.2 Design systems to solve the problem
4. 12.1.3 Connecting Python and HTML
5. 12.2 Example 1: Exploratory data analysis for news headlines
6. 12.2.1 Assumptions
7. 12.2.2 Design and implementation
8. 12.2.3 Potential extensions
9. 12.3 Example 2: Collecting data about food safety events
10. 12.3.1 Assumptions
11. 12.3.2 Design and implementation
12. 12.3.3 Potential extensions
13. 12.4 Example 3: Identifying bicycles in images
14. 12.4.1 Assumptions
15. 12.4.2 Design and implementation
16. 12.4.3 Potential extensions
17. 12.5 Further reading for building human-in-the-loop machine learning products
18. Summary
appendix Machine learning refresher
1. A.1 Interpreting predictions from a model
2. A.1.1 Probability distributions
3. A.2 Softmax deep dive
4. A.2.1 Converting the model output to confidences with softmax
5. A.2.2 The choice of base/temperature for softmax
6. A.2.3 The result from dividing exponentials
7. A.3 Measuring human-in-the-loop machine learning systems
8. A.3.1 Precision, recall, and F-score
9. A.3.2 Micro and macro precision, recall, and F-score
10. A.3.3 Taking random chance into account: Chance-adjusted accuracy
11. A.3.4 Taking confidence into account: Area under the ROC curve (AUC)
12. A.3.5 Number of model errors spotted
13. A.3.6 Human labor cost saved
14. A.3.7 Other methods for calculating accuracy in this book
index
inside back cover

Human-in-the-Loop Machine Learning

Table of Contents