0%

This practical book shows you how to employ machine learning models to extract information from images. ML engineers and data scientists will learn how to solve a variety of image problems including classification, object detection, autoencoders, image generation, counting, and captioning with proven ML techniques. This book provides a great introduction to end-to-end deep learning: dataset creation, data preprocessing, model design, model training, evaluation, deployment, and interpretability.

Google engineers Valliappa Lakshmanan, Martin Görner, and Ryan Gillard show you how to develop accurate and explainable computer vision ML models and put them into large-scale production using robust ML architecture in a flexible and maintainable way. You'll learn how to design, train, evaluate, and predict with models written in TensorFlow or Keras.

You'll learn how to:

  • Design ML architecture for computer vision tasks
  • Select a model (such as ResNet, SqueezeNet, or EfficientNet) appropriate to your task
  • Create an end-to-end ML pipeline to train, evaluate, deploy, and explain your model
  • Preprocess images for data augmentation and to support learnability
  • Incorporate explainability and responsible AI best practices
  • Deploy image models as web services or on edge devices
  • Monitor and manage ML models

Table of Contents

  1. Preface
    1. Who Is This Book For?
    2. How to Use This Book
    3. Organization of the Book
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
  2. 1. Machine Learning for Computer Vision
    1. Machine Learning
    2. Deep Learning Use Cases
    3. Summary
  3. 2. ML Models for Vision
    1. A Dataset for Machine Perception
    2. 5-Flowers Dataset
    3. Reading Image Data
    4. Visualizing Image Data
    5. Reading the Dataset File
    6. A Linear Model Using Keras
    7. Keras Model
    8. Training the Model
    9. A Neural Network Using Keras
    10. Neural Networks
    11. Deep Neural Networks
    12. Summary
    13. Glossary
  4. 3. Image Vision
    1. Pretrained Embeddings
    2. Pretrained Model
    3. Transfer Learning
    4. Fine-Tuning
    5. Convolutional Networks
    6. Convolutional Filters
    7. Stacking Convolutional Layers
    8. Pooling Layers
    9. AlexNet
    10. The Quest for Depth
    11. Filter Factorization
    12. 1x1 Convolutions
    13. VGG19
    14. Global Average Pooling
    15. Modular Architectures
    16. Inception
    17. SqueezeNet
    18. ResNet and Skip Connections
    19. DenseNet
    20. Depth-Separable Convolutions
    21. Xception
    22. Neural Architecture Search Designs
    23. NASNet
    24. The MobileNet Family
    25. Beyond Convolution: The Transformer Architecture
    26. Choosing a Model
    27. Performance Comparison
    28. Ensembling
    29. Recommended Strategy
    30. Summary
  5. 4. Object Detection and Image Segmentation
    1. Object Detection
    2. YOLO
    3. RetinaNet
    4. Segmentation
    5. Mask R-CNN and Instance Segmentation
    6. U-Net and Semantic Segmentation
    7. Summary
  6. 5. Creating Vision Datasets
    1. Collecting Images
    2. Photographs
    3. Imaging
    4. Proof of Concept
    5. Data Types
    6. Channels
    7. Geospatial Data
    8. Audio and Video
    9. Manual Labeling
    10. Multilabel
    11. Object Detection
    12. Labeling at Scale
    13. Labeling User Interface
    14. Multiple Tasks
    15. Voting and Crowdsourcing
    16. Labeling Services
    17. Automated Labeling
    18. Labels from Related Data
    19. Noisy Student
    20. Self-Supervised Learning
    21. Bias
    22. Sources of Bias
    23. Selection Bias
    24. Measurement Bias
    25. Confirmation Bias
    26. Detecting Bias
    27. Creating a Dataset
    28. Splitting Data
    29. TensorFlow Records
    30. Reading TensorFlow Records
    31. Summary
  7. 6. Preprocessing
    1. Reasons for Preprocessing
    2. Shape Transformation
    3. Data Quality Transformation
    4. Improving Model Quality
    5. Size and Resolution
    6. Using Keras Preprocessing Layers
    7. Using the TensorFlow Image Module
    8. Mixing Keras and TensorFlow
    9. Model Training
    10. Training-Serving Skew
    11. Reusing Functions
    12. Preprocessing Within the Model
    13. Using tf.transform
    14. Data Augmentation
    15. Spatial Transformations
    16. Color Distortion
    17. Information Dropping
    18. Forming Input Images
    19. Summary
  8. 7. Training Pipeline
    1. Efficient Ingestion
    2. Storing Data Efficiently
    3. Reading Data in Parallel
    4. Maximizing GPU Utilization
    5. Saving Model State
    6. Exporting the Model
    7. Checkpointing
    8. Distribution Strategy
    9. Choosing a Strategy
    10. Creating the Strategy
    11. Serverless ML
    12. Creating a Python Package
    13. Submitting a Training Job
    14. Hyperparameter Tuning
    15. Deploying the Model
    16. Summary
  9. 8. Model Quality and Continuous Evaluation
    1. Monitoring
    2. TensorBoard
    3. Weight Histograms
    4. Device Placement
    5. Data Visualization
    6. Training Events
    7. Model Quality Metrics
    8. Metrics for Classification
    9. Metrics for Regression
    10. Metrics for Object Detection
    11. Quality Evaluation
    12. Sliced Evaluations
    13. Fairness Monitoring
    14. Continuous Evaluation
    15. Summary
  10. 9. Model Predictions
    1. Making Predictions
    2. Exporting the Model
    3. Using In-Memory Models
    4. Improving Abstraction
    5. Improving Efficiency
    6. Online Prediction
    7. TensorFlow Serving
    8. Modifying the Serving Function
    9. Handling Image Bytes
    10. Batch and Stream Prediction
    11. The Apache Beam Pipeline
    12. Managed Service for Batch Prediction
    13. Invoking Online Prediction
    14. Edge ML
    15. Constraints and Optimizations
    16. TensorFlow Lite
    17. Running TensorFlow Lite
    18. Processing the Image Buffer
    19. Federated Learning
    20. Summary
  11. 10. Trends in Production ML
    1. Machine Learning Pipelines
    2. The Need for Pipelines
    3. Kubeflow Pipelines Cluster
    4. Containerizing the Codebase
    5. Writing a Component
    6. Connecting Components
    7. Automating a Run
    8. Explainability
    9. Techniques
    10. Adding Explainability
    11. No-Code Computer Vision
    12. Why Use No-Code?
    13. Loading Data
    14. Training
    15. Evaluation
    16. Summary
  12. 11. Advanced Vision Problems
    1. Object Measurement
    2. Reference Object
    3. Segmentation
    4. Rotation Correction
    5. Ratio and Measurements
    6. Counting
    7. Density Estimation
    8. Extracting Patches
    9. Simulating Input Images
    10. Regression
    11. Prediction
    12. Pose Estimation
    13. PersonLab
    14. The PoseNet Model
    15. Identifying Multiple Poses
    16. Image Search
    17. Distributed Search
    18. Fast Search
    19. Better Embeddings
    20. Summary
  13. 12. Image and Text Generation
    1. Image Understanding
    2. Embeddings
    3. Auxiliary Learning Tasks
    4. Autoencoders
    5. Variational Autoencoders
    6. Image Generation
    7. Generative Adversarial Networks
    8. GAN Improvements
    9. Image-to-Image Translation
    10. Super-Resolution
    11. Modifying Pictures (Inpainting)
    12. Anomaly Detection
    13. Deepfakes
    14. Image Captioning
    15. Dataset
    16. Tokenizing the Captions
    17. Batching
    18. Captioning Model
    19. Training Loop
    20. Prediction
    21. Summary
  14. Afterword
  15. Index
3.129.45.92