0%

Get to grips with deep learning techniques for building image processing applications using PyTorch with the help of code notebooks and test questions

Key Features

  • Implement solutions to 50 real-world computer vision applications using PyTorch
  • Understand the theory and working mechanisms of neural network architectures and their implementation
  • Discover best practices using a custom library created especially for this book

Book Description

Deep learning is the driving force behind many recent advances in various computer vision (CV) applications. This book takes a hands-on approach to help you to solve over 50 CV problems using PyTorch1.x on real-world datasets.

You'll start by building a neural network (NN) from scratch using NumPy and PyTorch and discover best practices for tweaking its hyperparameters. You'll then perform image classification using convolutional neural networks and transfer learning and understand how they work. As you progress, you'll implement multiple use cases of 2D and 3D multi-object detection, segmentation, human-pose-estimation by learning about the R-CNN family, SSD, YOLO, U-Net architectures, and the Detectron2 platform. The book will also guide you in performing facial expression swapping, generating new faces, and manipulating facial expressions as you explore autoencoders and modern generative adversarial networks. You'll learn how to combine CV with NLP techniques, such as LSTM and transformer, and RL techniques, such as Deep Q-learning, to implement OCR, image captioning, object detection, and a self-driving car agent. Finally, you'll move your NN model to production on the AWS Cloud.

By the end of this book, you'll be able to leverage modern NN architectures to solve over 50 real-world CV problems confidently.

What you will learn

  • Train a NN from scratch with NumPy and PyTorch
  • Implement 2D and 3D multi-object detection and segmentation
  • Generate digits and DeepFakes with autoencoders and advanced GANs
  • Manipulate images using CycleGAN, Pix2PixGAN, StyleGAN2, and SRGAN
  • Combine CV with NLP to perform OCR, image captioning, and object detection
  • Combine CV with reinforcement learning to build agents that play pong and self-drive a car
  • Deploy a deep learning model on the AWS server using FastAPI and Docker
  • Implement over 35 NN architectures and common OpenCV utilities

Who this book is for

This book is for beginners to PyTorch and intermediate-level machine learning practitioners who are looking to get well-versed with computer vision techniques using deep learning and PyTorch. If you are just getting started with neural networks, you'll find the use cases accompanied by notebooks in GitHub present in this book useful. Basic knowledge of the Python programming language and machine learning is all you need to get started with this book.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Modern Computer Vision with PyTorch
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  7. Section 1 - Fundamentals of Deep Learning for Computer Vision
  8. Artificial Neural Network Fundamentals
    1. Comparing AI and traditional machine learning
    2. Learning about the artificial neural network building blocks
    3. Implementing feedforward propagation
    4. Calculating the hidden layer unit values
    5. Applying the activation function
    6. Calculating the output layer values
    7. Calculating loss values
    8. Calculating loss during continuous variable prediction
    9. Calculating loss during categorical variable prediction
    10. Feedforward propagation in code
    11. Activation functions in code
    12. Loss functions in code
    13. Implementing backpropagation
    14. Gradient descent in code
    15. Implementing backpropagation using the chain rule
    16. Putting feedforward propagation and backpropagation together
    17. Understanding the impact of the learning rate 
    18. Summarizing the training process of a neural network
    19. Summary
    20. Questions
  9. PyTorch Fundamentals
    1. Installing PyTorch
    2. PyTorch tensors
    3. Initializing a tensor
    4. Operations on tensors
    5. Auto gradients of tensor objects
    6. Advantages of PyTorch's tensors over NumPy's ndarrays
    7. Building a neural network using PyTorch
    8. Dataset, DataLoader, and batch size
    9. Predicting on new data points
    10. Implementing a custom loss function
    11. Fetching the values of intermediate layers
    12. Using a sequential method to build a neural network
    13. Saving and loading a PyTorch model
    14. state dict
    15. Saving
    16. Loading
    17. Summary
    18. Questions
  10. Building a Deep Neural Network with PyTorch
    1. Representing an image
    2. Converting images into structured arrays and scalars
    3. Why leverage neural networks for image analysis?
    4. Preparing our data for image classification
    5. Training a neural network
    6. Scaling a dataset to improve model accuracy
    7. Understanding the impact of varying the batch size
    8. Batch size of 32
    9. Batch size of 10,000
    10. Understanding the impact of varying the loss optimizer
    11. Understanding the impact of varying the learning rate
    12. Impact of the learning rate on a scaled dataset
    13. High learning rate
    14. Medium learning rate
    15. Low learning rate
    16. Parameter distribution across layers for different learning rates
    17. Impact of varying the learning rate on a non-scaled dataset
    18. Understanding the impact of learning rate annealing
    19. Building a deeper neural network
    20. Understanding the impact of batch normalization
    21. Very small input values without batch normalization
    22. Very small input values with batch normalization
    23. The concept of overfitting
    24. Impact of adding dropout
    25. Impact of regularization
    26. L1 regularization
    27. L2 regularization
    28. Summary
    29. Questions
  11. Section 2 - Object Classification and Detection
  12. Introducing Convolutional Neural Networks
    1. The problem with traditional deep neural networks
    2. Building blocks of a CNN
    3. Convolution
    4. Filter
    5. Strides and padding 
    6. Strides
    7. Padding
    8. Pooling
    9. Putting them all together
    10. How convolution and pooling help in image translation
    11. Implementing a CNN 
    12. Building a CNN-based architecture using PyTorch
    13. Forward propagating the output in Python
    14. Classifying images using deep CNNs
    15. Implementing data augmentation
    16. Image augmentations
    17. Affine transformations
    18. Changing the brightness
    19. Adding noise
    20. Performing a sequence of augmentations
    21. Performing data augmentation on a batch of images and the need for collate_fn
    22. Data augmentation for image translation
    23. Visualizing the outcome of feature learning
    24. Building a CNN for classifying real-world images
    25. Impact on the number of images used for training
    26. Summary
    27. Questions
  13. Transfer Learning for Image Classification
    1. Introducing transfer learning
    2. Understanding VGG16 architecture
    3. Understanding ResNet architecture
    4. Implementing facial key point detection
    5. 2D and 3D facial key point detection
    6. Multi-task learning – Implementing age estimation and gender classification
    7. Introducing the torch_snippets library
    8. Summary
    9. Questions
  14. Practical Aspects of Image Classification
    1. Generating CAMs
    2. Understanding the impact of data augmentation and batch normalization
    3. Coding up road sign detection
    4. Practical aspects to take care of during model implementation
    5. Dealing with imbalanced data
    6. The size of the object within an image
    7. Dealing with the difference between training and validation data
    8. The number of nodes in the flatten layer
    9. Image size
    10. Leveraging OpenCV utilities
    11. Summary
    12. Questions
  15. Basics of Object Detection
    1. Introducing object detection
    2. Creating a bounding box ground truth for training
    3. Installing the image annotation tool
    4. Understanding region proposals
    5. Leveraging SelectiveSearch to generate region proposals
    6. Implementing SelectiveSearch to generate region proposals
    7. Understanding IoU
    8. Non-max suppression
    9. Mean average precision
    10. Training R-CNN-based custom object detectors
    11. Working details of R-CNN
    12. Implementing R-CNN for object detection on a custom dataset
    13. Downloading the dataset
    14. Preparing the dataset
    15. Fetching region proposals and the ground truth of offset
    16. Creating the training data
    17. R-CNN network architecture
    18. Predict on a new image
    19. Training Fast R-CNN-based custom object detectors
    20. Working details of Fast R-CNN
    21. Implementing Fast R-CNN for object detection on a custom dataset
    22. Summary
    23. Questions
  16. Advanced Object Detection
    1. Components of modern object detection algorithms
    2. Anchor boxes
    3. Region Proposal Network
    4. Classification and regression
    5. Training Faster R-CNN on a custom dataset
    6. Working details of YOLO
    7. Training YOLO on a custom dataset
    8. Installing Darknet
    9. Setting up the dataset format
    10. Configuring the architecture
    11. Training and testing the model
    12. Working details of SSD
    13. Components in SSD code
    14. SSD300
    15. MultiBoxLoss
    16. Training SSD on a custom dataset
    17. Summary
    18. Test your understanding
  17. Image Segmentation
    1. Exploring the U-Net architecture
    2. Performing upscaling
    3. Implementing semantic segmentation using U-Net
    4. Exploring the Mask R-CNN architecture
    5. RoI Align
    6. Mask head
    7. Implementing instance segmentation using Mask R-CNN
    8. Predicting multiple instances of multiple classes
    9. Summary
    10. Questions
  18. Applications of Object Detection and Segmentation
    1. Multi-object instance segmentation
    2. Fetching and preparing data
    3. Training the model for instance segmentation
    4. Making inferences on a new image
    5. Human pose detection
    6. Crowd counting
    7. Coding up crowd counting
    8. Image colorization
    9. 3D object detection with point clouds
    10. Theory
    11. Input encoding
    12. Output encoding
    13. Training the YOLO model for 3D object detection
    14. Data format
    15. Data inspection
    16. Training
    17. Testing
    18. Summary
  19. Section 3 - Image Manipulation
  20. Autoencoders and Image Manipulation
    1. Understanding autoencoders
    2. Implementing vanilla autoencoders
    3. Understanding convolutional autoencoders
    4. Grouping similar images using t-SNE
    5. Understanding variational autoencoders
    6. Working of VAE
    7. KL divergence
    8. Building a VAE
    9. Performing an adversarial attack on images
    10. Performing neural style transfer
    11. Generating deep fakes
    12. Summary
    13. Questions
  21. Image Generation Using GANs
    1. Introducing GANs
    2. Using GANs to generate handwritten digits
    3. Using DCGANs to generate face images
    4. Implementing conditional GANs
    5. Summary
    6. Questions
  22. Advanced GANs to Manipulate Images
    1. Leveraging the Pix2Pix GAN
    2. Leveraging CycleGAN
    3. Leveraging StyleGAN on custom images
    4. Super-resolution GAN
    5. Architecture
    6. Coding SRGAN
    7. Summary
    8. Questions
  23. Section 4 - Combining Computer Vision with Other Techniques
  24. Training with Minimal Data Points
    1. Implementing zero-shot learning
    2. Coding zero-shot learning
    3. Implementing few-shot learning
    4. Building a Siamese network
    5. Coding Siamese networks
    6. Working details of prototypical networks
    7. Working details of relation networks
    8. Summary
    9. Questions
  25. Combining Computer Vision and NLP Techniques
    1. Introducing RNNs
    2. The idea behind the need for RNN architecture
    3. Exploring the structure of an RNN
    4. Why store memory?
    5. Introducing LSTM architecture
    6. The working details of LSTM
    7. Implementing LSTM in PyTorch
    8. Implementing image captioning
    9. Image captioning in code
    10. Transcribing handwritten images
    11. The working details of CTC loss
    12. Calculating the CTC loss value
    13. Handwriting transcription in code
    14. Object detection using DETR
    15. The working details of transformers
    16. Basics of transformers
    17. The working details of DETR
    18. Detection with transformers in code
    19. Summary
    20. Questions
  26. Combining Computer Vision and Reinforcement Learning
    1. Learning the basics of reinforcement learning
    2. Calculating the state value
    3. Calculating the state-action value
    4. Implementing Q-learning
    5. Q-value
    6. Understanding the Gym environment
    7. Building a Q-table
    8. Leveraging exploration-exploitation
    9. Implementing deep Q-learning
    10. Implementing deep Q-learning with the fixed targets model
    11. Coding up an agent to play Pong
    12. Implementing an agent to perform autonomous driving
    13. Installing the CARLA environment
    14. Install the CARLA binaries
    15. Installing the CARLA Gym environment
    16. Training a self-driving agent
    17. model.py
    18. actor.py
    19. Training DQN with fixed targets
    20. Summary
    21. Questions
  27. Moving a Model to Production
    1. Understanding the basics of an API
    2. Creating an API and making predictions on a local server
    3. Installing the API module and dependencies
    4. Serving an image classifier
    5. fmnist.py
    6. server.py
    7. Running the server
    8. Moving the API to the cloud
    9. Comparing Docker containers and Docker images
    10. Creating a Docker container
    11. Creating the requirements.txt file
    12. Creating a Dockerfile
    13. Building a Docker image and creating a Docker container
    14. Shipping and running the Docker container in the cloud
    15. Configuring AWS
    16. Creating a Docker repository on AWS ECR and pushing the image
    17. Creating an EC2 instance
    18. Pulling the image and building the Docker container
    19. Summary
  28. Using OpenCV Utilities for Image Analysis
    1. Drawing bounding boxes around words in an image
    2. Detecting lanes in an image of a road
    3. Detecting objects based on color
    4. Building a panoramic view of images
    5. Detecting the number plate of a car
    6. Summary
  29. Appendix
    1. Chapter 1 - Artificial Neural Network Fundamentals
    2. Chapter 2 - PyTorch Fundamentals
    3. Chapter 3 - Building a Deep Neural Network with PyTorch
    4. Chapter 4 - Introducing Convolutional Neural Networks
    5. Chapter 5 - Transfer Learning for Image Classification
    6. Chapter 6 - Practical Aspects of Image Classification
    7. Chapter 7 - Basics of Object Detection
    8. Chapter 8 - Advanced Object Detection
    9. Chapter 9 - Image Segmentation
    10. Chapter 11 - Autoencoders and Image Manipulation
    11. Chapter 12 - Image Generation Using GANs
    12. Chapter 13 - Advanced GANs to Manipulate Images
    13. Chapter 14 - Training with Minimal Data Points
    14. Chapter 15 - Combining Computer Vision and NLP Techniques
    15. Chapter 16 - Combining Computer Vision and Reinforcement Learning
  30. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
44.192.73.68