0%

NVIDIA's Full-Color Guide to Deep Learning: All You Need to Get Started and Get Results

"To enable everyone to be part of this historic revolution requires the democratization of AI knowledge and resources. This book is timely and relevant towards accomplishing these lofty goals."

--From the foreword by Dr. Anima Anandkumar, Bren Professor, Caltech, and Director of ML Research, NVIDIA

"Ekman uses a learning technique that in our experience has proven pivotal to successasking the reader to think about using DL techniques in practice. His straightforward approach is refreshing, and he permits the reader to dream, just a bit, about where DL may yet take us."

--From the foreword by Dr. Craig Clawson, Director, NVIDIA Deep Learning Institute

Deep learning (DL) is a key component of today's exciting advances in machine learning and artificial intelligence. Learning Deep Learning is a complete guide to DL. Illuminating both the core concepts and the hands-on programming techniques needed to succeed, this book is ideal for developers, data scientists, analysts, and others--including those with no prior machine learning or statistics experience.

After introducing the essential building blocks of deep neural networks, such as artificial neurons and fully connected, convolutional, and recurrent layers, Magnus Ekman shows how to use them to build advanced architectures, including the Transformer. He describes how these concepts are used to build modern networks for computer vision and natural language processing (NLP), including Mask R-CNN, GPT, and BERT. And he explains how a natural language translator and a system generating natural language descriptions of images.

Throughout, Ekman provides concise, well-annotated code examples using TensorFlow with Keras. Corresponding PyTorch examples are provided online, and the book thereby covers the two dominating Python libraries for DL used in industry and academia. He concludes with an introduction to neural architecture search (NAS), exploring important ethical issues and providing resources for further learning.

  • Explore and master core concepts: perceptrons, gradient-based learning, sigmoid neurons, and back propagation

  • See how DL frameworks make it easier to develop more complicated and useful neural networks

  • Discover how convolutional neural networks (CNNs) revolutionize image classification and analysis

  • Apply recurrent neural networks (RNNs) and long short-term memory (LSTM) to text and other variable-length sequences

  • Master NLP with sequence-to-sequence networks and the Transformer architecture

  • Build applications for natural language translation and image captioning

NVIDIA's invention of the GPU sparked the PC gaming market. The company's pioneering work in accelerated computing--a supercharged form of computing at the intersection of computer graphics, high-performance computing, and AI--is reshaping trillion-dollar industries, such as transportation, healthcare, and manufacturing, and fueling the growth of many others.

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Table of Contents

  1. Cover Page
  2. About This eBook
  3. Halftitle Page
  4. Title Page
  5. Copyright Page
  6. Dedication Page
  7. Contents
  8. Foreword
  9. Foreword
  10. Preface
    1. What Is Deep Learning?
    2. Brief History of Deep Neural Networks
    3. Is This Book for You?
    4. Is DL Dangerous?
    5. Choosing a DL Framework
    6. Prerequisites for Learning DL
    7. About the Code Examples
    8. How to Read This Book
    9. Overview of Each Chapter and Appendix
  11. Acknowledgments
  12. About the Author
  13. Chapter 1. The Rosenblatt Perceptron
    1. Example of a Two-Input Perceptron
    2. The Perceptron Learning Algorithm
    3. Limitations of the Perceptron
    4. Combining Multiple Perceptrons
    5. Implementing Perceptrons with Linear Algebra
    6. Geometric Interpretation of the Perceptron
    7. Understanding the Bias Term
    8. Concluding Remarks on the Perceptron
  14. Chapter 2. Gradient-Based Learning
    1. Intuitive Explanation of the Perceptron Learning Algorithm
    2. Derivatives and Optimization Problems
    3. Solving a Learning Problem with Gradient Descent
    4. Constants and Variables in a Network
    5. Analytic Explanation of the Perceptron Learning Algorithm
    6. Geometric Description of the Perceptron Learning Algorithm
    7. Revisiting Different Types of Perceptron Plots
    8. Using a Perceptron to Identify Patterns
    9. Concluding Remarks on Gradient-Based Learning
  15. Chapter 3. Sigmoid Neurons and Backpropagation
    1. Modified Neurons to Enable Gradient Descent for Multilevel Networks
    2. Which Activation Function Should We Use?
    3. Function Composition and the Chain Rule
    4. Using Backpropagation to Compute the Gradient
    5. Backpropagation with Multiple Neurons per Layer
    6. Programming Example: Learning the XOR Function
    7. Network Architectures
    8. Concluding Remarks on Backpropagation
  16. Chapter 4. Fully Connected Networks Applied to Multiclass Classification
    1. Introduction to Datasets Used When Training Networks
    2. Training and Inference
    3. Extending the Network and Learning Algorithm to Do Multiclass Classification
    4. Network for Digit Classification
    5. Loss Function for Multiclass Classification
    6. Programming Example: Classifying Handwritten Digits
    7. Mini-Batch Gradient Descent
    8. Concluding Remarks on Multiclass Classification
  17. Chapter 5. Toward DL: Frameworks and Network Tweaks
    1. Programming Example: Moving to a DL Framework
    2. The Problem of Saturated Neurons and Vanishing Gradients
    3. Initialization and Normalization Techniques to Avoid Saturated Neurons
    4. Cross-Entropy Loss Function to Mitigate Effect of Saturated Output Neurons
    5. Different Activation Functions to Avoid Vanishing Gradient in Hidden Layers
    6. Variations on Gradient Descent to Improve Learning
    7. Experiment: Tweaking Network and Learning Parameters
    8. Hyperparameter Tuning and Cross-Validation
    9. Concluding Remarks on the Path Toward Deep Learning
  18. Chapter 6. Fully Connected Networks Applied to Regression
    1. Output Units
    2. The Boston Housing Dataset
    3. Programming Example: Predicting House Prices with a DNN
    4. Improving Generalization with Regularization
    5. Experiment: Deeper and Regularized Models for House Price Prediction
    6. Concluding Remarks on Output Units and Regression Problems
  19. Chapter 7. Convolutional Neural Networks Applied to Image Classification
    1. The CIFAR-10 Dataset
    2. Characteristics and Building Blocks for Convolutional Layers
    3. Combining Feature Maps into a Convolutional Layer
    4. Combining Convolutional and Fully Connected Layers into a Network
    5. Effects of Sparse Connections and Weight Sharing
    6. Programming Example: Image Classification with a Convolutional Network
    7. Concluding Remarks on Convolutional Networks
  20. Chapter 8. Deeper CNNs and Pretrained Models
    1. VGGNet
    2. GoogLeNet
    3. ResNet
    4. Programming Example: Use a Pretrained ResNet Implementation
    5. Transfer Learning
    6. Backpropagation for CNN and Pooling
    7. Data Augmentation as a Regularization Technique
    8. Mistakes Made by CNNs
    9. Reducing Parameters with Depthwise Separable Convolutions
    10. Striking the Right Network Design Balance with EfficientNet
    11. Concluding Remarks on Deeper CNNs
  21. Chapter 9. Predicting Time Sequences with Recurrent Neural Networks
    1. Limitations of Feedforward Networks
    2. Recurrent Neural Networks
    3. Mathematical Representation of a Recurrent Layer
    4. Combining Layers into an RNN
    5. Alternative View of RNN and Unrolling in Time
    6. Backpropagation Through Time
    7. Programming Example: Forecasting Book Sales
    8. Dataset Considerations for RNNs
    9. Concluding Remarks on RNNs
  22. Chapter 10. Long Short-Term Memory
    1. Keeping Gradients Healthy
    2. Introduction to LSTM
    3. Alternative View of LSTM
    4. Related Topics: Highway Networks and Skip Connections
    5. Concluding Remarks on LSTM
  23. Chapter 11. Text Autocompletion with LSTM and Beam Search
    1. Encoding Text
    2. Longer-Term Prediction and Autoregressive Models
    3. Beam Search
    4. Programming Example: Using LSTM for Text Autocompletion
    5. Bidirectional RNNs
    6. Different Combinations of Input and Output Sequences
    7. Concluding Remarks on Text Autocompletion with LSTM
  24. Chapter 12. Neural Language Models and Word Embeddings
    1. Introduction to Language Models and Their Use Cases
    2. Examples of Different Language Models
    3. Benefit of Word Embeddings and Insight into How They Work
    4. Word Embeddings Created by Neural Language Models
    5. Programming Example: Neural Language Model and Resulting Embeddings
    6. King – Man + Woman! = Queen
    7. King – Man + Woman ! = Queen
    8. Language Models, Word Embeddings, and Human Biases
    9. Related Topic: Sentiment Analysis of Text
    10. Concluding Remarks on Language Models and Word Embeddings
  25. Chapter 13. Word Embeddings from word2vec and GloVe
    1. Using word2vec to Create Word Embeddings Without a Language Model
    2. Additional Thoughts on word2vec
    3. word2vec in Matrix Form
    4. Wrapping Up word2vec
    5. Programming Example: Exploring Properties of GloVe Embeddings
    6. Concluding Remarks on word2vec and GloVe
  26. Chapter 14. Sequence-to-Sequence Networks and Natural Language Translation
    1. Encoder-Decoder Model for Sequence-to-Sequence Learning
    2. Introduction to the Keras Functional API
    3. Programming Example: Neural Machine Translation
    4. Experimental Results
    5. Properties of the Intermediate Representation
    6. Concluding Remarks on Language Translation
  27. Chapter 15. Attention and the Transformer
    1. Rationale Behind Attention
    2. Attention in Sequence-to-Sequence Networks
    3. Alternatives to Recurrent Networks
    4. Self-Attention
    5. Multi-head Attention
    6. The Transformer
    7. Concluding Remarks on the Transformer
  28. Chapter 16. One-to-Many Network for Image Captioning
    1. Extending the Image Captioning Network with Attention
    2. Programming Example: Attention-Based Image Captioning
    3. Concluding Remarks on Image Captioning
  29. Chapter 17. Medley of Additional Topics
    1. Autoencoders
    2. Multimodal Learning
    3. Multitask Learning
    4. Process for Tuning a Network
    5. Neural Architecture Search
    6. Concluding Remarks
  30. Chapter 18. Summary and Next Steps
    1. Things You Should Know by Now
    2. Ethical AI and Data Ethics
    3. Things You Do Not Yet Know
    4. Next Steps
  31. Appendix A. Linear Regression and Linear Classifiers
    1. Linear Regression as a Machine Learning Algorithm
    2. Computing Linear Regression Coefficients
    3. Classification with Logistic Regression
    4. Classifying XOR with a Linear Classifier
    5. Classification with Support Vector Machines
    6. Evaluation Metrics for a Binary Classifier
  32. Appendix B. Object Detection and Segmentation
    1. Object Detection
    2. Semantic Segmentation
    3. Instance Segmentation with Mask R-CNN
  33. Appendix C. Word Embeddings Beyond word2vec and GloVe
    1. Wordpieces
    2. FastText
    3. Character-Based Method
    4. ELMo
    5. Related Work
  34. Appendix D. GPT, BERT, and RoBERTa
    1. GPT
    2. BERT
    3. RoBERTa
    4. Historical Work Leading Up to GPT and BERT
    5. Other Models Based on the Transformer
  35. Appendix E. Newton-Raphson versus Gradient Descent
    1. Newton-Raphson Root-Finding Method
    2. Relationship Between Newton-Raphson and Gradient Descent
  36. Appendix F. Matrix Implementation of Digit Classification Network
    1. Single Matrix
    2. Mini-Batch Implementation
  37. Appendix G. Relating Convolutional Layers to Mathematical Convolution
  38. Appendix H. Gated Recurrent Units
    1. Alternative GRU Implementation
    2. Network Based on the GRU
  39. Appendix I. Setting Up a Development Environment
    1. Python
    2. Programming Environment
    3. Programming Examples
    4. Datasets
    5. Installing a DL Framework
    6. TensorFlow Specific Considerations
    7. Key Differences Between PyTorch and TensorFlow
  40. Appendix J. Cheat Sheets
  41. Works Cited
  42. Index
  43. Code Snippets
3.139.97.157