0%

Ever since computers began beating us at chess, they've been getting better at a wide range of human activities, from writing songs and generating news articles to helping doctors provide healthcare.

Deep learning is the source of many of these breakthroughs, and its remarkable ability to find patterns hiding in data has made it the fastest growing field in artificial intelligence (AI). Digital assistants on our phones use deep learning to understand and respond intelligently to voice commands; automotive systems use it to safely navigate road hazards; online platforms use it to deliver personalized suggestions for movies and books – the possibilities are endless.

Deep Learning: A Visual Approach is for anyone who wants to understand this fascinating field in depth, but without any of the advanced math and programming usually required to grasp its internals. If you want to know how these tools work, and use them yourself, the answers are all within these pages. And, if you’re ready to write your own programs, there are also plenty of supplemental Python notebooks in the accompanying Github repository to get you going.

The book’s conversational style, extensive color illustrations, illuminating analogies, and real-world examples expertly explain the key concepts in deep learning, including:

•How text generators create novel stories and articles
•How deep learning systems learn to play and win at human games
•How image classification systems identify objects or people in a photo
•How to think about probabilities in a way that’s useful to everyday life
•How to use the machine learning techniques that form the core of modern AI

Intellectual adventurers of all kinds can use the powerful ideas covered in Deep Learning: A Visual Approach to build intelligent systems that help us better understand the world and everyone who lives in it. It’s the future of AI, and this book allows you to fully envision it.

Table of Contents

  1. Title Page
  2. Copyright
  3. Dedication
  4. About the Author
  5. Acknowledgments
  6. Introduction
    1. Who This Book Is For
    2. This Book Has No Complex Math and No Code
    3. There Is Code, If You Want It
    4. The Figures Are Available, Too!
    5. Errata
    6. About This Book
    7. Part I: Foundational Ideas
    8. Part II: Basic Machine Learning
    9. Part III: Deep Learning Basics
    10. Part IV: Beyond the Basics
    11. Final Words
  7. Part I: Foundational Ideas
    1. Chapter 1: An Overview of Machine Learning
    2. Expert Systems
    3. Supervised Learning
    4. Unsupervised Learning
    5. Reinforcement Learning
    6. Deep Learning
    7. Summary
    8. Chapter 2: Essential Statistics
    9. Describing Randomness
    10. Random Variables and Probability Distributions
    11. Some Common Distributions
    12. Continuous Distributions
    13. Discrete Distributions
    14. Collections of Random Values
    15. Expected Value
    16. Dependence
    17. Independent and Identically Distributed Variables
    18. Sampling and Replacement
    19. Selection with Replacement
    20. Selection Without Replacement
    21. Bootstrapping
    22. Covariance and Correlation
    23. Covariance
    24. Correlation
    25. Statistics Don’t Tell Us Everything
    26. High-Dimensional Spaces
    27. Summary
    28. Chapter 3: Measuring Performance
    29. Different Types of Probability
    30. Dart Throwing
    31. Simple Probability
    32. Conditional Probability
    33. Joint Probability
    34. Marginal Probability
    35. Measuring Correctness
    36. Classifying Samples
    37. The Confusion Matrix
    38. Characterizing Incorrect Predictions
    39. Measuring Correct and Incorrect
    40. Accuracy
    41. Precision
    42. Recall
    43. Precision-Recall Tradeoff
    44. Misleading Measures
    45. f1 Score
    46. About These Terms
    47. Other Measures
    48. Constructing a Confusion Matrix Correctly
    49. Summary
    50. Chapter 4: Bayes’ Rule
    51. Frequentist and Bayesian Probability
    52. The Frequentist Approach
    53. The Bayesian Approach
    54. Frequentists vs. Bayesians
    55. Frequentist Coin Flipping
    56. Bayesian Coin Flipping
    57. A Motivating Example
    58. Picturing the Coin Probabilities
    59. Expressing Coin Flips as Probabilities
    60. Bayes’ Rule
    61. Discussion of Bayes’ Rule
    62. Bayes’ Rule and Confusion Matrices
    63. Repeating Bayes’ Rule
    64. The Posterior-Prior Loop
    65. The Bayes Loop in Action
    66. Multiple Hypotheses
    67. Summary
    68. Chapter 5: Curves and Surfaces
    69. The Nature of Functions
    70. The Derivative
    71. Maximums and Minimums
    72. Tangent Lines
    73. Finding Minimums and Maximums with Derivatives
    74. The Gradient
    75. Water, Gravity, and the Gradient
    76. Finding Maximums and Minimums with Gradients
    77. Saddle Points
    78. Summary
    79. Chapter 6: Information Theory
    80. Surprise and Context
    81. Understanding Surprise
    82. Unpacking Context
    83. Measuring Information
    84. Adaptive Codes
    85. Speaking Morse
    86. Customizing Morse Code
    87. Entropy
    88. Cross Entropy
    89. Two Adaptive Codes
    90. Using the Codes
    91. Cross Entropy in Practice
    92. Kullback–Leibler Divergence
    93. Summary
  8. Part II: Basic Machine Learning
    1. Chapter 7: Classification
    2. Two-Dimensional Binary Classification
    3. 2D Multiclass Classification
    4. Multiclass Classification
    5. One-Versus-Rest
    6. One-Versus-One
    7. Clustering
    8. The Curse of Dimensionality
    9. Dimensionality and Density
    10. High-Dimensional Weirdness
    11. Summary
    12. Chapter 8: Training and Testing
    13. Training
    14. Testing the Performance
    15. Test Data
    16. Validation Data
    17. Cross-Validation
    18. k-Fold Cross-Validation
    19. Summary
    20. Chapter 9: Overfitting and Underfitting
    21. Finding a Good Fit
    22. Overfitting
    23. Underfitting
    24. Detecting and Addressing Overfitting
    25. Early Stopping
    26. Regularization
    27. Bias and Variance
    28. Matching the Underlying Data
    29. High Bias, Low Variance
    30. Low Bias, High Variance
    31. Comparing Curves
    32. Fitting a Line with Bayes’ Rule
    33. Summary
    34. Chapter 10: Data Preparation
    35. Basic Data Cleaning
    36. The Importance of Consistency
    37. Types of Data
    38. One-Hot Encoding
    39. Normalizing and Standardizing
    40. Normalization
    41. Standardization
    42. Remembering the Transformation
    43. Types of Transformations
    44. Slice Processing
    45. Samplewise Processing
    46. Featurewise Processing
    47. Elementwise Processing
    48. Inverse Transformations
    49. Information Leakage in Cross-Validation
    50. Shrinking the Dataset
    51. Feature Selection
    52. Dimensionality Reduction
    53. Principal Component Analysis
    54. PCA for Simple Images
    55. PCA for Real Images
    56. Summary
    57. Chapter 11: Classifiers
    58. Types of Classifiers
    59. k-Nearest Neighbors
    60. Decision Trees
    61. Using Decision Trees
    62. Overfitting Trees
    63. Splitting Nodes
    64. Support Vector Machines
    65. The Basic Algorithm
    66. The SVM Kernel Trick
    67. Naive Bayes
    68. Comparing Classifiers
    69. Summary
    70. Chapter 12: Ensembles
    71. Voting
    72. Ensembles of Decision Trees
    73. Bagging
    74. Random Forests
    75. Extra Trees
    76. Boosting
    77. Summary
  9. Part III: Deep Learning Basics
    1. Chapter 13: Neural Networks
    2. Real Neurons
    3. Artificial Neurons
    4. The Perceptron
    5. Modern Artificial Neurons
    6. Drawing the Neurons
    7. Feed-Forward Networks
    8. Neural Network Graphs
    9. Initializing the Weights
    10. Deep Networks
    11. Fully Connected Layers
    12. Tensors
    13. Preventing Network Collapse
    14. Activation Functions
    15. Straight-Line Functions
    16. Step Functions
    17. Piecewise Linear Functions
    18. Smooth Functions
    19. Activation Function Gallery
    20. Comparing Activation Functions
    21. Softmax
    22. Summary
    23. Chapter 14: Backpropagation
    24. A High-Level Overview of Training
    25. Punishing Error
    26. A Slow Way to Learn
    27. Gradient Descent
    28. Getting Started
    29. Backprop on a Tiny Neural Network
    30. Finding Deltas for the Output Neurons
    31. Using Deltas to Change Weights
    32. Other Neuron Deltas
    33. Backprop on a Larger Network
    34. The Learning Rate
    35. Building a Binary Classifier
    36. Picking a Learning Rate
    37. An Even Smaller Learning Rate
    38. Summary
    39. Chapter 15: Optimizers
    40. Error as a 2D Curve
    41. Adjusting the Learning Rate
    42. Constant-Sized Updates
    43. Changing the Learning Rate over Time
    44. Decay Schedules
    45. Updating Strategies
    46. Batch Gradient Descent
    47. Stochastic Gradient Descent
    48. Mini-Batch Gradient Descent
    49. Gradient Descent Variations
    50. Momentum
    51. Nesterov Momentum
    52. Adagrad
    53. Adadelta and RMSprop
    54. Adam
    55. Choosing an Optimizer
    56. Regularization
    57. Dropout
    58. Batchnorm
    59. Summary
  10. PART IV: Beyond the Basics
    1. Chapter 16: Convolutional Neural Networks
    2. Introducing Convolution
    3. Detecting Yellow
    4. Weight Sharing
    5. Larger Filters
    6. Filters and Features
    7. Padding
    8. Multidimensional Convolution
    9. Multiple Filters
    10. Convolution Layers
    11. 1D Convolution
    12. 1×1 Convolutions
    13. Changing Output Size
    14. Pooling
    15. Striding
    16. Transposed Convolution
    17. Hierarchies of Filters
    18. Simplifying Assumptions
    19. Finding Face Masks
    20. Finding Eyes, Noses, and Mouths
    21. Applying Our Filters
    22. Summary
    23. Chapter 17: Convnets in Practice
    24. Categorizing Handwritten Digits
    25. VGG16
    26. Visualizing Filters, Part 1
    27. Visualizing Filters, Part 2
    28. Adversaries
    29. Summary
    30. Chapter 18: Autoencoders
    31. Introduction to Encoding
    32. Lossless and Lossy Encoding
    33. Blending Representations
    34. The Simplest Autoencoder
    35. A Better Autoencoder
    36. Exploring the Autoencoder
    37. A Closer Look at the Latent Variables
    38. The Parameter Space
    39. Blending Latent Variables
    40. Predicting from Novel Input
    41. Convolutional Autoencoders
    42. Blending Latent Variables
    43. Predicting from Novel Input
    44. Denoising
    45. Variational Autoencoders
    46. Distribution of Latent Variables
    47. Variational Autoencoder Structure
    48. Exploring the VAE
    49. Working with the MNIST Samples
    50. Working with Two Latent Variables
    51. Producing New Input
    52. Summary
    53. Chapter 19: Recurrent Neural Networks
    54. Working with Language
    55. Common Natural Language Processing Tasks
    56. Transforming Text into Numbers
    57. Fine-Tuning and Downstream Networks
    58. Fully Connected Prediction
    59. Testing Our Network
    60. Why Our Network Failed
    61. Recurrent Neural Networks
    62. Introducing State
    63. Rolling Up Our Diagram
    64. Recurrent Cells in Action
    65. Training a Recurrent Neural Network
    66. Long Short-Term Memory and Gated Recurrent Networks
    67. Using Recurrent Neural Networks
    68. Working with Sunspot Data
    69. Generating Text
    70. Different Architectures
    71. Seq2Seq
    72. Summary
    73. Chapter 20: Attention and Transformers
    74. Embedding
    75. Embedding Words
    76. ELMo
    77. Attention
    78. A Motivating Analogy
    79. Self-Attention
    80. Q/KV Attention
    81. Multi-Head Attention
    82. Layer Icons
    83. Transformers
    84. Skip Connections
    85. Norm-Add
    86. Positional Encoding
    87. Assembling a Transformer
    88. Transformers in Action
    89. BERT and GPT-2
    90. BERT
    91. GPT-2
    92. Generators Discussion
    93. Data Poisoning
    94. Summary
    95. Chapter 21: Reinforcement Learning
    96. Basic Ideas
    97. Learning a New Game
    98. The Structure of Reinforcement Learning
    99. Step 1: The Agent Selects an Action
    100. Step 2: The Environment Responds
    101. Step 3: The Agent Updates Itself
    102. Back to the Big Picture
    103. Understanding Rewards
    104. Flippers
    105. L-Learning
    106. The Basics
    107. The L-Learning Algorithm
    108. Testing Our Algorithm
    109. Handling Unpredictability
    110. Q-Learning
    111. Q-Values and Updates
    112. Q-Learning Policy
    113. Putting It All Together
    114. The Elephant in the Room
    115. Q-learning in Action
    116. SARSA
    117. The Algorithm
    118. SARSA in Action
    119. Comparing Q-Learning and SARSA
    120. The Big Picture
    121. Summary
    122. Chapter 22: Generative Adversarial Networks
    123. Forging Money
    124. Learning from Experience
    125. Forging with Neural Networks
    126. A Learning Round
    127. Why Adversarial?
    128. Implementing GANs
    129. The Discriminator
    130. The Generator
    131. Training the GAN
    132. GANs in Action
    133. Building a Discriminator and Generator
    134. Training Our Network
    135. Testing Our Network
    136. DCGANs
    137. Challenges
    138. Using Big Samples
    139. Modal Collapse
    140. Training with Generated Data
    141. Summary
    142. Chapter 23: Creative Applications
    143. Deep Dreaming
    144. Stimulating Filters
    145. Running Deep Dreaming
    146. Neural Style Transfer
    147. Representing Style
    148. Representing Content
    149. Style and Content Together
    150. Running Style Transfer
    151. Generating More of This Book
    152. Summary
    153. Final Thoughts
  11. References
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
    11. Chapter 11
    12. Chapter 12
    13. Chapter 13
    14. Chapter 14
    15. Chapter 15
    16. Chapter 16
    17. Chapter 17
    18. Chapter 18
    19. Chapter 19
    20. Chapter 20
    21. Chapter 21
    22. Chapter 22
    23. Chapter 23
  12. Image Credits
    1. Chapter 1
    2. Chapter 10
    3. Chapter 16
    4. Chapter 17
    5. Chapter 18
    6. Chapter 23
  13. Index
  14. PART V: Bonus Chapters
    1. Chapter B1: SciKit-Learn
    2. Python Conventions and Libraries
    3. Estimators
    4. Creation
    5. Learning with fit()
    6. Predicting with predict()
    7. Using decision_function() and predict_proba()
    8. Clustering
    9. Transformations
    10. Inverse Transformations
    11. Data Refinement
    12. Ensembles
    13. Automation
    14. Cross-Validation
    15. Hyperparameter Searching
    16. Exhaustive Grid Search
    17. Random Grid Search
    18. Pipelines
    19. Looking at the Decision Boundary
    20. Applying Pipelined Transformations
    21. Datasets
    22. Utilities
    23. Wrapping Up
    24. References
    25. Chapter B2: Keras Part 1
    26. The Structure of This Chapter
    27. Libraries, Programming, and Debugging
    28. Versions and Programming Style
    29. Python Programming and Debugging
    30. Running Externally
    31. A Workaround Note
    32. Overview
    33. Tensors and Arrays
    34. Setting Up Keras
    35. Shapes of Tensors Holding Images
    36. GPUs and Other Accelerators
    37. Getting Started
    38. Hello, World
    39. Preparing the Data
    40. Reshaping
    41. Loading the Data
    42. Looking at the Data
    43. Train-test Splitting
    44. Fixing the Data Type
    45. Normalizing the Data
    46. Fixing the Labels
    47. Pre-Processing All in One Place
    48. Making the Model
    49. Turning Grids into Lists
    50. Creating the Model
    51. Compiling the Model
    52. Model Creation Summary
    53. Training the Model
    54. Training and Using Our Model
    55. Looking at the Output
    56. Prediction
    57. Analysis of Training History
    58. Saving and Loading
    59. Saving Everything in One File
    60. Saving Just the Weights
    61. Saving Just the Architecture
    62. Using Pre-Trained Models
    63. Saving the Pre-Processing Steps
    64. Callbacks
    65. Checkpoints
    66. Learning Rate
    67. Early Stopping
    68. Wrapping Up
    69. References
    70. Image Credits
    71. Chapter B3: Keras Part 2
    72. Improving the Model
    73. Counting Up Hyperparameters
    74. Changing One Hyperparameter
    75. Other Ways to Improve
    76. Adding Another Dense Layer
    77. Less Is More
    78. Adding Dropout
    79. Observations
    80. Using Scikit-Learn
    81. Keras Wrappers
    82. Cross-Validation
    83. Cross-Validation with Normalization
    84. Hyperparameter Searching
    85. Convolution Networks
    86. Utility Layers
    87. Preparing the Data for A CNN
    88. Convolution Layers
    89. Using Convolution for MNIST
    90. Patterns
    91. Image Data Augmentation
    92. Synthetic Data
    93. Parameter Searching for Convnets
    94. RNNs
    95. Generating Sequence Data
    96. RNN Data Preparation
    97. Building, Compiling, and Running the RNN
    98. Analyzing RNN Performance
    99. A More Complex Dataset
    100. Deep RNNS
    101. The Value of More Data
    102. Returning Sequences
    103. Stateful RNNs
    104. Time-Distributed Layers
    105. Generating Text
    106. The Functional API
    107. Input Layers
    108. Making A Functional Model
    109. Summary
    110. References
    111. Image Credits
3.146.255.127