Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Umberto Michelucci

Applied Deep Learning with TensorFlow 2

Learn to Implement Advanced Deep Learning Techniques with Python

2nd ed.

Logo of the publisher

Umberto Michelucci

Dübendorf, Switzerland

ISBN 978-1-4842-8019-5e-ISBN 978-1-4842-8020-1

https://doi.org/10.1007/978-1-4842-8020-1

Apress Standard

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

To my daughter Caterina and my wife Francesca. You are the reason I do what I do.

Foreword

Without even realizing it, we have been buried by the data we generate every day, in all areas of technology applications, social life, and health. Buried in this huge amount of data that we have been storing for years in the most disparate formats and ways lies knowledge that we yearn for and that we have not yet uncovered.

After an initial phase of caution that lasted a few years, today we all agree that artificial intelligence is a very powerful way to extract this knowledge from data.

For example, in my daily activity as a professor of biomedical engineering at the Politecnico di Torino, I often find myself dealing with topics that have to do with the world of health. I realize how much the clinical world is fascinated by the potential of these new technologies and sometimes looks at them as if they were something mystical. In healthcare, artificial intelligence, machine learning, and deep learning are in the spotlight for their ability to predict disease risk and for their efficiency in automating several steps that make up the investigation phase of biomedical images or signals that support clinical decision-making.

However, the road to making these technologies a definitive support in clinical practice is still long and tortuous and includes a better understanding of the functioning of biological systems, much of which has yet to be clarified despite the countless discoveries in medicine, biology, biochemistry, and biophysics. Despite the incredible impact of these methods in everyday life, most users of AI-driven technologies have no idea how these technologies work, and this is still true even for many scientific disciplines.

Therefore, there is a great need to educate society about AI technologies at different levels of depth in order to make sure that the professionals of the future, at least those involved in scientific disciplines, can actively use these methods. In other words, machine learning techniques, or deep learning, should not be considered a solution to a specific problem, but a tool or set of tools to achieve a solution to a specific problem.

In this context, this book proposes an approach that helps readers interface with complex methodologies by providing original application examples that are gradually more complex and resemble real problems while maintaining a scholastic character. Dr. Michelucci puts in this book all his skills as an exceptional trainer, combining his ability to explain very complex concepts in a clear and understandable way, while maintaining a good degree of mathematical formalism, and his ability to stimulate critical thinking through the development of practical problems. This book also provides a quick guide to using working environments such as Jupyter Notebooks to create and share documents that contain equations, code, and text.

A journey that starts with the single neuron teaches readers how to build neural networks, training techniques, testing, and validation through appropriate metrics, tuning hyperparameters, and much more.

All these topics are covered by providing code examples that help the readers make concepts and methods their own so that they can customize them to specific problems.

Therefore, this book is a valuable aid for those who want to not only learn about deep learning, but also want to make it part of their methodological background.

I believe that Dr. Michelucci's work will be useful to engineers, physicists, and mathematicians who are interested in making their own concepts and methods related to deep neural networks. The collaboration I undertook with Umberto Michelucci for several years was fundamental for my research group's professional growth, and I am sure that this book will be of support to many other passionate scientists.

Sincerely,

Marco A. Deriu, PhD

Introduction

This is the second edition of Applied Deep Learning and it has been updated for TensorFlow 2.X and expanded to cover additional advanced material, such as autoencoders and generative adversarial networks (GANs). The goal of this book is to teach you the necessary fundamentals of how neural networks work, how to train them, and how to implement them with Keras. We start by discussing what a neuron is and what you can achieve with just one, then move to multiple layers in feed-forward neural networks. You learn what regularization is and how to use it, how advanced optimizers (such as Adam) work, and how to do hyperparameter tuning. At the end of the book, we look at some advanced topics, such as autoencoders, metric analysis, and GANs.

If you are new to this subject, I suggest you read the chapters in order, but if you already have some experience and you want to learn about a specific topic, you can jump directly to the relevant chapter. The chapters are mostly self-contained, although each refers to concepts explained in previous chapters, so if you don’t know what a specific symbol or concept means, you can refer to the previous chapters. I worked hard to keep the mathematical notation and programming style as consistent as possible to make following the book easier. I only discuss very short code snippets (the ones I consider relevant), so you will not find complete code to copy and use, but don’t worry. This book has an online site where you can find lots of Jupyter Notebooks that will be updated regularly with new examples and topics. You can find them at https://adl.toelt.ai.

Anytime you want to see the complete code in action, go to that site and you will find complete examples that you can download or open in Google Colab to try. TensorFlow is updated often, so providing code examples in the book would make the book age very quickly! My suggestion is to study the concepts here in the book, and then go to the online site and try the complete code to see how what you learned works in practice.

At the end of each chapter there are some exercises that have the goal of making you think about what you learned and give you interesting insights.

Who This Book Is For

To benefit from this book, you should have intermediate Python programming experience. It’s helpful if you understand how the NumPy library works, since it is used extensively with TensorFlow. You should also have a basic understanding of algebra and calculus. You should understand at least the following concepts:

What is a matrix.
How to do basic operations on matrixes, such as multiplying them, inverting them, and so on.
What is a derivative (and what is a partial derivative).
How to calculate easy derivatives.
What a function is and what it means to minimize one.

If you understand those concepts, you should be able to follow the explanations in the book. I always give many practical hints in the book to make clear what the implications of the theoretical concepts are in practice. I hope this will help you with your real-life projects.

Do You Need to Know TensorFlow/Keras?

This is a tricky question. The more you know, the more you will be able to benefit from the book. The main goal of this book is not to teach you Keras, but to teach you how neural networks work and give you implementation examples in Keras. Let me stress it again: The focus is on understanding how neural networks work, not on how Keras works. This is not a book on Keras. The best way to learn all the particularities of Keras is to look at the official documentation (https://www.tensorflow.org/learn). It is always up-to-date and contains many examples. This book covers the necessary skills you need to understand basic examples, but if you want to understand all the subtleties, you should study the official documentation.

Note

You will probably be able to understand most of the concepts even without knowing how Keras works, but the more experience you have with Keras, the easier it will be for you to follow the explanations.

Which Version of TensorFlow Is Used in this Book?

The code developed in this book has been tested on TensorFlow 2.5. I try to use only the fundamental Keras features to make it as compatible as possible with older and future versions. If you are using a different TensorFlow version, you may find that some of the code will not work. If you are running the code from https://adl.toelt.ai locally and you encounter this problem, I suggest you create a virtual environment¹ with TensorFlow 2.5. Versions of other packages, such as NumPy or Pandas, should not matter much. Any relatively modern (let’s say from 2020 or 2021) version should work just fine.

How to Try the Code in the Book

There are several ways to try the code discussed in this book. I worked very hard to make sure that you can run all the examples in the book in Google Colab (https://colab.research.google.com/), so that you don’t have to install anything on your personal laptop or PC. If you go to https://adl.toelt.ai, you can open all the examples directly in Google Colab. If you are on a page at https://adl.toelt.ai, simply hover the mouse over the small rocket icon on the top-right side of the examples (see Figure I-1). You have several options to open the notebook in an environment to try it.

You can simply choose Google Colab from the drop down list. The notebook will open in a browser in Google Colab so that you can test the code directly. Additionally, by clicking the icon with the arrow pointing down shown on the right of Figure I-1, you can also download the code on your laptop and run it locally².

Note

You can run all the examples discussed in this book in Google Colab. You can find a direct link to open the notebooks online by going to https://adl.toelt.ai and hovering the mouse over the small rocket icon on the top-right side of the page.

If you don’t know how Google Colab works, I suggest you watch the very short introductory video at https://www.youtube.com/watch?v=inN8seMm7UI. Basically, Google Colab is an online Jupyter Notebook with the Python engine running on Google servers. If you have worked with Jupyter Notebooks before you should be fine. If not, I suggest you go to the official project page at https://jupyter.org and study the many available tutorials. The Jupyter Notebook environment is widely used to do data science and is something that every practitioner should know about.

Contents of the Book

The first chapter discusses the problem of optimization in general and how it relates to neural networks. We look at how the most important minimization algorithm, gradient descent, is working and how the mini-batch and stochastic variations of it function.

Chapter 2 looks at how one neuron is structured and at the most commonly used activation functions. Then it covers how to implement a neural network with one single neuron in Keras and how to do linear regression and logistic regression (classification) with it. We discuss the three fundamental components of any neural network model: the network architecture, the loss function, and the optimizer.

Chapter 3 moves to neural networks with multiple layers and many neurons. We discuss the concept of overfitting and how to do basic error analysis. Then we look at how to implement neural networks with multiple layers with Keras. Additionally, we look at weight initialization and discuss the various ways of doing it. Finally, we discuss how to estimate how much memory a neural network model implemented with Keras will need.

In Chapter 4, the concept of regularization is discussed. We look at the l_p norm and at l₂ and l₁ regularization. Then we discuss how dropout works and how to implement it in Keras. Finally, we look at how early stopping works.

In Chapter 5, the Adam, momentum, and RMSProp optimizers are discussed, starting with what exponentially weighted averages are, a concept necessary for understanding advanced optimizers.

In Chapter 6, we discuss hyper-parameter tuning. We discuss grid and random search and coarse-to-fine optimization. Then Bayesian optimization is explained and discussed at length. Finally, we discuss sampling on a logarithmic scale. In Chapter 7, we discuss convolutional neural networks and how to implement them in Keras.

Chapter 8 is a very short chapter with a very basic introduction to recurrent neural networks. In Chapter 9, we discuss autoencoders and their applications. Chapter 10 contains a discussion of metric and error analysis.

Chapter 11 is a brief introduction to generative adversarial networks. In Appendixes A and B, an introduction to Keras and a discussion about how to customize it are briefly discussed.

Final Words

I hope that this book gives you a clear curriculum to follow in order to study neural networks in the most structured and easy way. The topics are not easy and require effort and time. Thus, you should not be discouraged. Unfortunately, real machine learning projects involve much more than simply copy and pasting from blogs on the Internet. Programming is only a part of it, and without knowing how the algorithms work, writing the code will be useless, and in the worst case will give you the wrong results.

I hope that you will find this book useful and that you will profit from it for your career and research projects.

Dübendorf, 1^st January 2022

Acknowledgments

This book would not have been possible without the help of many people who read drafts and gave me feedback. Prof. Marco Deriu helped greatly with many projects, ideas, and discussions. Dr. Piga read drafts and gave me feedback and ideas about how to make the chapters better. In particular, I am deeply indebted to Michela Sperti. She worked without pause and updated almost all of the book’s code to TensorFlow 2. Not only that, but she also read all the chapters and gave me important feedback that made the book much better. Without her, the book would not be as good as it is. Of course, all the mistakes that are in the book are completely my fault.

I am also incredibly grateful to Aditee Mirashi, an untiring editor, Jojo John Moolayil, a wonderful technical editor, and Celestin John Suresh, the most wonderful acquisition editor one may want. Many thanks to a wonderful Apress editing team.

But more importantly, I am infinitely indebted to my daughter Caterina and my wife Francesca, who supported me during the entire process and had infinite patience with me while I was writing and updating this book. You are the reason I do what I do.

A last big thank you goes to all the readers of the first edition. I thank you for your trust and interest in what I wrote. You are my main motivation for updating the book to this second edition.

Table of Contents

Chapter 1: Optimization and Neural Networks1

A Basic Understanding of Neural Networks1

The Problem of Learning3

A First Definition of Learning3

A Definition of Learning for Neural Networks4

Constrained vs. Unconstrained Optimization5

Absolute and Local Minima of a Function7

Optimization Algorithms8

Choosing the Right Learning Rate13

Variations of GD15

How to Choose the Right Mini-Batch Size18

[Advanced Section] SGD and Fractals20

Exercises21

Conclusion25

Chapter 3: Feed-Forward Neural Networks61

A Short Review of Network’s Architecture and Matrix Notation62

Output of Neurons65

A Short Summary of Matrix Dimensions66

Hyper-Parameters in Fully Connected Networks67

A Short Review of the Softmax Activation Function for Multiclass Classifications68

A Brief Digression: Overfitting69

A Practical Example of Overfitting69

Basic Error Analysis76

Implementing a Feed-Forward Neural Network in Keras78

Multiclass Classification with Feed-Forward Neural Networks78

The Zalando Dataset for the Real-World Example79

Modifying Labels for the Softmax Function: One-Hot Encoding83

The Feed-Forward Network Model85

Gradient Descent Variations Performances89

Examples of Wrong Predictions93

Weight Initialization94

Adding Many Layers Efficiently97

Comparing Different Networks100

Estimating the Memory Requirements of Models105

General Formula for the Memory Footprint107

Exercises108

References109

Chapter 4: Regularization111

Complex Networks and Overfitting111

What Is Regularization116

About Network Complexity117

ℓ_p Norm118

ℓ₂ Regularization118

ℓ₁ Regularization131

Are the Weights Really Going to Zero?135

Dropout137

Early Stopping141

Additional Methods142

Exercises143

References144

Chapter 5: Advanced Optimizers145

Available Optimizers in Keras in TensorFlow 2.5145

Advanced Optimizers145

Exponentially Weighted Averages146

Momentum150

RMSProp152

Adam153

Comparison of the Optimizers’ Performance154

Small Coding Digression158

Which Optimizer Should You Use?159

Chapter 6: Hyper-Parameter Tuning161

Black-Box Optimization161

Notes on Black-Box Functions163

The Problem of Hyper-Parameter Tuning164

Sample Black-Box Problem166

Grid Search167

Random Search172

Coarse to Fine Optimization176

Bayesian Optimization180

Sampling on a Logarithmic Scale201

Hyper-Parameter Tuning with the Zalando Dataset203

A Quick Note about the Radial Basis Function210

Exercises211

References211

Chapter 7: Convolutional Neural Networks213

Kernels and Filters213

Convolution214

Pooling231

Padding234

Building Blocks of a CNN235

Convolutional Layers235

Pooling Layers237

Stacking Layers Together238

An Example of a CNN239

Conclusion243

Exercises243

References244

Chapter 8: A Brief Introduction to Recurrent Neural Networks245

Introduction to RNNs245

Notation247

The Basic Idea of RNNs248

Why the Name Recurrent249

Learning to Count249

Conclusion254

Further Readings283

Chapter 10: Metric Analysis285

Human-Level Performance and Bayes Error286

A Short Story About Human-Level Performance289

Human-Level Performance on MNIST291

Bias291

Metric Analysis Diagram293

Training Set Overfitting294

Test Set295

How to Split Your Dataset297

Unbalanced Class Distribution: What Can Happen300

Datasets with Different Distributions306

k-fold Cross Validation312

Manual Metric Analysis: An Example319

Exercises329

References329

Chapter 11: Generative Adversarial Networks (GANs)331

Introduction to GANs331

Training Algorithm for GANs332

A Practical Example with Keras and MNIST333

Conditional GANs341

Conclusion346

Appendix A: Introduction to Keras347

Some History347

Understanding the Sequential Model348

Understanding Keras Layers349

Setting the Activation Function350

Using Functional APIs350

Specifying Loss Functions and Metrics352

Putting It All Together and Training352

Modeling evaluate( ) and predict( )354

Using Callback Functions354

Saving and Loading Models355

Saving Your Weights Manually360

Saving the Entire Model360

Conclusion361

Appendix B: Customizing Keras363

Customizing Callback Classes363

Example of a Custom Callback Class365

Custom Training Loops369

Calculating Gradients369

Custom Training Loop for a Neural Network371

Index375

About the Author

Umberto Michelucci

is the founder and the chief AI scientist of TOELT – Advanced AI LAB LLC, a company aiming to develop new and modern teaching, coaching, and research methods for AI, to make AI technologies and research accessible to every company and everyone. He’s an expert in numerical simulation, statistics, data science, and machine learning. In addition to several years of research experience at the George Washington University (USA) and the University of Augsburg (DE), he has 15 years of practical experience in the fields of data warehouse, data science, and machine learning. His first book, Applied Deep Learning—A Case-Based Approach to Understanding Deep Neural Networks, was published by Springer in 2018. He published a second book, Convolutional and Recurrent Neural Networks Theory and Applications, in 2019. He’s very active in artificial intelligence research. He publishes his research results regularly in leading journals and gives regular talks at international conferences. He also gives regular lectures on machine learning and statistics at various international universities. Umberto studied physics and mathematics. He holds a PhD in machine learning and physics, and he is also a Google Developer Expert in Machine Learning based in Switzerland.

About the Contributing Author

Michela Sperti

is responsible for most of the code upgrades from TensorFlow 1 to TensorFlow 2 in this book. She is a second year PhD student at Politecnico di Torino, Bioengineering department, with Prof. M. A. Deriu. She graduated in Biomedical Engineering at Politecnico di Torino in 2019 with a thesis on machine learning techniques for cardiovascular risk prediction in rheumatic patients. She worked for one year as a research assistant under the European-funded MSCA VIRTUOUS project (which aims to apply machine learning techniques to investigate taste and food properties). Currently, she is studying explainability techniques for machine learning and deep learning models applied in various fields (from cardiovascular risk to food organoleptic properties prediction), with the final aim of understanding complex mechanisms that underlie physiological processes. She is very passionate about teaching and is committed to communicating her results. She is the author of eight articles published in peer-review journals and took part in three international workshops as both a teaching assistant and a speaker.

About the Technical Reviewer

Jojo Moolayil

is an artificial intelligence professional and a published author of three books on machine learning, deep learning, and IoT. He is currently working with Amazon Web Services as a research scientist – A.I. in their Vancouver, BC office.

He was born and raised in Pune, India and graduated from the University of Pune with a major in Information Technology Engineering. His passion for problem-solving and data-driven decision-making led him to start a career with Mu Sigma Inc., the world’s largest pure-play analytics provider. There, he was responsible for developing machine learning and decision science solutions to large, complex problems for healthcare and telecom giants. He later worked with Flutura (an IoT Analytics startup) and General Electric with a focus on industrial A.I in Bangalore, India.

In his current role with AWS, he works on researching and developing large-scale A.I. solutions for combating fraud and enriching the customer’s payment experience in the cloud. He is also actively involved as a technical reviewer and AI consultant with leading publishers and has reviewed over a dozen books on machine learning, deep learning, and business analytics.

You can reach out to Jojo at

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Applied Deep Learning with TensorFlow 2

Learn to Implement Advanced Deep Learning Techniques with Python

Who This Book Is For

Do You Need to Know TensorFlow/Keras?

Which Version of TensorFlow Is Used in this Book?

How to Try the Code in the Book

Contents of the Book

Final Words

Table of Contents for
Front Matter