Training at Scale

So far in this book, the datasets we have used or looked at have ranged in size from the tens of thousands (MNIST) of samples to just over a million (ImageNet). Although all these datasets were considered huge when they first came out, and required state-of-the-art machines to use, the great speed at which technologies such as GPUs and cloud computing have advanced has now made them both easy and quick to train by people with relatively low-power machines.

However, some of the amazing power of deep neural networks comes from their ability to scale with the amount of data fed to them. In simple terms, this means that the more good, clean data you can use to train your model, the better the result is going to be. Researchers are aware of this, and we can see that the number of training samples in new public datasets has continued to increase.

As a result of this, it is highly likely that, if you start working on problems in the industry or maybe even just the latest Kaggle competition, you are likely going to be using datasets that can be many millions of elements in size. How to handle datasets this large, and also how to train your models efficiently, then becomes a real problem. The difference can mean waiting three days instead of 1 month for your model to finish training, so it isn't something you want to get wrong.

In this chapter, you will learn about some of the ways in which we can deal with the following problems:

Having a dataset too big to fit into memory
How to scale your training across multiple machines
Having data too complex to be organized in normal directory folders and subfolders

Table of Contents for Training at Scale

Create new playlist

Sign In

Sign Up

Table of Contents for
Training at Scale