Prefetching

Another way we can make an efficient data pipeline is by always having a batch of data ready to send to the GPU. Ideally, when training our model, we would like our GPU usage to be at 100% all the time. This way, we are making the maximum usage of our expensive piece of hardware that is efficiently computing forward and backward passes while training.

For this to happen though, we need our CPUs to load and prepare a batch of images, ready to pass to the GPU, during the time it takes to do a forward and backward pass of the model. Luckily, we can do this easily using a simple prefetch transformation after we collect our batch, as follows:

train_dataset= train_dataset.batch(128).prefetch(1)

Using prefetch will make sure our data pipeline prepares a whole batch of data for us, while training is happening, ready to be loaded into the GPU for the next iteration. Doing this ensures that our pipeline is not slowed waiting for a batch to be collected, and if fetching a batch takes less time than a forward and backward pass of the model, then our pipeline will be as efficient as it can be.

To be clear, using prefetch(1) here means that we prefetch the whole batch of data. This is why, we have batching as the last step in the pipeline and use the prefetch here as doing so is most effective.

Table of Contents for Prefetching

Create new playlist

Sign In

Sign Up

Table of Contents for
Prefetching