More data is always beneficial

In several experiments conducted by Google researchers in the paper Revisiting Unreasonable Effectiveness of Data in Deep Learning Era, they constructed an internal dataset that contained 300 million observations, which is obviously much larger than ImageNet. They then trained several state-of-the-art architectures on this dataset, increasing the amount of data shown to the model from 10 million to 30 million, 100 million, and finally 300 million. In doing so, they showed that model performance increased linearly with the log of the number of observations used to train, showing us that more data always helps in the source domain.

But what about the target domain? We repeated the Google experiment using a few datasets that resemble the type we might use during transfer learning, including the Dogs versus Cats dataset that we will use later in this chapter. We found that in the target domain model performance increased linearly with the log of the number of observations used to train, just as it did in the source domain. More data always helps.

Table of Contents for More data is always beneficial

Create new playlist

Sign In

Sign Up

Table of Contents for
More data is always beneficial