Generalization

The fundamental goal of any ML project is to produce a model that operates well with datasets beyond what is available to you currently. You want it to predict, classify, or estimate with minimal errors. Not so much on the data you already have, but on data that you will have when the resulting model is implemented.

The data that you already have to develop your ML model is called the training set. The training set should be representative of what you expect to find in the datasets that you will be applying the model to in the future. Even if you have a very large training set, it is unlikely that future datasets will be precisely the same. This great big, complex world produces a lot of variation.

The ability of an ML model to work as expected with a variety of future examples is called generalization. It is a key concept in ML, deep learning, and most other predictive modeling techniques. You want your ML model to predict accurately on both the data you have today and the data you will have in the future. You want the model to generalize beyond the dataset it was grown from. You should be willing to sacrifice accuracy on the training set in order to increase the probability of accuracy on the future, as yet unknown, datasets. After all, this is the whole point of developing an ML model in the first place.

However, since the only data you have today is the training set and the underlying function that you are trying to approximate with the model is unknown, there is no choice but to use the error on the training data itself as a proxy for the error on the datasets of tomorrow, as the ML model optimizes itself. Yet this is dark and full of terrors – very dangerous!

Thankfully, there are some methods to protect yourself from falsely optimistic ML models. These will be covered later. For now, know that practical ML is based on a foundation of mistrust. This is one of the things that makes it so powerful. If done properly, ML models that pass the rigors of validation go on to very successful applications in the wild and woolly real world.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.166.7