Polynomial regression – the ultimate model?

As we saw, we can think of the line as a sub-model of the parabola when  is equal to zero, and a line is also a sub-model of a cubic model when  and  are equal to zero. Of course, the parabola is a sub-model of the cubic one when . OK, I will stop here, but I think you already notice the pattern. This suggests that we can, in principle, use polynomial regression to fit an arbitrary complex model. We just built a polynomial with the right order. We could do this by increasing the order one by one until we do not observe any improvement on the fit, or we could build an infinite order polynomial and then somehow make all the irrelevant coefficients zero until we get a perfect fit of our data. To test this idea, we can start with a very simple example. Let's use the quadratic model to fit the third group of the Anscombe dataset. I will wait here while you do that... I am still waiting, don't worry...

OK, if you really did the exercise, you will have observed by yourself that it is possible to use a quadratic model to fit a line. While it may seem that the previous simple experiment validates the idea of building an infinite order polynomial to the fit data, we should curb our enthusiasm. In general, using a polynomial to fit data is not the best idea. Why? Because it does not matter which data we have. In principle, it is always possible to find a polynomial to fit the data perfectly! In fact, it is pretty easy to compute the exact order the polynomial should have. Why is fitting data perfectly problematic? Well that's the subject of Chapter 6, Model Comparison, but spoiler alert! A model that fits your current data perfectly will, in general, do a very poor job at fitting/describing unobserved data. The reason for this is that any real dataset will contain noise and sometimes (hopefully) an interesting pattern. An arbitrary over-complex model will fit the noise, leading to poor predictions. This is known as overfitting, and is a pervasive phenomena in statistics and ML. Polynomial regression makes for a convenient straw man when it comes to overfitting because it is easy to see the problem. This generates intuition that we can translate to more complex models, which can lead to overfitting without us really noticing. Part of the job, when analyzing data, is to be sure that models are not overfitting it. We will discuss this topic in detail in Chapter 6, Model Comparison.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.102.134