Predictive accuracy measures

In the previous example, it is more or less easy to see that the order 0 model is very simple and that the order 5 model is too complex for the data, but what about the other two? To answer this question, we need a more principled way of taking into account the accuracy on one side and the simplicity on the other. To do so, we will need to introduce several new concepts. The first two are:

  • Within-sample accuracy: The accuracy measured with the data that's used to fit a model
  • Out-of-sample accuracy: The accuracy of the model that's measured on data that is not used for fitting the model—this is also known as predictive accuracy

For any combination of data and models, the within-sample accuracy will be, on average, smaller than the out-of-sample accuracy. Thus, using the within-sample accuracy could fool us into thinking that we have a better model than we truly have. For this reason, out-of-sample measures are preferred over within-sample measures. However, generally, there is a problem with this. We need to be able to afford leaving aside a portion of the data—not to fit the model, but to test it. This is often a luxury for most analysts. To circumvent this problem, people have spent a lot of effort devising methods to estimate the out-of-sample predictive accuracy using only within-sample data. Two such methods for this are:

  • Cross-validation: This is an empirical strategy based on dividing the available data into subsets that are used for fitting and evaluation in an alternated way
  • Information criteria: This is an umbrella term for several relatively simple expressions that can be considered as ways to approximate the results that we could have obtained by performing cross-validation
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.123