Chapter 4. Nonlinear Methods

In the previous chapter, we looked at using R for the estimation of linear models, the groundwork upon which most data analysis is built. However, what is one supposed to do if the relationship between two variables is not linear? For this, we must use nonlinear methods, a topic to which this chapter is devoted.

We will start with extensions to linear regression and then move on to nonparametric regression. In this chapter, we'll cover the following topics:

  • Polynomial regression
  • Spline regression
  • General regression framework
  • Point-wise regression
  • Kernel density regression
  • Local polynomial regression

Nonparametric and parametric models

The exact role of nonlinear methods in statistics is perhaps still a bit contentious. They are largely used for the purposes of exploratory analyses as visual tools, and for this reason data visualization and nonlinear statistical methods are closely tied together. The question is whether or not nonlinear methods can also be used to truly develop statistical models.

Broadly speaking, nonlinear models might be grouped into nonparametric (or semi-parametric) and parametric models. The term "parametric" here has a very different meaning than it does in statistical methods, used in testing for differences between groups. Parametric tests of statistically significant differences are concerned with a parameterization of the sample distribution. For example, the t-test does not actually test for differences in observed data but tests for differences in distributions whose parameters were computed based on observed data. This is to say, that we only need to know the parameters of the t-distribution for a given sample to do a t-test; we don't actually need the original sample values to perform a t-test if we know these parameters.

The term parametric when applied to regression is concerned not with the distribution of the sample being studied but with the statistical model being developed. As such, a parametric regression model is so called because it can be described completely with a few model parameters. For example, in linear regression, once we know that the model is linear, we only need to know two parameters: one is the intercept and the other is the slope of the model to recreate the entire line. Alternatively, if we regress a curve of unknown algebraic form on to a cloud of data points, then we need to know the x and y values of each point on that curve to recreate it, making it a nonparametric model. A nonparametric regression model can in fact make distributional assumptions about the sample.

There are a number of advantages that parametric regression models have over their nonparametric counterparts. They can be easily interpreted and used to advance a general theory from the model, which is not always possible with nonparametric regression models. The proportion of variance explained can give a convenient summary of how well a parametric model explains data, whereas, a similar statistic is not so readily available for nonparametric models. Finally, parametric models are excellent for use in smaller datasets or datasets with a low signal-to-noise ratio, since the strong assumption about the algebraic form of the model will have a tempering effect on model parameter sensitivity to only a few observations. The disadvantage of parametric models is that they require an assumption about the algebraic form of the relationship between two or more variables. It is also worth noting that nonlinear parametric regression models, like polynomial regression, can be thought of as linear regression onto nonlinear transformations of the predictor variables, and as such, it is possible to make all of the same assumptions of linear regression. As such, the term "nonlinear regression" is sometimes used strictly to refer to nonparametric regression models, though in this book, we will use nonlinear regression to refer to both parametric and nonparametric models.

Before getting into either the theory or the practical execution of these methods in R, it is important to first place these methods in context and address the question: What are such statistical methods useful for? There are two potential answers to this question, which have practical implications for how one might use R to execute these methods:

  • Nonparametric statistical models are simply exploratory tools and can be used to get an overall sense of how variables relate to one another (that is, they are not really models).
  • Nonparametric statistical models are powerful generalizations of linear models that allow us to model the relationships between two or more variables with high fidelity and without being bound to the equation for a line
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.123