Visually exploring nonlinear relationships

Let's say that we want to understand the relationship between height, weight, age, and gender. We probably have some preconceived notions about how these variables relate to one another (for example, taller people probably tend to weigh more). Let's perform the following steps:

  1. Firstly, load the data and attach the data frame using the following code:
    body.measures <- read.csv('nhanes_body.txt')
    attach(body.measures)
  2. Now, let's just look at the data visually, as shown in the following diagram, an important first step in most analyses. Let's have a look at the following function:
    plot(age, height, xlab = 'Age', ylab = 'Height', main = 'Height vs Age')
    Visually exploring nonlinear relationships

The previous dataset has thousands of data points, so a simple plot with default graphics options looks messy but it is clear that this is not a linear relationship. However, even the previous plot shows us some features about the relationship between age and height. The most obvious feature is that the relationship between height and age is not linear but it does have segments that might be successfully modeled as linear. As expected, those who are relatively young are shorter than those who have reached adulthood. It also looks like there is a slight trend towards decreasing height as adults get older.

Tip

Are there data visualization tools in R for examining high density data?

To examine high density data, there are a number of packages available for visualization in R, including hexbin and ggplot2. These packages are also helpful for creating publication-quality plots.

The following three packages display high density data on two-dimensional plots and shading plot areas to give a visual representation of number of observations.

Using hexbin, as shown in the following code:

library(hexbin)
bin<-hexbin(age, height)
plot(bin, xlab = 'Age', ylab = 'Height', main = 'Height vs Age')

Using ggplot2, as shown in the following code:

library(ggplot2)
qplot(age, height, data = body.measures, geom="hex", xlim = c(0, 80), ylim = c(80, 200), binwidth = c(5, 5))

Using graphics, as shown in the following code:

smoothScatter(height ~ age, xlab = 'Age', ylab = 'Height', main = 'Height vs Age')

If we wish to try to make sense of an overall pattern using visualization, we can simply use the scatter.smooth command. This is not a sophisticated command with many options but it gives a quick view of the relationships between two variables with minimal code. It is generally helpful to set the color of the data points to a light color so that the smoothed line is visible. The plot is probably not publication ready and we might want to try to develop a better statistical model, as the one given by the following code, rather than the one provided by this curve, but this is a quick and dirty method to visualize data:

scatter.smooth(age, height, xlab = 'Age', ylab = 'Height', main = 'Height vs Age', col = 'gray', pch = 16)

The plot is shown in the following graph:

Visually exploring nonlinear relationships

As we cover the next few sections, we will be making many judgments about how sensitive we wish to be to individual data points, which is tantamount to deciding whether we wish to be optimally sensitive to small quirks in the data or whether we wish to minimize our modeling of sample-specific random error in the data. One will come at the cost of the other. If we expect that there will be large changes in one variable with respect to another, and that these changes truly represent a phenomenon worth modeling, then we believe that there is a large signal-to-noise ratio, and we will use methods that are very sensitive to changes in the data. Alternatively, if we believe that there will be only relatively small changes in one variable with respect to another and that large fluctuations are the effect of random errors, then we will be best served by methods that are relatively insensitive to fluctuations.

We will further discuss nonparametric data used to create these plots but first, we will stop and discuss extensions of the linear model to nonlinear relationships.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.65.130