How to do it...

We will generate synthetic data, first a fully linear model where each independent variable (two of them) will have a linear impact. Secondly, a model with one variable:

First, we generate some data for regression. The relationship will be linear and there is nothing new up to this point, as shown in the following example:

library(SemiPar)
x1 = rnorm(100,20,6)
x2 = runif(100,1,8)
y = x1 + x2 + rnorm(100,0,5)
data_sim = data.frame(x1=x1,x2=x2,y=y)

Next, we set up our semiparametric model, and we assume that there is an additive relationship between the variables. Because we generated the data, we already know that no transformation of the variables is necessary. But let's assume we didn't know this, and specify a general f() transformation for each variable:

fit <- spm(data_sim$y ~ f(data_sim$x1)+f(data_sim$x2))
summary(fit)

The preceding command displays the following output:

Then, run the following code:

par(mfrow=c(1,2))
plot(fit)

Estimated effect for each variable. The SemiPar package correctly treats them as almost linear. The gray area is the 95% confidence interval (these are slightly wider for small and large values of the independent variables, because we don't have so much data in those areas):

We can now generate a new dataset with the same structure, but now with an exponential transformation to x2. Ideally, SemiPar should identify this transformation, as shown in the following example:

x1 = rnorm(100,20,6)
x2 = runif(100,1,8)
y = x1 + 150*exp(-x2) + rnorm(100,0,5)
data_sim = data.frame(x1=x1,x2=x2,y=y)

We can now fit our model, with the same structure as before, as in the following code:

fit <- spm(data_sim$y ~ f(data_sim$x1)+f(data_sim$x2))
summary(fit)

Take a look at the following output:

Then, run the following code:

plot(fit)

The model correctly finds out that the correct relationship between x2 and x1 is via a functional relationship that resembles (-exp(x2)):

Both models presented before are actually fully non-parametric (we specified a general function for each of them). But we haven't yet constrained any variable to have an explicit linear effect. Let's now build a truly semiparametric model, containing a linear (parametric) and a non-linear (nonparametric) part. In this setting, we can get a coefficient for x1 that can be used for interpretation. As expected, we get a coefficient of 1 for x1, which is something we already knew, as demonstrated in the following code:

fit <- spm(data_sim$y ~ data_sim$x1 + f(data_sim$x2))
summary(fit)

The preceding code displays the following output of the linear and nonlinear components:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...