How to do it...

In this recipe, we will investigate how robust logistic regression handles extreme observations. The mechanics will be the same that we implemented for robust linear regression (we will generate data, contaminate it, and see how our estimates change).

  1. First, load the robust package:
library(robust)
  1. We first generate 1000 observations for logistic regression, and we try both logistic regression and robust logistic regression. Of course, both methods yield correct results in the absence of contamination:
set.seed(1000)
x1 = rnorm(1000)
x2 = rnorm(1000)
link_val = 2 + 2*x1 + 5*x2
pr = 1/(1+exp(-link_val))
y = rbinom(1000,1,pr)
df = data.frame(y=y,x1=x1,x2=x2)
glm(y~x1+x2,data=df,family="binomial")
robust::glmRob(y~x1+x2,data=df,family="binomial")

Estimated results. Non-robust (top) and robust (below). Because we don't have any contamination here, both will yield similar results:

  1. We now contaminate 5% of the results and repeat the procedure. The standard logistic regression model is not working properly: x1 has an estimated coefficient that equal to 0.08, which is very far away from the correct value (-1). x2 is also wrong, with an estimated coefficient of 3.26 instead of 5. The robust method is much better here: x1 is 0.63 instead of 1, and x2 is 3.65:
x1 = rnorm(1000)  
x2 = rnorm(1000)
link_val = 2 + 2*x1 + 5*x2
pr = 1/(1+exp(-link_val))
y = rbinom(1000,1,pr)
x1[1:100] = 10*rnorm(100)
df = data.frame(y=y,x1=x1,x2=x2)
glm(y~x1+x2,data=df,family="binomial")
robust::glmRob(y~x1+x2,mthod="cubif",data=df,family="binomial")

Standard and robust results. Standard ones are greatly impacted by the outliers:

  1. We now repeat the same exercise, but contaminating 10% of the data. The conclusions are quite similar: the classical GLM has serious problems, whereas the robust one works well:
x1 = rnorm(1000)  
x2 = rnorm(1000)
link_val = 2 + 2*x1 + 5*x2
pr = 1/(1+exp(-link_val))
y = rbinom(1000,1,pr)
x1[1:200] = 10*rnorm(200)
df = data.frame(y=y,x1=x1,x2=x2)

Classical and robust GLM results with a 10% contamination:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.250.172