Regularization with the lasso

In the previous chapter on linear regression, we used the glmnet package to perform regularization with ridge regression and the lasso. As we've seen that it might be a good idea to remove some of our features, we'll try applying lasso to our data set and assess the results. First, we'll train a series of regularized models with glmnet() and then we will use cv.glmnet() to estimate a suitable value for λ. Then, we'll examine the coefficients of our regularized model using this λ:

> library(glmnet)
> heart_train_mat <- model.matrix(OUTPUT ~ ., heart_train)[,-1]
> lambdas <- 10 ^ seq(8, -4, length = 250)
> heart_models_lasso <- glmnet(heart_train_mat, 
  heart_train$OUTPUT, alpha = 1, lambda = lambdas, family = "binomial")
> lasso.cv <- cv.glmnet(heart_train_mat, heart_train$OUTPUT, alpha = 1,lambda = lambdas, family = "binomial")
> lambda_lasso <- lasso.cv$lambda.min
> lambda_lasso
[1] 0.01057052

> predict(heart_models_lasso, type = "coefficients", s = lambda_lasso)
19 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept) -4.980249537
AGE          .          
SEX          1.029146139
CHESTPAIN2   0.122044733
CHESTPAIN3   .          
CHESTPAIN4   1.521164330
RESTBP       0.013456000
CHOL         0.004190012
SUGAR       -0.587616822
ECG1         .          
ECG2         0.338365613
MAXHR       -0.010651758
ANGINA       0.807497991
DEP          0.211899820
EXERCISE2    0.351797531
EXERCISE3    0.081846313
FLUOR        0.947928099
THAL6        0.083440880
THAL7        1.501844677

We see that a number of our features have effectively been removed from the model because their coefficients are zero. If we now use this model to measure the classification accuracy on our training and test sets, we observe that in both cases, we get slightly better performance. Even if this difference is small, remember that we have achieved this using three fewer features:

> lasso_train_predictions <- predict(heart_models_lasso, s = lambda_lasso, newx = heart_train_mat, type = "response")
> lasso_train_class_predictions <- 
  as.numeric(lasso_train_predictions > 0.5)
> mean(lasso_train_class_predictions == heart_train$OUTPUT)
[1] 0.8913043
> heart_test_mat <- model.matrix(OUTPUT ~ ., heart_test)[,-1]
> lasso_test_predictions <- predict(heart_models_lasso, s = lambda_lasso, newx = heart_test_mat, type = "response")
> lasso_test_class_predictions <- 
  as.numeric(lasso_test_predictions > 0.5)
> mean(lasso_test_class_predictions == heart_test$OUTPUT)
[1] 0.925
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.234