Regression models range from commonly used linear, logistic, and multiple regression algorithms used in statistics to Ridge and Lasso regression, which penalizes co-efficients to improve model performance.
In our earlier examples, we saw the application of linear regression when we created trend-lines. Multiple linear regression refers to the fact that the process of creating the model requires multiple independent variables.
For instance:
Total Advertising Cost = x* Print Ads, would be a simple linear regression; whereas
Total Advertising Cost = X + Print Ads + Radio Ads + TV Ads, due to the presence of more than one independent variable (Print, Radio, and TV), would be a multiple linear regression.
Logistic regression is another commonly used statistical regression modelling technique that predicts the outcome of a discrete categorical value, mainly for cases where the outcome variable is dichotomous (for example, 0 or 1, Yes or No, and so on). There can, however, be more than 2 discrete outcomes (for example, State NY, NJ, CT) and this type of logistic regression is known as multinomial logistic regression.
Ridge and Lasso Regressions include a regularization term (λ) in addition to the other aspects of Linear Regression. The regularization term, Ridge Regression, has the effect of reducing the β coefficients (thus 'penalizing' the co-efficients). In Lasso, the regularization term tends to reduce some of the co-efficients to 0, thus eliminating the effect of the variable on the final model:
# Load mlbench and create a regression model of glucose (outcome/dependent variable) with pressure, triceps and insulin as the independent variables.
> library("mlbench") >lm_model<- lm(glucose ~ pressure + triceps + insulin, data=PimaIndiansDiabetes[1:100,]) > plot(lm_model)