Model selection tools – the stepwise regression

The stepwise regression has been changed in Minitab v17. In the previous versions, this was its own tool. Now, stepwise regression is an option within the Fit Regression Model… tools. This includes the fit regression models for binary logistic regression and poisson regression. The same options for stepwise regression are used in all three of the fit regression model tools.

We will use the sleep dataset with stepwise regression to select predictors for a regression model.

We will use the total sleep column for the study and specify using the Box-Cox transformation on the response before running the stepwise regression on all two-way interactions.

Getting ready

The data is available at the following link from StatSci.org:

http://www.statsci.org/data/general/sleep.html

The data is tab delimited and can be copied directly into the worksheet. Once the data is in Minitab, calculate the natural log of total sleep. See steps 1 to 5 in the Model Selection, best subsets recipe.

How to do it…

The following steps will use a stepwise regression to identify a regression model within the Fit General Linear Model function:

  1. Navigate to Stat | Regression | Regression and select Fit Regression Model….
  2. Enter TotalSleep in the Responses: section.
  3. In Continuous predictors:, enter BodyWt, BrainWt, Lifespan, Gestation, Predation, Exposure, and Danger.
  4. To add interactions to the study, click on the Model… button and highlight Predictors: in the top-left section of the dialog box. Then click on the Add button next to Interactions through order: 2.
  5. Click on OK and then select the Stepwise… button.
  6. From the Method: drop down, select Stepwise and click on OK.
  7. Go to the Graphs… button and select the Four in one residual plots and click on OK in each dialog box to run the model.
  8. Check the results in the session window to observe the fitted model, as shown in the following screenshot:
    How to do it…
  9. From the results of the preceding stepwise regression, we should observe that the model converged on a solution of Gestation, Predation, Danger, Exposure, and Predation*Danger.
  10. To observe predicted responses, navigate to Stat | Regression | Regression and select Factorial Plots.
  11. Select all available predictors to put them in the plots and click on OK.

    Tip

    Use the >> arrow to move everything into the charts.

  12. To observe the interaction of Predation and Danger, generate a contour plot. Navigate to Stat | Regression | Regression and select Contour Plot.
  13. Change the X Axis: section to Danger and click on OK.

How it works…

The stepwise regression will initially run forwards, including terms in the regression model. The first round is to select the predictor with the lowest P-value. The second round continues in the same way, looking to add the predictor that would be the best addition to the model. This continues round by round until no more terms can be added.

The default stepwise method will work forwards and backwards. If a term that was added on a previous round has an increase in P-value above the decision level, it will be removed during the next round. Because of this selection method, we do not study all possible models, unlike the Best Subsets… tool.

If the results here show exposure at round three with a P-value of 0.815, then the fourth round will remove this term.

The stepwise selection method can be changed to forward selection, where terms are added but cannot be removed; or, it can be changed to backwards elimination, where all variables are included in round 1; each subsequent round uses the P-value to exclude a term.

With the steps here, we do not see the decision at each step in the regression; we see only the final fitted model. The Stepwise options can be selected to include the details for each step in the session window. This will reveal the terms included or removed at each point in the model. This will also display R-sq, R-sq(adj), R-sq(pred), and Mallows' Cp for each step as well.

The decision method is based on the alpha risk or the decision for the P-value. The default can be changed along with the method to allow model hierarchy to be calculated as well.

See also

  • The Model selection tools – the best subsets regression recipe
  • The Multiple regression with linear predictors recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.65.65