Time for action - modelling with multiple linear regression

Multiple linear regression is one step removed from simple linear regression. It adheres to the sample principles, but makes use of additional independent variables to predict the outcome of a dependent variable.

Let us build upon our previous head to head combat model using multiple regression. This time, we will include both the number of Shu and Wei soldiers engaged as predictors of battle performance:

  1. Create a multiple regression model that predicts Rating using both the number of Shu and Wei soldiers engaged:
    > #create a multiple linear regression model using the
    lm(formula, data) function
    > #predict the rating of a head to head battle using the number
    of Shu and Wei soldiers engaged
    > lmHeadToHeadRating_ShuWeiSoldiers <- lm(subsetHeadToHead$Rating
    ~ subsetHeadToHead$ShuSoldiersEngaged +
    subsetHeadToHead$WeiSoldiersEngaged, subsetHeadToHead)
    
  2. Create a summary of the model:
    > #model summary
    > lmHeadToHeadRating_ShuWeiSoldiers_Summary <-
    summary(lmHeadToHeadRating_ShuWeiSoldiers)
    
  3. Display your linear model summary in the R console:
    > #display the summary
    > lmHeadToHeadRating_ShuWeiSoldiers_Summary
    
    Time for action - modelling with multiple linear regression

What just happened?

We used multiple linear regression to create a second model for predicting the performance rating of the Shu army in a head to head conflict. This model incorporated both the number of Shu and number of Wei soldiers engaged in combat as predictors. We can interpret a multiple linear regression model in a similar manner to a simple linear regression model. We can also compare our new model to the one that we previously created.

Interpreting the summary output

Review the summary output for our multiple regression model. The summary should be similar to the following screenshot:

Interpreting the summary output

From the Estimate column, we can derive our regression equation:

Rating = 33 + 0.0011 * ShuSoldiersEngaged - 0.00007 *
WeiSoldiersEngaged

Again, both our overall model (p < .001) and our independent variable coefficients (p < .001) are statistically significant. Moreover, the R-squared increased compared to our previous model to explain 51% of the variance in the Shu army's performance rating.

Let us use our multiple regression model to predict the performance of a 25,000 soldier Shu army against a 25,000 soldier Wei army, as follows:

Rating = 33 + 0.0011 * 25000 - 0.00007 * 25000
Rating = 33 + 27.5 - 1.75
Rating = 58.75

Recall that our Rating variable ranges from 0 to 100 and that our past victories have achieved ratings of 80 or higher. Our predicted rating of 59 suggests that the Shu army would likely not be victorious in this hypothetical conflict. However, also recall that our model only contains 51% of the ingredients that account for changes in head to head battle performance. Furthermore, our initial inference at the beginning of this chapter revealed that the Wei forces tend to enter a given battle with many more soldiers than the Shu. For these reasons, our model, as well as our hypothetical example, may not have sufficient practical relevance.

Explaining model differences

The increase in R-squared from our simple regression model to our multiple regression model can be attributed to the fact that our new model included more information that is relevant to predicting head to head battle performance. Our multiple regression model factors in the size of both armies when determining the Shu army's rating. Since the ability of the Shu army to perform well is dependent to some extent on the opposing forces, including both armies yields a much stronger basis for prediction than the single army approach that our original model took.

The key to developing useful predictive regression models is to include only the most relevant data. While 51% is a large improvement in predictive power over our preceding model, it still may not be enough to make us confident in making critical strategy decisions for the Shu army. Certainly, we are encouraged to explore the full range of our data before settling on a particular model.

Pop quiz

  1. Which of the following is most likely to increase the statistical significance of a multiple regression model?

    a. Including more independent variables.

    b. Including fewer independent variables.

    c. Including more relevant and fewer irrelevant independent variables.

    d. Including more irrelevant and fewer relevant independent variables.

  2. Which of the following is most likely to increase the practical significance of a multiple regression model?

    a. Including more independent variables.

    b. Including fewer independent variables.

    c. Including more relevant and fewer irrelevant independent variables.

    d. Including more irrelevant and fewer relevant independent variables.

Have a go hero

Create a new simple linear regression model that uses DurationInDays alone to predict the Shu army's performance in a head to head conflict. Then create two new multiple linear regression models that expand upon the previous model by incorporating ShuSoldiersEngaged and WeiSoldiersEngaged respectively. Generate and interpret the model summaries. Once complete, you should have three new regression models:

  • lmHeadToHeadRating_Duration
  • lmHeadToHeadRating_DurationShuSoldiers
  • lmHeadToHeadRating_DurationSoldiers

Also there should be three accompanying summaries saved in your R workspace. What do these models tell you about the importance of the duration of battle in predicting the outcome of head to head conflicts?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.25.4