Time for action - comparing and choosing models

At the moment, we have several models that attempt to predict the performance rating of the Shu army in head to head battles based on the duration and number of soldiers engaged in that battle. Yet, we do not have answers regarding which model is best and the relative contribution that each model makes above and beyond the preceding models.

We can use the process of hierarchical linear regression (HLR) to compare our models. Let us look at how HLR can be used to compare the three models that we have made thus far:

  1. Display a summary of each model:
    > #use HLR to compare different models
    > #first consider the models individually
    > #simple regression model using duration to predict battle
    rating
    > lmHeadToHeadRating_Duration_Summary
    
  2. This should produce a result as shown in the following screenshot:
    Time for action - comparing and choosing models
    > #multiple regression model using duration, Shu soldiers, and Wei soldiers to predict battle rating
    > lmHeadToHeadRating_DurationSoldiers_Summary
    
    • This should give a summary similar to the following:
      Time for action - comparing and choosing models
      > #interaction model using duration, Shu soldiers, Wei
      soldiers, and the interaction between Shu and Wei soldiers to
      predict battle rating
      > lmHeadToHeadRating_DurationSoldiersShuWeiInteraction_Summary
      
    • Produces the following summary:
    Time for action - comparing and choosing models
  3. Use anova(object, ...) to compare the relative contribution of each model:
    > #use anova(object, ...) to compare the relative contribution
    of multiple models
    > #compare the three head to head combat models using ANOVA
    > anovaHeadToHeadRatingModelComparison <-
    anova(lmHeadToHeadRating_Duration,
    lmHeadToHeadRating_DurationSoldiers,
    lmHeadToHeadRating_DurationSoldiersShuWeiInteraction)
    
  4. Display the anova results in the R console:
    > display the anova results
    > anovaHeadToHeadRatingModelComparison
    
    Time for action - comparing and choosing models

What just happened?

You have the data that you need to complete a hierarchical linear regression (HLR) analysis. To be thorough, you should consider both the individual models (summaries) and the relative contribution of each model (ANOVA).

Interpreting the model summaries

You are already familiar with interpreting model summaries. These are the best places to start when conducting an HLR analysis. You can check the summaries to see if each overall model and its coefficients are statistically significant. You should also take note of each model's R-squared value.

Our simple regression model is statistically significant on all accounts and has an amiable R-squared value of 77%. Likewise, all of the variables in our multiple regression model, as well as the model itself, are statistically significant. The model has an R-squared value of 86%. Furthermore, while our interaction model is also statistically significant, with an R-squared of 87%, two of its predictor variables are not statistically significant. Although these summaries provide us with a wealth of knowledge on the individual merits of each model, it is best to make a decision after considering the results of an anova test.

Interpreting the ANOVA results

Generally, analysis of variance (ANOVA) is a statistical procedure that compares the means of multiple groups and determines if they are significantly different from one another. In our case, ANOVA can be used in HLR to compare multiple regression models. Here, ANOVA determines if the coefficient(s) that each successive model brings to the overall regression equation makes a statistically significant contribution above and beyond the coefficients that preceded it.

Consider the following three models:

A: Y = X1
B: Y = X1 + X2
C: Y = X1 + X2 + X3

The difference between each model is that a new predictor is contributed to the regression equation. Model B contributes X2 in addition to model A, whereas model C contributes X3 in addition to model B. ANOVA succeeds in determining whether these successive contributions are statistically significant. For instance, if model B was found to be statistically significant through ANOVA, then including X2 in the regression model is likely to add value. Continuing, if model C were not found to be statistically significant, then including X3 in the regression model probably does not add much value and therefore should be removed. By comparing successive models in this manner, we are able to determine, in a statistical sense, whether our coefficients are or are not adding value to the overall model. Thus, our decisions to include valuable coefficients and eliminate excess ones are informed.

Of course, we have to be mindful of practical significance at all times. When selecting independent variables for our model, we should use our understanding of the data and the situation to select only the best predictors. Although we could, it is inappropriate to haphazardly test numerous arbitrary combinations of variables in an attempt to find the supposed best statistical model. In fact, partaking in such practice is likely to lead to a model that is both meaningless in a practical sense and incapable of predicting valid answers to the questions that motivated the use of regression modeling in the first place. Therefore, always keep in mind the practical implications of every statistical analysis.

anova(object, ...)

R's anova(object, ...) is a variable-argument function that can be used to conduct ANOVA on several objects. Each object of comparison can be entered into the function as its own argument. For example, in:

anova(A, B, C)

Here we are telling R to compare three objects (A, B, and C) using ANOVA.

The anova(object, ...) function yields an ANOVA table, which details the results of the analysis. For the purposes of comparing successive models using HLR, we are only concerned with the p-values (the Pr(>}|t|) column). The p-value beside each model indicates whether or not it is statistically significant above and beyond its preceding model. It does not however, indicate the individual statistical significance of the model, which is why we also considered the individual model summaries.

anova(object, ...)

The ANOVA table from our activity indicates that our multiple regression model is statistically significant above and beyond our simple regression model. However, our interaction model does not make a statistically significant contribution above and beyond our multiple regression model. This suggests, from a statistical standpoint, that our interaction coefficient should be removed. Recall that we did not formulate a logical basis for the interaction between the number of Shu and Wei soldiers engaged in head to head combat. Without a statistical or practical reason to include the interaction coefficient, it is best removed from the model. In other words, our HLR analysis suggests that, out of the models that we analyzed, the multiple regression model is best.

Pop quiz

  1. Which of the following best explains the meaning of a statistically significant result in an ANOVA table generated during an HLR analysis?

    a. The regression models' coefficients are statistically significant.

    b. The overall regression model is statistically significant.

    c. The contribution that the model makes is statistically significant.

    d. The contribution that the model makes above and beyond the preceding model is statistically significant.

Have a go hero

Using the techniques that we explored in this chapter, analyze the remaining battle methods surround, ambush, and fire and create regression models for each that predict the performance rating of the Shu army. Be sure to use your practical knowledge of the combat strategies to choose appropriate coefficients for your regression models. Once you have found a few reasonably predictive models for each method, use HLR to compare them. Ultimately, come to a statistically and practically justifiable conclusion about the best regression model to use for each battle method. Remember to save your R workspace and console text to preserve the content that you created during this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.58