Constructing indexes to measure offensive and defensive strength

At this point, we have clean datasets and a decent understanding of our data fields. Now it's time to put the data and knowledge to use! In this section, we will build offensive and defensive indexes out of several of the statistics just analyzed.

Indexes are descriptive statistics that combine information from multiple data fields to give an observer a sense of what is going on without the observer needing to drill down into the components of the index. Sticking with the professional football theme, a quarterback is assigned a passer rating that is designed to communicate his passing ability (relative to other quarterbacks), without someone having to drill down into his completion percentage, yards per completion, touchdowns, and so on.

In this section, we will use the underlying team level statistics to construct indexes for the offensive and defensive strengths of each team. The offense strength index will depend on the teams' passing and rushing strength, and the defense strength index will depend on the ability of the teams to defend against the pass and the rush. This will allow us compare the different aspects of each team's game to other teams and will let us arrive at a winner and a loser in our simulated games later on.

Getting ready

The recipes in this chapter are cumulative. If you completed the previous recipes, you should have everything you need to continue.

How to do it…

Perform the following steps to construct the offensive and defensive strength indexes:

  1. The first thing that we will do in this section is calculate an offensive passing strength score. The most useful field we have to inform us about passing strength is the PassYds/G (average passing yards per game) field. The higher this number, the stronger the team's passing game:
    offense$OPassStrength <- max(offense[,5])-offense[,5]
    offense$OPassStrength <- (1-(offense$OPassStrength/max(offense$OPassStrength)))*100
    

    First, we calculated the difference between each team and the team with the most passing yards per game. Then, we divided the difference by the maximum number to normalize it, subtracted it from one (since higher the difference from the max, the worse the team's passing game will be), and then multiplied it by 100 so that we end up with values between 0 and 100.

    Note

    Due to the way we normalized this, the team with the strongest statistic will always get a 100, the team with the weakest statistic will always get a 0. Had we simply divided the team's value by the maximum value, our index would not have this characteristic.

  2. Next, we will do the exact same thing for offensive rushing strength. The field we will use to calculate this is RushYds/G (average rushing yards per game):
    offense$ORushStrength <- max(offense[,6])-offense[,6]
    offense$ORushStrength <- (1-(offense$ORushStrength/max(offense$ORushStrength)))*100
    
  3. Let's calculate index values for a couple more fields before aggregating them into a single offensive strength value. For example, let's choose points and yards per game:
    offense$OPPGStrength <- max(offense[,3])-offense[,3]
    offense$OPPGStrength <- (1-(offense$OPPGStrength/max(offense$OPPGStrength)))*100
    
    offense$OYPGStrength <- max(offense[,4])-offense[,4]
    offense$OYPGStrength <- (1-(offense$OYPGStrength/max(offense$OYPGStrength)))*100
    offense$OffStrength <- (offense$OPassStrength+offense$ORushStrength+offense$OPPGStrength+offense$OYPGStrength)/4
    

    In this last line of code, we simply took the average of each of the index values we calculated previously to come up with the offensive strength index.

  4. We will now follow the exact same steps for our defense dataset, starting with calculating a passing defense strength index from the number of passing yards allowed per game figures:
    defense$DPassStrength <- max(defense[,6])-defense[,6]
    defense$DPassStrength <- defense$DPassStrength/max(defense$DPassStrength)*100
    
  5. Next, we'll do the same thing with rushing defense strength:
    defense$DRushStrength <- max(defense[,5])-defense[,5]
    defense$DRushStrength <- defense$DRushStrength/max(defense$DRushStrength)*100
    
  6. As with offense, we will calculate indexes using points allowed per game and total yards allowed per game before averaging all four to arrive at an overall defensive strength index:
    defense$DPPGStrength <- max(defense[,3])-defense[,3]
    defense$DPPGStrength <- defense$DPPGStrength/max(defense$DPPGStrength)*100
    
    defense$DYPGStrength <- max(defense[,4])-defense[,4]
    defense$DYPGStrength <- defense$DYPGStrength/max(defense$DYPGStrength)*100
    
    defense$DefStrength <- (defense$DPassStrength+defense$DRushStrength+defense$DPPGStrength+defense$DYPGStrength)/4
    

    Note

    One difference to note between the offense and defense calculations is that we are not subtracting from 1 in the second step of each set of formulas. This is because for defense, lower numbers indicate more strength, whereas for offense, it is indicated by higher numbers.

How it works…

As mentioned previously, the purpose of indexes is to simplify and standardize the underlying statistics so that they can easily be interpreted and compared, and this is essentially what we did in this recipe. We boiled down several of the offensive and defensive statistics to a single value for each team.

We kept the examples relatively simple for illustrative purposes, but you can incorporate many more figures into the index values. We also kept the way to aggregate the individual indexes as simple as possible, choosing to just take the average of the four that we calculated. As a more complicated way, you can potentially weigh each of the indexes according to how important you consider them to be. For example, if you wanted to weigh the offensive ability to score the highest, followed by the passing strength, the ability to gain yards, and the rushing strength, respectively, instead of simply dividing by 4, you could assign the weights as follows:

offense$OffStrength <- (offense$OPPGStrength * 0.4) + (offense$OPassStrength * 0.25) + (offense$OYPGStrength * 0.2) + (offense$ORushStrength * 0.15)

This way, the values that you believe to be more important will contribute more toward the overall offensive or defensive index than other values that are relatively not as important, but important enough to be taken into consideration in your calculations.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.177.14