At this point, we have clean datasets and a decent understanding of our data fields. Now it's time to put the data and knowledge to use! In this section, we will build offensive and defensive indexes out of several of the statistics just analyzed.
Indexes are descriptive statistics that combine information from multiple data fields to give an observer a sense of what is going on without the observer needing to drill down into the components of the index. Sticking with the professional football theme, a quarterback is assigned a passer rating that is designed to communicate his passing ability (relative to other quarterbacks), without someone having to drill down into his completion percentage, yards per completion, touchdowns, and so on.
In this section, we will use the underlying team level statistics to construct indexes for the offensive and defensive strengths of each team. The offense strength index will depend on the teams' passing and rushing strength, and the defense strength index will depend on the ability of the teams to defend against the pass and the rush. This will allow us compare the different aspects of each team's game to other teams and will let us arrive at a winner and a loser in our simulated games later on.
The recipes in this chapter are cumulative. If you completed the previous recipes, you should have everything you need to continue.
Perform the following steps to construct the offensive and defensive strength indexes:
PassYds/G
(average passing yards per game) field. The higher this number, the stronger the team's passing game:offense$OPassStrength <- max(offense[,5])-offense[,5] offense$OPassStrength <- (1-(offense$OPassStrength/max(offense$OPassStrength)))*100
First, we calculated the difference between each team and the team with the most passing yards per game. Then, we divided the difference by the maximum number to normalize it, subtracted it from one (since higher the difference from the max, the worse the team's passing game will be), and then multiplied it by 100 so that we end up with values between 0 and 100.
RushYds/G
(average rushing yards per game):offense$ORushStrength <- max(offense[,6])-offense[,6] offense$ORushStrength <- (1-(offense$ORushStrength/max(offense$ORushStrength)))*100
offense$OPPGStrength <- max(offense[,3])-offense[,3] offense$OPPGStrength <- (1-(offense$OPPGStrength/max(offense$OPPGStrength)))*100 offense$OYPGStrength <- max(offense[,4])-offense[,4] offense$OYPGStrength <- (1-(offense$OYPGStrength/max(offense$OYPGStrength)))*100 offense$OffStrength <- (offense$OPassStrength+offense$ORushStrength+offense$OPPGStrength+offense$OYPGStrength)/4
In this last line of code, we simply took the average of each of the index values we calculated previously to come up with the offensive strength index.
defense$DPassStrength <- max(defense[,6])-defense[,6] defense$DPassStrength <- defense$DPassStrength/max(defense$DPassStrength)*100
defense$DRushStrength <- max(defense[,5])-defense[,5] defense$DRushStrength <- defense$DRushStrength/max(defense$DRushStrength)*100
defense$DPPGStrength <- max(defense[,3])-defense[,3] defense$DPPGStrength <- defense$DPPGStrength/max(defense$DPPGStrength)*100 defense$DYPGStrength <- max(defense[,4])-defense[,4] defense$DYPGStrength <- defense$DYPGStrength/max(defense$DYPGStrength)*100 defense$DefStrength <- (defense$DPassStrength+defense$DRushStrength+defense$DPPGStrength+defense$DYPGStrength)/4
As mentioned previously, the purpose of indexes is to simplify and standardize the underlying statistics so that they can easily be interpreted and compared, and this is essentially what we did in this recipe. We boiled down several of the offensive and defensive statistics to a single value for each team.
We kept the examples relatively simple for illustrative purposes, but you can incorporate many more figures into the index values. We also kept the way to aggregate the individual indexes as simple as possible, choosing to just take the average of the four that we calculated. As a more complicated way, you can potentially weigh each of the indexes according to how important you consider them to be. For example, if you wanted to weigh the offensive ability to score the highest, followed by the passing strength, the ability to gain yards, and the rushing strength, respectively, instead of simply dividing by 4, you could assign the weights as follows:
offense$OffStrength <- (offense$OPPGStrength * 0.4) + (offense$OPassStrength * 0.25) + (offense$OYPGStrength * 0.2) + (offense$ORushStrength * 0.15)
This way, the values that you believe to be more important will contribute more toward the overall offensive or defensive index than other values that are relatively not as important, but important enough to be taken into consideration in your calculations.
13.59.177.14