How to do it... 

In this recipe, we will attempt to predict FC Barcelona's goals throughout the 2017/18 season, using some covariates and the club's previous goals. This will expose the many difficulties that exist when working with sports; even after using several important covariates, we won't be doing any better than just using the average for FC Barcelona's goals for that season (2.60):

  1. First, we load the dataset and create a dummy variable for home/away. We transform the dates, so we can compute the time difference between two games (we suspect that the number of days has a positive effect on the number of goals):
library("tscount") 
library(dummy)
library(dplyr)
data = read.table("./E1.txt",sep=" ",head=T)
data$home_away = ifelse(data$ha == "H", 1, 0)
data$date = as.Date(data$date,format="%m/%d/%Y")
data = data %>% mutate(diff_days = as.numeric(date-lag(date)))
data[is.na(data)] = 0
  1. We formulate our model, using the following variables: whether Barcelona was playing at home or away; the position in the league for the opposing team; the difference in days; and whether there was a Champions League game after that game (we suspect that the best players are not used for these games), or before that game (players can get injured after Champions League games). We specify 1,2,3 lagged values for the dependent variable—the idea is the following: after scoring lots of goals, we would expect the next game to also see lots of goals being scored (a positive autocorrelation). But these results are not encouraging: all the coefficients have large standard errors (so the confidence intervals on the last columns show that all of them include a zero). The beta coefficients relate to the lagged values for the number of goals (three lags – three coefficients). Replace by this: The eta ones relate to the regressors: for example eta_1 corresponds to whether the game was played home/away; if the 95% confidence interval did not include the zero, we would be able to say that playing home makes the team more likely to score - since the coefficient is positive. It is important to state that because neither coefficient is significative, we can't really make any inference here:
seatbeltsfit <- tsglm(data$Goals, model = list(past_obs = c(1:3)), link = "log", distr = "poisson",xreg
= cbind(data$home_away,data$pos,data$diff_days, data$champions_next_days_after,data$champions_next_days_before))

The following output shows the estimated model coefficients:

  1. Let's assume we want to predict the next Barcelona game, played at home against a team that ranks seventh in the domestic league, and played after a Champions League game. We get 1.86 goals, and for the same game played away we get 1.43 (which makes sense because teams always perform better when playing at home). And what happens if the next game is against that very same team, but with no Champions League game in between or after? Then, we get 1.76 goals. It is important to realize that these are the conditional mean predictions for a Poisson distribution (the lambda parameter), so these numbers should be rounded in order to present them. Here, we are just taking the prediction, but we also get confidence intervals that we can use (all of them will be bounded by zero). If we wanted to predict the next two games, we would need to pass a matrix with two rows (remember that the predicted values at t+k are reused to compute the predicted ones at t+k+1): 
J = matrix(c(1,7,10,1,1),c(1,5)) 
predict(seatbeltsfit, n.ahead = 1, level = 0.9, global = TRUE,B = 2000, newxreg = J)$pred
J = matrix(c(0,7,10,1,1),c(1,5))
predict(seatbeltsfit, n.ahead = 1, level = 0.9, global = TRUE, B = 2000, newxreg = J)$pred
J = matrix(c(0,7,10,0,0),c(1,5))
predict(seatbeltsfit, n.ahead = 1, level = 0.9, global = TRUE, B = 2000, newxreg = J)$pred
  1. Finally, we can plot the fitted versus the actual values, and as we would expect, they exhibit a very bad fit. The model is unable to predict why Barcelona scores more than two goals:
library(ggplot2) 
frame = data.frame(true_vals = model$response, fit = round(model$fitted.values))
ggplot(frame,aes(1:38)) + geom_line(aes(y = true_vals, colour = "Observed goals")) + geom_line(aes(y = fit, colour = "predicted goals"))

The following output shows the predicted and observed goals for FC Barcelona:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.235.79