Regression analysis is one of the most commonly used statistical methods. It is covered in most undergraduate and graduate statistical courses. However, the method discussed in these courses is the standard multiple regression model with one response variable. In this chapter, we will introduce multivariate time series regression models with several response variables. We will illustrate this method using many examples.
In this chapter, we will discuss several different formulations of multivariate time series regression models. The multiple regression is one of the most commonly used statistical models, so we will start with its multivariate representation in the next section. Other extensions and representations will be introduced in Sections 3.3 and 3.4. They include the representation adapted from the vector autoregressive models, which will be referred to as vector time series regression models. The VARX model is another extension. We will discuss the similarities and differences among these extensions and presentations.
In a multiple regression model, a response variable Y is related to k predictor variables, X1, X2, …, Xk, as follows,
where ξ is assumed to be uncorrelated white noise, often as i.i.d. N(0, σ2). When time series data are used to fit a multiple regression model, we often write Eq. (3.1) as
where t refers to time,
and in time series regression ξt is normally assumed to follow a time series model such as AR(p).
When we have time series data from time t = 1 to t = n, we can present Eq. (3.2) in the matrix form,
where
and ξ follows a n‐dimensional multivariate normal distribution N(0, Σ). Given Σ, the generalized least squares estimator (GLS)
is known to be the best unbiased estimator in the sense that for any constant vector c, the estimator
has the smallest possible variance among all other unbiased estimators of β in the form
Now, suppose that instead of one response variable in Eq. (3.2), we have m response time series variables related to these k predictor time series variables, that is,
or
where
and
For i = 1, 2, …, m and time t = 1 to t = n, let
The matrix form of the multiple regression for the ith response variable of is
which, as expected, is exactly the same as Eq. (3.3). Putting all the multiple regressions for the m response variables together from t = 1 to t = n, we have
where
and
Each ξ(i) follows a n‐dimensional multivariate normal distribution N(0, Σ(i)), i = 1, …, m, and ξ(i) and ξ(j) are uncorrelated for i ≠ j. We will call the model given in Eq. (3.7) the multivariate multiple time series regression model.
As noted in Eq. (3.6), the ith response Y(i) actually follows the general multiple time series regression model
or
where ξ(i) = [ξi,1, ξi,2, …, ξi,n]′ follows a n‐dimensional multivariate normal distribution N(0, Σ(i)). In the time series regression, ξi,t is often assumed to follow a time series model such as AR(p). From the results of the multiple regression, we know that when Σ(i) is known, the GLS estimator
is the best unbiased estimator.
Normally, we will not know the variance–covariance matrix Σ(i) of ξ(i). Even if ξi,t follows a time series model such as AR(p) or ARMA(p, q), the Σ(i) structure is not known because the related time series model parameters are usually unknown. In this case, we use the following GLS procedure suggested in Wei (2006, Chapter 15):
Compute the residuals from the GLS model fitting in step 4, and repeat step 1 through step 4 until a convergence criterion (such as the maximum absolute value change in the estimates between iterations becomes less than a specified quantity) is reached.
Combining for i = 1, …, m, we get
where
and the estimate of the variance–covariance matrix Σ(i) of ξ(i) is given by step 3 in the last GLS iteration.
It should be pointed out that although the error term can be autocorrelated in the time series regression model, it should be stationary. A nonstationary error structure could produce a spurious regression where a significant regression can be achieved for totally unrelated series as pointed out by Abraham and Ledolter (2006), Chatterjee, Hadi, and Price (2006), Draper and Smith (1998), Granger and Newbold (1986), and Phillips (1986).
Recall from Chapter 2 that the m‐dimensional vector autoregressive model, VAR(p), is given by
where θ0 is a m × 1 constant vector, Φi are m × m parameter coefficient matrices, at is a sequence of m‐dimensional vector white noise process, VWN(0, Σ). Eq. (3.16) can be extended to the following
where a response vector Yt is related to k predictor vectors, X1,t, …, Xk,t, and the error vector, ξt, is a m‐dimensional Gaussian vector white noise process, VWN(0, Σ). To make the model in Eq. (3.17) more general, some or all of the predictor vectors do not need to have the same dimension as the response vector Yt. For example, instead of the dimension m, Xi,t can have a dimension r. In such a case, the dimension of the associated parameter coefficient matrix Φi will be m × r, which will no longer be a square matrix like those in the VAR(p) model.
For the multivariate time series regression, some software packages, such as MATLAB (2017), use the following model,
where Xt is a m × r design matrix for r exogenous variables. Since the model involves the VAR structure for Yt and predictor vector Xt, it is known as a VARX model. However, it should be noted that in Eq. (3.18) the associated regression coefficients β corresponding to the r exogenous variables is a r × 1 vector, which implies that the column entries of Xt share a common regression coefficient for all t, and this is relatively restrictive. In this formulation, the VARX model in Eq. (3.18) without lagged response vector variables Yt − j does not reduce to the multivariate multiple regression model given in Eq. (3.5).
Another representation of the VARX model is given by
where Φi is a m × m parameter matrix for Yt − i, Xt is a r‐dimensional time series vector for the r exogenous variables, and Θi is a m × r parameter matrix for Xt − i. This representation is used by some other software such as SAS and is called the VARX(p,s) model, and it is the form that we recommend to use. The parameter estimation of vector time series regression models is achieved through either the least squares (LS) or the maximum likelihood (ML), similar to those of vector time series models introduced in Chapter 2. Once the model is fitted, it can be used to forecast Yt + ℓ as follows
where a separate vector time series model of Xt may need to be constructed for j ≥ 0.
The forecasting procedures are exactly the same as those discussed in Chapter 2. Rather than repeating them, we will look at some useful empirical examples instead.
18.118.253.223