Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3
Multivariate time series regression models

Regression analysis is one of the most commonly used statistical methods. It is covered in most undergraduate and graduate statistical courses. However, the method discussed in these courses is the standard multiple regression model with one response variable. In this chapter, we will introduce multivariate time series regression models with several response variables. We will illustrate this method using many examples.

3.1 Introduction

In this chapter, we will discuss several different formulations of multivariate time series regression models. The multiple regression is one of the most commonly used statistical models, so we will start with its multivariate representation in the next section. Other extensions and representations will be introduced in Sections 3.3 and 3.4. They include the representation adapted from the vector autoregressive models, which will be referred to as vector time series regression models. The VARX model is another extension. We will discuss the similarities and differences among these extensions and presentations.

3.2 Multivariate multiple time series regression models

3.2.1 The classical multiple regression model

In a multiple regression model, a response variable Y is related to k predictor variables, X₁, X₂, …, X_k, as follows,

(3.1)

where ξ is assumed to be uncorrelated white noise, often as i.i.d. N(0, σ²). When time series data are used to fit a multiple regression model, we often write Eq. (3.1) as

(3.2)

where t refers to time,

and in time series regression ξ_t is normally assumed to follow a time series model such as AR(p).

When we have time series data from time t = 1 to t = n, we can present Eq. (3.2) in the matrix form,

(3.3)

where

and ξ follows a n‐dimensional multivariate normal distribution N(0, Σ). Given Σ, the generalized least squares estimator (GLS)

(3.4)

is known to be the best unbiased estimator in the sense that for any constant vector c, the estimator

has the smallest possible variance among all other unbiased estimators of β in the form

3.2.2 Multivariate multiple regression model

Now, suppose that instead of one response variable in Eq. (3.2), we have m response time series variables related to these k predictor time series variables, that is,

(3.5)

where

and

For i = 1, 2, …, m and time t = 1 to t = n, let

The matrix form of the multiple regression for the ith response variable of is

(3.6)

which, as expected, is exactly the same as Eq. (3.3). Putting all the multiple regressions for the m response variables together from t = 1 to t = n, we have

(3.7)

where

and

Each ξ_(i) follows a n‐dimensional multivariate normal distribution N(0, Σ_(i)), i = 1, …, m, and ξ_(i) and ξ_(j) are uncorrelated for i ≠ j. We will call the model given in Eq. (3.7) the multivariate multiple time series regression model.

3.3 Estimation of the multivariate multiple time series regression model

3.3.1 The Generalized Least Squares (GLS) estimation

As noted in Eq. (3.6), the ith response Y_(i) actually follows the general multiple time series regression model

(3.8)

(3.9)

where ξ_(i) = [ξ_i,1, ξ_i,2, …, ξ_i,n]^′ follows a n‐dimensional multivariate normal distribution N(0, Σ_(i)). In the time series regression, ξ_i,t is often assumed to follow a time series model such as AR(p). From the results of the multiple regression, we know that when Σ_(i) is known, the GLS estimator

(3.10)

is the best unbiased estimator.

Normally, we will not know the variance–covariance matrix Σ_(i) of ξ_(i). Even if ξ_i,t follows a time series model such as AR(p) or ARMA(p, q), the Σ_(i) structure is not known because the related time series model parameters are usually unknown. In this case, we use the following GLS procedure suggested in Wei (2006, Chapter 15):

Step 1: Calculate the ordinary least squares (OLS) residuals from the OLS fitting of the model in (3.8).
Step 2: Estimate the parameters of the assumed time series model based on the OLS residuals .
Step 3: Compute from the estimated model for ξ_i,t obtained in step 2.
Step 4: Compute the GLS estimator, , using the obtained in step 3.

Compute the residuals from the GLS model fitting in step 4, and repeat step 1 through step 4 until a convergence criterion (such as the maximum absolute value change in the estimates between iterations becomes less than a specified quantity) is reached.

Combining for i = 1, …, m, we get

(3.11)

where

and the estimate of the variance–covariance matrix Σ_(i) of ξ_(i) is given by step 3 in the last GLS iteration.

It should be pointed out that although the error term can be autocorrelated in the time series regression model, it should be stationary. A nonstationary error structure could produce a spurious regression where a significant regression can be achieved for totally unrelated series as pointed out by Abraham and Ledolter (2006), Chatterjee, Hadi, and Price (2006), Draper and Smith (1998), Granger and Newbold (1986), and Phillips (1986).

3.3.2 Empirical Example I – U.S. retail sales and some national indicators

Example 3.1

In this example, we will first examine the OLS regression of U.S. retail sales and some important national indicator variables, including GDP (gross domestic product), DJIA (Dow Jones Industrial Average), and average CPI (consumer price index) from 1992 to 2015. The data set is the WW3a series listed in the Data Appendix and given in Table 3.1 with the plot given in Figure 3.1.

After examining the figure, we will first try the following regression equations

(3.12)

where

Y_1,t = sales of motor vehicle and parts dealers
Y_2,t = sales of food and beverage stores
Y_3,t = sales of health care and personal care stores
Y_4,t = sales of general merchandise stores
GDP_t = GDP (trillion) at year t
DJIA_t − 1 = Dow Jones Industrial Average at beginning of the year
CPI_t = average CPI of the year (1982–1984 = 100)

and the ξ_i,t follows independent In terms of the OLS regression, Eq. (3.12) becomes

(3.13)

where

and ξ_i,t follows The estimation results are given in Table 3.2.

The estimate of the variance–covariance matrix of the residual vectors is

The models fit very well with very large R‐squares. All sales are positively correlated with current year GDP. However, surprisingly most parameter estimates are not significant. Removing these predictor variables with p‐value more than 0.2, we get the following estimation results in Table 3.3.

The results are quite surprising. All nonsignificant constant terms become significant except for Y_4,t. All sales are positively related to only one predictor variable with fitted R² values almost equal to those fitted with five predictor variables that were used in Table 3.2. The sales of motor vehicle and parts dealers (Y_1,t) are highly related to Dow Jones Industrial Average at the beginning of the year (DJIA_t − 1). The sales of food and beverage stores (Y_2,t) is positively related to average CPI of the year (CPI_t) with R² value equal to 0.984. The sales of health care and personal care stores (Y_3,t) at year t are positively related to GDP at year t (GDP_t) and year t − 2 (GDP_t − 2) with R² value almost 1, and their sales are not related to the previous year’s GDP. Lastly, the sales of general merchandise stores (Y_4,t) are highly and positively related to the current year GDP (GDP_t) with fitted R² value almost 1.

Table 3.1 Multivariate regression data set.

Data sources: sales data (in millions) from United States Census of Bureau; GDP (in trillions) from World Bank national accounts data and OECD Nation Accounts data files; CPI ([1982–1984] = 100) from the U.S. Department of Labor, Bureau of Labor Statistics.

Year t	Y_1,t	Y_2,t	Y_3,t	Y_4,t	GDP_t	GDP_t − 1	GDP_t − 2	DJIA_t − 1	CPI_t
1992	418 393	370 513	89 705	247 876	6.539	6.174	5.98	3 168.8	140.300
1993	472 916	374 516	92 594	265 996	6.879	6.539	6.174	3 301.1	144.500
1994	541 141	384 340	96 363	285 190	7.309	6.879	6.539	3 754.1	148.200
1995	579 715	390 386	101 635	300 498	7.664	7.309	6.879	3 834.4	152.400
1996	627 507	401 073	109 557	315 305	8.100	7.664	7.309	5 117.1	156.900
1997	653 817	409 373	118 672	331 363	8.609	8.100	7.664	6 448.3	160.500
1998	688 415	416 525	129 583	351 081	9.089	8.609	8.100	7 908.3	163.000
1999	764 204	433 699	142 699	380 179	9.661	9.089	8.609	9 181.4	166.600
2000	796 210	444 764	155 234	404 228	10.285	9.661	9.089	11 497.1	172.200
2001	815 579	462 429	166 533	427 468	10.622	10.285	9.661	10 786.9	177.100
2002	818 811	464 856	179 983	446 520	10.978	10.622	10.285	10 021.5	179.900
2003	841 588	474 385	192 426	468 771	11.511	10.978	10.622	8 341.6	184.000
2004	866 372	490 380	199 290	497 382	12.275	11.511	10.978	10 453.9	188.900
2005	888 307	508 484	210 085	528 385	13.094	12.275	11.511	10 783	195.300
2006	899 997	525 232	223 336	554 256	13.856	13.094	12.275	10 717.5	201.600
2007	910 139	547 837	237 164	578 582	14.478	13.856	13.094	12 463.2	207.342
2008	785 865	569 276	246 573	595 041	14.719	14.478	13.856	13 264.8	215.303
2009	671 772	568 418	252 794	588 918	14.419	14.719	14.478	8 776.4	214.537
2010	742 913	580 530	260 435	603 757	14.964	14.419	14.719	10 428.1	218.056
2011	812 938	609 137	271 612	624 766	15.518	14.964	14.419	11 577.5	224.939
2012	886 494	628 205	274 000	642 313	16.155	15.518	14.964	12 217.6	229.594
2013	959 294	640 847	281 840	651 874	16.692	16.155	15.518	13 104.1	232.957
2014	1 020 851	669 165	299 263	667 163	17.393	16.692	16.155	16 576.7	236.736
2015	1 095 412	685 568	315 257	674 928	18.037	17.393	16.692	17 823.1	237.017

Figure 3.1 U.S. retail sales, GDP, DJIA, and CPI.

images — **Table 3.2** OLS estimation results.

Example 3.2

The residuals from the OLS regression in Example 3.1 are not quite white noise. In fact, they can all be approximated by an AR(1) process. We will improve the OLS estimation with GLS described in Section 3.3.1 on the model

(3.14)

where ξ_(i) = [ξ_i,1, ξ_i,2, …, ξ_i,n]^′ follows a n‐dimensional multivariate normal distribution N(0, Σ_(i)), and Σ_(i) is a variance–covariance matrix from AR(1) process, ξ_i,t = ϕ⁽ⁱ⁾ξ_{i,t − 1} + a_i,t. The GLS estimate of β_(i) is

(3.15)

where is the estimated variance–covariance matrix of the residuals from the fitted multiple regression of in Example 3.1. The GLS results from SAS AUTOREG procedure are given in Table 3.4.

Compared to the results in Table 3.2, we can see that the R² values using the GLS are much higher. More importantly, a few more parameter estimates become significant and meaningful.

3.4 Vector time series regression models

3.4.1 Extension of a VAR model to VARX models

Recall from Chapter 2 that the m‐dimensional vector autoregressive model, VAR(p), is given by

(3.16)

where θ₀ is a m × 1 constant vector, Φ_i are m × m parameter coefficient matrices, a_t is a sequence of m‐dimensional vector white noise process, VWN(0, Σ). Eq. (3.16) can be extended to the following

(3.17)

where a response vector Y_t is related to k predictor vectors, X_1,t, …, X_k,t, and the error vector, ξ_t, is a m‐dimensional Gaussian vector white noise process, VWN(0, Σ). To make the model in Eq. (3.17) more general, some or all of the predictor vectors do not need to have the same dimension as the response vector Y_t. For example, instead of the dimension m, X_i,t can have a dimension r. In such a case, the dimension of the associated parameter coefficient matrix Φ_i will be m × r, which will no longer be a square matrix like those in the VAR(p) model.

For the multivariate time series regression, some software packages, such as MATLAB (2017), use the following model,

(3.18)

where X_t is a m × r design matrix for r exogenous variables. Since the model involves the VAR structure for Y_t and predictor vector X_t, it is known as a VARX model. However, it should be noted that in Eq. (3.18) the associated regression coefficients β corresponding to the r exogenous variables is a r × 1 vector, which implies that the column entries of X_t share a common regression coefficient for all t, and this is relatively restrictive. In this formulation, the VARX model in Eq. (3.18) without lagged response vector variables Y_t − j does not reduce to the multivariate multiple regression model given in Eq. (3.5).

Another representation of the VARX model is given by

(3.19)

where Φ_i is a m × m parameter matrix for Y_t − i, X_t is a r‐dimensional time series vector for the r exogenous variables, and Θ_i is a m × r parameter matrix for X_t − i. This representation is used by some other software such as SAS and is called the VARX(p,s) model, and it is the form that we recommend to use. The parameter estimation of vector time series regression models is achieved through either the least squares (LS) or the maximum likelihood (ML), similar to those of vector time series models introduced in Chapter 2. Once the model is fitted, it can be used to forecast Y_t + ℓ as follows

(3.20)

where a separate vector time series model of X_t may need to be constructed for j ≥ 0.

The forecasting procedures are exactly the same as those discussed in Chapter 2. Rather than repeating them, we will look at some useful empirical examples instead.

3.4.2 Empirical Example II – VARX models for U.S. retail sales and some national indicators

Example 3.3

In this example, we will use the vector time series regression model introduced in Eq. (3.17) to study the relationship for the variables presented in Table 3.1. However, to examine the efficiency of the model for forecasting, we will use the data from 1992 to 2014 for model fitting and leave the 2015 values for comparison. Specifically, we will examine the following model:

(3.21)

where t from 1 (1992) to 23 (2014),

and ξ_t is a four‐dimensional Gaussian vector white noise process, VWN(0, Σ).

The least squares estimation result using SAS VARMAX procedure is given by

(3.22)

and the estimate of the variance–covariance matrix of the residual vectors is

We now use Eq. (3.22) to forecast the 2015 sales and the results are given in Table 3.5.

Table 3.5 Forecasts for 2015 sales from the model in Eq. (3.22).

Variable	Forecast	Standard Error	95% Forecast Limit	Actual Value
Y_1,t	1 122 531.94	51 687.55	(1 021 226.20, 1 223 837.68)	1 095 412
Y_2,t	632 611.36	11 185.94	(612 647.29, 625 275.43)	685 568
Y_3,t	314 881.38	4 388.13	(306 280.80, 323 481.96)	315 257
Y_4,t	727 816.36	9 829.43	(708 551.03, 747 081.69)	674 928

Example 3.4

In this example, we will use the VARX(p,s) model introduced in Eq. (3.19) to study the relationship of U.S. retail sales and some important national indicator variables discussed in Examples 3.1 and 3.2 by using the full data set WW3a, which includes more related values of predictor variables, as shown in Table 3.6.

Table 3.6 Vector time series regression data set.

Year (t)	Y_1,t	Y_2,t	Y_3,t	Y_4,t	X_1,t	X_{1,t − 1}	X_{1,t − 2}	X_2,t	X_{2,t − 1}	X_{2,t − 2}	X_3,t	X_{3,t − 1}	X_{3,t − 2}
1992	418 393	370 513	89 705	247 876	6.539	6.174	5.98	3 301.1	3 168.83	2 633.66	140.3	136.2	130.7
1993	472 916	374 516	92 594	265 996	6.879	6.539	6.174	3 754.1	3 301.1	3 168.83	144.5	140.3	136.2
1994	541 141	384 340	96 363	285 190	7.309	6.879	6.539	3 834.4	3 754.1	3 301.1	148.2	144.5	140.3
1995	579 715	390 386	101 635	300 498	7.664	7.309	6.879	5 117.1	3 834.4	3 754.1	152.4	148.2	144.5
1996	627 507	401 073	109 557	315 305	8.1	7.664	7.309	6 448.3	5 117.1	3 834.4	156.9	152.4	148.2
1997	653 817	409 373	118 672	331 363	8.609	8.1	7.664	7 908.3	6 448.3	5 117.1	160.5	156.9	152.4
1998	688 415	416 525	129 583	351 081	9.089	8.609	8.1	9 181.4	7 908.3	6 448.3	163	160.5	156.9
1999	764 204	433 699	142 699	380 179	9.661	9.089	8.609	11 497.1	9 181.4	7 908.3	166.6	163	160.5
2000	796 210	444 764	155 234	404 228	10.285	9.661	9.089	10 786.9	11 497.1	9 181.4	172.2	166.6	163
2001	815 579	462 429	166 533	427 468	10.622	10.285	9.661	10 021.5	10 786.9	11 497.1	177.1	172.2	166.6
2002	818 811	464 856	179 983	446 520	10.978	10.622	10.285	8 341.6	10 021.5	10 786.9	179.9	177.1	172.2
2003	841 588	474 385	192 426	468 771	11.511	10.978	10.622	10 453.9	8 341.6	10 021.5	184	179.9	177.1
2004	866 372	490 380	199 290	497 382	12.275	11.511	10.978	10 783	10 453.9	8 341.6	188.9	184	179.9
2005	888 307	508 484	210 085	528 385	13.094	12.275	11.511	10 717.5	10 783	10 453.9	195.3	188.9	184
2006	899 997	525 232	223 336	554 256	13.856	13.094	12.275	12 463.2	10 717.5	10 783	201.6	195.3	188.9
2007	910 139	547 837	237 164	578 582	14.478	13.856	13.094	13 264.8	12 463.2	10 717.5	207.342	201.6	195.3
2008	785 865	569 276	246 573	595 041	14.719	14.478	13.856	8 776.4	13 264.8	12 463.2	215.303	207.342	201.6
2009	671 772	568 418	252 794	588 918	14.419	14.719	14.478	10 428.1	8 776.4	13 264.8	214.537	215.303	207.342
2010	742 913	580 530	260 435	603 757	14.964	14.419	14.719	11 577.5	10 428.1	8 776.4	218.056	214.537	215.303
2011	812 938	609 137	271 612	624 766	15.518	14.964	14.419	12 217.6	11 577.5	10 428.1	224.939	218.056	214.537
2012	886 494	628 205	274 000	642 313	16.155	15.518	14.964	13 104.1	12 217.6	11 577.5	229.594	224.939	218.056
2013	959 294	640 847	281 840	651 874	16.692	16.155	15.518	16 576.7	13 104.1	12 217.6	232.957	229.594	224.939
2014	1 020 851	669 165	299 263	667 163	17.393	16.692	16.155	17 823.07	16 576.7	13 104.1	236.736	232.957	229.594
2015	1 095 412	685 568	315 257	674 928	18.037	17.393	16.692	17 425.03	17 823.07	16 576.7	237.017	236.736	232.957

We recall that

Y_1,t = sales of motor vehicle and parts dealers of the year t
Y_2,t = sales of food and beverage stores of the year t
Y_3,t = sales of health care and personal care stores of the year t
Y_4,t = sales of general merchandise stores of the year t

and let

X_1,t = GDP (trillion) of the year t
X_2,t = Dow Jones Industrial Average of the year t
X_3,t = average CPI of the year t

Specifically, we will examine the following VARX(1,2) model

(3.23)

where

(3.24)

and ξ_t is a four‐dimensional Gaussian vector white noise process, VWN(0, Σ).

The least square estimation result using SAS VARMAX procedure is given by

(3.25)

and

(3.26)

Most parameter estimates in Eq. (3.25) are not significant, so we need to re‐estimate the model by setting the parameters with insignificant estimates to zero. However, to examine the efficiency of this VARX model in forecasting, in the re‐estimation, we will only use the data from 1992 to 2014 and leave the 2015 data for forecast comparison. From this re‐estimation, we obtain the following simplified equation:

(3.27)

and

(3.28)

The result becomes much simpler, and we can conclude the following:

The sales of motor vehicle and parts dealers at year t depends mainly on the sales of its previous year, the current and previous year GDP, and the current year Dow Jones Industrial Average.
The sales of food and beverage stores at year t depends on the sales of the previous year. They are positively related to the previous year sales of health care and personal care stores but negatively related to the previous year sales of general merchandise stores.
The sales of health care and personal care stores of the year t primarily depend on the sales performance of its previous year and very little on other variables.
The sales of general merchandise at year t depends on the sales of the previous year, negatively related to the previous year sales of food and beverage stores, and positively related to the current year GDP.

We now use Eq. (3.27) to forecast the 2015 sales and the results are given in Table 3.7.

By comparing Tables 3.5 and 3.7, we see that the forecast intervals based on the model in Eq. (3.22) are much shorter than those based on the model in Eq. (3.27), and these are true for all four of the response variables. However, Table 3.8 gives the exact comparison between forecasts and actual values of the 2015 sales. Based on the average of absolute forecast errors, the model in Eq. (3.27) outperforms the model in Eq. (3.22).

Table 3.7 Forecasts for 2015 sales from the model in Eq. (3.27).

Variable	Forecast	Standard Error	95% Forecast Limit	Actual Value
Y_1,t	1 119 316.06	103 437.70	(916 581.89, 1 322 050.24)	1 095 412
Y_2,t	700 295.38	13 187.77	(674 447.83, 726 142.93)	685 568
Y_3,t	313 664.52	7 857.42	(298 264.26, 329 064.78)	315 257
Y_4,t	702 687.10	12 922.15	(677 360.16, 728 014.05)	674 928

Table 3.8 Forecast errors for 2015 sales from models in Eqs. (3.22) and (3.27).

Sales Variable	Forecast Error From (3.22)	Forecast Error From (3.27)
Y_1,t	27 119.94	23 904.06
Y_2,t	−52 956.64	14 727.38
Y_3,t	−2 115.62	−1 592.48
Y_4,t	72 816.37	27 749.10
Average of Absolute Errors	38 752.14	16 633.25

3.5 Empirical Example III – Total mortality and air pollution in California

Example 3.5

Air pollution is by far the largest contributor to early death. According to a WHO report, one in eight total global deaths was the result of air pollution exposure. Since California is one of the most air polluted states in the U.S., to further illustrate the VARX(p,s) model, we will examine the relationship between total mortality and air pollution in California. As we know, because of air pollution problems, Los Angeles established an Air Pollution Control Agency in the early 1950s. Since problems with air pollution remained, LA residents started to complain about the effectiveness of the agency. So, the agency decided to offer a research grant for scholars Box and Tiao to study the effectiveness of its various control schemes. It was through this study that Box and Tiao (1975) introduced the well‐known time series intervention method.

We believe that a suitable time unit used to study the relationship between death and air pollution is yearly. Unfortunately, we cannot find any annual air pollution data sets. The only data sets that we can find are the daily mortality and air pollution, which contains a very large number of variables and observations, but there are many missing values that cannot be easily replaced by any available estimation method. Currently available software for multivariate regression either requires no missing values or known models for these missing values. As a result, we will consider the daily mortality and pollution levels for two air pollutants, ozone and carbon monoxide, from five Californian cities between April 13 and September 14, 1999. Thus, we will use only a portion of the original dataset, which includes a vector of five dependent variables and two vectors of independent variables with five elements each. The data set of these 15 time series is listed as WW3b in the Data Appendix. The detailed description of notations used is given in Table 3.9. The plot of the numbers of daily death in these five California cities is given in Figure 3.2.

Table 3.9 Description of notations used in Example 3.5.

Variable	Description
D	daily mortality counts from all causes
O3	daily ozone time series, measured in parts per billion (ppb)
CO	carbon monoxide, measured in μg m⁻³
LA	Los Angeles, CA
RS	Riverside, CA
SD	San Diego, CA
SJ	San Jose, CA
SA	Santa Ana/Anaheim, CA

Figure 3.2 The daily deaths in five California cities.

To investigate the relationship between the daily death and the air pollutant O₃ and CO, we first examine the correlation matrix of these 15 variables as shown in Table 3.10.

Table 3.10 Correlation matrix for the 15 variables.

With we can see that the air pollution levels among the five cities are highly correlated, but their correlations with death are not strong because the time unit used is daily and not yearly as we indicated earlier. Simply for an illustration of multivariate time series regression model building, we try the following VARX model,

(3.29)

where

We use (n − 3) observations to fit the model and keep the last three observations for forecast comparison. The least squares estimation result using SAS VARMAX procedure is given next.

(3.30)

and the estimate of the variance–covariance matrix of the residual vectors is

The next three period forecasts and their related information are given in Table 3.11. The result is not quite satisfactory. In fact, the real death value for San Diego on September 12, 1999 is outside of the 95% forecast limits. Next, let us try to fit the following VARX model

(3.31)

The estimation result is

(3.32)

and the estimate of the variance–covariance matrix of the residual vectors is

Table 3.11 The detailed information for the three period forecasts from the model in (3.30).

Variable	Period	Date	Forecast	Real Value	Error	Standard Error	95% Confidence Limits
LA_death	153	September 12, 1999	133.69	140	−6.31	13.69134	106.8589	160.528
	154	September 13, 1999	138.30	126	12.30	13.69134	111.4681	165.1372
	155	September 14, 1999	129.71	132	−2.29	13.69134	102.8743	156.5434
RS_death	153	September 12, 1999	25.59	26	−0.41	5.28314	15.23418	35.9437
	154	September 13, 1999	27.65	29	−1.35	5.28314	17.29514	38.00466
	155	September 14, 1999	22.48	29	−6.52	5.28314	12.12868	32.8382
SD_death	153	September 12, 1999	21.50	34	−12.50	4.73102	12.23077	30.77602
	154	September 13, 1999	20.93	17	3.93	4.73102	11.65738	30.20263
	155	September 14, 1999	21.84	23	−1.16	4.73102	12.56833	31.11358
SJ_death	153	September 12, 1999	43.69	44	−0.31	6.50609	30.94079	56.4442
	154	September 13, 1999	45.87	54	−8.13	6.50609	33.1181	58.62151
	155	September 14, 1999	40.76	40	0.76	6.50609	28.00574	53.50915
SA_death	153	September 12, 1999	36.67	32	4.67	5.90621	25.09582	48.24773
	154	September 13, 1999	38.91	33	5.91	5.90621	27.32965	50.48155
	155	September 14, 1999	36.53	30	6.53	5.90621	24.95158	48.10348

The next three period forecasts and their related information from the model in Eq. (3.32) are given in Table 3.12. Again, the forecast result is not satisfactory. The real death value for San Diego on September 12, 1999, is also outside of the 95% forecast limits.

Table 3.12 The detailed information for the three period forecasts from the model in (3.32).

Variable	Period	Date	Forecast	Real Value	Error	Standard Error	95% Confidence Limits
LA_death	153	September 12, 1999	135.26	140	−4.74	13.64644	108.5122	162.0053
	154	September 13, 1999	137.45	126	11.45	13.94102	110.1261	164.7739
	155	September 14, 1999	129.59	132	−2.41	13.95659	102.2356	156.9444
RS_death	153	September 12, 1999	25.33	26	−0.64	5.36105	14.82444	35.83936
	154	September 13, 1999	27.79	29	−1.21	5.39594	17.20954	38.36124
	155	September 14, 1999	23.09	29	−5.91	5.39883	12.50404	33.66707
SD_death	153	September 12, 1999	20.67	34	−13.33	4.65587	11.54558	29.79624
	154	September 13, 1999	20.90	17	3.90	4.83337	11.42951	30.37596
	155	September 14, 1999	21.61	23	−1.39	4.84264	12.12102	31.10381
SJ_death	153	September 12, 1999	43.46	44	−0.54	6.50748	30.70738	56.21626
	154	September 13, 1999	45.79	54	−8.21	6.58074	32.88833	58.68437
	155	September 14, 1999	41.23	40	1.27	6.58203	28.32596	54.12705
SA_death	153	September 12, 1999	35.96	32	3.96	5.76135	24.6661	47.25018
	154	September 13, 1999	38.80	33	5.80	5.86926	27.29824	50.30531
	155	September 14, 1999	35.67	30	5.67	5.87366	24.16234	47.18665

Now, let us try the following VARX model with more parameters,

(3.33)

The estimation result is

(3.34)

and the estimate of the variance–covariance matrix of the residual vectors is

The next three period forecasts and their related information from the model in Eq. (3.34) are given in Table 3.13.

Table 3.13 The detailed information for the three period forecasts from the model in Eq. (3.34).

Variable	Period	Date	Forecast	Real	Error	Standard Error	95% Confidence Limits
LA_death	153	September 12, 1999	130.13	140	−9.87	13.71186	103.25687	157.00638
	154	September 13, 1999	136.35	126	10.35	13.71186	109.48000	163.22951
	155	September 14, 1999	127.92	132	−4.08	13.71186	101.04084	154.79036
RS_death	153	September 12, 1999	25.89	26	−0.11	5.34510	15.41636	36.36878
	154	September 13, 1999	27.03	29	−1.94	5.34510	16.55793	37.51034
	155	September 14, 1999	25.57	29	−3.43	5.34510	15.09440	36.04681
SD_death	153	September 12, 1999	22.66	34	−11.34	4.78436	13.28403	32.03839
	154	September 13, 1999	20.77	17	3.77	4.78436	11.38971	30.14407
	155	September 14, 1999	23.29	23	0.29	4.78436	13.91430	32.66867
SJ_death	153	September 12, 1999	44.16	44	0.16	6.46995	31.48270	56.84444
	154	September 13, 1999	45.21	54	−8.79	6.46995	32.52533	57.88708
	155	September 14, 1999	46.03	40	6.03	6.46995	33.34482	58.70657
SA_death	153	September 12, 1999	36.92	32	4.92	5.74493	25.66091	48.18061
	154	September 13, 1999	38.98	33	5.98	5.74493	27.71603	50.23574
	155	September 14, 1999	37.75	30	7.75	5.74493	26.49270	49.01241

The forecast result is again not satisfactory with the real death value for San Diego on September 12, 1999, falling very much outside of the 95% forecast limits.

The model in Eq. (3.33) has more parameters than those models in Eqs. (3.29) and (3.31). However, its sum of absolute forecast errors is much larger than the other two simpler models. This example shows that a more complicated model does not necessarily produce a better result.

There are not many references on multivariate time series regression models. In closing this chapter, we recommend some references to our readers, including Pankratz (1991), Rao (2002), Lütkepohl (2007), SAS Institute, Inc. (2015), and Islam (2017).

Software code

R code for Examples 3.1–3.3

##data import##
df1 <- read.csv(file = "c:/Bookdata/WW3a.csv")
colnames(df1)[2:5] <- c("Y1t","Y2t","Y3t","Y4t")

##Example 3.1##
#Table 3.2#
reg1 <- lm(cbind(df1$Y1t,df1$Y2t,df1$Y3t,df1$Y4t) ~ df1$GDP +
        df1$GDP.1 + df1$GDP.2 + df1$DJIA.1 + df1$CPI)
summary(reg1)

#Table 3.3#
lm1 <- lm(Y1t ~ DJIA.1,data = df1)
summary(lm1)

lm2 <- lm(Y2t ~ CPI,data = df1)
summary(lm2)

lm3 <- lm(Y3t ~ GDP + GDP.2, data = df1)
summary(lm3)

lm4 <- lm(Y4t ~ GDP, data = df1)
summary(lm4)

##End of Example 3.1##

##Example 3.2##
#Table 3.4#
library(nlme)

lm1 <- gls(Y1t ~ GDP + GDP.1 + GDP.2 + DJIA.1 + CPI,
    data = df1, method = "ML", correlation = corARMA(p = 1,form = ~ Year),
    verbose = TRUE)
summary(lm1)

lm2 <- gls(Y2t ~ GDP + GDP.1 + GDP.2 + DJIA.1 + CPI,
       data = df1, method = "ML", correlation = corARMA(p = 1,form = ~ Year),
        verbose = TRUE)
summary(lm2)

lm3 <- gls(Y3t ~ GDP + GDP.1 + GDP.2 + DJIA.1 + CPI,
       data = df1, method = "ML", correlation = corARMA (p = 1,form = ~ Year),
        verbose = TRUE)
summary(lm3)

lm4 <- gls(Y4t ~ GDP + GDP.1 + GDP.2 + DJIA.1 + CPI,
       data = df1, method = "ML", correlation = corARMA(p = 1,form = ~ Year),
        verbose = TRUE)
summary(lm4)
##End of Example 3.2##

##Example 3.3##
df2 <- df1[df1$Year != 2015,]
reg2 <- lm(cbind(Y1t,Y2t,Y3t,Y4t) ~ GDP +
        GDP.1 + GDP.2 + DJIA.1 + CPI,data = df2)
summary(reg2)
#Table 3.5
predict(reg2, newdata = df1[df1$Year==2015,])
##End of Example 3.3##

SAS Code

The output from R and SAS for Examples 3.1–3.3 are equivalent. However, the output from R and SAS for Examples 3.4 and 3.5 are not quite the same. We believe the SAS results are correct, so only SAS code will be provided for the last two examples. To help readers using SAS, we also provide SAS code for Examples 3.1–3.3 in the following.

SAS Code for Example 3.1

/*Example 3.1*/
proc import file="C:/Bookdata/WW3a.csv" out=df1 Replace;
run;

data df1;
set df1;
Y1t = Motor_vehicles_parts;
Y2t = Food_beverage;
Y3t = Health;
Y4t = General_merchandise;
run;

proc reg;
model Y1t = GDP GDP_1 GDP_2 DJIA_1 CPI;
model Y2t = GDP GDP_1 GDP_2 DJIA_1 CPI;
model Y3t = GDP GDP_1 GDP_2 DJIA_1 CPI;
model Y4t = GDP GDP_1 GDP_2 DJIA_1 CPI;
run;
quit;

/*Variance Covariance Matrix under Table 3.2*/
proc varmax data = df1;
model Y1t Y2t Y3t Y4t = gdp gdp_1 gdp_2 djia_1 cpi/ method = ls;
run;

proc reg data = df1;
model Y1t = DJIA_1;
model Y2t = CPI;
model Y3t = GDP GDP_2;
model Y4t = GDP;
run;
quit;

/*Figure 3.1*/
proc import file = "c:ookdataww3a.csv" out = df1 dbms = csv replace;
run;

data df1;
set df1;
motor_vehicles_parts = motor_vehicles___parts ;
food_beverage = food___beverage;
keep motor_vehicles_parts food_beverage health general_merchandise gdp djia cpi year;
run;

ods rtf file = "C:Bookdatafigure3.1.rtf" style = JOURNAL startpage = no;
ods graphics / reset height = 2in width = 6.5in;
proc sgplot data = df1;
       series x = year y = motor_vehicles_parts;
        series x = year y = food_beverage/lineattrs = (pattern = dashdotdot);
       series x = year y = health/lineattrs = (pattern = dashdashdot);
       series x = year y = general_merchandise/lineattrs = (pattern = shortdash);
       xaxis values = (1990 to 2015 by 5);
       yaxis label = "millions" values = (80000 to 1200000 by 112000);
       title "Us Retail Sales from 1992 to 2015";
run;
ods graphics / reset height = 1.2in width = 6.5in;
proc sgplot data = df1;
       series x = year y = gdp;
       xaxis values = (1990 to 2015 by 5);
       yaxis label = "trillions" values = (5 to 20 by 3);
       title "GDP from 1992 to 2015";
run;
proc sgplot data = df1;
       series x = year y = djia;
       xaxis values = (1990 to 2015 by 5);
       yaxis label = "index" values = (3300 to 18000 by 2940);
       title "Dow Jones Industrial Average from 1992 to 2015";
run;
proc sgplot data = df1;
       series x = year y = cpi;
       xaxis values = (1990 to 2015 by 5);
       yaxis label = "index" values = (140 to 240 by 20);
       title "Consumer Price Index From 1992 to 2015";
run;
ods rtf close;

SAS Code for Example 3.2

/*Example 3.2*/
proc import file = "C:BookdataWW3a.csv" out = df2 replace;
run;

data df2;
set df2;
Y1t = Motor vehicles_parts;
Y2t = Food_beverage;
Y3t = Health;
Y4t = General_merchandise;
run;

proc autoreg data = df2;
model Y1t = GDP GDP_1 GDP_2 DJIA_1 CPI/NORMAL NLAG = 1 METHOD = ML;
output out = data1 residual = R1;
title "GLS Regression for Y1t";
run;
proc autoreg data = df2;
model Y2t = GDP GDP_1 GDP_2 DJIA_1 CPI/NORMAL NLAG = 1 METHOD = ML;
title "GLS Regression for Y2t";
output out = data2 residual = R2;
run;
proc autoreg data = df2;
model Y3t = GDP GDP_1 GDP_2 DJIA_1 CPI/NORMAL NLAG = 1 METHOD = ML;
title "GLS Regression for Y3t";
output out= data3 residual = R3;
run;
proc autoreg data = df2;
model Y4t = GDP GDP_1 GDP_2 DJIA_1 CPI/NORMAL NLAG = 1 METHOD = ML;
title "GLS Regression for Y4t";
output out = data4 residual = R4;
nloptions maxiter = 10000000 maxtime = 3000;
run;

SAS Code for Example 3.3

/*Example 3.3*/
proc import file = "c:ookdataww3a.csv" out = df3 replace;
run;

data df3;
set df3;
Y1t = Motor_vehicles_parts;
Y2t = Food_beverage;
Y3t = Health;
Y4t = General_merchandise;
run;

data work.df3;
set work.df3;
array y(4) y1t y2t y3t y4t;
do i = 1 to 4;
if year = 2015 then y(i)=.;
end;
run;

proc varmax data= work.df3;
model y1t y2t y3t y4t = gdp gdp_1 gdp_2 djia_1 cpi/ method=ls;
output lead=1;
run;

SAS Code for Example 3.4

/*Example 3.4*/
proc import file = "c:ookdataww3a.csv" out = df4 replace;
run;

data df4;
set df4;
Y1t = Motor_vehicles_parts;
Y2t = Food__beverage;
Y3t = Health;
Y4t = General_merchandise;
run;

/*model before restriction*/
proc varmax data = work.df4;
model y1t y2t y3t y4t = gdp djia cpi/
p = 1 xlag = 2 method = ls;
output lead=1;
run;

/*drop data in 2015*/
data work.df4;
set work.df4;
array y(4) y1t y2t y3t y4t;
do i = 1 to 4;
if year = 2015 then y(i)=.;
end;
run;

/*model after restriction*/
proc varmax data = work.df4;
model y1t y2t y3t y4t = gdp djia cpi/
p = 1 xlag = 2 method = ls;
restrict
const(1) = 0,xl(0,1,3) = 0,xl(1,1,2) = 0,
xl(1,1,3) = 0,xl(2,1,1) = 0,xl(2,1,2) = 0,
xl(2,1,3) = 0,ar(1,1,2) = 0,ar(1,1,3) = 0,
ar(1,1,4) = 0,const(2) = 0,xl(0,2,1) = 0,
xl(0,2,2) = 0,xl(0,2,3) = 0,xl(1,2,1) = 0,
xl(1,2,2) = 0,xl(1,2,3) = 0,xl(2,2,1) = 0,
xl(2,2,2) = 0,xl(2,2,3) = 0,ar(1,2,1) = 0,
const(3) = 0,xl(0,3,1) = 0,xl(0,3,2) = 0,
xl(0,3,3) = 0,xl(1,3,1) = 0,xl(1,3,2) = 0,
xl(1,3,3) = 0,xl(2,3,1) = 0,xl(2,3,2) = 0,
xl(2,3,3) = 0,ar(1,3,1) = 0,ar(1,3,2) = 0,
ar(1,3,4) = 0,const(4) = 0,xl(0,4,2) = 0,
xl(0,4,3) = 0,xl(1,4,1) = 0,xl(1,4,2) = 0,
xl(1,4,3) = 0,xl(2,4,1) = 0,xl(2,4,2) = 0,
xl(2,4,3) = 0,ar(1,4,1) = 0,ar(1,4,3) = 0
;
output lead=1;
run;

SAS Code for Example 3.5

/*Example 3.5*/
proc import out = df5 file = "c:ookdataww3b.csv" dbms = csv replace;
run;

/*delete last 3 death*/
proc sql;
create table df5 as
select *
from df5
order by date;
create table df5 as
select monotonic() as rownum,*
from df5;
select max(rownum) -3 into: filter
from df5;
quit;

data df5;
set df5;
array death(*) la_death rs_death sd_death sj_death sa_death;
do i = 1 to dim(death);
        if rownum > &filter. then death{i} = .;
end;
drop i;
run;

/*model 1*/
proc varmax data = df5;
id date interval = day;
model la_death rs_death sd_death sj_death sa_death = la_co rs_co sd_co sj_co sa_co la_o3 rs_o3 sd_o3 sj_o3 sa_o3/xlag = 0 print = (diagnose);
output lead = 3;
run;

/*model 2*/
proc varmax data = df5;
id date interval = day;
model la_death rs_death sd_death sj_death sa_death = la_co rs_co sd_co sj_co sa_co la_o3 rs_o3 sd_o3 sj_o3 sa_o3 /p = 1 xlag = 0 print = (diagnose);
output lead = 3;
run;

/*model 3*/
proc varmax data = df5;
id date interval = day;
model la_death rs_death sd_death sj_death sa_death = la_co rs_co sd_co sj_co sa_co la_o3 rs_o3 sd_o3 sj_o3 sa_o3 /xlag = 1 print = (diagnose);
output lead = 3;
run;

/*Figure 3.2*/
proc import file = "c:ookdataww3b.csv" out = df1 dbms = csv replace;
run;

data df2;
set df1;
los_angeles = la_death;
riverside = rs_death;
san_diego = sd_death;
san_jose = sj_death;
sant_ana = sa_death;
run;

data df2;
set df2;
keep date los_angeles riverside san_diego san_jose sant_ana;
run;

data df2;
set df2;
array vars{*} los_angeles riverside san_diego san_jose sant_ana;
do i = 1 to dim(vars);
death = vars{i};
city = vname(vars{i});
output;
end;
drop i los_angeles riverside san_diego san_jose sant_ana;
run;

ods graphics / reset height = 5in width = 6.5in;
ods rtf file = "C:Bookdatafigure3.2.rtf" style = JOURNAL startpage = no;
proc sgpanel data = df2;
panelby city/onepanel layout=rowlattice uniscale = column novarname;
format date date9.;
series x =date y = death;
colaxis values = ('13apr99'd to '20sep99'd) display = (nolabel);
title 'Death in 5 cities';
run;
ods rtf close;

Projects

Find m response and k predictor time series variables with m ≥ 3 and k ≥ 5 of your interest. Build a multivariate multiple time series regression model with a written report and associated software code.
Build a vector time series regression model on the response and predictor variables from Project 1 with a written report and associated software code.
Build a vector time series VARX(p,s) model on the response and predictor variables from Project 1 with a written report and associated software code.
Use (n – 3) observations to estimate the models obtained from Projects 2 and 3. Forecast the next three periods of the response variables and compare the forecast results from the two models.
Find a social science or natural science related time series data set, which includes multivariate responses and predictors, construct a multivariate multiple time series regression analysis, and complete with a written report and analysis software code.

References

Abraham, B. and Ledolter, J. (2006). Introduction to Regression Modeling. Thompson Brooks/Cole.
Box, G.E.P. and Tiao, G.C. (1975). Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 70: 70–79.
Chatterjee, S., Hadi, A.S., and Price, B. (2006). Regression Analysis by Examples, 4e. Wiley.
Draper, N.R. and Smith, H. (1998). Applied Regression Analysis, 3e. Wiley.
Granger, C.W.J. and Newbold, P. (1986). Forecast Economic Time Series, 2e. Academic Press.
Islam, M.Q. (2017). Estimation and hypothesis testing in multivariate linear regression models under non‐normality. Communication of Statistics, Theory and Methods 46: 8521–8543.
Lütkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. Springer.
MATLAB (2017). https://www.mathworks.com/help/matlab
Pankratz, A. (1991). Forecasting with Dynamic Regression Models. Wiley.
Phillips, P.C.B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics 33: 311–340.
Rao, C.R. (2002). Linear Statistical Inference and its Applications, 2e. Wiley.
SAS Institute, Inc. (2015). SAS/ETS User's Guide. Cary, NC: SAS Institute, Inc.
Wei, W.W.S. (2006). Time Series Analysis – Univariate and Multivariate Methods, 2e. Pearson Addison‐Wesley.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

	Y_1,t	Y_2,t	Y_3,t	Y_4,t
	777 627 (467 638) 175 401 (61 485) −36 457 (84 872) −75 016 (56 716) 23.907 (7.967) −5779.45 (5541.27)	−27 085 (127 589) 4173.50 (16 775) −7546.19 (23 156) 7829.35 (15 474) 1.248 (2.174) 2466.3 (1511.87)	−5201.66 (39 519) 13 975 (5195.88) −4211.73 (7172.27) 14 931 (4792.86) −0.0428 (0.6733) −451.62 (468.27)	−54 006 (125 220) 29 242 (16 464) 9150.61 (22 726) −2403.84 (15187) −1.868 (2.133) 619.56 (1483.80)
R²	0.929	0.985	0.997	0.993
	46.77 (<0.0001)	232.47 (<0.0001)	1324.91 (<0.0001)	503.67 (<0.0001)

	Y_1,t	Y_2,t	Y_3,t	Y_4,t
	386 816 (33 107) 39.86 (3.188)	−83 242 (16 145) 3088.86 (84.08)	−43 672 (3282.28) 9776.48 (2268.01) 10 810 (2333.61)	317.61 (9665.49) 39 551 (771.18)
R²	0.877	0.984	0.997	0.992
	156.3 (<0.0001)	1349.55 (<0.0001)	3568.57 (<0.0001)	2630.29 (<0.0001)

	Y_1,t	Y_2,t	Y_3,t	Y_4,t
	863 639 (374 348) 199 868 (43 092) −13 260 (41 573) −66 740 (31 237) 1.4737 (6.2360) −8 657 (4 035)	81 461 (68 777) 4 895 (7 493) 9519 (7 200) 60.3340 (5 423) 1.4579 (1.0875) 1 320 (707.1509)	−25 915 (40 269) 10 639 (5 268) 2 977 (5 465) 8 829 (4 013) −0.1895 (0.7394) −202.9449 (470.5135)	−174 716 (59 045) 23 050 (6 399) −2 276 (6 147) 333.9967 (4631) −1.2119 (0.9288) 2 097 (604.1294)
R²	0.971	0.997	0.998	0.999
used	−0.8917	−0.9448	−0.5545	−0.9485

Table of Contents for 3 Multivariate time series regression models

Create new playlist

Sign In

Sign Up

3.1 Introduction

3.2 Multivariate multiple time series regression models

3.2.1 The classical multiple regression model

3.2.2 Multivariate multiple regression model

3.3 Estimation of the multivariate multiple time series regression model

3.3.1 The Generalized Least Squares (GLS) estimation

3.3.2 Empirical Example I – U.S. retail sales and some national indicators

3.4 Vector time series regression models

3.4.1 Extension of a VAR model to VARX models

3.4.2 Empirical Example II – VARX models for U.S. retail sales and some national indicators

3.5 Empirical Example III – Total mortality and air pollution in California

Projects

References

Table of Contents for
3 Multivariate time series regression models