Chapter 8: Models for Multivariate Time Series

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8: Models for Multivariate Time Series

Introduction

Multivariate Time Series

VARMAX Models

Infinite-Order Representations

Correlation Matrix at Lag 0

VARMAX Models

VARMAX Building in Practice

Conclusion

Introduction

In this chapter, you will learn the basic theory for multivariate time series. The purpose is to introduce the simplest theoretical model behind the many tools offered by the VARMAX procedure, because most of them are extensions or refinements of this basic model. The idea is not to give a thorough introduction to the theory, for this subject is far too extensive to include in a book that is specific to SAS. For more information about multivariate time series analysis, consult ordinary textbooks like Lütkepohl (1993) or others listed in the references for SAS help for the VARMAX procedure.

In later chapters, the basic VARMAX model is extended in various ways. These chapters will introduce the theory of such extensions, together with the SAS coding for examples.

Multivariate Time Series

A multivariate time series consists of many (in this chapter, k) univariate time series. The observation for the jth series at time t is denoted X_jt, j = 1, . . . , k and t = 1, …, T. The length of the time series—that is, the number of observations—is, as in the chapters for the univariate models, denoted as T. In matrix notation, the k-dimensional observation is written as a column vector X_t:

$Xt=(X1tXkt)$

The idea is to model these k series simultaneously because they can interact in a way that it is insufficient to establish by separate univariate models for each separate series.

A fundamental property of multivariate time series is that all series should be simultaneously stationary. This means that their joint distribution should be constant over time. This concept is a direct generalization from the univariate case. The extension of the definition of stationarity to more than just one time series states that a lagged dependence of one series to another series, if present, is constant for the whole data period. It also means that no trends should be present in the series.

If the series is not stationary, differencing often transforms the series into stationarity, just as for the univariate models. For instance, price indices for many countries might be trending due to inflation, but the series of year-to-year changes in price levels might be rather constant, having a mean value that corresponds to the average annual inflation rate in the observed countries.

A time series (univariate or multivariate) that is stationary because of differencing is called integrated. This notation is the I in the name ARIMA models. In Chapters 13 and 14, this issue is considered in more detail because stationarity for two nonstationary series can be obtained in other ways, leading to the notion of cointegration for a stationary relationship between two nonstationary series.

VARMAX Models

If the multivariate series is stationary, then a Vector Autoregressive Moving Average (VARMA) is a direct generalization of the Autoregressive Moving Average (ARMA) models that were introduced in Chapter 6. The VARMA(p, q) model is defined as follows:

$Xt−φ1Xt−1−..−φpXt−p=c+εt−θ1εt−1−..−θqεt−q$

This formula just replicates the usual univariate definition of an ARMA model. The only difference is that all terms are now vectors or matrices, not just numbers. The model is for this reason well established and intuitively appealing for everybody familiar with univariate time series modeling. The arguments for the relevance of this class of model are direct replications of the arguments for the similar univariate time series. The interpretation of the multivariate model is also a straightforward generalization of the interpretation of the univariate model.

The parameter vector c in this parameterization is a k-dimensional column vector. Only if p = 0 is it the mean value for each of the k series. If p > 0, then the mean vector μ is given as follows:

$μ=(I−φ1−..−φp)−1c$

The coefficients in the definition of a VARMA(p, q) model are k ×k matrices, so they generally include k² parameters, as seen here:

$φm=(φm11...φm1k.....φmij.φmk1...φmkk)$

The expression by the model formulation for a specific component Xjt is very involved even for small values of the model orders p and q. The expression involves lagged (up to lag p) values of all observed components of the time series X_it, i = 1, . . . . , k and, moreover, lagged (up to lag q) values of all error components _εjt, j = 1, . . . . , k.

In the syntax of PROC VARMAX, these coefficients in the code are denoted by ordinary Latin letters and symbols in plain text like “ar(m, i, j)” for the coefficient φ_mij for the entry (i, j), i, j = 1, . . . . , k in the autoregressive parameter matrix φ_m for lag m, m = 1, . . . . , p.

Similarly, the entry (i, j) in the moving average parameter matrix θm for Lag m, m = 1, . . . . , q is denoted as “ma(m, i, j)” for the coefficient θ_mij for the entry (i, j), i, j = 1, . . . . , k in the moving average parameter matrix θ_m.

The models often include many parameters that could easily lead to over-parameterization. Many of the refinements are invented merely to reduce the number of parameters. For this reason, various ways of interpreting the model emerge.

The dependencies among different series with lagged effects are described by the off-diagonal elements of the coefficient matrices φ_m and θ_m. The diagonal elements of the coefficient matrices φ_m and θ_m correspond to univariate ARMA models for the individual series.

Infinite-Order Representations

In the theory of stationary processes, it is proved that a stationary time series under some assumptions can be represented both as an autoregression of infinite order and as a moving average of infinite order:

$Xt=π1Xt−1+π2Xt−2+...+εt$

and

$Xt=εt+ψ1εt−1+ψ2εt−2+...$

All VARMA models can be written in this way if the roots of the corresponding models are larger than 1 in absolute value.

In this parameterization, the (i, j) entry of π_m (the parameter π_mij) directly gives the effect of the jth component of X_t−m to the ith component of X_t in the same way as it would as an input variable in an ordinary regression model. Similarly, the parameter ψ_mij represents the effect of a sudden shock ε_jt−m for the jth series at time t − m to X_it the ith series m time periods later at time t.

These representations are used to elucidate the meaning of the fitted models; see, for example, Chapter 10.

Correlation Matrix at Lag 0

The error series (see below) are assumed to be a white noise series in the sense that all entries of εt and εt-m at two different points in time are supposed to be independent for all integers m ≠ 0.

$εt=(ε1tεkt)$

But for lag 0 the entries are not necessarily independent. The k ×k dimensional covariance matrix of the vector ε_t has this form:

$var(εt)=Σ=(σ11..σ1k........σk1..σkk)$

The diagonal elements of this matrix are the error variances for the series in the model. The off-diagonal elements are the covariances between two components of the error series. Normalized, these covariances are considered as correlations that tell us about the degree of dependence between the two series at the same point in time.

In a VARMA model, the immediate dependence between two of the components in X_t is parameterized only by the correlation between the two components in εt. As a correlation, this dependence is of no specific direction; in other words, it does not say anything about causality as such. But it is possible to derive the conditional distribution of one component conditioned on another component of the series. This means that if, for some reason, the ith component X_it is observed or assumed known, it is possible to calculate the conditional expectation of another component X_jt, which could be applied as a forecast.

VARMAX Models

The letter X in the procedure name VARMAX comes from the word exogenous. An exogenous variable is a variable that enters the model but in no way is modeled by the model. A typical example is seasonal factors, such as monthly dummy variables in a model for monthly sales. The weather and the holiday season are not at all determined by the sales, but they have great impact on sales.

For example, a VARMAX model with monthly dummy variables is written as follows:

$Xt−φ1Xt−1−..−φpXt−p=c+DJantδJan+..+DNovtδNov+εt−θ1εt−1−..−θqεt−q$

The dummy variables are the k ×k matrices, with all entries equaling 0 unless the month t is correct. If the month t is January, the matrix D_Jant is the identity matrix; otherwise, it is just a 0 matrix. The parameter vector c in this parameterization corresponds to the December level. The parameters δ_Nov are k-dimensional column vectors including the monthly effect δ_iNov for the ith series, i = 1, . . . . , k. The November effect, δ_iNov, is in fact equal to the difference between the December and the November level, so that the actual November level is c + δ_Nov.

In econometrics, the concept of exogeneity is important. The question is whether a variable can be treated as exogenous or not. In some cases, it is rather obvious. An example is the economy of a small country like Denmark. The Danish economy cannot have any impact on the price of oil, so the price of oil can be treated as exogenous in a model for the Danish economy. In Chapter 12, this subject of testing exogeneity with multivariate time series models and PROC VARMAX in SAS is discussed with an example.

VARMAX Building in Practice

PROC VARMAX in SAS makes the selection of the precise orders, p and q, for a VARMA(p, q) model easy. The assumption of stationarity is tested by means of the Dickey-Fuller test and similar tests for differencing as opposed to stationarity. Then, PROC VARMAX offers an automatic model-selection algorithm that fits many possible candidate models’ orders and selects the best according to a relevant criterion.

The model parameters are estimated by the method of maximum likelihood, which assumes that the error terms are Gaussian. The estimation is rather complicated because models for multivariate time series often include many parameters. So numerical algorithms have to be chosen with care. This is, however, not usually a problem that the user encounters frequently. PROC VARMAX includes modern algorithms. But, nevertheless, it happens now and then that the estimation algorithm fails. In such cases, the estimating procedure can be fine-tuned by detailed options for the numerical iterative process. In this book, however, the point is that an estimation process that fails is a sign of a poorly specified model. So the user should preferably alleviate the problem rather than insist on estimating the parameters of an incorrectly formulated model.

The parameters can, alternatively, be estimated by the method of least squares. This method is more robust, but it has a tendency of bias toward 0. The numerical value of, for instance, an autoregressive parameter is typically reduced.

The criterion for model selection is defined as a term that rewards model fit. It is given by a formula that includes the maximum likelihood value in this form:

$−2log(L^)$

The maximum likelihood value is minimized; note that this value of the likelihood function in the univariate case is related to the residual variance as follows:

$−2log(L^)≈Tlog(σ^2)$

See Chapter 6.

But the criterion also includes a term that rewards parameter parsimony. The number of estimated parameters is here denoted r. In a VARMA(p, q) model, it is r = (p + q)k².

The Akaike Information Criterion (AIC) is defined as follows:

$AIC=2r−2log(L^)$

Another criterion is Schwarz’s Bayesian Criterion (SBC), which also depends on the number of observations, T:

$SBC=log(T)r−2log(L^)$

SBC has a more severe penalty for the number of parameters, which leads to models with fewer parameters because log (T) > 2.

The default method in PROC VARMAX is the corrected Akaike Criterion (AICc), which is defined by adding a further punishment to the AIC:

$AICc=AIC+2rTT−r−1$

With this model-selection procedure, it is easy to at least find a good order for the model as a starting point. But usually the selected model includes too many parameters because all elements in the autoregressive and moving average coefficient matrices are estimated. These matrices, however, include many entries and therefore many parameters. Many of these parameters in practice turn out to be insignificant. They must be omitted from the model in order to gain precision in terms of degrees of freedom. This increase in precision is accomplished by tests for the significance of the individual parameters. It is also possible to test a hypothesis that more than one parameter could be left out of the model.

The fit of a model is tested in different ways. A VARMA model is specified in order to end up with an error series εt, which has no autocorrelation or cross-correlations other than correlations among the entries of εt at lag 0. The model is tested by way of the hypotheses that all these correlations equal 0.

This hypothesis can be tested for each individual autocorrelation or cross-correlation. This possibility is relevant for lags of special interest, like lag 1 or lag 12 for monthly observations. The estimated correlations can all be considered as approximately normally distributed, having mean 0 and variance equal the inverse, T^-1, to the number of observations, T. For small lags, the variance is a bit smaller. The tests are easily performed by a quick glance at a plot of estimated correlations with confidence bounds as produced by PROC VARMAX.

If many such hypotheses tested at a 5% test level, the tests would lead to rejection of the model fit despite the model’s being perfect. This situation is precisely the definition of the 5% test level, which means that the probability of rejection of the hypothesis is 5% even if the hypothesis is true. In this multivariate context, with many possible dimensions for lack of fit in VARMA models, this problem is more apparent than in other contexts.

The simultaneous hypothesis of many autocorrelations and cross-correlations being 0 is tested by means of portmanteau tests. A portmanteau test is basically defined as the summed squares of many correlations, but with some minor corrections to meet the approximating distribution. It gives statistics that are approximately chi-square distributed, with the number of terms in the sum of squares adjusted for the number of estimated parameters as degrees of freedom.

Conclusion

In this chapter, univariate time series models are generalized to multivariate series. This extension is straightforward because coefficients, which are simply numbers in the univariate case, are replaced by matrices in the multivariate model. The resulting models, the VARMAX models, give the name to the procedure PROC VARMAX, which is the main subject of this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Models for Multivariate Time Series

Create new playlist

Sign In

Sign Up

Chapter 8: Models for Multivariate Time Series

Introduction

Multivariate Time Series

VARMAX Models

Infinite-Order Representations

Correlation Matrix at Lag 0

VARMAX Models

VARMAX Building in Practice

Conclusion

Table of Contents for
Chapter 8: Models for Multivariate Time Series