Chapter 12: Bayesian Vector Autoregressive Models

Introduction

The Prior Covariance of the Autoregressive Parameter Matrices

The Prior Distribution for the Diagonal Elements

The Prior Distribution for the Off-Diagonal Elements

The BVAR Model in PROC VARMAX

Specific Parameters in the Prior Distribution

Further Shrinkage toward Zero

Application of the BVAR(1) Model

BVAR Models for the Egg Market

Conclusion

Introduction

One way to reduce the number of parameters in a Vector Autoregressive Moving Average, VARMA(p,q), model that has no moving average terms, q = 0, is to consider Bayesian estimation. The idea is that an informative prior is applied to the autoregressive parameters, usually in order to shrink them toward zero. The prior distribution reflects the intuition that the parameters for lag 1 are more natural to include in the model than autoregressive parameters for higher lags. In Bayesian terms, this shrinkage is obtained by a prior that concentrates more around zero for higher lags, so that the evidence from the data has to be stronger in order to have a significant estimate at lags higher than lag 1. This class of models is denoted Bayesian Vector Autoregressive, BVAR(p) models.

For many series met in practice this is an easier way to determine the order of a model than the automatic order selection methods that were applied in Chapters 8 and 9. The automatic order selection methods often lead to models that are difficult to handle in practice because of a high order and over-parameterization.

The Prior Covariance of the Autoregressive Parameter Matrices

The VARMAX procedure applies the prior distribution for the parameters in a vector autoregressive model as proposed by Litterman (1986). All coefficients in the autoregressive matrices are, a priori, supposed to be normally distributed with a mean of zero. The variance in the prior distribution decreases with the lag size. Moreover, it reduces the off-diagonal entries more than the diagonal entries in the autoregressive matrices. This is relevant because the diagonal entries correspond to univariate dependence, which often is believed to be more natural than dependence between series.

Consider now the matrix, φm, at lag m with entries φmij. Here, the sub-indices are i, j = 1, .., k for time series of dimension k. The index m is m = 1, .., p for an autoregressive model of order p.

φm=(φm11...φm1k.....φmij.φmk1...φmkk)φm=φm11...φmk1...φmij...φm1k...φmkk

The off-diagonal element φmij is the coefficient for the i'th variable Xit on the left side and the lagged j'th variable Xjt−m on the right side.

The Prior Distribution for the Diagonal Elements

The prior variance for the diagonal element, φmii at lag m, is specified as follows:

var(φmii)=(λm)2var(φmii)=(λm)2

The damping of the prior standard deviation of φmii for increasing lags m is seen to be of the form λ/m. The constant λ can be specified by the user. The default value in PROC VARMAX is λ = 1.

In maximum likelihood estimation, the variance of an estimated autoregressive parameter is of order 1/T, where usually T denotes the number of observations. For the simplest example of a univariate AR(1) model, the variance is proved to be as follows:

(1φ21)/T(1φ21)/T

This is very small for φ1, being numerically close to one. The prior variance for λ = 1 is large compared to maximum likelihood variance even for a moderate number of observations, T, so the prior is considered as non-informative.

The diagonal elements in autoregressive matrices are bound to an interval around zero. The exact form depends on the total model. But for the simplest example of a univariate AR(1) model, the parameter is bound to the interval ]−1,1[. This point again shows that the prior distribution λ = 1 is non-informative in practice because the prior in no way is restricted to this interval.

The examples later in this chapter demonstrate that preferably the user should apply smaller values than the default λ = 1 in order to obtain the shrinkage toward zero. The value λ = .5 better reflects that the parameter is bound to an interval close to zero.

The Prior Distribution for the Off-Diagonal Elements

The prior variance for an off-diagonal element is basically the same as for the diagonal elements, having the same shrinkage toward zero proportional to the inverted lag length:

var(φmij)=(λmθσiiσjj)2

The off-diagonal element φmij is the coefficient for the i'th variable Xit on the left side and the lagged j'th variable Xjt−m on the right side. The coefficient φmij can be considered as a regression coefficient. As in ordinary regression analysis, the estimated regression coefficient is scaled according to the standard deviations of the two variables. This scaling appears in the prior variance of an estimated regression coefficient. The factor in this prior (see below) gives the correct scaling of the prior variance corresponding to the scaling of the estimated regression coefficients.

(σ2iiσ2jj)

For the diagonal terms, i = j, the two series are identical, and the factor simply reduces to one.

The value θ in the prior variance for the off-diagonal element is a number between 0 and 1. A value of θ less than 1 reflects a prior belief that off-diagonal entries in the autoregressive matrices are less likely than diagonal elements. The parameter θ is by default set to θ = .1 by PROC VARMAX.

For series that are known to interact heavily, a much larger value of θ is often preferred. If interactions among the series in the data set are more intuitive than dependence on past values of the series themselves, you should apply the value θ = 1.

The reported variance of the estimated parameters in a Bayesian framework is a mixture of the prior variance and the variance obtained by maximum likelihood estimation. The resulting variance is a bit smaller than the maximum likelihood variance for estimated parameters unless the prior distribution is totally diffuse. This is because the prior distribution contributes to the estimation variance. The variance is smaller when a very informative distribution is applied—that is, when the parameter λ is close to 0. This smaller variance for the estimated parameters can lead to significance of parameters that are insignificant in maximum likelihood estimation.

This Bayesian method only applies to vector autoregressive models—that is, models with no moving average terms. No moving average parameters are allowed if the Bayesian framework is applied, but deterministic terms like trends and seasonal dummies can be applied. The models are denoted BVAR(p), where p, as usual, denotes the order of the model.

The deterministic terms in a VARMAX model all have diffuse prior variances. This means that no prior assumptions are applied to, for instance, constant terms, coefficients to seasonal dummies, and all exogenous right side variables.

The BVAR Model in PROC VARMAX

The PRIOR option to the MODEL statement specifies the BVAR model with default values θ = .1 and λ = 1. In Program 12.1, the BVAR model is applied to the Danish wage and price indices that were also considered in Chapters 9 and 10. Remember that the automatic order selection leads to models of high orders that were heavily over-parameterized. Following the results of Chapter 9, the parameters of a fourth-order autoregressive model are estimated by the application in Program 12.1. The number 4 is stated by the option P=4.

The specified fourth-order autoregressive model is turned into the fourth-order BVAR model using the default values for θ = .1 and λ by simply adding the option PRIOR to the MODEL statement.

Program 12.1: Estimation Using the Default BVAR Prior

PROC VARMAX DATA=SASMTS.WAGEPRICE PRINTALL PLOTS=ALL;

    MODEL LP LW/DIF=(LP(1) LW(1)) P=4 PRIOR;

RUN;

The result is that only one entry in the second-order autoregressive parameter matrix is significant—the minus sign (−), which says that the term is negative. No entries of the lag 3 and lag 4 matrices show significance. The remaining parameters and all entries in the third- and fourth-order autoregressive matrices are insignificant as denoted by the periods (.). Moreover, all four parameters for lag 1 are significant as denoted by the plus sign (+). This is quickly concluded for the skeleton presentation of the autoregressive matrices in Output 12.1.

Output 12.1: Significance of Estimated Parameters of a BVAR(4) Model

image

This application of the BVAR model seems to have done the job by reducing the order of the autoregressive model. The only significant parameter at lag 2 is found at the entry (2,2), which is the second-order autoregressive parameter model for the univariate wage series. By this result, it is natural to consider the second-order autoregressive model, p = 2, with the three remaining entries in the second-order matrix restricted to zero.

The final BVAR(2) model includes fewer parameters than the final model from Chapter 9, which was obtained by testing the significance of individual parameters. In Chapter 9, the final second-order AR(2) model was only accepted by the principle of parsimony. This allowed for ignoring some parameters at lag 4 because they were only of minor importance and were unnatural in an economic context. This principle is exactly the idea behind the Bayesian framework in the BVAR model.

Specific Parameters in the Prior Distribution

It is possible to change the prior distribution by specifying the parameters of distribution. According to the definition of the prior distribution, the prior becomes wider when λ and θ increase. For large values of λ and θ, the result of the Bayesian estimation approaches the result of the maximum likelihood estimation, and more parameters become significant. On the other hand, the shrinkage toward zero is strengthened when smaller values of λ and θ are used.

In this example, many economists believe that the auto-dependence represented by the diagonal entries in the autoregressive matrices are of no special importance compared to the interdependence between the two series. This argument leads to the choice of a larger value of θ. Even θ = 1 is natural in this context. However, the value +1 is not allowed, and therefore the value θ = 0.9 is used instead.

In the next application, Program 12.2, the values θ = .9 and λ = 1 are used. These values of θ and λ are stated in the parenthesis following the PRIOR option in the MODEL statement.

Program 12.2: Estimation by a User-Defined BVAR Prior

PROC VARMAX DATA=SASMTS.WAGEPRICE PRINTALL PLOTS=ALL;

    MODEL LP LW/DIF=(LP(1) LW(1))P=4 PRIOR=(THETA=0.9 LAMBDA=1);

RUN;

The result is that both off-diagonal parameters for lag 2 and an off-diagonal entry of the lag 4 autoregressive matrix are all significant in addition to the parameters that were already significant in Output 12.1. The rather lengthy Output 12.2 gives the individual significance of the estimated autoregressive parameters.

Output 12.2: Estimated Parameters in the BVAR(4) Model

image

Further Shrinkage toward Zero

The number of significant parameters for lag 2 and the single significant parameter at lag 4 are perhaps not what was initially expected. In order to increase the shrinkage toward zero, the value λ = .1 is applied in Program 12.3 even if this prior is very concentrated around zero, even for lag 1.

Program 12.3: Estimation by a User-Defined BVAR Prior

PROC VARMAX DATA=SASMTS.WAGEPRICE PRINTALL PLOTS=ALL;

    MODEL LP LW/DIF=(LP(1) LW(1)) P=4 PRIOR=(THETA=0.9 LAMBDA=0.1);

RUN;

The result (Output 12.3) for the schematic representation is that all entries of the autoregressive parameter matrices for lags larger than 1 are insignificant.

Output 12.3: Significant Parameters of a BVAR(4) Model Using an Informative Prior

image

Application of the BVAR(1) Model

The model is now reduced to a BVAR(1) model with all four entries of the autoregressive matrix being significantly positive. The parameters in this model are presented in Output 12.4 where the order of the model is reduced by the option P=1. The same prior distribution is applied.

Output 12.4: Parameter Estimates in the BVAR(1) Model

image

The covariance matrix for the error process is shown in Output 12.5. The covariance corresponds to a correlation .54 between the two univariate error processes. This is a rather strong immediate reaction for an error in one series to the other series within the same year. This value is close to the value .48 as reported in Output 10.3 for the VARMA(2,0) model.

Output 12.5: Covariance Matrix for the Error Process

image

BVAR Models for the Egg Market

Program 12.4 gives a first application of the Bayesian method for the egg example. The data set EGG includes four time series: indices for the produced quantity and the price for eggs and indices for the quantity and the price of total agricultural production in Denmark. (See Chapter 11.) The order p = 2 is applied for the autoregressive model because the number of parameters in the model for this fourth-dimensional time series is very large. As in Chapter 11, a dummy variable for the influence of the Danish entry into the European Union on January 1, 1973, is applied to the price series for the total agricultural production. Moreover, seasonal dummies are applied for these monthly time series with the option NSEASON=12.

Program 12.4: Estimation a BVAR(2) Model for a Four-Dimensional Series

DATA DUMMY;

    SET SASMTS.EGG;

    EUDUMMY=0;

    IF YEAR(DATE)=1973 AND MONTH(DATE)=1 THEN EUDUMMY=1;

RUN;

PROC VARMAX DATA=DUMMY PRINT=ALL PLOTS=ALL ;

    MODEL QEGG, PEGG, QTOT, PTOT=EUDUMMY/NSEASON=12 P=2 XLAG=3 PRIOR

        DIF=(QEGG(1) PEGG(1) QTOT(1) PTOT(1)) ;

    CAUSAL GROUP1=(QTOT PTOT) GROUP2=(QEGG PEGG);

RUN;

The result is seen in Output 12.6, which presents the two estimated 4 ×4 autoregressive parameter matrices on top of each other.

Output 12.6: The Parameters in the BVAR(2) Model

image

This application gives almost no significant autoregressive parameters for the egg series. Only the parameter that describes the lag 1 effect of the production of eggs on the price of eggs, −.86, is of significant magnitude. Moreover, the diagonal parameters for the total production series, QTOT, numerically exceeds .5.

The conclusion is that all entries in the lower left 2 ×2 corner of the two autoregressive coefficient matrices are zero. This finding is the same as saying that the total agricultural production series and price series Granger-causes the two egg series. This conclusion is confirmed by the causality test that is included in the CAUSAL statement in Program 12.4. (See Output 12.7.) This discussion of causality is similar to the discussion in Chapter 11, which gave a similar result. (See Output 11.3.)

Output 12.7: Result of the Causality Test

image

In Program 12.5, the previous causality finding is included in the model. This makes the number of parameters much smaller because the independent variables at the right side are no longer modeled. For example, the dummy variables are no longer needed. In this model, the XLAG=2 option shows that the independent variables are included with lags up to two.

Program 12.5: Estimation in the BVAR(2) Model with Exogenous Variables

PROC VARMAX DATA=SASMTS.EGG PRINT=ALL PLOTS=ALL;

    MODEL QEGG PEGG = QTOT PTOT/DIF=(QEGG(1) PEGG(1) QTOT(1) PTOT(1))

        NSEASON=12 P=2 XLAG=2 PRIOR;

    CAUSAL GROUP1=(QEGG) GROUP2=(PEGG);

RUN;

The output confirms the finding from Chapter 11 that the production of eggs affects the price of eggs with a lag of one month. This is seen from the significant negative coefficient in the first-order autoregressive matrix while all other autoregressive coefficients are insignificant. (See Output 12.8.) The causality test also supports this conclusion.

Output 12.8: The Parameters in the BVAR(2) Model with Exogenous Variables

image

The final model is estimated by Program 12.6.

Program 12.6: The Final BVAR Model for the Egg Example

PROC VARMAX DATA=SASMTS.EGG PRINTALL PLOTS=ALL;

    MODEL PEGG = QEGG PTOT/DIF=(QEGG(1) PEGG(1) PTOT(1))

         NSEASON=12 P=2 Q=0 XLAG=2 LAGMAX=25 PRIOR;

RUN;

The final model is almost equal to the final model in Chapter 11 as seen in Output 11.10. (Also see Output 12.9.)

Output 12.9: Significant Parameters of the Final Model

image

Conclusion

In this chapter, the BVAR model is demonstrated to provide an easy way to establish models for multidimensional time series without too many insignificant parameters. The parameter reduction is made on the assumption that autoregressive parameters for higher lags are less likely than parameters for low lags. This intuitive idea is formulated by an informative prior distribution, as suggested by Litterman (1986).

Bayesian estimation provides an easy way to perform a model reduction when compared to manual testing for insignificant parameters and subjective discarding of some significant parameters at higher lags. Compared to application of information criteria like the Akaike Information Criterion or the Schwarz Bayesian Criterion, this Bayesian approach has the advantage that it is possible to distinguish between the importance of diagonal and off-diagonal elements in the autoregressive matrices.

Bayesian estimation is easily performed with PROC VARMAX. And it is proved to work well in two examples in this chapter. The estimation and test results are shown to be almost the same as in the previous chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.152.139