How to do it...

We will use the same house prices example that we used in the previous recipes for linear regression. We have a price variable, and several regressors, such as the number of bedrooms, bathrooms, and so on:

  1. We first run our example, but with 5,000 iterations and different priors. Let's use a Gaussian prior on each beta coefficient and a lower boundary equal to 0:
library(rstan)
library(coda)
library(DrBats)
data = read.csv("/Users/admin/Documents/R_book/chapter3/house_prices.csv")
model ="
data {
real y[125];
real x[125,6];
}
parameters {
real <lower=0> beta[6];
real sigma;
real alpha;
}
model {
beta ~ normal(5,20);
for (n in 1:125)
y[n] ~ normal(alpha + beta[1]*x[n,1] + beta[2]*x[n,2] + beta[3]*x[n,3] + beta[4]*x[n,4] + beta[5]*x[n,5] + beta[6]*x[n,6], sigma);
}"
xy = list(y=data[,1],x=data[,2:7])
fit = stan(model_code = model, data = xy, warmup = 500, iter = 5000, chains = 4, cores = 2, thin = 1,verbose=FALSE)
  1. We will now cast our result stored in the fit variable into an MCMC object that can be used by the CODA library:
coda__obj = coda.obj(fit)
  1. The autocorrelation plot shows the autocorrelation for different autocorrelation orders (remember that MCMC algorithms generate samples that are autocorrelated). The first column here is the autocorrelation of order zero (which will always be equal to one, so this is not analyzed), and the others refer to each autocorrelation lag. Some authors argue that even moderate autocorrelation here is not too bad. For example, a chain that has an autocorrelation of order 1 of almost 0.9 will have lots of repeated values. In this case, we say that the mixing is bad (and this would flag an obvious convergence problem). What happens when the autocorrelation here is moderate, such as the one we have here? MCMC can be ran with more iterations, or a higher number of warm-up iterations, hoping that the autocorrelation fades out, or the thinning parameter could be increased (thinning = k means that we keep every kth observation). In this example, it would be a good idea to rerun this code, with more iterations (and maybe also increase the warm-up iterations as well):
autocorr.plot(coda__obj) 

Take a look at the following screenshot:

  1. The cross-correlation plot shows a matrix containing the cross-correlations between the different parameters. In general, this does not flag any problem per se, unless there are pairs of variables that should not be correlated. In those cases, it could indicate poor mixing of the chain (situations where the chains get stuck in some places and do not correctly explore the posterior density). Here, we see that the intercept’s posterior density (the row that contains the blue squares) is negatively correlated with the other variables. It wouldn't be unwise to rerun this model with more iterations to make sure there is no convergence problem for the intercept:

  1. The cumuplot function plots the quantiles (y axis: quantiles, x axis: iterations). If the stationary distribution has been achieved, then this should show a very stable line at the tail of the plot (the lower and upper lines show the confidence bands). As we can see here, all of these stabilize after the first 300-500 iterations:

  1. effectiveSize is the sample size corrected by correlation. If the correlation is large, then effectiveSize goes to zero. If negligible, it will be equal to the number of iterations. As we would expect, effectiveSize for alpha is much lower than for the rest; this happens because there is still some big autocorrelation of order one/two for it (we have seen this in the autocorrelation plot). One way of interpreting this value is to think of it as the number of samples that effectively carry new information (autocorrelation means that information is shared between different values).

The following screenshot shows the effective sizes:

  1. The Gelman plot shows the Gelman statistic as a function of the iterations. When the stationary distribution is achieved, it should be equal to one. Most of these get close to zero after 2,000 iterations (maybe except for beta[2] that is taking longer to converge to 1). Since the Gelman plot uses all the chains, we only get one plot per variable.

The following screenshot shows the Gelman plots: 

  1. Geweke's statistic is rooted on the following idea: if the convergence to the stationary distribution was achieved, the last part of the chain will be correct. But the open question is whether the rest of the chain (without the last part) has converged as well.

The idea is to compare whether the mean of the chain in the first 10%, 20%, and so on. matches the mean of the chain for the last part of the data. This is useful for determining the appropriate burn-in: if we reject Geweke's test for 30% of the data, we should assign that 30% as a burn-in (discard those). The test statistic used here is the mean for each parameter/chain using a subset of the chain versus using the last part of the chain (this is essentially a Z-test).

The plot presented here shows what happens with Geweke's statistic as observations are removed. The more we move to the right over the x-axis, the more samples we are discarding from the chain. We can look at these plots and find the minimum number of samples that need to be removed in order to get Geweke's statistic between -2 and 2. It seems that here we would need to adjust our burn-in to almost 2,000. The following screenshot shows the Geweke plots :

  1. heidel.diag (Heidelberger and Welch's convergence diagnostic) is used to test whether a part of the chain comes from a stationary distribution. This is done first by using the full dataset, then by removing 10% of the initial data, then 20%, and so on, until either the null hypothesis of stationary distribution is accepted, or 50% of the sample is achieved, in which case the function returns a failure and a longer chain (more iterations) is needed. As we can see here, the p-values are all large, so there is no problem.

The following screenshot shows the Heidelberger and Welch's convergence diagnostic (printed only for the first chain):

  1. The Highest Posterior Density Interval (HPD interval ) is actually not a diagnostic tool, but a summary tool that CODA can calculate. The shortest part of the density that has a probability of at least k (usually 95%). It has the property that any point inside it has a density higher than any point outside it.

The following screenshot shows HDPintervals for each parameter (printed only for the first chain):

  1. raftery.diag (Raftery and Lewis' diagnostic) estimates the necessary amount of iterations that are needed to estimate a particular q diagnostic with a specific precision. We will get a diagnostic for each chain. Burn-in is the number of samples that need to be discarded (in order to avoid the initial dependence on the starting values). Total is the necessary amount of iterations needed to achieve an accuracy of 0.005 for the 0.025 quantile (this is the default quantile value for this function). Lower bound indicates how many samples would be needed if they were totally independent (no serial correlation). The dependence factor is a measure of how much autocorrelation we have (similar to the effective sample size). A value larger than five is considered dangerous and requires some adjustment.

The following screenshot shows the Raftery and Lewis diagnostic (printed only for the first chain):

The outputs suggest that convergence to the stationary distribution has been achieved. Nevertheless, the posterior correlation between the alpha and the other variables that we saw before, suggests that re-running this with more iterations would be advisable. We have also seen that the Geweke statistic suggests increasing burn-in to around 2,000.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.26.4