In this next example, we are going to be fitting a normal distribution to the precipitation dataset that we worked with in the previous chapter. We will wrap up with Bayesian analogue to the one sample t-test.
The results we want from this analysis are credible values of the true population mean of the precipitation data. Refer back to the previous chapter to recall that the sample mean was 34.89. In addition, we will also be determining credible values of the standard deviation of the precipitation data. Since we are interested in the credible values of two parameters, our posterior distribution is a joint distribution.
Our model will look a little differently now:
the.model <- " model { mu ~ dunif(0, 60) # prior stddev ~ dunif(0, 30) # prior tau <- pow(stddev, -2) for(i in 1:theLength){ samp[i] ~ dnorm(mu, tau) # likelihood function } }"
This time, we have to set two priors, one for the mean of the Gaussian curve that describes the precipitation data (mu
), and one for the standard deviation (stddev
). We also have to create a variable called tau
that describes the precision (inverse of the variance) of the curve, because dnorm
in JAGS takes the mean and the precision as hyper-parameters (and not the mean and standard deviation, like R). We specify that our prior for the mu
parameter is uniformly distributed from 0
inches of rain to 60
inches of rain—far above any reasonable value for the population precipitation mean. We also specify that our prior for the standard deviation is a flat one from 0
to 30
. If this were part of any meaningful analysis and not just a pedagogical example, our priors would be informed in part by precipitation data from other regions like the US or my precipitation data from previous years. JAGS comes chock full of different families of distributions for expressing different priors.
Next, we specify that the variable samp
(which will hold the precipitation data) is distributed normally with unknown parameters mu
and tau
.
Then, we construct an R list to hold the variables to send to JAGS:
the.data <- list( samp = precip, theLength = length(precip) )
Cool, let's run it! On my computer, this takes 5 seconds.
> results <- autorun.jags(the.model, + data=the.data, + n.chains = 3, + # now we care about two parameters + monitor = c('mu', 'stddev'))
Let's plot the results directly like before, while being careful to plot both the trace plot and histogram from both parameters by increasing the layout
argument in the call to the plot
function.
> plot(results, + plot.type=c("histogram", "trace"), + layout=c(2,2))
Figure 7.14 shows the distribution of credible values of the mu
parameter without reference to the stddev
parameter. This is called a marginal distribution.
Remember when, in the last chapter, we wanted to determine whether the US' mean precipitation was significantly discrepant from the (hypothetical) known population mean precipitation of the rest of the world of 38 inches. If we take any value outside the 95% credible interval to indicate significance, then, just like when we used the NHST t-test, we have to reject the hypothesis that there is significantly more or less rain in the US than in the rest of the world.
Before we move on to the next example, you may be interested in credible values for both the mean and the standard deviation at the same time. A great type of plot for depicting this information is a contour plot, which illustrates the shape of a three-dimensional surface by showing a series of lines for which there is equal height. In Figure 7.15, each line shows the edges of a slice of the posterior distribution that all have equal probability density.
> results.matrix <- as.matrix(results$mcmc) > > library(MASS) > # we need to make a kernel density > # estimate of the 3-d surface > z <- kde2d(results.matrix[,'mu'], + results.matrix[,'stddev'], + n=50) > > plot(results.matrix) > contour(z, drawlabels=FALSE, + nlevels=11, col=rainbow(11), + lwd=3, add=TRUE)
The purple contours (the inner-most contours) show the region of the posterior with the highest probability density. These correspond to the most likely values of our two parameters. As you can see, the most likely values of the parameters for the normal distribution that best describes our present knowledge of US precipitation are a mean of a little less than 35 and a standard deviation of a little less than 14. We can corroborate the results of our visual inspection by directly printing the results
variable:
> print(results) JAGS model summary statistics from 30000 samples (chains = 3; adapt+burnin = 5000): Lower95 Median Upper95 Mean SD Mode mu 31.645 34.862 38.181 34.866 1.6639 34.895 stddev 11.669 13.886 16.376 13.967 1.2122 13.773 MCerr MC%ofSD SSeff AC.10 psrf mu 0.012238 0.7 18484 0.002684 1.0001 stddev 0.0093951 0.8 16649 -0.0053588 1.0001 Total time taken: 5 seconds
which also shows other summary statistics from our MCMC samples and some information about the MCMC process.
3.15.203.124