Fitting distributions the Bayesian way

In this next example, we are going to be fitting a normal distribution to the precipitation dataset that we worked with in the previous chapter. We will wrap up with Bayesian analogue to the one sample t-test.

The results we want from this analysis are credible values of the true population mean of the precipitation data. Refer back to the previous chapter to recall that the sample mean was 34.89. In addition, we will also be determining credible values of the standard deviation of the precipitation data. Since we are interested in the credible values of two parameters, our posterior distribution is a joint distribution.

Our model will look a little differently now:

the.model <- "
model {
  mu ~ dunif(0, 60)        # prior
  stddev ~ dunif(0, 30)    # prior
  tau <- pow(stddev, -2)  

  for(i in 1:theLength){
    samp[i] ~ dnorm(mu, tau)   # likelihood function
  }
}"

This time, we have to set two priors, one for the mean of the Gaussian curve that describes the precipitation data (mu), and one for the standard deviation (stddev). We also have to create a variable called tau that describes the precision (inverse of the variance) of the curve, because dnorm in JAGS takes the mean and the precision as hyper-parameters (and not the mean and standard deviation, like R). We specify that our prior for the mu parameter is uniformly distributed from 0 inches of rain to 60 inches of rain—far above any reasonable value for the population precipitation mean. We also specify that our prior for the standard deviation is a flat one from 0 to 30. If this were part of any meaningful analysis and not just a pedagogical example, our priors would be informed in part by precipitation data from other regions like the US or my precipitation data from previous years. JAGS comes chock full of different families of distributions for expressing different priors.

Next, we specify that the variable samp (which will hold the precipitation data) is distributed normally with unknown parameters mu and tau.

Then, we construct an R list to hold the variables to send to JAGS:

  the.data <- list( 
    samp = precip,
    theLength = length(precip)
  )

Cool, let's run it! On my computer, this takes 5 seconds.

  > results <- autorun.jags(the.model, 
  +                         data=the.data,
  +                         n.chains = 3,
  +                         # now we care about two parameters
  +                         monitor = c('mu', 'stddev'))

Let's plot the results directly like before, while being careful to plot both the trace plot and histogram from both parameters by increasing the layout argument in the call to the plot function.

  > plot(results,
  +      plot.type=c("histogram", "trace"),
  +      layout=c(2,2))
Fitting distributions the Bayesian way

Figure 7.13: Output plots from the MCMC result of fitting a normal curve to the built-in precipitation data set

Figure 7.14 shows the distribution of credible values of the mu parameter without reference to the stddev parameter. This is called a marginal distribution.

Fitting distributions the Bayesian way

Figure 7.14: Marginal distribution of posterior for parameter 'mu'. Dashed line shows hypothetical population mean within 95% credible interval

Remember when, in the last chapter, we wanted to determine whether the US' mean precipitation was significantly discrepant from the (hypothetical) known population mean precipitation of the rest of the world of 38 inches. If we take any value outside the 95% credible interval to indicate significance, then, just like when we used the NHST t-test, we have to reject the hypothesis that there is significantly more or less rain in the US than in the rest of the world.

Before we move on to the next example, you may be interested in credible values for both the mean and the standard deviation at the same time. A great type of plot for depicting this information is a contour plot, which illustrates the shape of a three-dimensional surface by showing a series of lines for which there is equal height. In Figure 7.15, each line shows the edges of a slice of the posterior distribution that all have equal probability density.

  > results.matrix <- as.matrix(results$mcmc)
  > 
  > library(MASS)
  > # we need to make a kernel density
  > # estimate of the 3-d surface
  > z <- kde2d(results.matrix[,'mu'],
  +            results.matrix[,'stddev'],
  +            n=50)
  > 
  > plot(results.matrix)
  > contour(z, drawlabels=FALSE,
  +         nlevels=11, col=rainbow(11),
  +         lwd=3, add=TRUE)
Fitting distributions the Bayesian way

Figure 7.15: Contour plot of the joint posterior distribution. The purple contour corresponds to the region with the highest probability density

The purple contours (the inner-most contours) show the region of the posterior with the highest probability density. These correspond to the most likely values of our two parameters. As you can see, the most likely values of the parameters for the normal distribution that best describes our present knowledge of US precipitation are a mean of a little less than 35 and a standard deviation of a little less than 14. We can corroborate the results of our visual inspection by directly printing the results variable:

> print(results)

JAGS model summary statistics from 30000 samples (chains = 3; adapt+burnin = 5000):
                                                  

       Lower95 Median Upper95   Mean     SD   Mode
mu      31.645 34.862  38.181 34.866 1.6639 34.895
stddev  11.669 13.886  16.376 13.967 1.2122 13.773
                                                
           MCerr MC%ofSD SSeff      AC.10   psrf
mu      0.012238     0.7 18484   0.002684 1.0001
stddev 0.0093951     0.8 16649 -0.0053588 1.0001

Total time taken: 5 seconds

which also shows other summary statistics from our MCMC samples and some information about the MCMC process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.103.3