Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using JAGS and runjags

Although it's a bit silly to break out MCMC for the single-parameter career recommendation analysis that we discussed earlier, applying this method to this simple example will aid in its usage for more complicated models.

In order to get started, you need to install a software program called JAGS, which stands for Just Another Gibbs Sampler (a Gibbs sampler is a type of MCMC sampler). This program is independent of R, but we will be using R packages to communicate with it. After installing JAGS, you will need to install the R packages rjags, runjags, and modeest. As a reminder, you can install all three with this command:

  > install.packages(c("rjags", "runjags", "modeest"))

To make sure everything is installed properly, load the runjags package, and run the function testjags(). My output looks something like this:

  > library(runjags)
  > testjags()
  You are using R version 3.2.1 (2015-06-18) on a unix machine,
  with the RStudio GUI
  The rjags package is installed
  JAGS version 3.4.0 found successfully using the command
  '/usr/local/bin/jags'

The first step is to create the model that describes our problem. This model is written in an R-like syntax and stored in a string (character vector) that will get sent to JAGS to interpret. For this problem, we will store the model in a string variable called our.model, and the model looks like this:

  our.model <- "
  model {
    # likelihood function
    numSuccesses ~ dbinom(successProb, numTrials)

    # prior
    successProb ~ dbeta(1, 1)

    # parameter of interest
    theta <- numSuccesses / numTrials
  }"

Note that the JAGS syntax allows for R-style comments, which I included for clarity.

In the first few lines of the model, we are specifying the likelihood function. As we know, the likelihood function can be described with a binomial distribution. The line:

    numSuccesses ~ dbinom(successProb, numTrials)

says the variable numSuccesses is distributed according to the binomial function with hyper-parameters given by variable successProb and numTrials.

In the next relevant line, we are specifying our choice of the prior distribution. In keeping with our previous choice, this line reads, roughly: the successProb variable (referred to in the previous relevant line) is distributed in accordance with the beta distribution with hyper-parameters 1 and 1.

In the last line, we are specifying that the parameter we are really interested in is the proportion of successes (number of successes divided by the number of trials). We are calling that theta. Notice that we used the deterministic assignment operator (<-) instead of the distributed according to operator (~) to assign theta.

The next step is to define the successProb and numTrials variables for shipping to JAGS. We do this by stuffing these variables in an R list. We do this as follows:

  our.data <- list(
    numTrials = 40,
    successProb = 36/40
  )

Great! We are all set to run the MCMC.

  > results <- autorun.jags(our.model, 
  +                         data=our.data,
  +                         n.chains = 3,
  +                         monitor = c('theta'))

The function that runs the MCMC sampler and automatically stops at convergence is autorun.jags. The first argument is the string specifying the JAGS model. Next, we tell the function where to find the data that JAGS will need. After this, we specify that we want to run 3 independent MCMC chains; this will help guarantee convergence and, if we run them in parallel, drastically cut down on the time we have to wait for our sampling to be done. (To see some of the other options available, as always, you can run ?autorun.jags.) Lastly, we specify that we are interested in the variable 'theta'.

After this is done, we can directly plot the results variable where the results of the MCMC are stored. The output of this command is shown in Figure 7.11.

  > plot(results,
  +      plot.type=c("histogram", "trace"),
  +      layout=c(2,1))

Figure 7.11: Output plots from the MCMC results. The top is a trace plot of theta values along the chain's length. The bottom is a bar plot depicting the relative credibility of different theta values.

The first of these plots is called a trace plot. It shows the sampled values of theta as the chain got longer. The fact that all three chains are overlapping around the same set of values is, at least in this case, a strong guarantee that all three chains have converged. The bottom plot is a bar plot that depicts the relative credibility of different values of theta. It is shown here as a bar plot, and not a smooth curve, because the binomial likelihood function is discrete. If we want a continuous representation of the posterior distribution, we can extract the sample values from the results and plot it as a density plot with a sufficiently large bandwidth:

  > # mcmc samples are stored in mcmc attribute
  > # of results variable
  > results.matrix <- as.matrix(results$mcmc)
  > 
  > # extract the samples for 'theta'
  > # the only column, in this case
  > theta.samples <- results.matrix[,'theta']
  > 
  > plot(density(theta.samples, adjust=5))

And we can add the bounds of the 95% credible interval to the plot as before:

  > quantile(theta.samples, c(.025, .975))
   2.5% 97.5% 
  0.800 0.975 
  > lines(c(.8, .975), c(0.1, 0.1))
  > lines(c(.8, .8), c(0.15, 0.05))
  > lines(c(.975, .975), c(0.15, 0.05))

Figure 7.12: Density plot of the posterior distribution. Note that the x-axis starts here at 0.6

Rest assured that there is only a disagreement between the two credible intervals' bounds in this example because the MCMC could only sample discrete values from the posterior since the likelihood function is discrete. This will not occur in the other examples in this chapter. Regardless, the two methods seem to be in agreement about the shape of the posterior distribution and the credible values of theta. It is all but certain that my recommendations are better than chance. Go me!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Using JAGS and runjags

Create new playlist

Sign In

Sign Up

Using JAGS and runjags

Table of Contents for
Using JAGS and runjags