Exercises

Check the following definition of a probabilistic model. Identify the likelihood, the prior, and the posterior:

For the model in exercise 1, how many parameters have the posterior? In other words, how many dimensions does it have?
Write down Bayes' theorem for the model in exercise 1.

Check the following model. Identify the linear model and identify the likelihood. How many parameters does the posterior have?

For the model in exercise 1, assume that you have a dataset with 57 data points coming from a Gaussian with a mean of 4 and a standard deviation of 0.5. Using PyMC3, compute:
- The posterior distribution
- The prior distribution
- The posterior predictive distribution
- The prior predictive distribution

Tip: Besides pm.sample(), PyMC3 has other functions to compute samples.

Execute model_g using NUTS (the default sampler) and then using Metropolis.
Compare the results using ArviZ functions like plot_trace and plot_pairs. Center the variable x and repeat the exercise. What conclusion can you draw from this?
Using the howell dataset (available at https://github.com/aloctavodia/BAP) to create a linear model of the weight ( x) against the height (y). Exclude subjects that are younger than 18. Explain the results.
For four subjects, we get the weights (45.73, 65.8, 54.2, 32.59), but not their heights. Using the model from the previous exercise, predict the height for each subject, together with their HPDs of 50% and 94%.

Tip1: Check coal mining disaster example in the PyMC3 documentation.

Tip2: Use shared variables

Repeat exercise 7, this time including those below 18 years old. Explain the results.

It is known for many species that the weight does not scale with the height, but with the logarithm of the weight. Use this information to fit the howell data (including subjects from all ages). Do one more model, this time without using the logarithm but instead a second order polynomial. Compare and explain both results.
Think about a model that's able to fit the three first datasets from the Anscombe quartet. Also think about a model to fit the fourth dataset.
See in the accompanying code the model_t2 (and the data associated with it). Experiment with priors for , like the non-shifted exponential and gamma priors (they are commented in the code). Plot the prior distribution to ensure that you understand them. An easy way to do this is to just comment the likelihood in the model and check the trace plot. A more efficient way is to use the pm.sample_prior_predictive() function instead of pm.sample().
For the unpooled_model, change the value of sd of the prior; try values of 1 and 100. Explore how the estimated slopes change for each group. Which group is more affected by this change?
Using model hierarchical_model repeat Figure 3.18, the one with the eight groups and the eight lines, but this time dd the uncertainty to the linear fit.
Re-run the model_mlr example, this time without centering the data. Compare the uncertainty in the parameter for one case and the other. Can you explain these results?

Tip: Remember the definition of the parameter (also known as the intercept).

Read and run the following notebook from PyMC3's documentation: https://pymc-devs.github.io/pymc3/notebooks/LKJ.html.
Choose a dataset that you find interesting and use it with the simple linear regression model. Be sure to explore the results using ArviZ functions and compute the Pearson correlation coefficient. If you do not have an interesting dataset, try searching online, for example, at http://data.worldbank.org/ or http://www.stat.ufl.edu/~winner/datasets.html.

Table of Contents for Exercises