Exercises

  1. Rerun the first model using the petal length and then petal width variables. What are the main differences in the results? How wide or narrow is the 95% HPD interval in each case?
  2. Repeat exercise 1, this time using a Student's t-distribution as a weakly informative prior. Try different values of .
  3. Go back to the first example, the logistic regression for classifying setosa or versicolor given sepal length. Try to solve the same problem using a simple linear regression model, as we saw in Chapter 3, Modeling with Linear Regression. How useful is linear regression compared to logistic regression? Can the result be interpreted as a probability? Tip, check whether the values of  are restricted to the [0, 1] interval.
  1. In the example from the Interpreting the coefficients of a logistic regression section, we changed sepal_length by 1 unit. Using Figure 4.6, corroborate that the value of log_odds_versicolor_i corresponds to the value of probability_versicolor_i. Do the same for log_odds_versicolor_f and probability_versicolor_f. Just by noting log_odds_versicolor_i is negative, what can you say about the probability? Use Figure 4.6 to help you. Is this result clear to you from the definition of log-odds?
  2.  Use the same example from the previous exercise. For model_1, check how much the log-odds change when increasing sepal_length from 5.5 to 6.5 (spoiler: it should be 4.66). How much does the probability change? How does this increase compare to when we increase from 4.5 to 5.5?
  3. In the example for dealing with unbalanced data, change df = df[45:] to df = df[22:78]. This will keep roughly the same number of data points, but now the classes will be balanced. Compare the new result with the previous ones. Which one is more similar to the example using the complete dataset?
  4. Suppose instead of a softmax regression, we use a simple linear model by coding setosa =0, versicolor =1, and virginica = 2. Under the simple linear regression model, what will happen if we switch the coding? Will we get the same or different results?
  5. Compare the likelihood of the logistic model versus the likelihood of the LDA model. Use the sample_posterior_predictive function to generate predicted data, and compare the types of data you get for both cases. Be sure you understand the difference between the types of data the model predicts.
  6. Using the fish data, extend the ZIP_reg model to include the persons variable as part of a linear model. Include this variable to model the number of extra zeros. You should get a model that includes two linear models: one connecting the number of children and the presence/absence of a camper to the Poisson rate (as in the example we saw), and another connecting the number of persons to the  variable. For the second case, you will need a logistic inverse link!
  7. Use the data for the robust logistic example to feed a non-robust logistic regression model and to check that the outliers actually affected the results. You may want to add or remove outliers to better understand the effect of the estimation on a logistic regression and the robustness on the model introduced in this chapter.
  8. Read and run the following notebooks from PyMC3's documentation:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.205.205