Part IV

Data Modeling

Chapter 13 Linear Regression (Continuous Outcome Variable)

Chapter 14 Generalized Linear Models

Chapter 15 Survival Analysis

Chapter 16 Model Diagnostics

Chapter 17 Regularization

Chapter 18 Clustering

This part of the book follows the methods described in Jared Lander’s R for Everyone. The rationale is that since you have learned the methods of data manipulation in Python using Pandas, you can save out the cleaned data set if you need to use a method from another analytics language.

This part covers many of the basic modeling techniques and serves as an introduction to data analytics and machine learning. Other great references are:

  • Andreas Müller and Sarah Guido’s Introduction to Machine Learning with Python

  • Sebastian Raschka and Vahid Mirjalili’s Python Machine Learning

  • Mark Fenner’s Machine Learning with Python for Everyone

  • Andrew Kelleher and Adam Kelleher’s Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications

Many of the techniques covered so far in the book apply to figuring out what kind of information is stored in our columns, in particular, the variable we are trying to model or predict. If our data has an outcome variable, we can use supervised modeling techniques. If our variable of interest is continuous, we would use a linear regression model (Chapter 13). If our outcome variable is binary we would use a logistic regression model, if it is count data, we would use a Poisson model (Chapter 14). Survival models are used when we are looking for an outcome of interest, but also have censoring (Chapter 15). When we are fitting models for prediction, we sometimes need to find a way to pick the “best” model, this is when we have to compare model diagnostics (Chapter 16).

If we are solely interested in prediction, and not inference, we can employ regularization techniques to make our model more numerically stable (Chapter 17). If we do not have an outcome variable we can test our model against, we would use some kind of unsupervised modeling technique, such as clustering (Chapter 18).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.56