Many problems we find in science, engineering, and business are of the following form. We have a variable and we want to model/predict a variable . Importantly, these variables are paired like . In the most simple scenario, known as simple linear regression, both and are uni-dimensional continuous random variables. By continuous, we mean a variable represented using real numbers (or floats, if you wish), and using NumPy, you will represent the variables or as one-dimensional arrays. Because this is a very common model, the variables get proper names. We call the variables the dependent, predicted, or outcome variables, and the variables the independent, predictor, or input variables. When is a matrix (we have different variables), we have what is known as multiple linear regression. In this and the following chapter, we will explore these and other linear regression models.
Some typical situations where linear regression models can be used are:
- Model the relationship between factors like rain, soil salinity, and the presence/absence of fertilizer in crop productivity. Then, answer questions such as: is the relationship linear? How strong is this relationship? Which factors have the strongest effect?
- Find a relationship between average chocolate consumption by country and the number of Nobel laureates in that country, and then understand why this relationship could be spurious.
- Predict the gas bill (used for heating and cooking) of your house by using the sun radiation from the local weather report. How accurate is this prediction?