Scenario
Sometimes, two variables are highly interdependent, and you'd like to study their variations in greater detail, and then fit a model to it. For example, Facebook might be interested in studying the correlation between the number of Facebook friends
and the age of a user, to find out which age group utilizes Facebook the most. They might also be interested in finding out whether the variation is linear.
Aim
To make a scatter plot for the most correlated variables and then fit a linear regression model to it.
Steps for Completion
- Make a subset of the loan dataset.
- Use cor for the preceding loan data subset, and then choose two highly correlated variables in the loan dataset.
- Make a scatter plot for the preceding pairs for grade A, then fit a linear regression model.
- Determine what are the correlations of the preceding pairs.
Here are screenshots of the output plots:
Analysis
Both of these plots reveal an (approximate) linear relationship between the preceding pairs, confirming the numbers that we obtained with the cor command.