So far in the book, we have had an analyst at the heart of the data analysis. In a
perfect environment, the analyst may go through the following steps:
-
Step 1. Analyze data on a certain business problem. Let us say that the analyst is trying
to predict which insurance customers will make a claim. The analyst will usually start
with past data to see – perhaps through a variant of regression analysis – whether
the chance of a customer making a claim can be explained using existing data.
-
Step 2. Build a model for business predictions and decisions in the area. The analyst may
use the analysis in Step 1 to build a predictive model that can be used to predict
claims, based on other data (such as gender or age).
-
Step 3. Evaluate the model against new data. As new claims come in, the analyst would evaluate
how accurately his or her model is making predictions.
-
Step 4. If necessary, make changes to the model and reapply. If the model seems to need
elaboration or tweaking, the analyst may change the model to see if he or she can
get better predictions of insurance claims than before.
-
Step 5. Repeat Steps 3 and 4, if necessary, many times over, as you constantly seek better
results.
However, increasingly, we are automating the serious thought and decision making,
leaving the analysis to computers. The process seen above, when performed by a computer,
would be one example of machine learning.
In the example above, we may automate analysis of the insurance datasets by telling
the computer what analyses to apply to the data and what to look for (in this example,
better prediction of claims).
The next section discusses a few types of machine learning.