Specifying the phases of classifiers

Once the labeled data is prepared, the development of the classifiers involves training, evaluation, and deployment. These three phases of implementing a classifier are shown in the CRISP-DM (Cross-Industry Standard Process for Data Mining) life cycle in the following diagram (the CRISP-DM life cycle was explained in more detail in Chapter 5, Graph Algorithms)

In the first two phases of implementing a classifier—the testing and training phases—we use labeled data. The labeled data is divided into two partitions—a larger partition called the training data and a smaller partition called the testing data. A random sampling technique is used to divide the input labeled data into training and testing partitions to make sure that both partitions contain consistent patterns. Note that, as the preceding diagram shows, first, there is a training phase, where training data is used to train a model. Once the training phase is over, the trained model is evaluated using the testing data. Different performance matrices are used to quantify the performance of the trained model. Once the model is evaluated, we have the model deployment phase, where the trained model is deployed and used for inference to solve real-world problems by labeling unlabeled data.

Now, let's look at some classification algorithms.

We will look at the following classification algorithms in the subsequent sections:

The decision tree algorithm
The XGBoost algorithm
The random forest algorithm
The logistic regression algorithm
The Support Vector Machine (SVM) algorithm
The naive Bayes algorithm

Let's start with the decision tree algorithm.

Table of Contents for Specifying the phases of classifiers

Create new playlist

Sign In

Sign Up

Table of Contents for
Specifying the phases of classifiers