Specifying the phases of classifiers

Once the labeled data is prepared, the development of the classifiers involves training, evaluation, and deployment. These three phases of implementing a classifier are shown in the CRISP-DM (Cross-Industry Standard Process for Data Mining life cycle in the following diagram (the CRISP-DM life cycle was explained in more detail in Chapter 5, Graph Algorithms)

In the first two phases of implementing a classifier—the testing and training phases—we use labeled data. The labeled data is divided into two partitions—a larger partition called the training data and a smaller partition called the testing data. A random sampling technique is used to divide the input labeled data into training and testing partitions to make sure that both partitions contain consistent patterns. Note that, as the preceding diagram shows, first, there is a training phase, where training data is used to train a model. Once the training phase is over, the trained model is evaluated using the testing data. Different performance matrices are used to quantify the performance of the trained model. Once the model is evaluated, we have the model deployment phase, where the trained model is deployed and used for inference to solve real-world problems by labeling unlabeled data.

Now, let's look at some classification algorithms.

We will look at the following classification algorithms in the subsequent sections:

  • The decision tree algorithm
  • The XGBoost algorithm
  • The random forest algorithm
  • The logistic regression algorithm
  • The Support Vector Machine (SVM) algorithm
  • The naive Bayes algorithm

Let's start with the decision tree algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.228.35