Case study

In this section, we will perform a case study with real-world machine learning datasets to illustrate some of the concepts from Bayesian networks.

We will use the UCI Adult dataset, also known as the Census Income dataset (http://archive.ics.uci.edu/ml/datasets/Census+Income). This dataset was extracted from the United States Census Bureau's 1994 census data. The donors of the data is Ronny Kohavi and Barry Becker, who were with Silicon Graphics at the time. The dataset consists of 48,842 instances with 14 attributes, with a mix of categorical and continuous types. The target class is binary.

Business problem

The problem consists of predicting the income of members of a population based on census data, specifically, whether their income is greater than $50,000.

Machine learning mapping

This is a problem of classification and this time around we will be training Bayesian graph networks to develop predictive models. We will be using linear, non-linear, and ensemble algorithms, as we have done in experiments in previous chapters.

Data sampling and transformation

In the original dataset, there are 3,620 examples with missing values and six duplicate or conflicting instances. Here we include only examples with no missing values. This set, without unknowns, is divided into 30,162 training instances and 15,060 test instances.

Feature analysis

The features and their descriptions are given in Table 3:

Feature

Type information

age

continuous.

workclass

Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

fnlwgt

continuous.

education

Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education-num

continuous.

marital-status

Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

occupation

Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

relationship

Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

race

White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex

Female, Male.

capital-gain

continuous.

capital-loss

continuous.

hours-per-week

continuous.

native-country

United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Table 3. UCI Adult dataset – features

The dataset is split by label as 24.78% (>50K) to 75.22% (<= 50K). Summary statistics of key features are given in Figure 25:

Feature analysis
Feature analysis

Figure 25. Feature summary statistics

Models, results, and evaluation

We will perform detailed analysis on the Adult dataset using different flavors of Bayes network structures and with regular linear, non-linear, and ensemble algorithms. Weka also has an option to visualize the graph model on the trained dataset using the menu item, as shown in Figure 26. This is very useful when the domain expert wants to understand the assumptions and the structure of the graph model. If the domain expert wants to change or alter the network, it can be done easily and saved using the Bayes Network editor.

Models, results, and evaluation

Figure 26. Weka Explorer – visualization menu

Figure 27 shows the visualization of the trained Bayes Network model's graph structure:

Models, results, and evaluation

Figure 27: Visualization of learned structure of the Bayesian network.

The algorithms used for experiments are:

  • Bayesian network Classifiers
  • Naïve Bayes with default Kernel estimation on continuous data
  • Naïve Bayes with supervised discretization on continuous data
  • Tree augmented network (TAN) with search-score structure parameter learning using the K2 algorithm and a choice of three parents per node
  • Bayesian network with search and score
  • Searching using Hill Climbing and K2
  • Scoring using Simple Estimation
  • Choice of parents changed from two to three to illustrate the effect on metrics
  • Non-Bayesian algorithms
  • Logistic Regression (default parameters)
  • KNN (IBK with 10 Neighbors)
  • Decision Tree (J48, default parameters)
  • AdaBoostM1 (DecisionStump and default parameters)
  • Random Forest (default parameters)

Table 4 presents the evaluation metrics for all the learners used in the experiments, including Bayesian network classifiers as well as the non-Bayesian algorithms:

Algorithms

TP Rate

FP Rate

Precision

Recall

F-Measure

MCC

ROC Area

PRC Area

Naïve Bayes (Kernel Estimator)

0.831

0.391

0.821

0.831

0.822

0.494

0.891

0.906

Naïve Bayes (Discretized)

0.843

0.191

0.861

0.843

0.848

0.6

0.917

0.93

TAN (K2, 3 Parents, Simple Estimator)

0.859

0.273

0.856

0.859

0.857

0.6

0.916

0.931

BayesNet (K2, 3 Parents, Simple Estimator)

0.863

0.283

0.858

0.863

0.86

0.605

0.934

0.919

BayesNet (K2, 2 Parents, Simple Estimator)

0.858

0.283

0.854

0.858

0.855

0.594

0.917

0.932

BayesNet (Hill Climbing, 3 Parents, Simple Estimator)

0.862

0.293

0.857

0.862

0.859

0.602

0.918

0.933

Logistic Regression

0.851

0.332

0.844

0.851

0.845

0.561

0.903

0.917

KNN (10)

0.834

0.375

0.824

0.834

0.826

0.506

0.867

0.874

Decision Tree (J48)

0.858

0.300

0.853

0.858

0.855

0.590

0.890

0.904

AdaBoostM1

0.841

0.415

0.833

0.841

0.826

0.513

0.872

0.873

Random Forest

0.848

0.333

0.841

0.848

0.843

0.555

0.896

0.913

Table 4. Classifier performance metrics

Analysis of results

Naïve Bayes with supervised discretization shows relatively better performance than kernel estimation. This gives a useful hint that discretization, which is needed in most Bayes networks, will play an important role.

The results in the table show continuous improvement when Bayes network complexity is increased. For example, Naïve Bayes with discretization assumes independence from all features and shows a TP rate of 84.3, the TAN algorithm where there can be one more parent shows a TP rate of 85.9, and BN with three parents shows the best TP rate of 86.2. This clearly indicates that a complex BN with some nodes having no more than three parents can capture the domain knowledge and encode it well to predict on unseen test data.

Bayes network where structure is learned using search and score (with K2 search with three parents and scoring using Bayes score) and estimation is done using simple estimation, performs the best in almost all the metrics of the evaluation, as shown in the highlighted values.

There is a very small difference between Bayes Networks—where structure is learned using search and score of Hill Climbing—and K2, showing that even local search algorithms can find an optimum.

Bayes network with a three-parent structure beats most linear, non-linear, and ensemble methods such as AdaBoostM1 and Random Forest on almost all the metrics on unseen test data. This shows the strength of BNs in not only learning the structure and parameters on small datasets with large number of missing values as well as predicting well on unseen data, but in beating other sophisticated algorithms too.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.35.58