Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Case study

In this section, we will perform a case study with real-world machine learning datasets to illustrate some of the concepts from Bayesian networks.

We will use the UCI Adult dataset, also known as the Census Income dataset (http://archive.ics.uci.edu/ml/datasets/Census+Income). This dataset was extracted from the United States Census Bureau's 1994 census data. The donors of the data is Ronny Kohavi and Barry Becker, who were with Silicon Graphics at the time. The dataset consists of 48,842 instances with 14 attributes, with a mix of categorical and continuous types. The target class is binary.

Business problem

The problem consists of predicting the income of members of a population based on census data, specifically, whether their income is greater than $50,000.

Machine learning mapping

This is a problem of classification and this time around we will be training Bayesian graph networks to develop predictive models. We will be using linear, non-linear, and ensemble algorithms, as we have done in experiments in previous chapters.

Data sampling and transformation

In the original dataset, there are 3,620 examples with missing values and six duplicate or conflicting instances. Here we include only examples with no missing values. This set, without unknowns, is divided into 30,162 training instances and 15,060 test instances.

Feature analysis

The features and their descriptions are given in Table 3:

Feature	Type information
age	continuous.
workclass	Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt	continuous.
education	Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num	continuous.
marital-status	Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation	Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
relationship	Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race	White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex	Female, Male.
capital-gain	continuous.
capital-loss	continuous.
hours-per-week	continuous.
native-country	United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Table 3. UCI Adult dataset – features

The dataset is split by label as 24.78% (>50K) to 75.22% (<= 50K). Summary statistics of key features are given in Figure 25:

Figure 25. Feature summary statistics

Models, results, and evaluation

We will perform detailed analysis on the Adult dataset using different flavors of Bayes network structures and with regular linear, non-linear, and ensemble algorithms. Weka also has an option to visualize the graph model on the trained dataset using the menu item, as shown in Figure 26. This is very useful when the domain expert wants to understand the assumptions and the structure of the graph model. If the domain expert wants to change or alter the network, it can be done easily and saved using the Bayes Network editor.

Figure 26. Weka Explorer – visualization menu

Figure 27 shows the visualization of the trained Bayes Network model's graph structure:

Figure 27: Visualization of learned structure of the Bayesian network.

The algorithms used for experiments are:

Bayesian network Classifiers
Naïve Bayes with default Kernel estimation on continuous data
Naïve Bayes with supervised discretization on continuous data
Tree augmented network (TAN) with search-score structure parameter learning using the K2 algorithm and a choice of three parents per node
Bayesian network with search and score
Searching using Hill Climbing and K2
Scoring using Simple Estimation
Choice of parents changed from two to three to illustrate the effect on metrics
Non-Bayesian algorithms
Logistic Regression (default parameters)
KNN (IBK with 10 Neighbors)
Decision Tree (J48, default parameters)
AdaBoostM1 (DecisionStump and default parameters)
Random Forest (default parameters)

Table 4 presents the evaluation metrics for all the learners used in the experiments, including Bayesian network classifiers as well as the non-Bayesian algorithms:

Algorithms	TP Rate	FP Rate	Precision	Recall	F-Measure	MCC	ROC Area	PRC Area
Naïve Bayes (Kernel Estimator)	0.831	0.391	0.821	0.831	0.822	0.494	0.891	0.906
Naïve Bayes (Discretized)	0.843	0.191	0.861	0.843	0.848	0.6	0.917	0.93
TAN (K2, 3 Parents, Simple Estimator)	0.859	0.273	0.856	0.859	0.857	0.6	0.916	0.931
BayesNet (K2, 3 Parents, Simple Estimator)	0.863	0.283	0.858	0.863	0.86	0.605	0.934	0.919
BayesNet (K2, 2 Parents, Simple Estimator)	0.858	0.283	0.854	0.858	0.855	0.594	0.917	0.932
BayesNet (Hill Climbing, 3 Parents, Simple Estimator)	0.862	0.293	0.857	0.862	0.859	0.602	0.918	0.933
Logistic Regression	0.851	0.332	0.844	0.851	0.845	0.561	0.903	0.917
KNN (10)	0.834	0.375	0.824	0.834	0.826	0.506	0.867	0.874
Decision Tree (J48)	0.858	0.300	0.853	0.858	0.855	0.590	0.890	0.904
AdaBoostM1	0.841	0.415	0.833	0.841	0.826	0.513	0.872	0.873
Random Forest	0.848	0.333	0.841	0.848	0.843	0.555	0.896	0.913

Table 4. Classifier performance metrics

Analysis of results

Naïve Bayes with supervised discretization shows relatively better performance than kernel estimation. This gives a useful hint that discretization, which is needed in most Bayes networks, will play an important role.

The results in the table show continuous improvement when Bayes network complexity is increased. For example, Naïve Bayes with discretization assumes independence from all features and shows a TP rate of 84.3, the TAN algorithm where there can be one more parent shows a TP rate of 85.9, and BN with three parents shows the best TP rate of 86.2. This clearly indicates that a complex BN with some nodes having no more than three parents can capture the domain knowledge and encode it well to predict on unseen test data.

Bayes network where structure is learned using search and score (with K2 search with three parents and scoring using Bayes score) and estimation is done using simple estimation, performs the best in almost all the metrics of the evaluation, as shown in the highlighted values.

There is a very small difference between Bayes Networks—where structure is learned using search and score of Hill Climbing—and K2, showing that even local search algorithms can find an optimum.

Bayes network with a three-parent structure beats most linear, non-linear, and ensemble methods such as AdaBoostM1 and Random Forest on almost all the metrics on unseen test data. This shows the strength of BNs in not only learning the structure and parameters on small datasets with large number of missing values as well as predicting well on unseen data, but in beating other sophisticated algorithms too.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Case study

Create new playlist

Sign In

Sign Up