Should I take the job?

You have just been offered a new job, and you need to decide whether or not you are going to take it. There are a few things that are important for you to consider, so we will use those as input variables, or features, for our decision tree. Here is what is important to you: pay, benefits, company culture, and of course, Can I work from home?

Rather than load data from disk storage, we are going to create an in-memory database and add our features this way. We will create DataTable and create Columns as features as shown here:

DataTable data = new DataTable("Should I Go To Work For Company
X");
data.Columns.Add("Scenario");
data.Columns.Add("Pay");
data.Columns.Add("Benefits");
data.Columns.Add("Culture");
data.Columns.Add("WorkFromHome");
data.Columns.Add("ShouldITakeJob");

After this, we will load several rows of data, each with a different set of features, and our last column, ShouldITakeJob being a Yes or No, as our final decision:

data.Rows.Add("D1", "Good", "Good", "Mean", "Yes", "Yes");
data.Rows.Add("D2", "Good", "Good", "Mean", "No", "Yes");
data.Rows.Add("D3", "Average", "Good", "Good", "Yes", "Yes");
data.Rows.Add("D4", "Average", "Good", "Good", "No", "Yes");
data.Rows.Add("D5", "Bad", "Good", "Good", "Yes", "No");
data.Rows.Add("D6", "Bad", "Good", "Good", "No", "No");
data.Rows.Add("D7", "Good", "Average", "Mean", "Yes", "Yes");
data.Rows.Add("D8", "Good", "Average", "Mean", "No", "Yes");
data.Rows.Add("D9", "Average", "Average", "Good", "Yes", "No");
data.Rows.Add("D10", "Average", "Average", "Good", "No", "No");
data.Rows.Add("D11", "Bad", "Average", "Good", "Yes", "No");
data.Rows.Add("D12", "Bad", "Average", "Good", "No", "No");
data.Rows.Add("D13", "Good", "Bad", "Mean", "Yes", "Yes");
data.Rows.Add("D14", "Good", "Bad", "Mean", "No", "Yes");
data.Rows.Add("D15", "Average", "Bad", "Good", "Yes", "No");
data.Rows.Add("D16", "Average", "Bad", "Good", "No", "No");
data.Rows.Add("D17", "Bad", "Bad", "Good", "Yes", "No");
data.Rows.Add("D18", "Bad", "Bad", "Good", "No", "No");
data.Rows.Add("D19", "Good", "Good", "Good", "Yes", "Yes"); data.Rows.Add("D20", "Good", "Good", "Good", "No", "Yes");

Once all of the data is created and in our table, we need to put our previous features into a form of representation that our computer can understand. Since all our features are categories, it doesn't really matter how we represent them, if we are consistent. Since numbers are easier, we will convert our features (categories) to a code book through a process known as Codification. This codebook effectively transforms every value into an integer. Notice that we will pass in our data categories as input:

Codification codebook = new Codification(data);

Next, we need to create decision variables for our decision tree to use. The tree will be trying to help us determine if we should take our new job offer or not. For this decision, there will be several categories of inputs, which we will specify in our decision variable array, and two possible decisions, Yes or No.

The DecisionVariable array will hold the name of each category as well as the total count of possible attributes for this category. For example, the Pay category has three possible values, Good, Average, or  Poor. So, we specify the category name and the number 3. We then repeat this for all our other categories except the last one, which is our decision:

DecisionVariable[] attributes =
{
new DecisionVariable("Pay", 3),
new DecisionVariable("Benefits", 3),
new DecisionVariable("Culture", 3),
new DecisionVariable("WorkFromHome", 2)
};
int outputValues = 2; // 2 possible output values: yes or no
DecisionTree tree = new DecisionTree(attributes, outputValues);

Now that we have our decision tree created, we have to teach it the problem we are trying to solve. At this point, it really knows nothing. In order to do that, we will have to create a learning algorithm for the tree to use. In our case, that would be the ID3 algorithm we discussed earlier. Since we have only categorical values for this sample, the ID3 algorithm is the simplest choice. Please feel free to replace it with C4.5, C5.0, or whatever you want to try:

ID3Learning id3 = new ID3Learning(tree);
Now, with our tree fully created and ready to go, we
are ready to classify new samples.
// Translate our training data into integer symbols using our codebook:
DataTable symbols = codebook.Apply(data);
int[][] inputs = symbols.ToArray<int>("Pay", "Benefits", "Culture",
"WorkFromHome");
int[] outputs = symbols.ToIntArray("ShouldITakeJob").GetColumn(0);
// Learn the training instances!
id3.Run(inputs, outputs);

Once the learning algorithm has been run, it is trained and ready to use. We simply provide a sample dataset to the algorithm so that it can give us back an answer. In this case, the pay is good, the company culture is good, the benefits are good, and I can work from home. If the decision tree is trained properly, the answer is a resounding Yes:

int[] query = codebook.Translate("Good", "Good", "Good", "Yes");
int output = tree.Compute(query);
string answer = codebook.Translate("ShouldITakeJob", output);
// answer will be "Yes".

Next, we will turn our attention to using the numl open source machine learning package to show you another example of training and using a decision tree.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.111