Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Variable binning or discretizing continuous data

The continuous variable is the most appropriate step that one needs to take before including the variable in the model. This can be explained by taking one example fuel tank capacity of a car from the Cars93 dataset. Based on the fuel tank capacity, we can create a categorical variable with high, medium and low, lower medium:

> range(Cars93$Fuel.tank.capacity)

[1] 9.2 27.0

> cat

[1] 9.2 13.2 17.2 21.2 25.2

> options(digits = 2)

> t<-cut(Cars93$Fuel.tank.capacity,cat)

> as.data.frame(cbind(table(t)))

V1

(9.2,13.2] 19

(13.2,17.2] 33

(17.2,21.2] 36

(21.2,25.2] 3

The range of fuel tank capacity is identified as 9.2 and 27.0. Then, logically the class difference of 4 is used to arrive at classes. Those classes define how each value from the variable is assigned to each group. The final outcome table indicates that there are 4 groups; the top fuel tank capacity is available on 4 cars only.

Variable binning or discretization not only helps in decision tree construction but is also useful in the case of logistic regression mode and any other form of machine-learning-based models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Variable binning or discretizing continuous data

Create new playlist

Sign In

Sign Up

Variable binning or discretizing continuous data

Table of Contents for
Variable binning or discretizing continuous data