Variable binning or discretizing continuous data

The continuous variable is the most appropriate step that one needs to take before including the variable in the model. This can be explained by taking one example fuel tank capacity of a car from the Cars93 dataset. Based on the fuel tank capacity, we can create a categorical variable with high, medium and low, lower medium:

> range(Cars93$Fuel.tank.capacity)

[1] 9.2 27.0

> cat

[1] 9.2 13.2 17.2 21.2 25.2

> options(digits = 2)

> t<-cut(Cars93$Fuel.tank.capacity,cat)

> as.data.frame(cbind(table(t)))

V1

(9.2,13.2] 19

(13.2,17.2] 33

(17.2,21.2] 36

(21.2,25.2] 3

The range of fuel tank capacity is identified as 9.2 and 27.0. Then, logically the class difference of 4 is used to arrive at classes. Those classes define how each value from the variable is assigned to each group. The final outcome table indicates that there are 4 groups; the top fuel tank capacity is available on 4 cars only.

Variable binning or discretization not only helps in decision tree construction but is also useful in the case of logistic regression mode and any other form of machine-learning-based models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.36.221