Binarizing features

Finally, we might find ourselves not caring too much about the exact feature values of the data. Instead, we might just want to know whether a feature is present or absent. Binarizing the data can be achieved by thresholding the feature values. Let's quickly remind ourselves of our feature matrix, X:

In [9]: X
Out[9]: array([[ 1., -2., 2.],
[ 3., 0., 0.],
[ 0., 1., -1.]])

Let's assume that these numbers represent the thousands of dollars in our bank accounts. If there are more than 0.5 thousand dollars in the account, we consider the person rich, which we represent with a 1. Otherwise, we put a 0. This is akin to thresholding the data with threshold=0.5:

In [10]: binarizer = preprocessing.Binarizer(threshold=0.5)
... X_binarized = binarizer.transform(X)
... X_binarized
Out[10]: array([[ 1., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])

The result is a matrix made up entirely of ones and zeros.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.53.93