String indexer

Let's assume that we have a DataFrame df containing a column called color of categorical labels--red, green, and blue. We want to encode them as integer or float values. This is where kicks in. It automatically determines the cardinality of the category set and assigns each one a distinct value. So in our example, a list of categories such as red, red, green, red, blue, green should be transformed into 1, 1, 2, 1, 3, 2:

var indexer = new StringIndexer()

var indexed =

The result of this transformation is a DataFrame called indexed that, in addition to the colors column of the String type, now contains a column called colorsIndexed of type double.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.