VectorAssembler

Before we start with the actual machine learning algorithm, we need to apply one final transformation. We have to create one additional feature column containing all the information of the columns that we want the machine learning algorithm to consider. This is done by org.apache.spark.ml.feature.VectorAssembler as follows:

import org.apache.spark.ml.feature.VectorAssembler
vectorAssembler = new VectorAssembler()
.setInputCols(Array("colorVec", "field2", "field3","field4"))
.setOutputCol("features")

This transformer adds only one single column to the resulting DataFrame called features, which is of the org.apache.spark.ml.linalg.Vector type. In other words, this new column called features, created by the VectorAssembler, contains all the defined columns (in this case, colorVec, field2, field3, and field4) encoded in a single vector object for each row. This is the format the Apache SparkML algorithms are happy with.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.131.47