Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

VectorAssembler

Before we start with the actual machine learning algorithm, we need to apply one final transformation. We have to create one additional feature column containing all the information of the columns that we want the machine learning algorithm to consider. This is done by org.apache.spark.ml.feature.VectorAssembler as follows:

import org.apache.spark.ml.feature.VectorAssembler
vectorAssembler = new VectorAssembler()
        .setInputCols(Array("colorVec", "field2", "field3","field4"))
        .setOutputCol("features")

This transformer adds only one single column to the resulting DataFrame called features, which is of the org.apache.spark.ml.linalg.Vector type. In other words, this new column called features, created by the VectorAssembler, contains all the defined columns (in this case, colorVec, field2, field3, and field4) encoded in a single vector object for each row. This is the format the Apache SparkML algorithms are happy with.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.14.131.47

Table of Contents for VectorAssembler

Create new playlist

Sign In

Sign Up

Table of Contents for
VectorAssembler