Estimator Transformer

An Estimator Transformer transforms the input dataset into the output dataset by first generating a Transformer based on the input dataset. Then the Transformer processes the input data, reading the input column and generating the output column in the output dataset.

Such Transformers are invoked as shown next:

transformer = estimator.fit(inputDF)
outputDF = transformer.transform(inputDF)

The examples of Estimator Transformers are as follows:

IDF
LDA
Word2Vec

The diagram of an Estimator Transformer is as follows, where the input column from an input dataset is transformed into an output column generating the output dataset:

In the next few sections, we will look deeper into text analytics using a simple example dataset, which consists of lines of text (sentences), as shown in the following screenshot:

The upcoming code is used to load the text data into the input dataset.

Initialize a sequence of sentences called lines using a sequence of pairs of ID and text as shown next.

val lines = Seq(
 | (1, "Hello there, how do you like the book so far?"),
 | (2, "I am new to Machine Learning"),
 | (3, "Maybe i should get some coffee before starting"),
 | (4, "Coffee is best when you drink it hot"),
 | (5, "Book stores have coffee too so i should go to a book store")
 | )
lines: Seq[(Int, String)] = List((1,Hello there, how do you like the book so far?), (2,I am new to Machine Learning), (3,Maybe i should get some coffee before starting), (4,Coffee is best when you drink it hot), (5,Book stores have coffee too so i should go to a book store))

Next, invoke the createDataFrame() function to create a DataFrame from the sequence of sentences we saw earlier.

scala> val sentenceDF = spark.createDataFrame(lines).toDF("id", "sentence")
sentenceDF: org.apache.spark.sql.DataFrame = [id: int, sentence: string]

Now you can see the newly created dataset, which shows the Sentence DataFrame containing two column IDs and sentences.

scala> sentenceDF.show(false)
|id|sentence |
|1 |Hello there, how do you like the book so far? |
|2 |I am new to Machine Learning |
|3 |Maybe i should get some coffee before starting |
|4 |Coffee is best when you drink it hot |
|5 |Book stores have coffee too so i should go to a book store|

Table of Contents for Estimator Transformer

Create new playlist

Sign In

Sign Up

Table of Contents for
Estimator Transformer