Standard Transformer

A standard Transformer transforms the input dataset into the output dataset, explicitly applying transformation function to the input data. There is no dependency on the input data other than reading the input column and generating the output column.

Such Transformers are invoked as shown next:

outputDF = transfomer.transform(inputDF)

Examples of standard Transformers are as follows and will be explained in detail in the subsequent sections:

  • Tokenizer: This splits sentences into words using space as the delimiter
  • RegexTokenizer: This splits sentences into words using regular expressions to split
  • StopWordsRemover: This removes commonly used stop words from the list of words
  • Binarizer: This converts the strings to binary numbers 0/1
  • NGram: This creates N word phrases from the sentences
  • HashingTF: This creates Term frequency counts using hash table to index the words
  • SQLTransformer: This implements the transformations, which are defined by SQL statements
  • VectorAssembler: This combines a given list of columns into a single vector column

The diagram of a standard Transformer is as follows, where the input column from an input dataset is transformed into an output column generating the output dataset:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.170.27