There's more...

There has been a change for registration of a DataFrame as a table. Refer to this:

  • For versions prior to Spark 2.0.0: registerTempTable()
  • For Spark version 2.0.0 and previous: createOrReplaceTempView()

Pre-Spark 2.0.0 to register a DataFrame as a SQL table like artifact:

Before we can use the DataFrame for queries via SQL, we have to register the DataFrame as a temp table so the SQL statements can refer to it without any Scala/Spark syntax. This step may cause confusion for many beginners as we are not creating any table (temp or permanent), but the call registerTempTable() creates a name in SQL land that the SQL statements can refer to without additional UDF or without any domain-specific query language.

  • Register the CustDf DataFrame as a name which SQL statements recognize as customers:
custDF.registerTempTable("customers")
  • Register the prodDf DataFrame as a name which SQL statements recognize as product:
custDF.registerTempTable("customers")
  • Register the saleDf DataFrame as a name which SQL statements recognize as sales:
custDF.registerTempTable("customers")

To ensure completeness, we include the import statements that we used prior to Spark 2.0.0 to run the code (namely, Spark 1.5.2):

import org.apache.spark._

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SQLContext
import org.apache.spark.mllib.linalg._
import org.apache.spark.util._
import Array._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{ StructType, StructField, StringType};
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.157.247