Standalone programs

So far, we have been using Spark SQL and DataFrames through the Spark shell. To use it in standalone programs, you will need to create it explicitly, from a Spark context:

val conf = new SparkConf().setAppName("applicationName")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Additionally, importing the implicits object nested in sqlContext allows the conversions of RDDs to DataFrames:

import sqlContext.implicits._

We will use DataFrames extensively in the next chapter to manipulate data to get it ready for use with MLlib.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.220.22