This chapter will cover how to create a SparkContext
context for your cluster. A SparkContext
class represents the connection to a Spark cluster and provides the entry point for interacting with Spark. We need to create a SparkContext
instance so that we can interact with Spark and distribute our jobs. In Chapter 2, Using the Spark Shell, we interacted with Spark through the Spark shell, which created a SparkContext
. Now you can create RDDs, broadcast variables, counters, and so on, and actually do fun things with your data. The Spark shell serves as an example of interaction with the Spark cluster through SparkContext
in ./repl/src/main/scala/spark/repl/SparkILoop.scala
.
The following code snippet creates a SparkContext
instance using the MASTER
environment variable (or local
, if none are set) called Spark shell
and doesn't specify any dependencies. This is because the Spark shell is built into Spark and, as such, doesn't have any JAR files that it needs to be distributed.
def createSparkContext(): SparkContext = { val master = this.master match { case Some(m) => m case None => { val prop = System.getenv("MASTER") if (prop != null) prop else "local" } } sparkContext = new SparkContext(master, "Spark shell") sparkContext }
For a client to establish a connection to the Spark cluster, the SparkContext
object needs some basic information as follows:
In a Scala program, you can create a SparkContext
instance using the following code:
val spar kContext = new SparkContext(master_path, "application name", ["optional spark home path"],["optional list of jars"])
While you can hardcode all of these values, it's better to read them from the environment with reasonable defaults. This approach provides maximum flexibility to run the code in a changing environment without having to recompile the code. Using local
as the default value for the master machine makes it easy to launch your application locally in a test environment. By carefully selecting the defaults, you can avoid having to over-specify them. An example would be as follows:
import spark.sparkContext import spark.sparkContext._ import scala.util.Properties val master = Properties.envOrElse("MASTER","local") val sparkHome = Properties.get("SPARK_HOME") val myJars = Seq(System.get("JARS") val sparkContext = new SparkContext(master, "my app", sparkHome, myJars)
18.117.146.155