Installing and getting started

The best way of using SparkR is from RStudio. Your R program can be connected to a Spark cluster from RStudio using R shell, Rescript, or other R IDEs.

Option 1. Set SPARK_HOME in the environment (you can check https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.getenv.html), load the SparkR package, and call sparkR.session as follows. It will check for the Spark installation, and, if not found, it will be downloaded and cached automatically:

if (nchar(Sys.getenv("SPARK_HOME")) < 1) { 
Sys.setenv(SPARK_HOME = "/home/spark")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

Option 2. You can also manually configure SparkR on RStudio. For doing so, create an R script and execute the following lines of R code on RStudio:

SPARK_HOME = "spark-2.1.0-bin-hadoop2.7/R/lib" 
HADOOP_HOME= "spark-2.1.0-bin-hadoop2.7/bin"
Sys.setenv(SPARK_MEM = "2g")
Sys.setenv(SPARK_HOME = "spark-2.1.0-bin-hadoop2.7")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

Now load the SparkR library as follows:

library(SparkR, lib.loc = SPARK_HOME)

Now, like Scala/Java/PySpark, the entry point to your SparkR program is the SparkR session that can be created by calling sparkR.session as follows:

sparkR.session(appName = "Hello, Spark!", master = "local[*]")

Furthermore, if you want, you could also specify certain Spark driver properties. Normally, these application properties and runtime environment cannot be set programmatically, as the driver JVM process would have been started; in this case, SparkR takes care of this for you. To set them, pass them as you would pass other configuration properties in the sparkConfig argument to sparkR.session() as follows:

sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g")) 

In addition, the following Spark driver properties can be set in sparkConfig with sparkR.session from RStudio:

Figure 19: Spark driver properties can be set in sparkConfig with sparkR.session from RStudio
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.198.183