How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.

Set up the package location where the program will reside

package spark.ml.cookbook.chapter3

Import the necessary packages

import breeze.numerics.pow 
import org.apache.spark.sql.SparkSession 
import Array._

Import the packages for setting up logging level for log4j. This step is optional, but we highly recommend it (change the level appropriately as you move through the development cycle).

import org.apache.log4j.Logger 
import org.apache.log4j.Level

Set up the logging level to warning and error to cut down on output. See the previous step for package requirements.

Logger.getLogger("org").setLevel(Level.ERROR) 
Logger.getLogger("akka").setLevel(Level.ERROR)

Set up the Spark context and application parameter so Spark can run.

val spark = SparkSession 
  .builder 
  .master("local[*]") 
  .appName("myRDD") 
  .config("Spark.sql.warehouse.dir", ".") 
  .getOrCreate()

Set up the data structures and RDD for the example:

val num : Array[Double]    = Array(1,2,3,4,5,6,7,8,9,10,11,12,13) 
val odd : Array[Double]    = Array(1,3,5,7,9,11,13) 
val even : Array[Double]    = Array(2,4,6,8,10,12)

We apply the intersection() function to the RDDs to demonstrate the transformation:

val intersectRDD = numRDD.intersection(oddRDD)

On running the previous code, you will get the following output:

1.0
3.0
5.0

We apply the union() function to the RDDs to demonstrate the transformation:

    val unionRDD = oddRDD.union(evenRDD)

On running the previous code, you will get the following output:

1.0
2.0
3.0
4.0

We apply the subract() function to the RDDs to demonstrate the transformation:

val subtractRDD = numRDD.subtract(oddRDD)

On running the previous code, you will get the following output:

2.0
4.0
6.0
8.0

We apply the distinct() function to the RDDs to demonstrate the transformation:

val namesRDD = spark.sparkContext.parallelize(List("Ed","Jain", "Laura", "Ed")) 
val ditinctRDD = namesRDD.distinct()

On running the previous code, you will get the following output:

"ED"
"Jain"
"Laura"

We apply the distinct() function to the RDDs to demonstrate the transformation

val cartesianRDD = oddRDD.cartesian(evenRDD) 
cartesianRDD.collect.foreach(println)

On running the previous code, you will get the following output:

(1.0,2.0)
(1.0,4.0)
(1.0,6.0)
(3.0,2.0)
(3.0,4.0)
(3.0,6.0)

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...