How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
  1. Set up the package location where the program will reside
package spark.ml.cookbook.chapter3
  1. Import the necessary packages
import breeze.numerics.pow 
import org.apache.spark.sql.SparkSession 
import Array._
  1. Import the packages for setting up logging level for log4j. This step is optional, but we highly recommend it (change the level appropriately as you move through the development cycle).
import org.apache.log4j.Logger 
import org.apache.log4j.Level 
  1. Set up the logging level to warning and error to cut down on output. See the previous step for package requirements.
Logger.getLogger("org").setLevel(Level.ERROR) 
Logger.getLogger("akka").setLevel(Level.ERROR) 
  1. Set up the Spark context and application parameter so Spark can run.
val spark = SparkSession 
  .builder 
  .master("local[*]") 
  .appName("myRDD") 
  .config("Spark.sql.warehouse.dir", ".") 
  .getOrCreate() 
  1. Set up the data structures and RDD for the example:
val num : Array[Double]    = Array(1,2,3,4,5,6,7,8,9,10,11,12,13) 
val odd : Array[Double]    = Array(1,3,5,7,9,11,13) 
val even : Array[Double]    = Array(2,4,6,8,10,12) 
  1. We apply the intersection() function to the RDDs to demonstrate the transformation:
val intersectRDD = numRDD.intersection(oddRDD) 

On running the previous code, you will get the following output:

1.0
3.0
5.0
  1. We apply the union() function to the RDDs to demonstrate the transformation:
    val unionRDD = oddRDD.union(evenRDD) 

On running the previous code, you will get the following output:

1.0
2.0
3.0
4.0
  
  1. We apply the subract() function to the RDDs to demonstrate the transformation:
val subtractRDD = numRDD.subtract(oddRDD) 

On running the previous code, you will get the following output:

2.0
4.0
6.0
8.0
  
  1. We apply the distinct() function to the RDDs to demonstrate the transformation:
val namesRDD = spark.sparkContext.parallelize(List("Ed","Jain", "Laura", "Ed")) 
val ditinctRDD = namesRDD.distinct() 

On running the previous code, you will get the following output:

"ED"
"Jain"
"Laura"
  
  1. We apply the distinct() function to the RDDs to demonstrate the transformation
val cartesianRDD = oddRDD.cartesian(evenRDD) 
cartesianRDD.collect.foreach(println) 

On running the previous code, you will get the following output:

(1.0,2.0)
(1.0,4.0)
(1.0,6.0)
(3.0,2.0)
(3.0,4.0)
(3.0,6.0)   
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.160.221