DoubleRDD

DoubleRDD is an RDD consisting of a collection of double values. Due to this property, many statistical functions are available to use with the DoubleRDD.

The following are examples of DoubleRDD where we create an RDD from a sequence of double numbers:

scala> val rdd_one = sc.parallelize(Seq(1.0,2.0,3.0))
rdd_one: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[52] at parallelize at <console>:25

scala> rdd_one.mean
res62: Double = 2.0

scala> rdd_one.min
res63: Double = 1.0

scala> rdd_one.max
res64: Double = 3.0

scala> rdd_one.stdev
res65: Double = 0.816496580927726

The following is a diagram of the DoubleRDD and how you can run a sum() function on the DoubleRDD:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.219.49