count

count() simply counts the number of elements in the RDD and sends it to the Driver.

The following is an example of this function. We created an RDD from a Sequence of integers using SparkContext and parallelize function and then called count on the RDD to print the number of elements in the RDD.

scala> val rdd_one = sc.parallelize(Seq(1,2,3,4,5,6))
rdd_one: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[26] at parallelize at <console>:24

scala> rdd_one.count
res24: Long = 6

The following is an illustration of count(). The Driver asks each of the executor/task to count the number of elements in the partition being handled by the task and then adds up the counts from all the tasks together at the Driver level.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.65.134