How it works...

In this example, we created numbers one through twelve and placed them in three partitions. We then proceeded to break them into odd/even using a simple mod operation while. The groupBy() is used to aggregate them into two groups of odd/even. This is a typical aggregation problem that should look familiar to SQL users. Later in this chapter we revisit this operation using DataFrame which also takes advantage of the better optimization techniques provided by the SparkSQL engine. In the later part, we demonstrate the similarity of groupBy() and reduceByKey(). We set up an array of alphabets (that is, a and b) and then convert them into RDD. We then proceed to aggregate them based on key (that is, unique letters - only two in this case) and print the total in each group.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.44.52