Sorted word count

Using the same script with a slight modification, we can make one more call and have sorted results. The script now looks like this:

import pyspark
if not 'sc' in globals():
    sc = pyspark.SparkContext()
text_file = sc.textFile("Spark File Words.ipynb")
sorted_counts = text_file.flatMap(lambda line: line.split(" ")) 
            .map(lambda word: (word, 1)) 
            .reduceByKey(lambda a, b: a + b) 
            .sortByKey()
for x in sorted_counts.collect():
    print x

Here, we have added another function call to the RDD creation, sortByKey(). So, after we have map/reduced and arrived at list of words and occurrence, we can easily sort the results.

The resultant output looks like this:

Sorted word count

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.255.127