wholeTextFiles

wholeTextFiles() can be used to load multiple text files into a paired RDD containing pairs <filename, textOfFile> representing the filename and the entire content of the file. This is useful when loading multiple small text files and is different from textFile API because when whole TextFiles() is used, the entire content of the file is loaded as a single record:

sc.wholeTextFiles(path, minPartitions=None, use_unicode=True)

The following is an example of loading a textfile into an RDD using wholeTextFiles():

scala> val rdd_whole = sc.wholeTextFiles("wiki1.txt")
rdd_whole: org.apache.spark.rdd.RDD[(String, String)] = wiki1.txt MapPartitionsRDD[37] at wholeTextFiles at <console>:25

scala> rdd_whole.take(10)
res56: Array[(String, String)] =
Array((file:/Users/salla/spark-2.1.1-bin-hadoop2.7/wiki1.txt,Apache Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.180.113