Reading CSV using SparkSession

In Chapter 5, Working with Data and Storage, we read CSV using SparkSession in the form of a Java RDD. However, this time we will read the CSV in the form of a dataset. Consider, you have a CSV with the following content:

emp_id,emp_name,emp_dept
1,Foo,Engineering
2,Bar,Admin

The SparkSession can be used to read this CSV file as follows:

Dataset<Row> csv = sparkSession.read().format("csv").option("header","true").load("C:\Users\sgulati\Documents\my_docs\book\testdata\emp.csv");

Similarly to the collect() function on RDD, a dataset provides the show() function, which can be used to read the content of the dataset:

csv.show();

Executing this function will show the content of the CSV files along with the headers which seems similar to a relational table. The content of a dataset can be transformed/filtered using Spark SQL which will be discussed in the next sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.86.211