Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Reading CSV using SparkSession

In Chapter 5, Working with Data and Storage, we read CSV using SparkSession in the form of a Java RDD. However, this time we will read the CSV in the form of a dataset. Consider, you have a CSV with the following content:

emp_id,emp_name,emp_dept
1,Foo,Engineering
2,Bar,Admin

The SparkSession can be used to read this CSV file as follows:

Dataset<Row> csv = sparkSession.read().format("csv").option("header","true").load("C:\Users\sgulati\Documents\my_docs\book\testdata\emp.csv");

Similarly to the collect() function on RDD, a dataset provides the show() function, which can be used to read the content of the dataset:

csv.show();

Executing this function will show the content of the CSV files along with the headers which seems similar to a relational table. The content of a dataset can be transformed/filtered using Spark SQL which will be discussed in the next sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.145.86.211

Table of Contents for Reading CSV using SparkSession

Create new playlist

Sign In

Sign Up

Table of Contents for
Reading CSV using SparkSession