Reading a CSV file

Let's start with loading, parsing, and viewing simple flight data. At first, download the NYC flights dataset as a CSV from https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv. Now let's load and parse the dataset using read.csv() API of PySpark:

# Creating DataFrame from data file in CSV format
df = spark.read.format("com.databricks.spark.csv")
.option("header", "true")
.load("data/nycflights13.csv")

This is pretty similar to reading the libsvm format. Now you can see the resulting DataFrame's structure as follows:

df.printSchema() 

The output is as follows:

Figure 8: Schema of the NYC flight dataset

Now let's see a snap of the dataset using the show() method as follows:

df.show() 

Now let's view the sample of the data as follows:

Figure 9: Sample of the NYC flight dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.137.127