Using external data source APIs

As mentioned earlier, we can create DataFrame using external data source APIs as well. For the following example, we used com.databricks.spark.csv API as follows:

flightDF<- read.df(dataPath,  
header='true',
source = "com.databricks.spark.csv",
inferSchema='true')

Let's see the structure by exploring the schema of the DataFrame:

printSchema(flightDF)

The output is as follows:

Figure 23: The same schema of the NYC flight dataset using external data source API

Now let's see the first 10 rows of the DataFrame:

showDF(flightDF, numRows = 10)

The output is as follows:

Figure 24: Same sample data from NYC flight dataset using external data source API

So, you can see the same structure. Well done! Now it's time to explore something more, such as data manipulation using SparkR.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.254