Loading datasets

Spark SQL can read data from external storage systems such as files, Hive tables, and JDBC databases through the DataFrameReader interface.

The format of the API call is spark.read.inputtype

  • Parquet
  • CSV
  • Hive Table
  • JDBC
  • ORC
  • Text
  • JSON

Let's look at a couple of simple examples of reading CSV files into DataFrames:

scala> val statesPopulationDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesPopulation.csv")
statesPopulationDF: org.apache.spark.sql.DataFrame = [State: string, Year: int ... 1 more field]

scala> val statesTaxRatesDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesTaxRates.csv")
statesTaxRatesDF: org.apache.spark.sql.DataFrame = [State: string, TaxRate: double]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.103.122