Implicit schema

Let us look at an example of loading a CSV (comma-separated Values) file into a DataFrame. Whenever a text file contains a header, read API can infer the schema by reading the header line. We also have the option to specify the separator to be used to split the text file lines.

We read the csv inferring the schema from the header line and uses comma (,) as the separator. We also show use of schema command and printSchema command to verify the schema of the input file.

scala> val statesDF = spark.read.option("header", "true")
.option("inferschema", "true")
.option("sep", ",")
.csv("statesPopulation.csv")

statesDF: org.apache.spark.sql.DataFrame = [State: string, Year: int ... 1 more field]

scala> statesDF.schema
res92: org.apache.spark.sql.types.StructType = StructType(
StructField(State,StringType,true),
StructField(Year,IntegerType,true),
StructField(Population,IntegerType,true))

scala> statesDF.printSchema
root
|-- State: string (nullable = true)
|-- Year: integer (nullable = true)
|-- Population: integer (nullable = true)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.249.90