Explicit schema

A schema is described using StructType, which is a collection of StructField objects.

StructType and StructField belong to the org.apache.spark.sql.types package.
DataTypes such as IntegerType, StringType also belong to the org.apache.spark.sql.types package.

Using these imports, we can define a custom explicit schema.

First, import the necessary classes:

scala> import org.apache.spark.sql.types.{StructType, IntegerType, StringType}
import org.apache.spark.sql.types.{StructType, IntegerType, StringType}

Define a schema with two columns/fields-an Integer followed by a String:

scala> val schema = new StructType().add("i", IntegerType).add("s", StringType)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(s,StringType,true))

It's easy to print the newly created schema:

scala> schema.printTreeString
root
|-- i: integer (nullable = true)
|-- s: string (nullable = true)

There is also an option to print JSON, which is as follows, using prettyJson function:

scala> schema.prettyJson
res85: String =
{
"type" : "struct",
"fields" : [ {
"name" : "i",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "s",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}

All the data types of Spark SQL are located in the package org.apache.spark.sql.types. You can access them by doing:

import org.apache.spark.sql.types._
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.160.66