Practical examples

In order to facilitate understanding of these components, we'll introduce some practical examples using Scala code. Consider data about clients and accounts according to the following schemas:

case class Client(
age: Long,
countryCode: String,
familyName: String,
id: String,
name: String
)
As you might now, a case class is somehow the Scala equivalent to JavaBeans and is meant for implementing data transfer objects (DTOs) without further logic built into the class. A case class is compared by value and not by reference, which is the default for all other classes. Finally, the copy method allows for easy duplications of DTOs.

This case class specifies the schema for rows of the Client relation. Let's have a look at the schema for the Account relation:

case class Account(
balance: Long,
id: String,
clientId: String
)

Now we can create Datasets from different files and file types in order to see how the optimizer reacts:

val clientDs = spark.read.json("client.json").as[Client]
val clientDsBig = spark.read.json("client_big.json").as[Client]
val accountDs = spark.read.json("account.json").as[Account]
val accountDsBig = spark.read.json("account_big.json").as[Account]
val clientDsParquet = spark.read.parquet("client.parquet").as[Client]
val clientDsBigParquet = spark.read.parquet("client_big.parquet").as[Client]
val accountDsParquet = spark.read.parquet("account.parquet").as[Account]
val accountDsBigParquet = spark.read.parquet("account_big.parquet").as[Account]

Then we register all as temporary tables, so that we can write ordinary SQL statements against them. First, we'll do this for the json backed DataFrames:

clientDs.createOrReplaceTempView("client")
clientDsBig.createOrReplaceTempView("clientbig")
accountDs.createOrReplaceTempView("account")
accountDsBig.createOrReplaceTempView("accountbig")

Then we also do the same for the parquet backed DataFrames:

clientDsParquet.createOrReplaceTempView("clientparquet")
clientDsBigParquet.createOrReplaceTempView("clientbigparquet")
accountDsParquet.createOrReplaceTempView("accountparquet")
accountDsBigParquet.createOrReplaceTempView("accountbigparquet")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.117.214