Schema   structure of data

A schema is the description of the structure of your data and can be either Implicit or Explicit.

Since the DataFrames are internally based on the RDD, there are two main methods of converting existing RDDs into datasets. An RDD can be converted into a dataset by using reflection to infer the schema of the RDD. A second method for creating datasets is through a programmatic interface, using which you can take an existing RDD and provide a schema to convert the RDD into a dataset with schema.

In order to create a DataFrame from an RDD by inferring the schema using reflection, the Scala API for Spark provides case classes which can be used to define the schema of the table. The DataFrame is created programmatically from the RDD, because the case classes are not easy to use in all cases. For instance, creating a case classes on a 1000 column table is time consuming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.102.114