Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Dataset

Dataset was introduced in Spark 1.6. It is the combination of RDD and dataframe. Dataset brings compile time safety, the object oriented programming style of RDD, and the advances of dataframes together. Therefore, it is an immutable strongly typed object which uses schema to describe the data. It uses the efficient off-heap storage mechanism, Tungsten, and creates optimized query plans that get executed with Spark Catalyst optimizer.

Datasets also introduced the concept of encoders. Encoders work as translators among JVM objects and Spark internal binary format. The tabular representation of data with schema is stored in Spark binary format. Encoders allow operations on serialized data. Spark comes with various inbuilt encoders, along with an encoder API for JavaBean. Encoders allow the access of individual attributes without the need to de-sterilize an entire object. Thus, it reduces serialization efforts and load.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.140.195.225

Table of Contents for Dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
Dataset