When to use RDDs, Datasets, and DataFrames?

The following table describes the scenarios in which RDDs, Datasets, or DataFrames are to be used:

Scenario

What to use?

Use of the Python programming language

RDDs or DataFrames

Use of the R programming language

DataFrames

Use of the Java or Scala programming languages

RDDs, Datasets, or DataFrames

Unstructured data such as images and videos

RDDs

Use of low level transformations, actions, and controls data flow programmatically

RDDs

Use of high-level domain-specific APIs

Datasets and DataFrames

Use of functional programming constructs to process data

RDDs

Use of higher level expressions including SQLs

Datasets and DataFrames

Imposing structure is not needed and low-level optimizations are not needed

RDDs

High compile time safety and rich optimizations

Datasets

No compile time safety and rich optimizations are needed

DataFrames

Unification is needed across Spark libraries

Datasets or DataFrames

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.150.123