Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Architecture of Spark SQL

Spark SQL is a library on top of the Spark core execution engine, as shown in Figure 4.2. It exposes SQL interfaces using JDBC/ODBC for Data Warehousing applications or through a command-line console for interactively executing queries. So, any Business Intelligence (BI) tools can connect to Spark SQL to perform analytics at memory speeds. It also exposes a Dataset API and DataFrame API, which are supported in Java, Scala, Python, and R. Spark SQL users can use the Data Source API to read and write data from and to a variety of sources to create a DataFrame or a Dataset. Figure 4.2 also indicates the traditional way of creating and operating on RDDs from programming languages to the Spark core engine.

Figure 4.2: Spark SQL architecture

Spark SQL also extends the Dataset API, DataFrame API, and Data Sources API to be used across all other Spark libraries such as SparkR, Spark Streaming, Structured Streaming, Machine Learning Libraries, and GraphX as shown in Figure 4.3. Once the Dataset or DataFrame is created, it can be used in any library, and they are interoperable and can be converted to traditional RDDs.

Figure 4.3: Spark ecosystem with Data Sources API and DataFrame API

Spark SQL introduced an extensible optimizer called Catalyst to support most common data sources and algorithms. Catalyst enables the adding of new data sources, optimization rules, and data types for domains such as machine learning. Catalyst uses the pattern matching feature of Scala to express rules. It offers a general framework for transforming trees, which are used to perform analysis, planning, and runtime code generation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Architecture of Spark SQL

Create new playlist

Sign In

Sign Up

Architecture of Spark SQL

Table of Contents for
Architecture of Spark SQL