Using Kubernetes for provisioning containerized Spark applications

So what's in it for Apache Spark here? Let's assume we have a set of powerful nodes in our local data center. What is the advantage of using Kubernetes for deployment over just installing Apache Spark on bare metal? Let's take the question the other way round. Let's have a look at the disadvantages of using Kubernetes in this scenario. Actually, there is no disadvantage at all.

The link http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf is a 2014 paper of IBM Research, stating that performance within a Docker container is nearly identical to bare metal.

So this means that the only disadvantage is the effort you invest in installing and maintaining Kubernetes. But what you gain are the following:

Easy installation and updates of Apache Spark and other additional software packages (such as Apache Flink, Jupyter, or Zeppelin)
Easy switching between different versions
Parallel deployment of multiple clusters for different users or user groups
Fair resource assignment to users and user groups
Straightforward hybrid cloud integration, since the very same setup can be run on any cloud provider supporting Kubernetes as a service

So how do we get started? The following section provides a step-by-step example of how to set up a single node installation of Kubernetes on your machine and how to deploy an Apache Spark cluster, including Zeppelin, on it; so stay tuned!

Table of Contents for Using Kubernetes for provisioning containerized Spark applications

Create new playlist

Sign In

Sign Up

Table of Contents for
Using Kubernetes for provisioning containerized Spark applications