Using Kubernetes for provisioning containerized Spark applications

So what's in it for Apache Spark here? Let's assume we have a set of powerful nodes in our local data center. What is the advantage of using Kubernetes for deployment over just installing Apache Spark on bare metal? Let's take the question the other way round. Let's have a look at the disadvantages of using Kubernetes in this scenario. Actually, there is no disadvantage at all.

The link http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf is a 2014 paper of IBM Research, stating that performance within a Docker container is nearly identical to bare metal.

So this means that the only disadvantage is the effort you invest in installing and maintaining Kubernetes. But what you gain are the following:

  • Easy installation and updates of Apache Spark and other additional software packages (such as Apache Flink, Jupyter, or Zeppelin)
  • Easy switching between different versions
  • Parallel deployment of multiple clusters for different users or user groups
  • Fair resource assignment to users and user groups
  • Straightforward hybrid cloud integration, since the very same setup can be run on any cloud provider supporting Kubernetes as a service

So how do we get started? The following section provides a step-by-step example of how to set up a single node installation of Kubernetes on your machine and how to deploy an Apache Spark cluster, including Zeppelin, on it; so stay tuned!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.59.192