So what's in it for Apache Spark here? Let's assume we have a set of powerful nodes in our local data center. What is the advantage of using Kubernetes for deployment over just installing Apache Spark on bare metal? Let's take the question the other way round. Let's have a look at the disadvantages of using Kubernetes in this scenario. Actually, there is no disadvantage at all.
So this means that the only disadvantage is the effort you invest in installing and maintaining Kubernetes. But what you gain are the following:
- Easy installation and updates of Apache Spark and other additional software packages (such as Apache Flink, Jupyter, or Zeppelin)
- Easy switching between different versions
- Parallel deployment of multiple clusters for different users or user groups
- Fair resource assignment to users and user groups
- Straightforward hybrid cloud integration, since the very same setup can be run on any cloud provider supporting Kubernetes as a service
So how do we get started? The following section provides a step-by-step example of how to set up a single node installation of Kubernetes on your machine and how to deploy an Apache Spark cluster, including Zeppelin, on it; so stay tuned!