Cloud-based deployments

There are three different abstraction levels in the cloud computing paradigm:

  • Infrastructure as a Service (aka IaaS)
  • Platform as a Service (aka PaaS)
  • Software as a Service (aka SaaS)

IaaS provides the computing infrastructure through empty virtual machines for your software running as SaaS. This is also true for the Apache Spark on OpenStack.

The advantage of OpenStack is that it can be used among multiple different cloud providers, since it is an open standard and is also based on open source. You even can use OpenStack in a local data center, and transparently and dynamically move workloads between local, dedicated, and public cloud data centers.

PaaS, in contrast, takes away from you the burden of installing and operating an Apache Spark cluster because this is provided as a Service. In other words, you can think it as a layer like what your OS does.

Sometimes, you can even Dockerize your Spark application and deploy on the cloud platform independent manner. However, there is an ongoing discussion whether Docker is IaaS or PaaS, but in our opinion, this is just a form of a lightweight preinstalled virtual machine, so more on the IaaS.

Finally, SaaS is an application layer provided and managed by cloud computing paradigm. To be frank, you won't see or have to worry about the first two layers (IaaS and PaaS).

Google Cloud, Amazon AWS, Digital Ocean, and Microsoft Azure are good examples of cloud computing services that provide these three layers as services. We will show an example of how to deploy your Spark cluster on top of Cloud using Amazon AWS later in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.181.145