From the big data hype to Kubernetes

About 10 years ago, the biggest buzz in the IT industry was the term big data. Every major enterprise was racing to harness the mystical powers of massive, yet supposedly manageable, silos of data. Equipped with big data, no problem would prove insurmountable, and all forecasts would be met.

But lately, these forecasts appear to have faded, and the worst-kept secret in the IT industry is that big data is dead – at least as we knew it. This doesn't mean that the volume or growth of data has broken down – or the opposite. It's just the underlying technology that has changed, which means that the architectures of applications that use big data have too.

Take Hadoop as an example, which has been the icon of the big data hype. It was designed based on a set of assumptions that dramatically changed in a short time. One of these assumptions was that, in order to process a large batch of data, network latency was the evil and cloud-native storage simply wasn't an option. At that time, most of the IT industry data was on-premise, so the focus was on avoiding moving around big sets of information. This meant that data was to be co-located in order to compute it efficiently.

Today, this scenario has changed quite a bit: most applications still use large amounts of data, but data is now processed on the fly. That is to say, we now stream data instead of processing the whole dataset multiple times.

Besides this, the network latency barrier has become less of an issue for cloud providers and there are even multiple cloud sources to choose from. Also, companies now have the option to deploy their own private cloud on-premise, leading to new scenarios such as hybrid clouds.

Therefore, the focus is what really changed: today, big data does not merely mean a big quantity of datasets but flexible storage options for a big quantity of data.

This is where containers and, specifically, Kubernetes fits in. In a nutshell, you can think of a container as a packaged application that contains just the libraries that are needed to run it, and Kubernetes is like an orchestrating system that makes sure all the containers have the appropriate resources while managing their life cycle.

Kubernetes runs images and manages containers using Docker. However, Kubernetes can use other engines too (for example, rkt). Since we will be building our applications on top of Kubernetes, we will provide a short overview of its architecture in the next section.

Table of Contents for From the big data hype to Kubernetes

Create new playlist

Sign In

Sign Up

Table of Contents for
From the big data hype to Kubernetes