Chapter 3. Hadoop in the Cloud

Hadoop in the cloud can be implemented with very low initial investment and is well suited for proof of concepts and data systems with variable IT resource requirements. In this chapter, I will discuss the story of Hadoop in the cloud and how Hadoop can be implemented in the cloud for banks.

I will cover the full data life cycle of a risk simulation project using Hadoop in the cloud.

  • Data collection—ingesting the data into the cloud
  • Data transformation—iterating simulations with the given algorithms
  • Data analysis—analyzing the output results

I recommend you refer to your Hadoop cloud provider documentation if you need to dive deeper.

The big data cloud story

In the last few years, cloud computing has grown significantly within banks as they strive to improve the performance of their applications, increase agility, and most importantly reduce their IT costs. As moving applications into the cloud reduces the operational cost and IT complexity, it helps banks to focus on their core business instead of spending resources on technology support.

The Hadoop-based big data platform is just like any other cloud computing platform and a few financial organizations have implemented projects with Hadoop in the cloud.

The why

As far as banks are concerned, especially investment banks, business fluctuates a lot and is driven by the market. Fluctuating business means fluctuating trade volume and variable IT resource requirements. As shown in the following figure, traditional on-premise implementations will have a fixed number of servers for peak IT capacity, but the actual IT capacity needs are variable:

The why

As shown in the following figure, if a bank plans to have more IT capacity than maximum usage (a must for banks), there will be wastage, but if they plan to have IT capacity that is the average of required fluctuations, it will be lead to processing queues and customer dissatisfaction:

The why

With cloud computing, financial organizations only pay for the IT capacity they use and it is the number-one reason for using Hadoop in the cloud–elastic capacity and thus elastic pricing.

The second reason is proof of concept. For every financial institution, before the adoption of Hadoop technologies, the big dilemma was, "Is it really worth it?" or "Should I really spend on Hadoop hardware and software as it is still not completely mature?" You can simply create Hadoop clusters within minutes, do a small proof of concept, and validate the benefits. Then, either scale up your cloud with more use cases or go on-premise if that is what you prefer.

The when

Have a look at the following questions. If you answer yes to any of these for your big data problem, Hadoop in the cloud could be the way forward:

  • Is your data operation very intensive but unpredictable?
  • Do you want to do a small proof of concept without buying the hardware and software up front?
  • Do you want your operational expense to be very low or managed by external vendors?

What's the catch?

If the cloud solves all big data problems, why isn't every bank implementing it?

  • The biggest concern is—and will remain for the foreseeable future—the security of the data in the cloud, especially customers' private data. The moment senior managers think of security, they want to play safe and drop the idea of implementing it on the cloud.
  • Performance is still not as good as that on an on-premise installation. Disk I/O is a bottleneck in virtual machine environments. Especially with mixed tasks such as MapReduce, Spark, and so on, on the same cluster with several concurrent users you will feel a big performance impact.
  • Once the data is in the cloud, vendors manage the day-to-day administrative tasks, including operations. The implementation of Hadoop in the cloud will lead to the development and operation roles merging, which is slightly against the norm in terms of departmental functions of banks.

In the next section, I will pick up one of the most popular use cases: implementing Hadoop in the cloud for the risk division of a bank.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.164.75