Microsoft HDInsight

Big Data is creating a tsunami in the current IT world. Every firm / product company is investing heavily in this space. Microsoft is not an excuse in the race of open sourced big data technology platform.

As Microsoft shifted their strategy towards open source in recent years, they focused their Big Data efforts with Hortonworks. The resultant product is named as HDInsight ,with general availability since October 2013:

On reviewing the history, Google published its trade secret in 2004 using the MapReduce research paper. Then, it was available for everyone to use as an open source. On similar lines, Doug Cutting used to work on the Apache Nutch project as the open source. Inspired by Google's research paper, Doug released the Hadoop framework with the support of Yahoo. Since then, Big Data framework has been quite popular in the industry.

As Microsoft was already ahead in Cloud space using Microsoft Azure, it became easy to launch their Big Data suite HDInsight using its own cloud platform:

As depicted in the preceding diagram, Hadoop is hosted in the cloud-based virtual boxes on the Microsoft Azure platform by leveraging either Windows or Linux distribution. There are three layers in this HDInsight design, which are as follows:

  • Storage (top layer of Azure storage elements)
  • Infrastructure (middle layer of Windows Azure VM)
  • Process (HDInsight HaaS Hadoop as a Service)

In any computing theory, Process and Storage are the two fundamental blocks by design. The underlying platform is termed as Infrastructure. In this section, we are going to analyze three items:

  • Storage: Traditional RDBMS persists the content in the Table object, if it is structured. In case of unstructured content, the data is stored in the Blob object. As per the design, there are two core storage models supported by the HDInsight ecosystem, namely Azure Storage System and Hadoop Distributed File System. As Hadoop is the industry-popular stack, HDFS content is accessible using interoperable HDFS API. Though Azure storage is a separate element, WASB (Windows Azure Storage Blob) is designed for storage interoperability between HDFS and Azure Blob.
  • Infrastructure: In terms of Infrastructure, Microsoft provides a powerful industry adopted cloud platform namely Azure. Architecture and design of Azure platform is capable to support the next generation scalable Big Data platform.
  • Process: In terms of processing, HDInsight service is completely built based on the Apache Foundation Hadoop software, which is designed on the open source concept. By doing so, the HDInsight ecosystem leverages the standard and open source Hadoop concepts and technologies. In turn, it helps the end user to learn and deploy in the system easily. On top of that, HDInsight supports Windows PowerShell scripting for better deployment. Fundamentally, the ecosystem is implemented by the elastic business needs of the end customer using Microsoft's cloud-based Azure.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.200.46