HDInsight ecosystem

In my experience, big data solution requires several products and technologies, because no single product in the market delivers an end-to-end solution. There is no single silver bullet to resolve all the Big Data challenges. On similar lines, Hadoop and HDInsight are critical technologies in a modern Big Data solution. There are three essential elements in HDInsight platform, which are described as follows:

  • Data management: It is considered as the initial layer of the HDInsight ecosystem. It extracts and loads the source data feed with built-in tools like Microsoft Sqoop. If the business use case has the real-time feed, Microsoft StreamInsight is the live data-streaming engine to ingest the source feed into the main application.
  • Data enrichment: Its key objective is to improve the raw source data into understandable quality data. Microsoft's SQL Server has a DQS (Data Quality Services) component, which cleans the data from multiple sources for analysis.
  • Data analytics: Technology needs to enable the business. To achieve this, a big data solution must deliver actionable insights through a rich set of analytical tools including Business Intelligence (BI), advanced analytics using data mining, machine learning, graph mining, and others.

As an enterprise Big Data system, the ideal end-to-end solution is proposed in the following design diagram:

Briefly, HDInsight is a powerful Hadoop distribution, which opens up new opportunities for developing Hadoop applications in the Microsoft cloud platform, Azure. With HDInsight, the user can easily deploy a Hadoop platform in less than 20 minutes, which is impressive. Also, it supports pretty much all the common languages such as Java, .NET, and the like, to develop Hadoop applications in a quick and powerful way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.136.90