Azure HDInsight

Azure HDInsight is a complete cloud-based version of Apache Hadoop and is equivalent to the Hortonworks Data Platform Hadoop Distribution. Apache Hadoop is a framework for distributed processing and analysis of large datasets provided in clusters of computers.

Azure HDInsight currently supports the following cluster types:

Apache Hadoop: Clusters based on Apache Hadoop use the HDFS, the YARN resource management, and the MapReduce programming model. A cluster based on Apache Hadoop is for parallel processing and analysis of batch data.
Apache Spark: Apache Spark is a framework for parallel processing that supports in-memory processing to increase the performance of applications for analyzing large amounts of data. Spark works with SQL, data streams, and machine learning datasets.
Apache HBase: Apache HBase is a Hadoop-based NoSQL database that provides random access and strong consistency for large amounts of unstructured and partially structured data, and that is in a potential dimension of billions of lines, multiplied by billions of columns.
Machine Learning Server (formerly known as Microsoft R Server): The Machine Learning Server is a server for hosting and managing parallel, distributed R processes. This feature allows data scientists, statisticians, and R programmers to access scalable, distributed analysis methods in HDInsight, as needed.
Apache Storm: Apache Storm is a distributed real-time calculation system for the fast processing of large data streams and is offered as a managed cluster in HDInsight.
Apache Interactive Hive: This is an in-memory cache for interactive and faster Hive queries.
Apache Kafka: Apache Kafka is an open source platform for creating streaming data pipelines and applications, as well as providing a message queue function that allows you to publish and subscribe data streams.

Table of Contents for Azure HDInsight

Create new playlist

Sign In

Sign Up

Table of Contents for
Azure HDInsight