Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

R. StackowiakAzure Internet of Things Revealedhttps://doi.org/10.1007/978-1-4842-5470-7_5

5. Analyzing and Visualizing Data in Azure

Robert Stackowiak¹

(1)

Elgin, IL, USA

In this chapter, we explore processing, analyzing, and visualizing data that lands in the Azure cloud at a deeper level than in the previous introduction provided in Chapter 2. Our goal is to help you understand how each of the platforms and tools described are best utilized as you consider their inclusion into your own architecture. You should gain insight into where and how to deploy them.

As data coming from IoT devices is most often semi-structured, we focus the data management discussion in this chapter on Azure HDInsight and Cosmos DB. Data warehouses are also often part of the architecture as they enable business intelligence and analytics solutions where the data lines up neatly into rows and columns. We’ll address how they and associated tools can fit into this architecture in Chapter 7 when we consider integration with legacy data solutions.

The following components in the IoT architecture will be covered here:

Azure Stream Analytics
Time Series Insights
Azure Databricks
Semi-structured Data Management (Azure HDInsight and Cosmos DB)
Azure Machine Learning
Cognitive Services
Data Visualization and Power BI
Azure Bot Service and Bot Framework

Azure Stream Analytics

The Azure Stream Analytics in-memory streaming data analytics and event processing engine is designed to run transformation queries against input coming from IoT Hubs, Event Hubs, and Azure Blob Storage. It can be deployed in Azure or at the edge in containers deployed to devices.

Transformation queries are based on SQL and are used for filtering, sorting, aggregating, and joining streaming data or applying geospatial functions. You can also define function calls to the Azure Machine Learning service and/or create user-defined JavaScript or C# functions that you run in jobs. Stream Analytics jobs can be created using the Azure Portal, Azure PowerShell or Visual Studio.

Stream Analytics can process millions of events every second in Azure. Through partitioning, complex queries can be parallelized and executed on multiple nodes. The Stream Analytics SLA guarantees 99.9 percent availability for event processing every minute. There are built-in checkpoints and recoverability if delivery of an event fails.

Output can be sent to a monitored queue (such as to an Azure Service Bus, Azure Functions, or Azure Event Hubs) to trigger alerts or custom workflows. Data can be stored in downstream Azure data management solutions such as Azure Data Lake Storage, Cosmos DB, SQL Database, or SQL Data Warehouse and is often visualized in Power BI.

When you create a new job using the Azure Portal, you begin by defining the job name, choose a subscription and resource group to use, choose a location, and indicate the hosting environment and (in cloud deployment) the number of streaming units that provide a pool of computation resources.

You can then set inputs and outputs and define a query stream using the Azure Portal interface pictured in Figure 5-1. You also start, stop, and monitor jobs through this interface.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig1_HTML.png — Figure 5-1
Defining inputs, outputs, and queries in a Stream Analytics job

Streaming inputs can be defined coming from IoT Hubs, Event Hubs, or Blob Storage. Reference inputs can be defined coming from Blob Storage or a SQL Database. Outputs can be designated to Event Hubs, SQL Database, Blob Storage, Table storage, Service Bus topics, Service Bus queues, Cosmos DB, Power BI, Azure Data Lake Storage, or Azure Functions.

Time Series Insights

IoT devices commonly send telemetry messages to the cloud in a time series (i.e., the data is timestamped). The data initially lands in Azure in the Azure IoT Hub or Azure Event Hub. Time Series Insights connects to Azure IoT Hubs and Azure Event Hubs and parses JSON from these incoming messages. Metadata is joined with telemetry, and the data is indexed in a columnar store. The data is stored in memory and SSDs for up to 400 days. It can be queried using the Time Series Insights explorer or using APIs in custom applications.

You begin deployment by defining a Time Series Insights environment to be used. The Azure Portal prompts you for an environment name, subscription, location, and pricing tier (where tiers selected define ingress rates in millions of events per day and storage capacity in millions of events). Next, you define the event source by providing a name and source type (IoT Hub or Event Hub). You then select a hub (usually an existing hub) and apply an IoT Hub access policy name. For IoT Hubs, you also set a consumer group parameter and can create an event source timestamp property name. You can then create the Time Series Insights environment.

Figure 5-2 illustrates a summary of these selections (with subscription id not visible).

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig2_HTML.jpg — Figure 5-2
Time Series Insights creation summary

Once operational, you can query data using the Time Series Insights explorer or through APIs. Figure 5-3 shows one of the visualizations provided by the Time Series Insights explorer. Here, the tool is being applied to sample data provided by Microsoft that helps explain explorer functionality.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig3_HTML.png — Figure 5-3
Time Series Insights explorer

Time series data can be monitored to determine the health of the device. You can apply perspective views and discern patterns when performing root cause analysis. Azure Stream Analytics might also be inserted into the data flow to help you find anomalies and send alerts.

Azure Databricks

Azure Databricks enables a fully managed Apache Spark cluster in the cloud. You can program in Python, R, Scala, SQL, and Java and utilize the Spark Core API. As the entire Spark ecosystem is provided, you can use Spark SQL to work with tabular data stored in DataFrames, process and analyze streaming data in real-time (with integration to HDFS, Flume, and Kafka), utilize GraphX, and access the MLib machine learning library that includes classification, regression, clustering, collaborative filtering, and dimensionality reduction algorithms.

The Databricks Runtime is built upon this Spark base and can be deployed as serverless. It can also be utilized with datastores that support Spark such as Azure Data Lake Storage, Blob Storage, Cosmos DB, and Azure SQL Data Warehouse.

Through the Azure Portal, you begin by creating an Azure Databricks workspace (providing a workspace name, subscription, resource group, location, and pricing tier). You are then ready to create a Databricks cluster.

Databricks cluster creation begins with you providing a cluster name and defining the cluster mode (standard or high concurrency). You select the Databricks runtime version that you wish to deploy as well as the Python version that will be used. You next select whether you want autoscaling turned on and when you would like the cluster terminated if there is inactivity (where the length of time is provided in minutes). Next, you select the minimum number and maximum number of worker nodes and the type of hardware used. You also select the type of hardware used for the driver. Advanced options can be applied including Spark configuration options, tags, logging, and init scripts.

The Azure Portal interface for creation of a new Databricks cluster is shown in Figure 5-4.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig4_HTML.png — Figure 5-4
Databricks cluster creation in the Azure Portal

Once you’ve created the cluster, Databricks will present a screen like the following shown in Figure 5-5. You will see your resource group, the name of the managed resource group, the subscription information, the URL for Azure Databricks at the location you selected, and the pricing tier. From this screen, you can launch the workspace. You can also follow links to documentation, getting started, importing data from a file, importing data from Azure storage, access to a notebook, and the Administrators’ Guide.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig5_HTML.png — Figure 5-5
An initial view of Databricks after cluster creation

Upon launching the workspace, you will be logged in using your Azure Active Directory identity. Your Databricks workspace will appear like that shown in Figure 5-6. Common tasks you will likely want to execute are shown on the left in the figure including creating a new notebook (through a web-based application that enables creating and sharing of documents that contain the live code, equations, visualization, and descriptive text); uploading data (from a file, DBFS, Azure Blob Storage, Azure Data Lake Storage, Cassandra, JDBC, Kafka, Redis, or Elasticsearch); creating a table; creating a new cluster, new job, or new MLflow experiment; importing a library; or reading the documentation. As you create notebooks, they will appear under “Recents” heading.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig6_HTML.png — Figure 5-6
Databricks workspace

A Quickstart Notebook is provided by Microsoft as an example. A portion of that notebook is shown in Figure 5-7.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig7_HTML.png — Figure 5-7
Typical notebook view in Databricks

Within notebooks, you can provide code in R, Python, Scala, or SQL and provide supporting commentary and documentation. You can visualize data using tools such as Matplotlib, ggplot, or d3. Power BI provides additional data visualization capabilities as described later in this chapter.

Semi-structured Data Management

In addition to processing and analyzing data at the edge or within the data stream, machine learning models are often developed through analysis of historical data over lengthy time periods. Such data needs to land in a data management system designed for storing and analyzing such data.

NoSQL databases are ideal for semi-structured data. At the beginning of this century, Hadoop established itself as a popular open-source historical data store. The Hadoop version available in a PaaS offering from Microsoft is Azure HDInsight. More recently, NoSQL databases that are globally distributed have proven their ability to scale to enormous sizes. Microsoft’s PaaS offering here is Cosmos DB.

In this section of the chapter, we’ll describe Azure HDInsight and Cosmos DB. Either can be created through the Azure Portal, Azure CLI, and PowerShell. We’ll describe the creation of these data management systems using the Azure Portal.

Azure HDInsight

Azure HDInsight is Microsoft’s cloud-based offering that consists of Apache Hadoop components in the Hortonworks Data Platform (HDP). HDInsight clusters enable deployment of Hadoop, Spark for in-memory processing, Hive low-latency analytical processing (LLAP) for queries, Kafka and Storm for processing streaming data, HBase (a NoSQL database), and/or ML Services.

Clusters are monitored using Apache Ambari and the Azure Monitor. Cluster health and availability, cluster resource utilization, performance across the entire cluster, and YARN job queues are monitored with Ambari. Resource utilization at the virtual machine level is monitored using Azure Monitor. Information about the workloads being run is present in the YARN resource manager and in Azure Monitor logs.

Languages native to Hadoop include Pig Latin, HiveQL, and SparkSQL. Programming languages supported include Java, Python, .NET, and Go. Other languages, such as Scala, can be deployed in Java Virtual Machines. Typical development environments that are used include Visual Studio, Visual Studio Code, Eclipse, and Intellij for Scala.

Microsoft released several versions of the distribution that was initially deployed to either Azure Data Lake Storage (ADLS) Gen1 featuring a hierarchical file system or to Blob Storage. The release of ADLS Gen2 provides a combination of hierarchical file system and Blob Storage capabilities, and it is now commonly selected for deployment of HDInsight clusters.

An Azure Blob System (ABFS) driver is provided with HDInsight, as well as Databricks, providing access to storage. If you are going to use Azure Data Lake Storage in the deployment, ADLS must be created first.

Note

Using the Azure Portal to create ADLS, you first select a subscription and resource group for the storage account, give it a name, and set the location. You can also specify performance, account kind, replication, and access tier. Next in advance, you can set security and virtual network fields (if not satisfied with the defaults provided). In the Data Lake Storage Gen2 section, you set the hierarchical namespace to enabled.

Deploying HDInsight is a three-step process using the Azure Portal. You begin by defining basic properties including a name for the Hadoop cluster, subscription to be used, cluster login name and password, secure shell (SSH) username, password for SSH, resource group for the cluster and dependent storage account, and location. You also select the cluster type and select the version of HDInsight that you want to deploy.

Next, you select the storage type (either Azure Blob Storage or Azure Data Lake Storage) and the storage account (from your subscriptions or from another subscription by providing an access key). You can choose to preserve metadata outside of the cluster by linking a SQL database for Hive and/or Oozie.

In the third step, you receive a summary of your selections and can edit those selections. When satisfied with the choices made, you next create the cluster. Clusters can take up to 20 minutes to be created.

A common means of moving data into and out of HDInsight when connected to the IoT Hub is to use Apache Kafka. You would begin by installing the IoT Hub Connector on an edge node in the HDInsight cluster. You would then get the IoT Hub connection information, configure the connector to serve as a sink and/or source for data movement, and start the connector.

Cosmos DB

Cosmos DB is a globally distributed multi-model database. The database can manage key-value, columnar, document, and graph data. Indexing of all data is automatic, and no schema or secondary indexes are required. Data can be made accessible using SQL, the MongoDB API, Cassandra API, Azure Table Storage API, or Gremlin API.

Storage and throughput are elastically scaled across regions making it possible to handle hundreds of millions of requests per second. Since the data is globally distributed, SLAs are provided where 99 percent of read and write requests will occur within 10 milliseconds in the region closest to the user. SLAs of 99.999 percent for high availability can also be attained.

Depending on performance needed, a variety of data consistency levels can be specified. The data consistency levels can be designated as follows:

Strong Consistency. Only when an operation is complete is it is visible to all.
Bounded Staleness Consistency. Read operations will lag writes based on consistent prefixes or time intervals; this level preserves 99.99 percent availability.
Session Consistency. Consistent prefixes are applied with predictable consistency for a session, featuring high read throughput and low latency.
Consistent Prefix Consistency. Reads will never see out-of-order writes.
Eventual Consistency. Provides the lowest cost for reads; however, there is a potential for reads seeing out-of-order data.

When creating a Cosmos DB database using the Azure Portal, you provide basic information on the first Cosmos DB Account screen, then networking and tagging information, and finally review and creation of the Cosmos DB account. Figure 5-8 illustrates the first screen in the creation process in which you provide subscription information, the name of the resource group, an account name, specify the API that will be used and the originating location, and enable support of geo-redundancy and multiregion writes.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig8_HTML.jpg — Figure 5-8
Initial configuration of Cosmos DB

Loading of data into Cosmos DB from IoT devices can programmatically take place in many ways. Some examples include

Loading of data from the Databricks in-memory engine (where data initially landed in Azure in the IoT Hub and then was loaded into Databricks)
Creating stored procedures and Logic Apps in an Event Grid deployed in the IoT Hub that write data into Cosmos DB
Deploying Azure Functions in IoT Hub message routing that write data to Cosmos DB

Azure Machine Learning

Azure Databricks, previously described in this chapter, is only one of the means to build machine learning solutions in Azure. In this section, we’ll look at the following:

Azure Machine Learning Studio
Azure Machine Learning service (including development environments)

Azure Machine Learning Studio

Azure Machine Learning Studio is an online development environment providing a drag-and-drop interface that is used in building, testing, and deploying predictive analytics solutions. At the time this book was published, experiments were limited to training sets of no more than 10 GB in size. However, a visual interface based on ML Studio integrated with the Azure Machine Learning service was in preview enabling preparing, training, and deployment with much larger datasets typically used by data scientists.

Drag–drop modules and functions are provided for building experiments that include saved datasets, trained models, transforms, data format conversions, data transformation, feature selection, machine learning, Open CV library modules, Python language modules, R Language Modules, statistical functions, text analytics, time series anomaly detection, and web services. The machine learning category includes functions used in evaluation, initializing the model using anomaly detection, classification, clustering, or regression algorithms, scoring, and training. Statistical functions include math operations, linear correlation, probability distribution functions, t-test, and descriptive statistics reporting.

Figure 5-9 shows the interface with icons representing projects, experiments, web services, notebooks, datasets, trained models, and settings on the far left, functions and modules to the right, then the canvas showing the experiment, and finally the properties and project information. Across the bottom, you have options to run history, save or save as the current experiment, discard changes, run the experiment, set up a web service, or publish to the ML Studio Gallery.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig9_HTML.png — Figure 5-9
Azure Machine Learning Studio experiment

In the figure, we see a typical experiment data flow that begins with data input containing known outcomes, then preparing the data, splitting it for model training purposes, testing various mathematical models against the data, scoring them, and evaluating them for accuracy. Once we’re satisfied with a specific model, we convert the training experiment into a predictive experiment and can deploy it as a web service. Sample code is also provided in C#, Python, and R.

Azure Machine Learning Service

The Azure Machine Learning service is Microsoft’s PaaS offering used to train, deploy, and manage machine learning models at scales that data scientists typically work with. It is an open framework and can be used with open-source libraries that include MXNet, PyTorch, scikit-learn, and TensorFlow.

You begin by first generating a Machine Learning service workspace, typically through the Azure Portal. You provide a workspace name, subscription, resource group, and Azure region location for the workspace to be run.

In Figure 5-10, we see that a couple of Azure Machine Learning Workspaces have been created.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig10_HTML.png — Figure 5-10
Machine Learning services workspaces in the Azure Portal

You’ll have access to “Getting Started in Azure Notebooks,” a Forum, samples in GitHub, and the documentation when you enter the workspace. You will also have access to other features under public preview.

Most data scientists prefer to write code (most often in Python) that performs data cleansing and transformation, simulation and modeling, machine learning, and data visualization. Jupyter Notebooks are open-source web applications that enable creating and sharing of documents that contain the live code, equations, visualization, and descriptive text. Azure Notebooks provide this capability, as Figure 5-11 illustrates.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig11_HTML.png — Figure 5-11
Azure Notebook

Azure Notebooks are a preinstalled free cloud service that support up to 4 GB of memory and 1 GB of data. To remove these limits, you can attach a Notebooks project to a VM running the Jupyter server or to the Azure Data Science Virtual Machine.

The Azure Data Science VM includes popular data science and related tools preinstalled and pre-configured and comes in Linux Ubuntu and Windows editions. Some of the tools that you will find here include Microsoft R/Open, Microsoft ML Server (with support for R and Python), Anaconda Python, various data management servers, Spark-based big data platforms used for development and testing, a Jupyter Notebook Server, IDE support for R Studio and Visual Studio, data movement and management tools, machine learning tools, and deep learning tools.

Microsoft developers will be happy to find that Visual Studio can also be used for building, testing, and deploying Azure Machine Learning service solutions. The code editor highlights syntax, provides intelligent code completion (known as Intellisense), and provides auto text formatting. You can debug your code locally by installing appropriate Python versions and libraries and the deep learning frameworks that you are using in your project.

Figure 5-12 illustrates Visual Studio being used in testing Python Code for Azure Machine Learning service, with Cloud Explorer shown on the left.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig12_HTML.png — Figure 5-12
Visual Studio and Python development for Azure ML service

When you run your experiment, you can view the results through the Azure Portal interface into your workspace. You can apply active filters and view the maximum number of iterations to be run and the results of each iteration as shown in Figure 5-13. Above the experiment results in the figure, you also see tabs for pipelines, compute applied, models used, images (containers) created, deployments, and a summary of all activities. Thus, you can use this interface to track your models from inception to deployment.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig13_HTML.png — Figure 5-13
Azure ML service experiment results tracked in the Azure Portal

Cognitive Services

Azure Cognitive Services provides APIs, SDKs, and services enabling software developers to add cognitive features into applications. As noted in Chapter 2, these services focus in the areas of vision, speech, language, search, and decision. In the building of IoT applications, vision and decision are most often considered for deployment.

The Computer Vision Service provides advanced algorithms for processing information and returning information. The Custom Vision Service enables building of custom image classifiers. Both services are typically used with smart cameras that capture images at the edge and perform local analysis or transmit images to the cloud where the algorithms process the data.

The Computer Vision Service has several visual features relevant in IoT applications. It can be used to detect brands, assign images to categories based on taxonomies that you define, determine accent and dominant colors, provide descriptions, detect objects, and apply tagging.

The Custom Vision Service provides an image training environment. You begin by tagging a set of training images using tags that are consistent with what you are trying to detect. For example, if you are trying to train the service to detect the types of crops in a farm field, you’d first assemble a training set of images that are tagged with the crop types you wish to detect.

The image dataset in our example comes from public domain images posted by the USDA Agricultural Research Service (ARS). We tagged the images as showing alfalfa, corn, soybeans, or wheat. Figure 5-14 displays some of the images we uploaded into Custom Vision and denotes the number of each tagged image type used in the training.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig14_HTML.png — Figure 5-14
Training images uploaded into Custom Vision

Next you train the model and set a probability threshold for accuracy. The default is a goal of reaching 50 percent accuracy or above. You begin the training by simply hitting the train button shown in the previous figure.

Figure 5-15 illustrates the outcome of our second iteration of training. The precision indicates likeliness that a tag predicted by the model will be correct (60 percent in this example). Recall is a measurement of model sensitivity indicating the percentage of relevant tags detected (in comparison to the total relevant tags) and is also 60 percent in this iteration. The AP is average precision and is a measure of the model’s performance summarizing the precision and the recall at different thresholds.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig15_HTML.png — Figure 5-15
Custom Vision image training performance

Finally, you begin to test the model for accuracy using images that were not part of the training set. In the example shown in Figure 5-16, we have an image that has a couple of crops present. Corn is predicted with a high probability. Our model has less certainty regarding the second crop, predicting with low probability that it could be alfalfa or soybeans. If dissatisfied with this analysis, properly tag and add this and other images to the mix of training images and retrain the model producing a new iteration.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig16_HTML.png — Figure 5-16
Custom Vision tested with an image not used in training

Custom Vision can have many other use cases. For example, models might be produced for use in visual inspection of the condition utility lines to determine the need for their replacement, analyzing medical images for possible anomalies where further diagnoses might be needed, and determining whether there is proper alignment of components being placed into parts on a manufacturing assembly line.

Among the decision APIs, the Anomaly Detector is particularly relevant to IoT applications. You can use these RESTful APIs to detect anomalies in streaming data, leveraging previously seen data points. The APIs can also generate models that detect anomalies in JSON formatted time series datasets created in batch processes.

The APIs can provide details about the data including expected values, anomaly boundaries, and positions. Anomaly boundaries are automatically set. However, you can manually adjust the boundaries if you prefer more (or less) sensitivity in identifying anomalies.

Data Visualization and Power BI

Power BI is a business intelligence platform from Microsoft used in visualizing, aggregating, analyzing, and sharing data and data analysis. The Power BI service is deployed in the Microsoft cloud. The Power BI Desktop is free, downloadable software for your personal computer providing an environment to connect to data sources, develop data models, create visuals, and combine visuals into reports. Once created, you can publish these reports to the Power BI service.

When starting in Power BI Desktop, you likely will first download a sample of data to begin development. As development progresses and/or you deploy to the Power BI service, you can use Direct Query to analyze and report on the full live dataset.

In IoT scenarios, typical data sources include Blob Storage, Azure Data Lake Storage, HDInsight (HDFS, Interactive Query, and Spark), and Cosmos DB. Relational database sources that can be accessed include Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Service, Microsoft SQL Server and SQL Server Analysis Services, IBM DB2, Informix, and Netezza, MySQL, Oracle, PostgresSQL, SAP HANA and Business Warehouse, Snowflake, and any database supporting ODBC. Online services such as Dynamics and Salesforce can be accessed. Additionally, file types such as Excel, XML, JSON, PDF, and text or CSV can be leveraged.

Once loaded into Power BI Desktop, you might choose to transform data in the data model. For example, you can rename tables, update data types, append tables together and cleanse data so that similar sets can be combined, and rename groups of data.

You can model data relationships within the Power BI Desktop or rely on the Desktop to automatically infer relationships. Figure 5-17 illustrates the relationships that might exist in data coming from a smart retail shelf application that gathers information on products being put into shopping carts and identifies out of stock situations.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig17_HTML.png — Figure 5-17
Layout of tables in Power BI Desktop

As you create the report, you can select from many different data visualizations provided. Examples of available visualizations include stacked bar charts, stacked column charts, clustered bar charts, clustered column charts, 100 percent stacked bar charts, 100 percent stacked column charts, line charts, area charts, stacked area charts, line and stacked column charts, line and clustered column charts, ribbon charts, waterfall charts, scatter charts pie charts, donut charts, treemaps, filled maps, funnels, gauges, cards, multi-row cards, KPIs, slicers, tables, matrices, R script visuals, Python visuals, ArcGIS Maps, globe maps, tornado charts, and custom visuals that you import.

A typical report created in the Power BI Desktop appears in Figure 5-18. We see a couple of visualizations from reported data on the left (table and line chart views), additional visualizations available and filters applied in the right center, and data items selected from the tables used in the report on the right.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig18_HTML.png — Figure 5-18
A typical report in Power BI Desktop

Reports are published to the Power BI service to enable access by a community of users. In Figure 5-19, we show what the same desktop report would initially look like in the Power BI service.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig19_HTML.png — Figure 5-19
A report rendered in Power BI in a web browser view

Within the Power BI service, you can create a layout of the same report as it would appear on a mobile device as illustrated in Figure 5-20.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig20_HTML.jpg — Figure 5-20
A report rendered in Power BI as a mobile view

From within the Power BI service, you can also create different reports applying other filters and visualizations. In Figure 5-21, we see creating a report focused on out-of-stock items and their impact on revenue.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig21_HTML.png — Figure 5-21
A new report created in Power BI

Whereas reports show data from a single dataset, dashboards can display data present from a variety of datasets and reports. As such, they can provide a more holistic view as to how a business is functioning and leverage data from IoT devices and lines of business systems.

Dashboards are created only in the Power BI service (not through the Desktop). The dashboards can be created from scratch directly from datasets, by pinning reports, or by modifying existing dashboards.

A supplier quality analysis Power BI dashboard appears in Figure 5-22 as an example. The dashboard presents data in tiles with a variety of visualizations present in this example.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig22_HTML.png — Figure 5-22
A Power BI dashboard

Power BI has a natural language interface called Q&A that can guide users through data exploration. Figure 5-23 illustrates a visualization being created through this interface that can then be deployed as a tile to the Power BI dashboard.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig23_HTML.png — Figure 5-23
Power BI dashboard Q&A

Quick Insights can guide you toward interesting information in your data. You can run Quick Insights against datasets or individual dashboard tiles. The algorithms that are applied discover

Category outliers (top and/or bottom)
Change points in a time series
Correlation
Low variance
Major factors (e.g., most of a total value comes from a single factor)
Overall trends in time series
Seasonality in time series
Steady share
Time series outliers

Sample output from Quick Insights against the data in our earlier smart shelf example produced various charts. Figure 5-24 shows average of outage minutes vs. hourly sales (with an outlier indicated) and count of manufacturers vs. hourly sales.

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig24_HTML.jpg — Figure 5-24
Quick Insights output

Note

Power BI users can be granted access to Azure Machine Learning models developed by data scientists. Power Query will discover the models which the user has access to and exposes them as dynamic Power Query functions. At the time this book was published, this capability was supported in Power BI dataflows and in Power Query online in the Power BI service.

You can collaborate with others in the creation of reports and dashboards by sharing workspaces. Once created, access to reports and dashboard tiles can be made available through Microsoft Teams by adding Power BI Tabs to channels and pointing to the report or tile link. Reports can also be printed (including as PDFs) or embedded into portals.

Reports and dashboards in the Power BI service can also be shared directly to e-mail addresses where the individuals will have the same access as the publisher (unless row-level security applied to the dataset restricts them). When granting access, the publisher can choose to allow the recipient to also share the report or dashboard or build new content using the underlying dataset.

Azure Bot Service and Bot Framework

Bots provide a question and answer or natural language interface akin to talking to a human or intelligent robot. The Azure Bot Service and Bot Framework provide tools used in building, testing, deploying, and managing intelligent bots. Microsoft provides an extensible framework that includes the SDK, tools, templates, and AI services.

You can extend your bot’s functionality by using Microsoft’s QnA Maker to set up a knowledge base to answer questions. Natural language understanding is accomplished by leveraging LUIS in Cognitive Services. Multiple models can be managed and leveraged during a bot conversation. Graphics, menus, cards, and buttons can be added to text to complete the experience.

For example, you might use QnA maker as a front-end to users that then pushes SQL to backend data management systems. You might also use a bot to push a command to an IoT edge device.

Figure 5-25 illustrates a quick start for setting QnA Maker up available through the Azure Portal .

../images/480071_1_En_5_Chapter/480071_1_En_5_Fig25_HTML.png — Figure 5-25
QnA Maker quick start accessed through Azure Portal

Microsoft provides a Bot Framework Emulator useful in debugging and interrogations. Once you have configured your bot in the Azure Portal, the bot can also be reached through a web chat interface for testing. When testing is complete, you can publish your bot to Azure or a web service.

Once deployed, you can gather data in the Azure Portal related to traffic, latency, users, messages, and channels. You can use this data to determine how best to improve the capabilities and performance of your bot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Analyzing and Visualizing Data in Azure

Create new playlist

Sign In

Sign Up

5. Analyzing and Visualizing Data in Azure

Azure Stream Analytics

Time Series Insights

Azure Databricks

Semi-structured Data Management

Azure HDInsight

Note

Cosmos DB

Azure Machine Learning

Azure Machine Learning Studio

Azure Machine Learning Service

Cognitive Services

Data Visualization and Power BI

Note

Azure Bot Service and Bot Framework

Table of Contents for
5. Analyzing and Visualizing Data in Azure