In the previous chapters, you were introduced to Azure Synapse Analytics and Synapse Studio. Azure Synapse Analytics offers a lot of features that are flexible and highly scalable. Some of the extensibility features of Azure Synapse Analytics are the ability to use the modern data warehouse approach to serve meaningful reports and a data source for machine learning.
There are many ways to process and visualize data using Azure and the wider set of tools offered by Microsoft. This book will focus on using Power BI as a tool to create reports and dashboards. Power BI offers a wide range of products and services. This chapter will take a look at using Power BI Desktop on Windows to create meaningful dashboards as well as using Power BI workspaces to host and share published reports.
This book will also give you an introduction to the artificial intelligence and machine learning ecosystem within Azure and Microsoft. We will tackle a specific machine learning platform in later sections using Azure Machine Learning.
These technologies work seamlessly with Azure Synapse Analytics as a default approach to visualizing and gaining insights from a unified data warehouse.
Power BI is a suite of tools that enables users to visualize data and share insights across teams and organizations, or embed dashboard analytics in their websites or applications. It supports different data sources (both structured and unstructured data types) and helps analysts and end users create live dashboards and reports about business data on-demand. An example of this is visualizing company sales for recent months and determining the city that sold the most items.
What makes Power BI different from spreadsheet software such as Microsoft Excel is that it is designed to be a hosted user interface, often a live dashboard, where users don't need to frequently store a file in their local machine and open it. With Power BI, you can leverage the power of the cloud to harness complex data and present it through rich graphs or charts, letting the server run all the computations rather than your own machine. Imagine a scenario where your data size was to grow from 500 megabytes to several gigabytes. Most general-purpose machines, such as personal computers with a limited amount of memory, would struggle to load such an Excel file; however, with Power BI, it is just like opening a web page as it is a hosted service:
Figure 3.1: Power BI architecture
As seen in Figure 3.1, Power BI is composed of different components that can perform different functions:
At a high level, Power BI offers the following benefits:
Power BI is an intuitive tool that often just requires clicking or dragging-and-dropping in order to quickly access and visualize data. The authoring tool, Power BI Desktop, is equipped with many built-in features to derive analytics. It is smart enough to suggest a visualization model based on the fields of your choice.
Power BI dashboards and reports are highly customizable and allow you to personalize your experience depending on your branding. You can select themes, use custom charts, create labels, insert drawings and images, and a lot more.
Compared to sending an email with a PowerPoint file attached, Power BI allows an open collaboration between analysts and other members of the company just by sharing a centralized dashboard. You can access reports using major web browsers or by means of mobile applications that you can download from the Apple App Store and Google Play Store. People can send comments and annotations about the reports, creating a faster feedback loop with the use of alerts and notifications.
Power BI is secure in different facets and areas. For one, when authoring a report, it is ensured that you can only access data sources that you have been granted access to. This is backed by Row-Level Security (RLS). For example, analysts can only access data that is local to their region, making sure they don't have access to another city or country's data. Once you are ready to share the report, you can quickly save it to your personal workspace. You can select whomever you want to share the report with across your organization, or invite users from external tenants.
If you wish to start small while learning Power BI, you can start by just using Excel files as your data source. There are scenarios where analysts receive a CSV file from data engineers because the size of the dataset is not too large.
In an enterprise scenario, engineers and analysts have to deal with various data sources and platforms to create a unified dashboard. This is often a challenge, especially when data is not available for analysts to perform their exploration. This is where a combination of Power BI and Azure Synapse Analytics can be very useful.
Power BI is a very flexible platform that can work with multiple data sources. It could be a source as simple as an Excel file, a database on-premises or in the cloud, or a data warehouse. One of the best practices for having a modern data warehouse is separating the duty of processing data from data visualization. That is where Azure Synapse Analytics comes in.
Azure Synapse Analytics, with the use of a dedicated SQL Pool, can perform complex queries for all data sources and create a semantic model for Power BI. A semantic data model is a structured data format that Power BI can easily interpret for data visualization. This approach is scalable and eases a lot of the complexities of managing data across various teams.
There are a lot of benefits to combining Power BI and Azure Synapse Analytics. Although there is no one-size that fits all, nor one tool to solve all of your data problems, the combination of Azure Synapse Analytics and Power BI can bring robust, elastic, and more maintainable data operations and lifecycles to your team:
In this book, we will use Power BI to generate reports from the queries performed by the SQL Pool in a Synapse workspace. This chapter will focus on how you can leverage Power BI datasets in Synapse Studio.
There are different approaches to performing robust data visualization, but the main reason for using this approach (Power BI and Azure Synapse Analytics) is elasticity and the separation of duties. This combination facilitates a streamlined reporting capability at scale, even if you have billions of rows in your databases.
To understand further how Azure Synapse Analytics and Power BI work hand in hand, let's start with a simple quick start guide.
In this quick start guide, we will explore how we can use Azure Synapse Analytics with the use of an SQL Pool to capture COVID-19 data from Bing and display relevant reports to Power BI.
In order to perform this activity, you need the following:
Let's get started.
Integrating Power BI with Azure Synapse Analytics
This activity will act as a step-by-step guide to integrating Azure Synapse Analytics with Power BI. We will be using the power of Azure Synapse Analytics to perform queries that will allow Power BI to visualize data without the need to perform complex transformations within your Power BI environment:
Figure 3.2: Integrating Power BI with Azure Synapse Analytics
Figure 3.3: Creating a linked service
Follow these steps to create a database for storing data:
Figure 3.4: Creating an SQL database in Synapse Studio
Figure 3.5: Choosing an SQL pool type
When setting up your Azure Synapse Analytics workspace, it comes with a built-in serverless SQL pool for you.
Upon the successful creation of the database, it should now appear in your workspace's list of databases.
Figure 3.6: List of created databases in the Synapse workspace
Follow these steps for linking a dataset:
Figure 3.7: Datasets gallery
You will then be brought to the terms and conditions page. Please read the licensing agreements to make sure you follow the fair use of these datasets. It will also show you a preview of some of the rows of that file. Click on Add dataset.
The dataset should now appear in your Linked data tab under Azure Blob Storage | Sample Datasets.
Figure 3.8: Adding the dataset to Azure Synapse Analytics using Azure Blob Storage
Transferring data to your database and running a query
After getting the sample dataset, it's now time to transfer that data to the SQL database that you created earlier:
Figure 3.9: Creating a new query for selecting top100 rows
This will prompt you to a pre-filled SQL script development workspace. There are a couple of things that we will modify in this query.
Figure 3.10: Choosing an SQL pool and database
We will modify the query to create a view based on a select statement from the Bing COVID-19 dataset:
-- Drop existing "Covid19View"
DROP VIEW IF EXISTS Covid19View;
GO
-- Create a View named "Covid19View"
CREATE VIEW Covid19View
AS
-- Select ALL data from the dataset however for this specific report.
-- we only want country level reports and excluding "Worldwide" and country regions.
SELECT
*
FROM
OPENROWSET(
BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet',
FORMAT = 'parquet'
)
AS [result]
WHERE [country_region] != 'Worldwide'
And ( [admin_region_1] is NULL
Or [admin_region_1] = '' )
In the actual report later in Power BI, you might see that other countries are missing. This is intended to simplify the exercise.
Figure 3.11: SQL script properties
Once the query is finished, it should display the following message:
Figure 3.12: Query execution status
Let's check whether the view has been created in the database:
Figure 3.13: View created for the COVID-19 dataset
Figure 3.14: Querying the top 100 rows of Covid19View
Figure 3.15: Executing the query to get the top 100 rows
Creating and publishing the Power BI report
Now that we have populated the Azure Synapse Analytics database with data, it's time to create a report in Power BI:
Figure 3.16: Adding a new Power BI dataset
Figure 3.17: Selecting a data source for the Power BI dataset
Figure 3.18: Downloading a linked Power BI dataset
After downloading the file, open the .pbids file. The file should open Power BI Desktop on your Windows machine. If you're not yet authenticated, it will prompt you to choose a method to connect to the database:
Figure 3.19: Adding an Azure Synapse Analytics SQL database as a data source in Power BI
Once you are authenticated, it will then display a Navigator screen for you to select data sources you want to capture. Make sure that the view where we stored our data is ticked. Then hit Load:
Figure 3.20: Loading the COVID-19 dataset into Power BI
You will then be prompted with an option to choose either Import or DirectQuery:
Figure 3.21: Selecting the Power BI connection settings
By choosing the Import option, the data will be stored in Power BI Desktop. You will need to perform a refresh in order to get an update of that data. Import is the preferred approach if the data size is not that big and if your machine has enough memory (RAM).
If you choose DirectQuery, no data will be stored in Power BI Desktop. DirectQuery will only create a reference from the data source and will query that data each time you update your workspace. DirectQuery is preferred if you have complex external data sources.
After selecting an option, it will then load the data to add as your Power BI fields.
Figure 3.22: The Visualizations tab
This should give you a date selector like the one here:
Figure 3.23: Power BI Desktop with a date matrix
Drag and drop the following fields onto the map:
This should create a world map that is filled.
Figure 3.24: Displaying country-wise COVID-19 statistics
Feel free to explore the customizations you might want to implement.
After further customization, let's publish the report so that other people can see it.
Figure 3.25: Choosing a destination for the report
Figure 3.26: Opening the published report in Power BI
Once it's published, it will be available to your Power BI workspace. In the workspace, you can share this report with any of your team members within your tenant by clicking on the Share button:
Figure 3.27: Viewing a report in the Power BI workspace
This report can also be viewed on tablet and mobile devices:
Figure 3.28: Viewing the Power BI report on a tablet/mobile device
Congratulations on creating your Power BI dashboard! In this activity, we covered how to use Synapse Studio to transform data into a meaningful semantic model that Power BI can visualize. In the next section, we will explore how we can use machine learning with Azure Synapse Analytics.
There are multiple ways to perform machine learning on Azure. Microsoft enables data science to be more accessible to all types of users and empowers data scientists to be more productive. Microsoft provides a suite of technologies for developers, database engineers, and data scientists to create machine learning algorithms. Whatever your level of proficiency and expertise in data science, there is a useful Microsoft service, tool, or framework that can accelerate your machine learning journey.
Figure 3.29 depicts a machine learning landscape within the Microsoft Azure ecosystem. You can use pre-trained models with Azure Cognitive Services and directly integrate them with your applications without the need to set up a data pipeline. You can use popular frameworks such as TensorFlow and Keras in Azure, whether that's by installing them on a virtual machine or using a machine learning workspace. You can choose different platforms such as Azure Machine Learning or Azure Databricks to prepare and run your machine learning experiments:
Figure 3.29: Microsoft Azure features and services for machine learning
By using Azure to drive your computation for machine learning analysis, you are provided with specialized hardware that can speed up your experiments. With accelerated hardware such as fast graphics processing units (GPUs) and field-programmable gate arrays (FPGAs), you can read billions of database table row entries and try various models concurrently to achieve faster results for your machine learning experiments.
The following sections will give an overview of the major technologies and platforms that implement machine learning and artificial intelligence within the Microsoft Azure ecosystem.
ML.NET is an open-source cross-platform framework for .NET developers. By using ML.NET, you and your team can leverage the skills, libraries, and frameworks that are already present within the .NET ecosystem. You can create web applications using ASP.NET, mobile applications using Xamarin, Desktop applications using WPF, and even the Internet of Things (IoT) using Windows IoT. You can also extend your machine learning model creation capability by using TensorFlow and ONNX. ML.NET provides out-of-the-box support for algorithms that enable sentiment analysis, product recommendation, object detection, sales forecasting, and many more real-world business scenarios.
For tasks such as regression and classification, both training and consumption can be carried out using ML.NET. Other than this, it also supports core data types, extensible pipelines, data structures, tooling support, advanced performance mathematics, and more.
ML.NET can be installed from nuget.org. nuget.org is a public repository of downloadable .NET packages, libraries, and frameworks that you can easily add to your .NET project.
Automated Machine Learning (Automated ML, or AutoML) enables data professionals to build models code free within Azure Synapse Analytics powered by Azure Machine Learning. AutoML is designed to automatically detect the best machine learning models for you by selecting the right algorithm and helping tune hyperparameters for forecasting, classification, and regression. This is very useful if you do not have a data scientist on your team.
AutoML helps users (developers, analysts, and even data scientists) to implement machine learning without a high barrier of entry relating to programming languages, libraries, frameworks, and data science concepts. It allows companies to innovate, thanks to faster time-to-market by means of an iterative process, and to leverage data science best practices when running experiments.
AutoML runs are executed on Azure Synapse Analytics serverless Apache Spark pools and are tracked in the Azure Machine Learning service.
Azure Cognitive Services is a suite of cloud-based, general-purpose, pre-trained models and APIs that can be consumed and extended for further training for specific use cases. If, for example, you want to create an object detection AI that understands what a banana is, you might need to feed in more data to help the AI understand that an image contains a banana. Consuming cognitive services is done via HTTP and is platform-agnostic, meaning you can use any programming language and operating system. There are five main categories of cognitive services: decision, vision, speech, search, and language. You can readily integrate AI and ML with your mobile, web, desktop, or even IoT applications using Cognitive Services.
The speech-to-text and speaker-recognition capabilities of the Speech Services API are good examples of cognitive services. These capabilities allow you to transform speech data to text, translate it to other languages, and recognize the identity of the speaker without setting up a machine learning workspace that involves millions of datasets and a series of machine learning model experiments.
Using Cognitive Services is the best approach for those who want to easily integrate AI and machine learning in their applications with minimum data science knowledge. Microsoft offers very flexible pricing options where you only pay for what you use, and most of the services have free tiers for you to explore. There are currently pre-trained models available for text analytics (sentiment analysis) and anomaly detection, but more models will be available in the future.
You can learn more about Cognitive Services at https://azure.microsoft.com/services/cognitive-services/.
Microsoft Bot Framework enables applications to build intelligent bots (often used for chatbots) to automate workflows. The Bot Framework is closely associated with Cognitive Services such as Language Understanding Intelligence Service (LUIS) and QnA Maker. QnA Maker is a natural language processing (NLP) service that accelerates the creation of conversation-based AI such as chatbots. With the Bot Framework, developers can easily create a conversational AI that learns through training from utterances and intents. This framework also allows developers to easily publish bots to channels such as Microsoft Teams, Cortana, and Slack.
The Bot Framework is now widely adopted by large corporations such as banks and retail conglomerates for their first-level support. For example, the Bank of Beirut used the Bot Framework to create the Digi Bob chatbot, which assists users in applying for loans and availing themselves of other banking services. To learn more, read about this use case at https://customers.microsoft.com/story/740194-bank-of-beirut-banking-azure-dynamics365-lebanon.
Using the Bot Framework, developers can deploy intelligent enterprise-grade bots that can easily translate inquiries and messages (intents) from users and respond with meaningful actions. These actions can include querying a data source or orchestrating a command to a system. You can learn more about the Bot Framework at https://azure.microsoft.com/services/bot-services/.
There are more machine learning tools and products within the Microsoft ecosystem, such as:
This book is not able to cover all of the technologies mentioned here, so we instead focus on Azure Machine Learning. For more information on these services, visit https://azure.microsoft.com/overview/ai-platform/.
Azure Machine Learning offers a variety of features and lots of flexibility to users of various backgrounds and expertise. Azure Machine Learning can integrate into your existing data pipeline to perform tasks such as leveraging data from Azure Data Lake or Azure Synapse Analytics and serving the models directly to Power BI. You can also use Azure Databricks to further automate the hardware clusters where you are running your machine learning experiments.
Azure Machine Learning provides an end-to-end workspace to run your machine learning operations. With Azure Machine Learning, you can create experiments using AutoML, a visual interface, or the Software Development Kit (SDK) in your machine learning notebook. You can also create a portable data model that can run in a container. This model can then be published to Azure Container Instances (ACI).
Azure Machine Learning serves a Python SDK that fully supports mature frameworks such as MXNet, TensorFlow, PyTorch, and scikit-learn. You can import the SDK into your experiments using Jupyter Notebooks, Azure Notebooks, or even Visual Studio Code.
You can also use a visual interface (with minimal coding being required) within Azure Machine Learning called Designer to create and run experiments. The experience uses a low-code/no-code approach with drag-and-drop tools to create and connect entities. This is an intuitive way to connect data sources and create a machine learning model to train and serve.
As discussed earlier, AutoML is a mechanism that suggests the best algorithm to use in your experiments. It is a baked-in feature of Azure Machine Learning. You can automate away time-intensive tasks such as data cleaning and choosing the right algorithms for your model. With AutoML, you can rapidly iterate over many combinations of algorithms and hyperparameters to find the best model for your desired outcome.
Microsoft and Azure do not limit your model deployment options. Even if you are managing your workspace and performing your analysis in the cloud, you are not locked into just deploying the outcome of your experiments to Azure. You have the option to deploy them on-premises and in edge environments by using containers.
In a modern data warehouse, the combination of Azure Databricks and Azure Machine Learning can accelerate your machine learning operations. Azure Machine Learning can provide you with an end-to-end workspace where you can connect data from various sources with Azure Synapse Analytics, prepare and train data models, deploy them to consumers such as Power BI, and then monitor and retrain them to improve accuracy:
Figure 3.30: Preparation, experimentation, and deployment in Azure Machine Learning
With Azure Machine Learning you can use Azure Databricks to prepare the data for your experiments. You can then use either Jupyter Notebooks or Visual Studio Code to author your experiments; alternatively, you can also use the built-in Azure Notebooks feature of Azure Machine Learning. You will then run your experiments to train and test your machine learning model by leveraging computers to run complex data science algorithms. Azure Machine Learning will create a machine learning model with the ONNX format, which is highly portable and can easily be deployed to a container such as Azure Container Instance. You also have the option to run it on Azure Kubernetes Services (AKS) or even on edge devices that support Docker.
The ONNX model format enables data professionals working in Azure Synapse Analytics to bring a variety of models into Synapse securely without the need for data movement outside of the Azure Synapse Analytics security boundaries.
This book will not cover the use of Azure Databricks as the compute cluster of Azure Machine Learning, but there are advantages of having this combination. If you are already using Azure Databricks to derive real-time analytics on your modern data warehouse, you might also consider using it to run your machine learning experiments in Azure Machine Learning. You can read more about this at https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment#aml-databricks.
Azure Machine Learning and Azure Synapse Analytics can work hand in hand as they solve different problems. You use Azure Synapse Analytics for your modern data warehouse to make a unified data pipeline for your disparate data sources and eventually model and serve that data for a client consumer. Azure Machine Learning on the other hand, is used to create a machine learning model that can eventually be used for your applications to create meaningful inferences (assumptions).
A practical example of using Azure Machine Learning and Azure Synapse Analytics together could be in a physical retail store. Azure Synapse Analytics can aggregate multiple data sources such as beacon and CCTV data (unstructured), NoSQL databases, and SQL databases, then query them all to serve meaningful reports for Power BI such as "number of goods sold versus audience traffic." Using Azure Machine Learning on the other hand, you can run through experiments from various data sources, including the ones generated by Azure Synapse Analytics, to create a recommendation engine for people coming to the store based on their previous activities and correlation among the rest of the customers.
It is also possible to combine Azure Synapse Analytics and Azure Machine Learning workspaces as a linked service. An Azure Machine Learning linked service can be created from within an Azure Synapse Analytics workspace and enables much simpler collaboration between the two technologies.
In this quick start guide on Azure Machine Learning, we will look at how to get started with the platform without any coding required. We will be using a sample called Image Classification using DenseNet using the Designer feature of Azure Machine Learning to create a pipeline for your machine learning model.
In order to perform this activity, you need the following:
Figure 3.31: Creating an Azure Machine Learning workspace
For this specific exercise, you can create a new instance or use an existing one for:
Figure 3.32: Azure Machine Learning workspace details
Figure 3.33: Managing the machine learning model through Launch studio
This will redirect you to the Azure Machine Learning studio, where we will create our machine learning model.
Figure 3.34: Creating a machine learning pipeline using Designer
This sample will create a series of steps in the pipeline as shown in Figure 3.35:
Figure 3.35: Image classification pipeline
In summary, this pipeline takes a set of images from the Animal images Dataset repository to use for training the model. It will then apply series of steps and techniques (refer to the flow chart) to train the model until it's ready to be scored.
Scoring the model means creating a measurement on how accurate this model can be if fed with a new image. The last part is the model evaluation, wherein the performance of this model is evaluated. This exercise is what sets your model apart from other models in terms of accuracy and reliability.
For this exercise, create a new compute target by ticking the Create new radio button and assigning it a name. Click on Save. It will create that compute target:
Figure 3.36: Setting a compute target
In Azure Machine Learning, you can also add and configure existing virtual machines as a compute target.
Once the compute target has been created, you are now ready to run the experiment. Try to click the Submit button again to trigger the experiment.
Figure 3.37: Creating a new experiment
It will now run your experiment, and this will take some time. You can see the progress of the experiment visually depending on the status of each step: queue, not started, or completed.
Once the experiment is complete, all boxes should be highlighted in green, as shown here:
Figure 3.38: Animal imaging experiment pipeline
Figure 3.39: Visualizing the scored model
The result visualization shows the probability of an image being a cat, dog, or frog. As you can see, this model is better at detecting cats, where it scores 99.57% accuracy, compared with recognizing dogs, where it scores only 70.01%:
Figure 3.40: Scored model visualization result
This exercise gave you a glimpse of how easy it is to get started on Azure Machine Learning. You may choose to deploy this model to ACI or AKS to get a real-time inference on an image that you want to test, but that's beyond the scope of this book. Do not forget to clean up or delete your resources if you won't be needing them anymore.
In this chapter, we discussed how Azure Synapse Analytics can serve a unified data model to Power BI. We then explored the use of Power BI Desktop to create reports. Power BI enables you to create rich and meaningful graphs that derive business insights. We then published a report in order to collaborate on it across different media.
We have also learned that there are many available tools and technologies to implement machine learning and artificial intelligence in Azure. We explored Azure Machine Learning, including its features and benefits, and performed machine learning on Azure to create a model that classifies animals. In the next chapter, we will explore different business use cases for Azure Synapse Analytics and related technologies.
13.58.60.192