Azure Synapse Analytics, formerly known as Azure SQL Data Warehouse, is not a mere data warehouse anymore. Azure Synapse is an amalgamation of big data analytics with an enterprise data warehouse. It provides two different types of compute environments for different workloads: one is the SQL compute environment, which is called a SQL pool, and the other one is the Spark compute environment, which is called a Spark pool. Now developers can choose their compute environment as per their business needs. Azure Synapse also provides a unified portal called Synapse Studio for developers that creates a workspace for data prep, data management, data exploration, data warehousing, big data, and AI tasks.
This chapter covers an introduction to Azure Synapse and guides you on starting to use Synapse Studio. You will learn how to create an Azure Synapse workspaces and get acquainted with the components of Azure Synapse. You can start using Synapse with the sample data and queries provided in the Azure portal itself.
In this chapter, our topics will include the following:
In this chapter, you are going to learn how to create your first Synapse workspace in the Azure portal. In order to do this, there are certain prerequisites before you start working on Azure Synapse.
It would be beneficial to have basic knowledge of the Azure portal, as well as an understanding of SQL and Spark. Knowledge of Azure Data Factory and Power BI would be helpful but not essential.
You must have your own Azure subscription or access to an Azure subscription with appropriate permissions. If you are new to Azure, you can go through the following link to create a free Azure account: https://azure.microsoft.com/en-us/free/.
Once you have your Azure subscription created, you can proceed further with the main topics of this chapter.
Azure Synapse is a limitless analytics service on the Azure platform. It bundles together data warehousing and big data analytics with deep integration of Azure Machine Learning and Power BI. Azure Synapse brings together relational and non-relational data and helps in querying files in the data lake without looking for any other service.
One of the best features that has been introduced with Azure Synapse is code-free data orchestration where you can build ETL/ELT processes to bring data to Synapse from various sources.
Important note
Synapse provides various layers of security for the data stored; however, you need to follow the security guidelines to keep your data secured. For example, do not expose the username and password in any publicly accessible place – you will invite the biggest threat to your data by doing so. It is important to understand that Azure gives you the power to secure your data, but it is in your hands to best use that power.
What happens when we embrace a new technology in an organization?
We need to look out for a resource that already has knowledge of it, which brings extra costs on top of the cost of the technical implementation. However, Azure Synapse supports various programming languages, such as T-SQL, Python, Scala, Spark, SQL, and .NET, making it easy for people who are already familiar with those languages to learn. In this chapter, we will show a demo for T-SQL, but we will cover examples for other languages in upcoming chapters.
The following diagram represents all the components of Azure Synapse and how all these components are tied together within Synapse Analytics:
The preceding diagram represents all components of Azure Synapse, which includes Analytics runtimes, supported languages, form factors, data integration, and Power BI workspaces. We will cover all these topics in upcoming chapters.
Important note
Although Azure Synapse is deeply integrated with Spark, Azure ML, and Power BI, you do not need to pay for all these services. You will pay only for the features/services that you use. If you are using an Azure Synapse workspace only for enterprise data warehousing, you will be charged only for that. You can find out complete pricing details in Microsoft's documentation: https://azure.microsoft.com/en-us/pricing/details/synapse-analytics/.
Synapse workspace provides an integrated console to manage, monitor, and administer all the components and services of Azure Synapse Analytics. In order to get started with Azure Synapse Analytics, we need to create an Azure Synapse workspace, which provides an experience to access different features related to Azure Synapse Analytics.
You can create a Synapse workspace in the Azure portal just by providing some basic details. Follow these steps to create your first Azure Synapse workspace:
Important note
All resources in a subscription are billed together.
Important note
This name must be unique, so it is better to keep it specific to your team/project.
Important note
A storage account name must be between 3 and 24 characters in length and use numbers and lowercase letters only.
Provide SQL administrator credentials that can be used for administrator access to the workspace's SQL pools. We will talk about SQL pools in future chapters:
This deployment takes just a couple of minutes and creates a workspace that bundles Synapse analytics, ETL, reporting, modeling, and analysis together under one umbrella. Now you are ready to build your enterprise-level solution!
A data lake is a storage repository that allows you to store your data in native format without having to first structure the data at any scale.
Azure Data Lake Storage provides secure, scalable, cost-effective storage for big data analytics. There are two generations of Azure Data Lake, Gen1 and Gen2; however, we will focus on Gen2 only throughout this chapter. Azure Data Lake Gen2 converges the capabilities of Azure Data Lake Gen1 with the capabilities of Azure Blob Storage with the addition of a Hierarchical Namespace to Blob Storage. Because of Azure Blob Storage's capabilities, you get a high availability/disaster recovery solutions for your data lake at a low cost.
The new Azure Blob File System (ABFS) driver is available within Azure HDInsight, Azure Databricks, and Azure Synapse Analytics, which can be used to access the data in a similar way to Hadoop Distributed File System (HDFS).
To use Data Lake Storage Gen2's capabilities, you need to create a storage account that has a hierarchical namespace. You can go through the following steps to create your Azure Data Lake Storage Gen2 account:
Now that you have already created your Azure Data Lake Gen2 account, you can use this account with Azure Synapse Analytics. We will learn how to read data from Data Lake in later chapters, but for now, we will learn about Azure Synapse Studio, and how it provides a unified experience when working with various resources under one roof.
Synapse Studio is a unified experience for data preparation, data management, data warehousing, and big data analytics. Synapse Studio is a one-stop-shop for developers, data engineers, data scientists, and report analysts.
Before we start exploring more about Synapse Studio, we should know how we can get to Synapse Studio from the Azure portal. There are a couple of ways to navigate to Synapse Studio, but for that, first we need to navigate to our Synapse workspace on the Azure portal. In Figure 1.12, you can see Workspace web URL, which is highlighted. You can either click on that URL or copy that URL and paste it in your browser to access Synapse Studio:
Another simple approach is to just click on the Open Synapse Studio link under the Getting started section of the Synapse workspace.
You will need to provide credentials to access Synapse Studio. After successful authentication, you will see Synapse Studio opened in a new tab. You will find a direct link to various hubs integrated in Synapse Studio:
As you can see in Figure 1.13, Synapse Studio has six different hubs. We will learn about all these hubs in brief here:
In this section, we got an introduction to Synapse Studio, however, in the following chapters, we are going to explore more about Synapse Studio.
In this chapter, we covered an introduction to Azure Synapse and how can you create your first Azure Synapse workspace. After going through the sample scripts, you should have a fairly good idea about how Azure Synapse Studio works, and some of the different languages supported by Azure Synapse. We also discussed the differences between Azure SQL Data Warehouse and Azure Synapse. You learned about pausing and resuming a SQL pool, as well as automatic pausing of a Spark pool, which will save you some money if implemented.
In the next chapter, we will begin to look at specific analytics runtimes you need to understand and create your first Spark and SQL pool.
13.59.69.168