Azure Data Lake

One of the biggest problems that mid enterprise-sized organizations face is that data resides everywhere. Over the years, data has been accumulated usually by different systems, third-party, or in-house developed applications. Many vendors have set up a requirement to segregate their database servers in order to ensure performance, security, and management of their systems. Also, third-party vendors did not or do not want to take responsibility for their systems in a shared environment.

Organizations are starting to realize, or are already in the process of realizing, that consolidation is a must, both from the cost perspective as well as for easier manageability. However, in many cases, the vendors or developers are no longer to be found, which makes it very hard to make decisions to upgrade and/or migrate to the cloud. What could complicate things even further is the fact that shared or centralized data may be replicated everywhere and there may not even be one source of truth for the centralized data.

The bottom line is that if you have managed to complete a successful consolidation project with one source of truth, you are lucky and one of the few who have been able to achieve this goal!

On the other hand, it's the era of data analytics and reports, spanning multiple systems. Single sources of the truth are becoming more and more important. There are huge amounts of data to scan, summarize, analyze...

And so Microsoft came up with the Azure Data Lake, which is, in a nutshell, a cloud offering for big data that integrates with other Azure services such as: SQL database, SQL Server, SQL data warehouse, machine learning, Power BI, and Cortana. It also allows us to import and export data from almost any data source. Its main goals are ease of use and cost-effectiveness. The service has two main components:

  • Data Lake Store (static)
  • Data Lake Analytics component (paid on demand)

According to Microsoft at https://azure.microsoft.com/en-ca/solutions/data-lake/:

"The Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications. We've drawn on the experience of working with enterprise customers and running some of the largest scale processing and analytics in the world for Microsoft businesses like Office 365, Xbox Live, Azure, Windows, Bing, and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximizing the value of your data assets with a service that's ready to meet your current and future business needs."

This chapter will primarily focus on the components of the Azure Data Lake and basic implementation of those components.

While the examples in this chapter will give you an overview of typical scenarios, we encourage you to adapt this proposal according to your needs.

We will cover the following:

  • Creating and configuring the Data Lake Store resource
  • Creating and configuring the Data Lake Analytics resource
  • Using data factory to create and configure Data Lake Store/Analytics
  • Uploading data from an SQL Server Azure VM database into the data lake using blob storage
  • Calling a U-SQL script to summarize data into a new file in the data lake using blob storage (this part requires Data Lake Analytics)
  • Running U-SQL from a Data Lake Analytics job to do the following point
  • Summarizing data into a new file in the data lake using blob storage (this part requires Data Lake Analytics)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.39.142