Preface

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.

This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. You will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data modern data warehouse analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also how to create enterprise-grade security and auditing big data programs.

By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

Who this book is for

This book is for data architects, ETL developers, or anyone who wants to get well-versed in Azure data services to implement an analytical data estate for their enterprise. The book will also appeal to data scientists and data analysts who want to explore all the capabilities of Azure data services, which can be used to store, process, and analyze any kind of data. A beginner-level understanding of data analysis and streaming will be required.

What this book covers

Chapter 1, Balancing the Benefits of Data Lakes over Data Warehouses, explores the evolution of data lakes in the analytical world, and also helps us understand the value of data warehouses.

Chapter 2, Connecting Requirements and Technology, focuses on the architecture of the modern data warehouse and introduces various Azure services, and guides you in choosing the right ones for your needs.

Chapter 3, Understanding the Data Lake Storage Layer, examines the setup and organization of the Data Lake Gen2 storage. You'll learn how to access data and monitor your storage account. You will also learn about backups and disaster recovery, and examine various security and networking options for your storage.

Chapter 4, Understanding Synapse SQL Pools and SQL Options, explores MPP in a cloud PaaS service. You'll also explore the replication and distribution of data in a database. You'll learn about various evolutionary steps of SQL pools in Synapse and other components. You'll also check out various alternative SQL database services in Azure and how you can use them.

Chapter 5, Integrating Data into Your Modern Data Warehouse, shows how to implement ETL/ELT pipelines with Synapse pipelines, or Azure Factory. You'll examine various source connectors and work on integration jobs. You'll also learn how to monitor your integration environment.

Chapter 6, Using Synapse Spark Pools, discusses Synapse Spark pools and how to implement them on Azure. You will examine how to implement notebooks and Spark jobs and integrate additional libraries with your clusters. Finally, we will examine security features and see how to monitor our environment.

Chapter 7, Using Databricks Spark Clusters, examines Azure Databricks. We will learn how to work with it and perform various operations. We'll also learn how to create and use dashboards and run ETL jobs. Finally, you'll learn how to set up Databricks with VNets and implement access control within the workspace.

Chapter 8, Streaming Data into Your MDWH, explores Azure Stream Analytics and how it can be used for analysis. You'll learn how to set up and use the service, and you'll learn about various SQL queries with windowing functions and pattern recognition to detect and highlight various events. You'll also build an online dashboard with Power BI that monitors data streaming in real time.

Chapter 9, Integrating Azure Cognitive Services and Machine Learning, examines various machine learning models that you can use as services in Azure. You'll then explore the Azure Machine Learning service and learn how to implement your own model using the graphical user interface there.

Chapter 10, Loading the Presentation Layer, shows you how to load data into your presentation layer using various tools, such as PolyBase, the COPY command, and Synapse pipelines. We'll also check out how to implement SQL in your data lake. Lastly, we'll explore some options for exchanging metadata between various compute components to improve efficiency.

Chapter 11, Developing and Maintaining the Presentation Layer, examines how to use Azure Synapse, and particularly Synapse Studio, when you implement your presentation layer. You will see how to integrate Azure Synapse with Azure DevOps and how you can automate your deployments. In your role as an modern data warehouse developer, you will also enjoy the developer productivity features that Synapse Studio offers. You'll also dive into disaster recovery and some security aspects of your environment.

Chapter 12, Distributing Data, shows you ways to create data marts to distribute insights in your modern data warehouse with Power BI. You will see how to use Power BI data models and the options to visualize and publish their content and even use the data with other tools. We will also examine Azure Data Share as another option to provide datasets to others.

Chapter 13, Introducing Industry Data Models, showcases various industry data models that you can utilize in your projects using Microsoft's CDM tool. We'll also explore a service in Azure called Industry Data Workbench.

Chapter 14, Establishing Data Governance, takes you through the options that the Azure Purview preview offers for scanning your data and qualifying it. You will see how you can benefit from predefined and custom search patterns and how Purview helps you to find information in your data estate. You will also see how to integrate with other Azure services such as Azure Synapse Analytics or Data Factory.

To get the most out of this book

You will need a system with a good internet connection and an Azure account.

Please note: all the services that you might use during the exercises of this book will cause cost within your Azure subscription.

Try to always scale down the services where possible or even delete them after you have finished going through the exercises.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Cloud-Scale-Analytics-with-Azure-Data-Services. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800562936_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You can now start entering an alias for your input connection. Name it something such as airdelaystreaminginput."

A block of code is set as follows:

SELECT

    t1.Cartype,

    SUM(t2.mgNOx/60) as SumNOx

FROM

    Cartraffic as t1 TIMESTAMPED BY ObservedT

JOIN

    CarStats as t2

ON

    t1.Cartype = t2.Cartype

GROUP BY

    t1.Cartype,

    TUMBLINGWINDOW(minute, 10)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

SELECT

    CensusStation,

    COUNT(*) as Amount

FROM

    Cartraffic

TIMESTAMP BY

    ObservedT

GROUP BY

    CensusStation,

    System.Timestamp()

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Please click Create to start the provisioning of your configuration."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Share Your Thoughts

Once you've read Cloud Scale Analytics with Azure Data Services, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.204.208