0%

A practical guide to implementing a scalable and fast state-of-the-art analytical data estate

Key Features

  • Store and analyze data with enterprise-grade security and auditing
  • Perform batch, streaming, and interactive analytics to optimize your big data solutions with ease
  • Develop and run parallel data processing programs using real-world enterprise scenarios

Book Description

Azure Data Lake, the modern data warehouse architecture, and related data services on Azure enable organizations to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.

This book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics. The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud-based big data-modern data warehouse-analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs.

By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

What you will learn

  • Implement data governance with Azure services
  • Use integrated monitoring in the Azure Portal and integrate Azure Data Lake Storage into the Azure Monitor
  • Explore the serverless feature for ad-hoc data discovery, logical data warehousing, and data wrangling
  • Implement networking with Synapse Analytics and Spark pools
  • Create and run Spark jobs with Databricks clusters
  • Implement streaming using Azure Functions, a serverless runtime environment on Azure
  • Explore the predefined ML services in Azure and use them in your app

Who this book is for

This book is for data architects, ETL developers, or anyone who wants to get well-versed with Azure data services to implement an analytical data estate for their enterprise. The book will also appeal to data scientists and data analysts who want to explore all the capabilities of Azure data services, which can be used to store, process, and analyze any kind of data. A beginner-level understanding of data analysis and streaming will be required.

Table of Contents

  1. Cloud Scale Analytics with Azure Data Services
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
    9. Share Your Thoughts
  6. Section 1: Data Warehousing and Considerations Regarding Cloud Computing
  7. Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses
    1. Distinguishing between Data Warehouses and Data Lakes
    2. Understanding Data Warehouse patterns
    3. Investigating ETL/ELT
    4. Understanding Data Warehouse layers
    5. Implementing reporting and dashboarding
    6. Loading bigger amounts of data
    7. Starting with Data Lakes
    8. Understanding the Data Lake ecosystem
    9. Comparing Data Lake zones
    10. Discovering caveats
    11. Understanding the opportunities of modern cloud computing
    12. Understanding Infrastructure-as-a-Service
    13. Understanding Platform-as-a-Service
    14. Understanding Software-as-a-Service
    15. Examining the possibilities of virtual machines
    16. Understanding Serverless Functions
    17. Looking at the importance of containers
    18. Exploring the advantages of scalable environments
    19. Implementing elastic storage and compute
    20. Exploring the benefits of AI and ML
    21. Understanding ML challenges
    22. Sorting ML into the Modern Data Warehouse
    23. Understanding responsible ML/AI
    24. Answering the question
    25. Summary
  8. Chapter 2: Connecting Requirements and Technology
    1. Formulating your requirements
    2. Asking in the right direction
    3. Understanding basic architecture patterns
    4. Examining the scalable storage component
    5. Looking at data integration
    6. Sorting in compute
    7. Adding a presentation layer
    8. Planning for dashboard/reporting
    9. Adding APIs/API management
    10. Relying on SSO/MFA/networking
    11. Not forgetting DevOps and CI/CD
    12. Finding the right Azure tool for the right purpose
    13. Understanding Industry Data Models
    14. Thinking about different sizes
    15. Planning for S size
    16. Planning for M size
    17. Planning for L size
    18. Understanding the supporting services
    19. Requiring data governance
    20. Establishing security
    21. Establishing DevOps and CI/CD
    22. Summary
    23. Questions
  9. Section 2: The Storage Layer
  10. Chapter 3: Understanding the Data Lake Storage Layer
    1. Technical requirements
    2. Setting up your Cloud Big Data Storage
    3. Provisioning a standard storage account instead
    4. Creating an Azure Data Lake Gen2 storage account
    5. Organizing your data lake
    6. Talking about zones in your data lake
    7. Creating structures in your data lake
    8. Planning the leaf level
    9. Understanding data life cycles
    10. Investigating storage tiers
    11. Planning for criticality
    12. Setting up confidentiality
    13. Using filetypes
    14. Implementing a data model in your Data Lake
    15. Understanding interconnectivity between your data lake and the presentation layer
    16. Examining key implementation and usage
    17. Monitoring your storage account
    18. Creating alerts for Azure storage accounts
    19. Talking about backups
    20. Configuring delete locks for the storage service
    21. Backing up your data
    22. Implementing access control in your Data Lake
    23. Understanding RBAC
    24. Understanding ACLs
    25. Understanding the evaluation sequence of RBAC and ACLs
    26. Understanding Shared Key authorization
    27. Understanding Shared Access Signature authorization
    28. Setting the networking options
    29. Understanding storage account firewalls
    30. Adding Azure virtual networks
    31. Using private endpoints with Data Lake Storage
    32. Discovering additional knowledge
    33. Summary
    34. Further reading
  11. Chapter 4: Understanding Synapse SQL Pools and SQL Options
    1. Uncovering MPP in the cloud – the power of 60
    2. Understanding the control node
    3. Understanding compute nodes
    4. Understanding the data movement service
    5. Understanding distributions
    6. Provisioning a Synapse dedicated SQL pool
    7. Connecting to your database for the first time
    8. Distributing, replicating, and round-robin
    9. Understanding CCI
    10. Talking about partitioning
    11. Implementing workload management
    12. Understanding concurrency and memory settings
    13. Using resource classes
    14. Implementing workload classification
    15. Adding workload importance
    16. Understanding workload isolation
    17. Scaling the database
    18. Using PowerShell to handle scaling and start/stop
    19. Using T-SQL to scale your database
    20. Loading data
    21. Using the COPY statement
    22. Maintaining statistics
    23. Understanding other SQL options in Azure
    24. Summary
    25. Further reading
    26. Additional links
    27. Static resource classes and concurrency slots
    28. Dynamic resource classes, memory allocation, and concurrency slots
    29. Effective values for REQUEST_MIN_RESOURCE_GRANT_PERCENT
  12. Section 3: Cloud-Scale Data Integration and Data Transformation
  13. Chapter 5: Integrating Data into Your Modern Data Warehouse
    1. Technical requirements
    2. Setting up Azure Data Factory
    3. Creating the Data Factory service
    4. Examining the authoring environment
    5. Understanding the Author section
    6. Understanding the Monitor section
    7. Understanding the Manage section
    8. Understanding the object types
    9. Using wizards
    10. Working with parameters
    11. Using variables
    12. Adding data transformation logic
    13. Understanding mapping flows
    14. Understanding wrangling flows
    15. Understanding integration runtimes
    16. Integrating with DevOps
    17. Summary
    18. Further reading
  14. Chapter 6: Using Synapse Spark Pools
    1. Technical requirements
    2. Setting up a Synapse Spark pool
    3. Bringing your Spark cluster live for the first time
    4. Examining the Synapse Spark architecture
    5. Understanding the Synapse Spark pool and its components
    6. Running a Spark job
    7. Examining Synapse Spark instances
    8. Understanding Spark pools and Spark instances
    9. Understanding resource usage
    10. Programming with Synapse Spark pools
    11. Understanding Synapse Spark notebooks
    12. Running Spark applications
    13. Benefiting of the Synapse metadata exchange
    14. Using additional libraries with your Spark pool
    15. Using public libraries
    16. Adding your own packages
    17. Handling security
    18. Monitoring your Synapse Spark pools
    19. Summary
    20. Further reading
  15. Chapter 7: Using Databricks Spark Clusters
    1. Technical requirements
    2. Provisioning Databricks
    3. Examining the Databricks workspace
    4. Understanding the Databricks components
    5. Creating Databricks clusters
    6. Managing clusters
    7. Using Databricks notebooks
    8. Using Databricks Spark jobs
    9. Adding dependent libraries to a job
    10. Creating Databricks tables
    11. Understanding Databricks Delta Lake
    12. Having a glance at Databricks SQL Analytics
    13. Adding libraries
    14. Adding dashboards
    15. Setting up security
    16. Examining access controls
    17. Understanding secrets
    18. Understanding networking
    19. Monitoring Databricks
    20. Summary
    21. Further reading
  16. Chapter 8: Streaming Data into Your MDWH
    1. Technical requirements
    2. Provisioning ASA
    3. Implementing an ASA job
    4. Integrating sources
    5. Writing to sinks
    6. Understanding ASA SQL
    7. Understanding windowing
    8. Using window functions in your SQL
    9. Delivering to more than one output
    10. Adding reference data to your query
    11. Adding functions to your ASA job
    12. Understanding streaming units
    13. Resuming your job
    14. Using Structured Streaming with Spark
    15. Security in your streaming solution
    16. Connecting to sources and sinks
    17. Understanding ASA clusters
    18. Monitoring your streaming solution
    19. Using Azure Monitor
    20. Summary
    21. Further reading
  17. Chapter 9: Integrating Azure Cognitive Services and Machine Learning
    1. Technical requirements
    2. Understanding Azure Cognitive Services
    3. Examining available Cognitive Services
    4. Getting in touch with Cognitive Services
    5. Using Cognitive Services with your data
    6. Understanding the Azure Text Analytics cognitive service
    7. Implementing the call to your Text Analytics cognitive service with Spark
    8. Examining Azure Machine Learning
    9. Browsing the different Azure ML tools
    10. Examining Azure Machine Learning Studio
    11. Understanding the ML designer
    12. Creating a linear regression model with the designer
    13. Publishing your trained model for usage
    14. Using Azure Machine Learning with your modern data warehouse
    15. Connecting the services
    16. Understanding further options to integrate Azure ML with your modern data warehouse
    17. Summary
    18. Further reading
  18. Chapter 10: Loading the Presentation Layer
    1. Technical requirements
    2. Understanding the loading strategy with Synapse-dedicated SQL pools
    3. Loading data into Synapse-dedicated SQL pools
    4. Examining PolyBase
    5. Loading data into a dedicated SQL pool using COPY
    6. Adding data with Synapse pipelines/Data Factory
    7. Using Synapse serverless SQL pools
    8. Browsing data ad hoc
    9. Using a serverless SQL pool to ELT
    10. Building a virtual data warehouse layer with Synapse serverless SQL pools
    11. Integrating data with Synapse Spark pools
    12. Reading and loading data
    13. Exchanging metadata between computes
    14. Summary
    15. Further reading
  19. Section 4: Data Presentation, Dashboarding, and Distribution
  20. Chapter 11: Developing and Maintaining the Presentation Layer
    1. Developing with Synapse Studio
    2. Integrating Synapse Studio with Azure DevOps
    3. Understanding the development life cycle
    4. Automating deployments
    5. Understanding developer productivity with Synapse Studio
    6. Using the Copy Data Wizard
    7. Integrating Spark notebooks with Synapse pipelines
    8. Analyzing data ad hoc with Azure Synapse Spark pools
    9. Creating Spark tables
    10. Enriching Spark tables
    11. Enriching dedicated SQL pool tables
    12. Creating new integration datasets
    13. Starting serverless SQL analysis
    14. Backing up and DR in Azure Synapse
    15. Backing up data
    16. Backing up dedicated SQL pools
    17. Monitoring your MDWH
    18. Understanding security in your MDWH
    19. Implementing access control
    20. Implementing networking
    21. Summary
    22. Further reading
  21. Chapter 12: Distributing Data
    1. Technical requirements
    2. Building data marts with Power BI
    3. Understanding the Power BI ecosystem
    4. Understanding Power BI object types
    5. Understanding Power BI offerings
    6. Acquiring data
    7. Optimizing the columnstore database in Power BI
    8. Building business logic with Data Analysis Expressions
    9. Visualizing data
    10. Publishing insights
    11. Creating data models with Azure Analysis Services
    12. Developing AAS models
    13. Distributing data using Azure Data Share
    14. Summary
    15. Further reading
  22. Chapter 13: Introducing Industry Data Models
    1. Understanding Common Data Model
    2. Examining the basics of the SDK
    3. Understanding solutions and the manifest file
    4. Examining and leveraging predefined entities
    5. Finding CDM definitions
    6. Using the APIs of CDM
    7. Introducing Dataverse
    8. Discovering Azure Industry Data Workbench
    9. Summary
    10. Further reading
  23. Chapter 14: Establishing Data Governance
    1. Technical requirements
    2. Discovering Azure Purview
    3. Provisioning the service
    4. Connecting to your data
    5. Scanning data
    6. Searching your catalog
    7. Browsing assets
    8. Examining assets
    9. Classifying data
    10. Creating a custom classification
    11. Creating a custom classification rule
    12. Using custom classifications
    13. Integrating with Azure services
    14. Integrating with Synapse
    15. Integrating with Power BI
    16. Integrating with Azure Data Factory
    17. Using data lineage
    18. Discovering Insights
    19. Discovering more Purview
    20. Summary
    21. Further reading
    22. Why subscribe?
  24. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
3.138.102.178