0%

Solve real-world data problems and create data-driven workflows for easy data movement and processing at scale with Azure Data Factory

Key Features

  • Learn how to load and transform data from various sources, both on-premises and on cloud
  • Use Azure Data Factory's visual environment to build and manage hybrid ETL pipelines
  • Discover how to prepare, transform, process, and enrich data to generate key insights

Book Description

Azure Data Factory (ADF) is a modern data integration tool available on Microsoft Azure. This Azure Data Factory Cookbook helps you get up and running by showing you how to create and execute your first job in ADF. You'll learn how to branch and chain activities, create custom activities, and schedule pipelines. This book will help you to discover the benefits of cloud data warehousing, Azure Synapse Analytics, and Azure Data Lake Gen2 Storage, which are frequently used for big data analytics. With practical recipes, you'll learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premise infrastructure with cloud-native tools to get relevant business insights. As you advance, you'll be able to integrate the most commonly used Azure Services into ADF and understand how Azure services can be useful in designing ETL pipelines. The book will take you through the common errors that you may encounter while working with ADF and show you how to use the Azure portal to monitor pipelines. You'll also understand error messages and resolve problems in connectors and data flows with the debugging capabilities of ADF.

By the end of this book, you'll be able to use ADF as the main ETL and orchestration tool for your data warehouse or data platform projects.

What you will learn

  • Create an orchestration and transformation job in ADF
  • Develop, execute, and monitor data flows using Azure Synapse
  • Create big data pipelines using Azure Data Lake and ADF
  • Build a machine learning app with Apache Spark and ADF
  • Migrate on-premises SSIS jobs to ADF
  • Integrate ADF with commonly used Azure services such as Azure ML, Azure Logic Apps, and Azure Functions
  • Run big data compute jobs within HDInsight and Azure Databricks
  • Copy data from AWS S3 and Google Cloud Storage to Azure Storage using ADF's built-in connectors

Who this book is for

This book is for ETL developers, data warehouse and ETL architects, software professionals, and anyone who wants to learn about the common and not-so-common challenges faced while developing traditional and hybrid ETL solutions using Microsoft's Azure Data Factory. You'll also find this book useful if you are looking for recipes to improve or enhance your existing ETL pipelines. Basic knowledge of data warehousing is expected.

Table of Contents

  1. Azure Data Factory Cookbook
  2. Why subscribe?
  3. Contributors
  4. About the authors
  5. About the reviewers
  6. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Sections
    8. Getting ready
    9. How to do it…
    10. How it works…
    11. There's more…
    12. See also
    13. Get in touch
    14. Reviews
  8. Chapter 1: Getting Started with ADF
    1. Introduction to the Azure data platform
    2. Getting ready
    3. How to do it...
    4. How it works...
    5. Creating and executing our first job in ADF
    6. Getting ready
    7. How to do it...
    8. How it works...
    9. There's more...
    10. See also
    11. Creating an ADF pipeline by using the Copy Data tool
    12. Getting ready
    13. How to do it...
    14. How it works...
    15. There's more...
    16. Creating an ADF pipeline using Python
    17. Getting ready
    18. How to do it...
    19. How it works...
    20. There's more...
    21. See also
    22. Creating a data factory using PowerShell
    23. Getting ready
    24. How to do it…
    25. How it works...
    26. There's more...
    27. See also
    28. Using templates to create ADF pipelines
    29. Getting ready
    30. How to do it...
    31. How it works...
    32. See also
  9. Chapter 2: Orchestration and Control Flow
    1. Technical requirements
    2. Using parameters and built-in functions
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. There's more…
    7. See also
    8. Using Metadata and Stored Procedure activities
    9. Getting ready
    10. How to do it…
    11. How it works…
    12. There's more…
    13. Using the ForEach and Filter activities
    14. Getting ready
    15. How to do it…
    16. How it works…
    17. Chaining and branching activities within a pipeline
    18. Getting ready
    19. How to do it…
    20. There's more…
    21. Using the Lookup, Web, and Execute Pipeline activities
    22. Getting ready
    23. How to do it…
    24. How it works…
    25. There's more…
    26. See also
    27. Creating event-based triggers
    28. Getting ready
    29. How to do it…
    30. How it works…
    31. There's more…
    32. See also
  10. Chapter 3: Setting Up a Cloud Data Warehouse
    1. Technical requirements
    2. Connecting to Azure Synapse Analytics
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. There's more…
    7. Loading data to Azure Synapse Analytics using SSMS
    8. Getting ready
    9. How to do it…
    10. How it works…
    11. There's more…
    12. Loading data to Azure Synapse Analytics using Azure Data Factory
    13. Getting ready
    14. How to do it…
    15. How it works…
    16. There's more…
    17. Pausing/resuming an Azure SQL pool from Azure Data Factory
    18. Getting ready
    19. How to do it…
    20. How it works…
    21. There's more…
    22. Creating an Azure Synapse workspace
    23. Getting ready
    24. How to do it…
    25. There's more…
    26. Loading data to Azure Synapse Analytics using bulk load
    27. Getting ready
    28. How to do it…
    29. How it works…
    30. Copying data in Azure Synapse Orchestrate
    31. Getting ready
    32. How to do it…
    33. How it works…
    34. Using SQL on-demand
    35. Getting ready
    36. How to do it…
    37. How it works…
  11. Chapter 4: Working with Azure Data Lake
    1. Technical requirements
    2. Setting up Azure Data Lake Storage Gen2
    3. Getting ready
    4. How to do it...
    5. Connecting Azure Data Lake to Azure Data Factory and loading data
    6. Getting ready
    7. How to do it...
    8. How it works...
    9. Creating big data pipelines using Azure Data Lake and Azure Data Factory
    10. Getting ready
    11. How to do it...
    12. How it works
  12. Chapter 5: Working with Big Data – HDInsight and Databricks
    1. Technical requirements
    2. Setting up an HDInsight cluster
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. Processing data from Azure Data Lake with HDInsight and Hive
    7. Getting ready
    8. How to do it…
    9. How it works…
    10. Processing big data with Apache Spark
    11. Getting ready
    12. How to do it…
    13. How it works…
    14. Building a machine learning app with Databricks and Azure Data Lake Storage
    15. Getting ready
    16. How to do it…
    17. How it works…
  13. Chapter 6: Integration with MS SSIS
    1. Technical requirements
    2. Creating a SQL Server database
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. Building an SSIS package
    7. Getting ready
    8. How to do it…
    9. How it works…
    10. Running SSIS packages from ADF
    11. Getting ready
    12. How to do it…
    13. How it works…
  14. Chapter 7: Data Migration – Azure Data Factory and Other Cloud Services
    1. Technical requirements
    2. Copying data from Amazon S3 to Azure Blob storage
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. Copying large datasets from S3 to ADLS
    7. Getting ready
    8. How to do it…
    9. How it works…
    10. See also
    11. Copying data from Google Cloud Storage to Azure Data Lake
    12. Getting ready
    13. How to do it…
    14. How it works…
    15. See also
    16. Copying data from Google BigQuery to Azure Data Lake Store
    17. Getting ready
    18. How to do it…
    19. Migrating data from Google BigQuery to Azure Synapse
    20. Getting ready
    21. How to do it…
    22. See also
    23. Moving data to Dropbox
    24. Getting ready
    25. How to do it…
    26. How it works…
    27. There's more…
    28. See also
  15. Chapter 8: Working with Azure Services Integration
    1. Technical requirements
    2. Triggering your data processing with Logic Apps
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. There's more…
    7. Using the web activity to call an Azure logic app
    8. Getting ready
    9. How to do it…
    10. How it works…
    11. There's more…
    12. Adding flexibility to your pipelines with Azure Functions
    13. Getting ready…
    14. How to do it…
    15. How it works…
    16. There's more…
    17. Automatically building ML models with speed and scale
    18. Getting ready
    19. How to do it...
    20. How it works…
    21. There's more...
    22. Transforming and preparing your data via Azure Databricks
    23. Getting ready
    24. How to do it…
    25. How it works…
    26. There's more…
  16. Chapter 9: Managing Deployment Processes with Azure DevOps
    1. Technical requirements
    2. Setting up Azure DevOps
    3. Getting ready
    4. How to do it...
    5. How it works...
    6. Publishing changes to ADF
    7. Getting ready
    8. How to do it...
    9. How it works...
    10. Deploying your features into the master branch
    11. Getting ready
    12. How to do it...
    13. How it works...
    14. Getting ready for the CI/CD of ADF
    15. Getting ready
    16. How to do it...
    17. How it works...
    18. Creating an Azure pipeline for CD
    19. Getting ready
    20. How to do it...
    21. How to do it...
    22. There's more...
  17. Chapter 10: Monitoring and Troubleshooting Data Pipelines
    1. Technical requirements
    2. Monitoring pipeline runs and integration runtimes
    3. Getting ready
    4. How to do it…
    5. How it works…
    6. Investigating failures – running in debug mode
    7. Getting ready
    8. How to do it…
    9. How it works…
    10. There's more…
    11. See also
    12. Rerunning activities
    13. Getting ready
    14. How to do it…
    15. How it works…
    16. Configuring alerts for your Data Factory runs
    17. Getting ready
    18. How to do it…
    19. How it works…
    20. There's more…
    21. See also
  18. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
3.142.53.68