© Sudhir Rawat and Abhishek Narain 2019
Sudhir Rawat and Abhishek NarainUnderstanding Azure Data Factoryhttps://doi.org/10.1007/978-1-4842-4122-6_6

6. Managing Flow

Sudhir Rawat1  and Abhishek Narain2
(1)
Bangalore, India
(2)
Shanghai, China
 

In previous chapters, you focused on the Azure Data Factory features and learned how to build an end-to-end pipeline. The focus of this chapter will be how to set up a pipeline flow and why it’s important for any organization.

Why Managing Flow Is Important

So far, you can build an Azure Data Factory pipeline and run it. This chapter will discuss expressions, functions, and activities to control the data flow in an ADF pipeline. Why do you need to manage the flow? An SSIS developer knows the importance of control flow; however, if you are new to this world, then let’s look at an example. As a data developer, you build an ADF pipeline to move data from an on-premises SQL Server instance to an Azure SQL Database instance. However, you are asked to move five tables now and six tables later. So, you will create one pipeline and change it later. This will continue as many times as the organization needs data for certain tables. Another example could be you have been asked to move delta/incremental data. This is not a straightforward flow. You need to tweak the ADF pipeline without changing the existing pipeline and redeploying, which may cause errors. That’s where you need some mechanism to manage the ADF pipeline from the outside. This means you need to control the application flow from a configuration file that resides outside of the application.

Azure Data Factory provides various activities to manage the pipeline flow. Let’s get started with some action.

Expressions

Programming languages need to be able to get values during runtime to decide on the code flow. In Azure Data Factory, when the pipeline executes, you can capture parameter values or system variables to decide on the flow of the data. There are various system variables and functions to help achieve this. You can write an expression and evaluate conditions. Figure 6-1 lists the system variables provided in Azure Data Factory.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig1_HTML.jpg
Figure 6-1

System variables

Functions

There are many functions provided in Azure Data Factory to be used in an expression. For example, there are various types of functions that developers can use to get a value, to check whether a dictionary contains a key, to get a string representation of a data URI, to get an index of a value in a string, to concatenate strings, and so on. Figure 6-2 shows the different types of functions available in Azure Data Factory.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig2_HTML.jpg
Figure 6-2

Functions

Activities

In all programing languages there are ways to control the code flow such as for loops, if and until statements, and so on. These all help to decide which part of the code needs to be executed. In Azure Data Factory, the control flow activities help to set the direction of the data pipeline execution. For example, the if condition activity provides a way to decide which activity needs to be executed based on a condition.

Let’s Build the Flow

Let’s build a solution for AdventureWorks to understand how to use the features discussed.

AdventureWorks wants to share increment/delta data with a vendor. The data is stored in different tables of Azure SQL Database. Assume that the data is getting stored on a daily basis. At first, you’ll capture all the data and store it in Azure Blob Storage. Then each subsequent day, you need to capture only the delta data from different tables and store it in Azure Blob Storage. This blob is shared with the vendor. At the end, an e-mail is sent to the administrator to inform them about the pipeline execution. Figure 6-3 shows the architecture you will build in this chapter.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig3_HTML.jpg
Figure 6-3

Azure Data Factory pipeline design for delta data loading

Let’s start building this architecture.

Build the Source Database

Here are the steps:
  1. 1)
     
  2. 2)

    Click “Create a resource.”

     
  3. 3)

    Click Databases.

     
  4. 4)
    Click SQL Database (see Figure 6-4).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig4_HTML.jpg
    Figure 6-4

    Selecting SQL Database

     
  5. 5)

    Use ADFControlFlow for “Database name.”

     
  6. 6)

    Select the subscription where you want to deploy Azure SQL Server.

     
  7. 7)

    Create or select a resource group.

     
  8. 8)

    Select “Blank database” for “Select source.”

     
  9. 9)

    For Server, either create a new server or select an existing server.

     
  10. 10)

    Select “Not now” for “Want to use SQL elastic pool.”

     
  11. 11)

    Select the needed pricing tier.

     
  12. 12)

    Select the default or provide a Collation value.

     
  13. 13)
    Click Create (see Figure 6-5).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig5_HTML.jpg
    Figure 6-5

    SQL Database setup information

     
  14. 14)
    Once Azure SQL Server is set up, click “Query editor (preview),” as shown in Figure 6-6, or if you are familiar with SQL Server Management Studio, then execute all scripts there.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig6_HTML.jpg
    Figure 6-6

    SQL query editor

     
  15. 15)

    Click Login.

     
  16. 16)

    Select “SQL server authentication” for “Authorization type.”

     
  17. 17)

    Provide a login and a password.

     
  18. 18)
    Click OK (see Figure 6-7).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig7_HTML.jpg
    Figure 6-7

    SQL query editor login screen

     
  19. 19)

    Run the scripts shown in Figure 6-8, Figure 6-9, Figure 6-10, and Figure 6-11 one by one.

     
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig8_HTML.jpg
Figure 6-8

SQL script for table creation and data insertion

../images/468833_1_En_6_Chapter/468833_1_En_6_Fig9_HTML.jpg
Figure 6-9

SQL script for table creation and data insertion, continued

../images/468833_1_En_6_Chapter/468833_1_En_6_Fig10_HTML.jpg
Figure 6-10

SQL script for table creation and data insertion, continued

../images/468833_1_En_6_Chapter/468833_1_En_6_Fig11_HTML.jpg
Figure 6-11

SQL script for stored procedure creation

Build Azure Blob Storage as the Destination

Follow these steps:
  1. 1)
     
  2. 2)

    Click “Create a resource.”

     
  3. 3)

    Click Storage.

     
  4. 4)
    Click “Storage account - blob, file, table, queue” (see Figure 6-12).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig12_HTML.jpg
    Figure 6-12

    Azure Blob Storage service selection

     
  5. 5)
    Provide all the requested information to set up Azure Blob Storage and click Create (see Figure 6-13).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig13_HTML.jpg
    Figure 6-13

    Azure Blob Storage selection

     
  6. 6)

    Once the Azure Blob Storage setup is done, click “Storage Explorer (preview).”

     
  7. 7)
    Right-click Blob Containers and click Create Blob Container (see Figure 6-14).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig14_HTML.jpg
    Figure 6-14

    Access Azure Storage Explorer (preview)

     
  8. 8)
    Provide a name and public access level (see Figure 6-15).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig15_HTML.jpg
    Figure 6-15

    Container name and access level screen

     
  9. 9)

    Click OK.

     

Build the Azure Logic App

Follow these steps:
  1. 1)
     
  2. 2)
    Click “Create a resource,” then Integration, and then Logic App (see Figure 6-16).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig16_HTML.jpg
    Figure 6-16

    Azure Logic App service selection

     
  3. 3)
    Provide a name, select your subscription, create or select a resource group, select the right location, enable or disable Log Analytics, and click Create (see Figure 6-17).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig17_HTML.jpg
    Figure 6-17

    Azure Logic App service creation

     
  4. 4)
    Once the Azure Logic App is created, click Edit (see Figure 6-18).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig18_HTML.jpg
    Figure 6-18

    Clicking Edit

     
  5. 5)
    Select “When a HTTP request is received” from the Logic Apps Designer (see Figure 6-19).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig19_HTML.jpg
    Figure 6-19

    Azure Logic App trigger selection

     
  6. 6)
    Click “+ New step” (see Figure 6-20).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig20_HTML.jpg
    Figure 6-20

    Azure Logic Apps Designer

     
  7. 7)
    Click Office 365 Outlook. If you want to use another e-mail provider like Gmail, you can (see Figure 6-21).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig21_HTML.jpg
    Figure 6-21

    Azure Logic App action selection

     
  8. 8)
    Click “Send an email” (see Figure 6-22).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig22_HTML.jpg
    Figure 6-22

    Azure Logic App action configuration

     
  9. 9)
    Click “Sign in” (see Figure 6-23).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig23_HTML.jpg
    Figure 6-23

    Azure Logic App Outlook authentication link

     
This opens a new page to authenticate.
  1. 10)
    Configure the e-mail settings and click Save (see Figure 6-24).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig24_HTML.jpg
    Figure 6-24

    Azure Logic App Office 365 Outlook e-mail configuration

     
  2. 11)
    Once the Logic App is saved, you can view the HTTP POST URL (see Figure 6-25).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig25_HTML.jpg
    Figure 6-25

    Azure Logic App HTTP POST URL

     
  3. 12)
    Add the value shown in Figure 6-26 in Request Body JSON Schema.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig26_HTML.jpg
    Figure 6-26

    JSON schema

     
  4. 13)
    The screen will look like Figure 6-27 after entering the value.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig27_HTML.jpg
    Figure 6-27

    Azure Logic App HTTP request body configuration

     
  5. 14)
    In the “Send an email” activity, add a custom message adding dynamic content, as shown in Figure 6-28.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig28_HTML.jpg
    Figure 6-28

    Azure Logic App adding dynamic content

     

Build the Azure Data Factory Pipeline

Follow these steps:
  1. 1)

    From the Azure portal, click Azure Data Factory services, and click Author & Monitor. If you haven’t set up Azure Data Factory yet, then please refer to the previous chapters to set up the ADF service.

     
  2. 2)
    In the Author & Monitor UI, click Connection and + New (see Figure 6-29).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig29_HTML.jpg
    Figure 6-29

    Azure Data Factory new connection

     
  3. 3)
    Create two connections: one for Azure SQL Database (the service created earlier) and another for Azure Blob Storage (the service created earlier). Please refer to Chapter 5 if you are not sure how to create connections. Once you have created the connections, the screen will look like Figure 6-30.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig30_HTML.jpg
    Figure 6-30

    Azure Data Factory connections

     
  4. 4)
    Let’s create datasets. Click + and then Dataset (see Figure 6-31).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig31_HTML.jpg
    Figure 6-31

    Azure Data Factory dataset option

     
  5. 5)
    Select Azure SQL Database and click Finish (see Figure 6-32).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig32_HTML.jpg
    Figure 6-32

    Azure Data Factory Azure SQL Database selection

     
  6. 6)

    On the General tab, provide a name and add a description.

     
  7. 7)
    On the Connection tab, select the connection you created earlier for “Linked service.” Don’t choose any value for Table (see Figure 6-33).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig33_HTML.jpg
    Figure 6-33

    Azure Data Factory Azure SQL database configuration

     
  8. 8)
    Let’s create a dataset for Azure Blob Storage. Click + and then Dataset (see Figure 6-34).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig34_HTML.jpg
    Figure 6-34

    Azure Data Factory Dataset option

     
  9. 9)
    Select Azure Blob Storage and click Finish (see Figure 6-35).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig35_HTML.jpg
    Figure 6-35

    Azure Data Factory Azure Blob Storage dataset selection

     
  10. 10)

    On the General tab, provide a name and add a description.

     
  11. 11)
    On the Parameters tab, click New and provide a variable name for Name, select String for Type, and leave Default Value blank (see Figure 6-36).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig36_HTML.jpg
    Figure 6-36

    Azure Data Factory dataset configuration

     
  12. 12)

    On the Connection tab, select the linked service you created earlier.

     
  13. 13)

    Provide a container in “File path” and click the file name area to add the parameter.

     
  14. 14)
    Select the parameter name and click Finish (see Figure 6-37).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig37_HTML.jpg
    Figure 6-37

    Azure Data Factory parameter listing

     
  15. 15)
    Select “Text format” for “File format.” The screen will look like Figure 6-38.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig38_HTML.jpg
    Figure 6-38

    Azure Data Factory dataset configuration

     
  16. 16)
    Let’s create a dataset for the config table in Azure SQL. Click + and then Dataset (see Figure 6-39).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig39_HTML.jpg
    Figure 6-39

    Azure Data Factory Dataset option

     
  17. 17)

    On the General tab, provide a name and add a description.

     
  18. 18)
    On the Connection tab, select the Azure SQL connection created earlier. Provide [dbo].[config] for Table (see Figure 6-40).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig40_HTML.jpg
    Figure 6-40

    Azure Data Factory dataset configuration

     
  19. 19)

    Once the dataset is set up, let’s create a pipeline. Click + and then Pipeline.

     
  20. 20)

    On the General tab, provide a name and add a description.

     
  21. 21)
    On the Parameters tab, click + New and create a new parameter, as shown in Figure 6-41.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig41_HTML.jpg
    Figure 6-41

    Azure Data Factory parameter setting

     
  22. 22)
    Drag and drop a ForEach activity (in Iteration & Conditionals), as shown in Figure 6-42.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig42_HTML.jpg
    Figure 6-42

    Adding a ForEach activity

     
  23. 23)

    On the General tab, provide a name and add a description.

     
  24. 24)
    In Settings, provide “@pipeline().parameters.tablenames” for Items (see Figure 6-43).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig43_HTML.jpg
    Figure 6-43

    Azure Data Factory configuring activity

     
  25. 25)

    Under Activities (0), click “Add activity.”

     
  26. 26)
    Drag and drop the Lookup activity (see Figure 6-44).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig44_HTML.jpg
    Figure 6-44

    Adding a Lookup activity

     
  27. 27)

    On the General tab, provide a name (LookupNewwatermark) and add a description.

     
  28. 28)
    In Settings, select “Azure SQL dataset” for Source Dataset, and select Query for Use Query. Provide the following code in the Query area to get a new watermark:
    select MAX(@{item().WaterMark_Column}) as NewWatermarkvalue from   @{item().TABLE_NAME}
     
  29. 29)
    Select “First row only.” The screen will look like Figure 6-45.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig45_HTML.jpg
    Figure 6-45

    Azure Data Factory activity configuration

     
  30. 30)
    Drag and drop another Lookup activity (see Figure 6-46).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig46_HTML.jpg
    Figure 6-46

    Adding another Lookup activity

     
  31. 31)

    On the General tab, provide a name (LookupOldwatermark) and add a description. Let’s use the default values for the rest of the properties.

     
  32. 32)

    On the Settings tab, select the “watermark” dataset for Source Dataset.

     
  33. 33)

    Select Query for Use Query.

     
  34. 34)
    Provide the following query in the Query area:
    select Table_Name, WatermarkValue from Config where Table_Name  =  '@{item().TABLE_NAME}'
     
  35. 35)
    Select “First row only.” The screen will look like Figure 6-47.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig47_HTML.jpg
    Figure 6-47

    Azure Data Factory activity configuration

     
  36. 36)
    Drag and drop the Copy Data activity (in Move & Transform). Connect both previous activities to the Copy Data activity (see Figure 6-48).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig48_HTML.jpg
    Figure 6-48

    Adding a Copy Data activity

     
  37. 37)

    On the General tab, provide a name (getData) and add a description. Let’s use the default values for the rest of the properties.

     
  38. 38)

    On the Source tab, select “Azure SQL dataset” for Source Dataset. Select Query for Use Query.

     
  39. 39)
    Provide the following query for Query:
    select * from @{item().TABLE_NAME} where @{item().WaterMark_Column} > '@{activity('LookupOldwatermark').output.firstRow.WatermarkValue}' and @{item().WaterMark_Column} <= '@{activity('LookupNewwatermark').output.firstRow.NewWatermarkvalue}'
     
The screen will look like Figure 6-49 .
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig49_HTML.jpg
Figure 6-49

Copy Data activity source configuration

  1. 40)

    On the Sink tab, select Azure Blob Storage for Sink Dataset.

     
  2. 41)
    Provide the following for the folder name:
    @CONCAT(item().TABLE_NAME, pipeline().RunId, '.txt')
     
The screen will look like Figure 6-50.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig50_HTML.jpg
Figure 6-50

Copy Data activity sink configuration

  1. 42)
    Drag and drop the Stored Procedure activity and connect it from the Copy Data (getData) activity (see Figure 6-51).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig51_HTML.jpg
    Figure 6-51

    Stored Procedure activity

     
  2. 43)

    On the General tab, provide a name (UpdateConfigTable) and add a description. Let’s use the default values for rest of the properties.

     
  3. 44)
    Under SQL Account, select “Azure SQL connection” for “Linked service” (see Figure 6-52).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig52_HTML.jpg
    Figure 6-52

    Stored Procedure activity configuration

     
  4. 45)

    Provide “[dbo].[spupdatewatermark]” for “Stored procedure name.”

     
  5. 46)
    Click + New for “Stored procedure parameters.” Create the parameters listed in Table 6-1.
    Table 6-1

    Azure Data Factory Parameter Configuration

    Name

    Type

    Value

    RecordModifiedtime

    DateTime

    @{activity(‘LookupNewwatermark’).output.firstRow.NewWatermarkvalue}

    TableName

    String

    @{activity(‘LookupOldwatermark’).output.firstRow.Table_Name}

     
  6. 47)
    After creating the parameters, the screen will look like Figure 6-53.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig53_HTML.jpg
    Figure 6-53

    Stored Procedure activity parameter configuration

     
  7. 48)
    Drag and drop the Web activity and connect it to the Stored Procedure (UpdateConfigTable) activity (see Figure 6-54).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig54_HTML.jpg
    Figure 6-54

    Stored Procedure activity parameter

     
  8. 49)

    On the General tab, provide a name (UpdateConfigTable) and add a description. Let’s use the default values for rest of the properties.

     
  9. 50)

    On the Settings tab, provide the URL (copied from the Azure logic apps).

     
  10. 51)

    Select POST for Method.

     
  11. 52)
    Add the following value in Body:
           {
                 pipeline_run_time: @{pipeline().TriggerTime},
                 data_factory_name:@{pipeline().DataFactory}
           }
     
The screen will look like Figure 6-55.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig55_HTML.jpg
Figure 6-55

Azure Data Factory web activity configuration

  1. 53)
    The final pipeline will look like Figure 6-56.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig56_HTML.jpg
    Figure 6-56

    Azure Data Factory pipeline

     
  2. 54)

    Click Publish All.

     
  3. 55)

    Click Trigger and then “Trigger now.”

     
  4. 56)
    Provide the following value for the tablenames parameter:
    [
        {
            "TABLE_NAME": "Employee",
           "WaterMark_Column": "RecordModifiedDate"
        },
        {
            "TABLE_NAME": "Books",
           "WaterMark_Column": "RecordModifiedDate"
        }
    ]
     
The screen will look like Figure 6-57.
../images/468833_1_En_6_Chapter/468833_1_En_6_Fig57_HTML.jpg
Figure 6-57

Azure Data Factory parameter passing

  1. 57)

    Click Finish.

     
  2. 58)
    Click Monitor and click to drill down to see each activity run. All activity except the main activity (ForEachSourceTable) will run twice because you passed two tables to load data (see Figure 6-58).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig58_HTML.jpg
    Figure 6-58

    Azure Data Factory monitoring

     
  3. 59)
    In Azure Blob Storage, you will find two files, as shown in Figure 6-59.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig59_HTML.jpg
    Figure 6-59

    Azure Data Factory output

     
  4. 60)
    Open the files to look at the data (see Figure 6-60).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig60_HTML.jpg
    Figure 6-60

    Azure Data Factory output

     
  5. 61)
    Check the e-mail account; you should see an e-mail like in Figure 6-61.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig61_HTML.jpg
    Figure 6-61

    Azure Data Factory pipeline execution report e-mail

     
  6. 62)
    In Azure SQL Server, check the WatermarkValue config table (see Figure 6-62).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig62_HTML.jpg
    Figure 6-62

    Watermark value update

     
Let’s insert some more records in the tables.
  1. 1)
    Run the code shown in Figure 6-63 to insert records into an Azure SQL table.
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig63_HTML.jpg
    Figure 6-63

    SQL script for data insertion

     
  2. 2)

    Run an Azure Data Factory pipeline.

     
  3. 3)
    Look Azure Blob Storage and you will find two more files (selected in Figure 6-64).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig64_HTML.jpg
    Figure 6-64

    Azure Data Factory output

     
  4. 4)
    Open the files to see the new data (see Figure 6-65).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig65_HTML.jpg
    Figure 6-65

    Azure Data Factory output data

     
  5. 5)
    Check the watermark values in the config table (see Figure 6-66).
    ../images/468833_1_En_6_Chapter/468833_1_En_6_Fig66_HTML.jpg
    Figure 6-66

    Watermark value update

     

You built a solution to understand how the flow can be handled within Azure Data Factory. There are other functions and activities that can be used on a case-by-case basis.

Summary

In this chapter, you learned about managing the data pipeline flow and learned how to use expressions, functions, and activities to control the data flow in Azure Data Factory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.248.37