In the previous chapter, you worked with various activities to build a solution with various data analytics requirements. On Microsoft Azure, you will notice that many services are available for storage and compute. There is no right or wrong choice of service; you just need to be clear about what the business needs now and, more important, what it needs in the future. There will always be trade-offs when choosing one service over another. So, as a data professional, you need to be clear on what the business requirements are now and in the future.
Most data professionals want to know whether they can schedule their open source code for data transformation? That is the question this chapter will answer through building a pipeline to process data using Python.
Before diving into generating more code, let’s take a look at two different terms that are frequently used.
Data Warehouse to Modern Data Warehouse
In the traditional sense, a data warehouse is a central repository that consolidates data from different sources such as a file system, a customer relationship management (CRM) system, SQL Server, and so on. The data is cleaned and transformed, the values are calculated, and the data is stored for historical purposes. This has been going on for a decade.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig1_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig1_HTML.jpg)
Modern data warehouse pattern
ETL vs. ELT
Extract-transform-load (ETL) and extract-load-transform (ELT) are not new to data professionals. Both techniques describe transformations, either before loading or after loading the data. In ELT, the transformation happens on the target, whereas in ETL, it may happen on the source side or on a compute layer in between. If you are processing Big Data, you may want to use ETL and then use an Azure Databricks Spark cluster to transform the data in an optimized environment with lower latency. If you have the same source and destination, you can leverage the compute power of a system like Azure SQL Data Warehouse to transform data there. Since ELT takes place in the first stage of loading raw data on the target, you can leverage of the power of a system like Azure SQL Data Warehouse to transform the data parallelly.
As there is no right or wrong way to process data, you may want to look at various parameters such as the data source and destination, latency, scalability, performance, cost, skill set, and so on, to decide on ETL or ELT. This chapter will show how to apply both approaches using Azure Data Factory.
Azure Databricks
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig2_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig2_HTML.jpg)
Azure Databricks architecture on Azure
Optimized environment
Ease of setup and management
Provides enterprise security through Azure AD integration
Collaboration
PowerBI integration
Integrates with other Azure services
When should you choose HDInsight Spark versus Azure Databricks? Both are optimized to run Spark workloads. HDInsight provides a platform-as-a-service (PaaS) experience where organizations can run multiple types of workloads such as Kafka, Spark, MapReduce, Storm, HBase, and Hive LLAP. Azure Databricks supports only Spark clusters. The platform provides a software-as-a-service (SaaS) experience. Also, it helps different people within the organization to collaborate easily. The pricing of the services is another consideration.
Build and Implement Use Case
In this use case, you will focus on transforming data using Azure Data Factory. This will be an example of ETL.
Microsoft Azure Data Factory
Microsoft Azure SQL Data Warehouse (DWH)
Microsoft Azure Databricks
Microsoft Azure Blob Storage
One of the ways to load dimension data is to use Azure Data Factory’s Copy activity to transfer dimension data to Azure SQL Data Warehouse. The other way is to leverage Azure Databricks to do it (as shown in Figure 5-3). You can do various kinds of transformations such as managing slowly changing dimensions (SCD-1, SCD-2, SCD-3) and checking for data anomalies.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig3_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig3_HTML.jpg)
Architecture to feed dimension data
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig4_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig4_HTML.jpg)
Dimension/master data files
- 1)
Go to https://portal.azure.com .
- 2)
Click “Create a resource.”
- 3)
Click Databases.
- 4)
- 5)
Provide the database name.
- 6)
Select your subscription.
- 7)
Create or select a resource group.
- 8)
Set “Select source” to “Blank database.”
- 9)
Create a new server.
- 10)
Select your needed performance level.
- 11)
Leave Collation at its default value.
- 12)Click Create (see Figure 5-6).Figure 5-6
Providing values to set up the SQL Data Warehouse service
- 13)
Once created, click SQL Data Warehouse.
- 14)Click “Query editor (preview),” as shown in Figure 5-7.Figure 5-7
Opening the query editor to run queries
- 15)
- 16)Enter the following command:CREATE master KEY encryption BY password = 'adfBook@123';go
- 17)Click Run (see Figure 5-9).Figure 5-9
Executing scripts in the query editor
- 1)
Click “Create a resource.”
- 2)
Click Analytics.
- 3)Click Azure Databricks (see Figure 5-10).Figure 5-10
Selecting the Azure Databricks service
- 4)
Provide the workspace name.
- 5)
Select your subscription.
- 6)
Create or select a resource group.
- 7)
Select your location.
- 8)
Select the needed pricing tier. For this demonstration, let’s select Standard.
- 9)Click Create (see Figure 5-11).Figure 5-11
Providing values to set up Azure Databricks
- 10)
Click Azure Databricks Service in the Azure dashboard.
- 11)
- 12)
- 13)
Provide the name.
- 14)For Language, select Scala (see Figure 5-14).Figure 5-14
Creating a notebook
- 15)
Click Create.
- 16)Once the notebook is created, paste the code shown in Figure 5-15, Figure 5-16, Figure 5-17, and Figure 5-18 into the notebook.Figure 5-15
Azure Databricks Scala code
Figure 5-16Azure Databricks Scala code, continued
Figure 5-17Azure Databricks Scala code, continued
Figure 5-18Azure Databricks Scala code, continued
- 1)
Switch to the Azure Data Factory Author & Monitor UI.
- 2)Drag and drop a Notebook activity onto the designer (see Figure 5-19).Figure 5-19
Setting up an activity in Azure Data Factory
- 3)Name the activity (see Figure 5-20).Figure 5-20
Setting up an activity
- 4)
Click the Azure Databricks tab.
- 5)Click +New (see Figure 5-21).Figure 5-21
Creating a Databricks linked service
- 6)
Provide the name and add a description.
- 7)
Leave the default value for “Connect via integration runtime.”
- 8)
Select From Azure Subscription for “Account selection method.”
- 9)
Select your Azure subscription.
- 10)
Select the Databricks workspace created earlier for “Databricks workspace.”
- 11)
Select “New job cluster” for “Select cluster.”
- 12)
Domain/Region will populate automatically. The value was set when you created the Databricks workspace.
- 13)
Select the access token.
- 14)
For “Access token,” click the text box; it will take you to the Azure Databricks account to generate a token. Copy the token and paste it here.
- 15)
Select the cluster node type.
- 16)
Select 4.1 (which includes Apache Spark 2.3.0 and Scala 2.11) for “Cluster version.”
- 17)Select 1 for Workers (see Figure 5-22).Figure 5-22
Setting up a linked service
- 18)
- 19)
Click Finish.
- 20)
Click Settings.
- 21)Provide the notebook path. You can get this path from the Azure Databricks workspace (see Figure 5-24).Figure 5-24
Setting the notebook path
- 22)
- 23)Click Trigger and then Trigger Now (see Figure 5-26).Figure 5-26
Running an Azure Data factory pipeline
- 24)
Click Finish.
- 25)
- 26)Once the pipeline has executed successfully, switch to the Azure SQL Data Warehouse query editor to view the tables and data. Let’s first see the error records (see Figure 5-28).Figure 5-28
Querying SQL Data Warehouse to check error data
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig29_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig29_HTML.jpg)
Querying SQL Data Warehouse to verify dimension data
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig30_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig30_HTML.jpg)
Architecture to load sales data
- 1)Switch to the Azure SQL Data Warehouse query editor to create a fact table using the following script (see Figure 5-31):CREATE TABLE [dbo].[FactStoreSales]([SalesOrderID] [int] NULL,[StoreId] [int] NULL,[OrderDate] [date] NULL,[SubTotal] [decimal](18, 2) NULL,[Taxperc] [int] NULL,[TaxAmt] [decimal](18, 2) NULL,[Freightperc] [int] NULL,[Freight] [decimal](18, 2) NULL,[TotalDue] [decimal](18, 2) NULL,[SalesOrderDetailID] [int] NULL,[ProductKey] [bigint] NULL,[OrderQty] [int] NULL,[UnitPrice] [decimal](18, 2) NULL,[UnitPriceDiscount] [decimal](18, 2) NULL,[LineTotal] [decimal](18, 2) NULL)Figure 5-31
Building the fact table
- 2)
- 3)Figure 5-33 shows the Databricks script that will get executed from Azure Data Factory. Create a notebook (Python) in your Azure Databricks account and put the code there.Figure 5-33
Azure Databricks Python code
- 4)
Switch to the Azure Data Factory Author & Monitor UI.
- 5)Drag and drop a Notebook activity, as shown in Figure 5-34.Figure 5-34
Building the Azure data pipeline
- 6)
On the General tab, provide a name and description for the activity.
- 7)
On the Azure Databricks tab, create or use the existing Databricks linked service.
- 8)If you are creating a new Databricks linked activity, provide the name, add a description, select your subscription, provide the Databricks workspace, generate a token if you don’t have one, and add the cluster details, as shown in Figure 5-35.Figure 5-35
Parameter selection for Azure Databrick linked service
- 9)
Click Finish.
- 10)On the Settings tab, provide the notebook path, as shown in Figure 5-36.Figure 5-36
Setting a notebook path
- 11)Drag and drop a Copy Data activity and connect it with the Notebook activity, as shown in Figure 5-37.Figure 5-37
Setting up the Azure Data Factory pipeline
- 12)On the Source tab, provide the link to the location of the raw files (see Figure 5-38).Figure 5-38
Setting a connection to Azure Blob Storage
- 13)
Similarly, provide a link service to Azure Blob Storage where you want to store archive files.
- 14)
Click Publish All.
- 15)
Click Trigger and then Trigger Now.
- 16)
Click Finish.
- 17)
Click the Monitor icon on the left side to monitor the pipeline execution.
- 18)Once completed successfully, you can query the data in Azure SQL Data Warehouse (see Figure 5-39).Figure 5-39
Querying the fact table in SQL Data Warehouse
- 19)Click Azure Blob Storage to see the files copied in Azure Blob Storage (see Figure 5-40).Figure 5-40
Verifying the file movement
Stored Procedure
Running a stored procedure is another way to transform data into deeper insights and predictions. The benefit of running a stored procedure is that it provides compute near the data, which means the data doesn’t need to travel for processing. You can invoke a Stored Procedure activity to Azure SQL Database, Azure SQL Data Warehouse, and a SQL Server database on-premises or via a virtual machine (VM). In the case of on-premises storage, you need to install the self-hosted integration runtime. Organizations use stored procedures to clean and move data from staging to the production database. This is a classic example of ELT. Since the stored procedure runs on the database server, you’ll need to validate whether executing the heavy-lifting job causes any performance issues.
Let’s say AdventureWorks wants to evaluate how to transform data using the ELT methodology. In this scenario, assume that it’s available in Azure Blob Storage (in CSV format). Using Azure Data Factory, you will move data to Azure SQL Database and then run a stored procedure to clean the data (primarily removing duplicate records).
- 1)
Go to https://portal.azure.com .
- 2)
Click “Create a resource.”
- 3)
Click Databases.
- 4)
- 5)Fill in the following details and click Create to set up the new Azure SQL Server instance (see Figure 5-42).Figure 5-42
Inputting values to set up Azure SQL Database
- 6)Let’s use SQL Server Management Studio (SSMS) to connect to the Azure SQL server, or you can use the query editor (see Figure 5-43).Figure 5-43
Switching to the query editor
- 7)If you plan to use SSMS, please set your machine IP to Azure SQL so that you can access it from your computer (SSMS), as shown in Figure 5-44.Figure 5-44
Accessing Azure SQL Server from the client tool
- 8)Create the SQL Server table and stored procedure as provided. Figure 5-45 shows the structure after running the table and stored procedure scripts.Figure 5-45
Artifact in Azure SQL Database
- 9)Upload the data file (provided) to Azure Blob Storage (see Figure 5-46).Figure 5-46
Data file in Azure Blob Storage
- 1)
Switch to the Azure Data Factory Author & Monitor UI.
- 2)Let’s create a SQL and Azure Blob Storage connection. Click Connections (see Figure 5-47).Figure 5-47
Setting up a connection
- 3)Click +New (see Figure 5-48).Figure 5-48
Creating new linked services
- 4)
- 5)
Click Continue.
- 6)Provide information about the storage where you uploaded your data file (see Figure 5-50).Figure 5-50
Azure Blob Storage linked service options
- 7)
Click Finish.
- 8)
Let’s create a SQL connection. Click +New.
- 9)
- 10)
Click Continue.
- 11)Provide information about the Azure SQL Server instance created in earlier steps (see Figure 5-52).Figure 5-52
Setting up the Azure SQL Database linked service
- 12)
Click Finish.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig53_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig53_HTML.jpg)
List of available linked service
- 18)
Click + and then Dataset.
- 19)
Select Azure SQL Database.
- 20)
Click Finish.
- 21)On the Connection tab, select the Azure SQL Server connection created in the previous step. For Table, select All_Sales_Records_Raw (see Figure 5-57).Figure 5-57
Inputting values to set up Azure SQL Database
- 22)
Click Publish ALL.
- 23)
Click + and then Pipeline.
- 24)
- 25)On the Source tab, select the Azure Blob Storage dataset created earlier (see Figure 5-59).Figure 5-59
Source selection for the Copy activity
- 26)On the Sink tab, select the Azure SQL dataset created earlier (see Figure 5-60).Figure 5-60
Sink selection for the Copy activity
- 27)On the Mapping tab, click Import Schemas. Since the column names are the same on the source and target, the mapping is done automatically (see Figure 5-61).Figure 5-61
Field mapping
- 28)
- 29)Select the Copy Data activity, click the “Add activity on:” option, select Success, and then drag the arrow to the Stored Procedure activity. This means run the next process if the current activity runs successfully (see Figure 5-63).Figure 5-63
Setting up a link between two activities
- 30)Select Stored Procedure (see Figure 5-64).Figure 5-64
Setting values for the stored procedure
- 31)
On the SQL Account tab, select “Azure SQL Database connection” for “Linked service.”
- 32)On the Stored Procedure tab, select the stored procedure (adfstoredprocactivity) created earlier (see Figure 5-65).Figure 5-65
Selecting the stored procedure
- 33)
Click Publish All.
- 34)
Click Trigger and then Trigger Now.
- 35)
Click Finish.
- 36)
- 37)Once pipeline execution happens successfully, query the database (see Figure 5-67).Figure 5-67
SQL query to validate the transformation
Custom Activity
So far, you have seen various activities that can be used in Azure Data Factory for data transformation. Why would you need a custom activity?
It is not always the case that you will go with the built-in activities to transform data. There are many scenarios where developers want to add their own logic in a programming language for transformation. For example, you might want to read a document, extract specific information and store it in a database, or call an API to retrieve data and store it or any other customization; these are not built-in tasks. In a nutshell, if Azure Data Factory doesn’t support the transformation that you are looking for, then you can use a Custom activity to write your own transformation.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig68_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig68_HTML.jpg)
Reference architecture
- 1)
Go to https://apps.twitter.com/ .
- 2)
Click Create New App.
- 3)Fill out the information (see Figure 5-69).Figure 5-69
Setting up a Twitter application
- 4)
Click Create New Application.
- 5)Once the application is created, go to Keys and Access Tokens (see Figure 5-70).Figure 5-70
Application settings
- 6)
Click “Create my access token” and save it to use it in a future step.
- 1)
Switch to https://portal.azure.com .
- 2)
Click Azure Active Directory on the left side.
- 3)
Click “App registrations.”
- 4)Click “New application registration” (see Figure 5-71).Figure 5-71
Azure AD app registration option
- 5)
Provide a name.
- 6)
Select the application type; here select “Web app /API.”
- 7)Provide the sign-on URL. This does not need to be a site that exists (you can put http://test1.adventureworks.com ), as shown in Figure 5-72.Figure 5-72
AD app registration values
- 8)
Click Create.
- 9)
Once the app is registered, click it.
- 10)
- 11)Click Keys in the Password section and then provide a description and expiration date. Click Save, and it will show the password (see Figure 5-74). Copy it into a notepad.Figure 5-74
Setting app keys
- 1)Click “Create a resource” and search for Azure key vault (see Figure 5-75).Figure 5-75
Azure Key Vault
- 2)
Select Key Vault and click Create.
- 3)Enter a name, select your subscription, enter or create a resource group, select the right location, and leave the defaults for “Pricing tier,” “Access Policies,” and “Virtual Network Access (preview),” as shown in Figure 5-76.Figure 5-76
Inputting values to set up Azure Key Vault
- 4)
Click Create.
- 1)Click “Create a resource,” click Databases, and then click Azure Cosmos DB (Figure 5-77).Figure 5-77
Setting up Azure Cosmos DB
- 2)Provide an ID and the API, select your subscription, select or create a resource group, select the right location, leave the other settings at the defaults, and click Create (see Figure 5-78).Figure 5-78
Inputting values to set up Azure Cosmos Database
- 3)At the end, you will see three services added in your Azure subscription (see Figure 5-79).Figure 5-79
Services set up for this demo so far
- 4)
Click Azure Cosmos DB Account (created in an earlier step).
- 5)Click Data Explorer and then New Database (see Figure 5-80).Figure 5-80
Creating a new Azure Cosmos DB database
- 6)Provide a database ID and click OK (see Figure 5-81).Figure 5-81
Inputting values to set up the Azure Cosmos DB database
- 7)
- 8)
Select “Use existing” and choose the database ID created in the previous step.
- 9)
Provide the collection ID.
- 10)
Select Fixed (10 GB) as the storage capacity.
- 11)
Click OK.
- 1)
Click “Create a resource.”
- 2)
Click Compute.
- 3)Click Batch Service (see Figure 5-83).Figure 5-83
Setting up the Azure Batch service
- 4)
Provide an account name.
- 5)
Select your subscription.
- 6)
Select or create a new resource group.
- 7)
Select the right location.
- 8)
Select or create a new storage account.
- 9)Select “Batch service” for “Poll allocation mode” (see Figure 5-84).Figure 5-84
Setting up a new Azure Blob Storage account
- 10)
Click Create.
- 11)Once the Azure Batch services are set up, you will see services shown in Figure 5-85 on the Azure dashboard (if you choose to pin to dashboard).Figure 5-85
Services set up for this demo
Prerequisites
Package Name | Description |
---|---|
Python 2.7 | Install Python 2.7 from https://www.python.org/downloads/ . |
install python-pip | Install python-pip to make sure to install the Python packages. |
pip install tweepy | Connects to Twitter. |
pip install pydocumentdb | Accesses Azure Cosmos DB. |
pip install azure-keyvault | Accesses Azure Key Vault. |
- 1)
Click “Azure batch service.”
- 2)
Click Pools.
- 3)Click Add (see Figure 5-86).Figure 5-86
Setting up pools
- 4)Add the information in Table 5-2 (see Figure 5-87).Table 5-2
Setting Values to Set Up the Pool
Property
Value
Pool ID
Name of the pool.
Display name
Description (optional).
Image Type
Custom image (Linux/Windows).
Custom VM Image
Select the custom image created earlier.
Operating System
Windows.
OS Distribution
WindowsServer.
OS version
Microsoft Windows Server 2008 R2 SP1 (latest).
Caching
None.
Container configuration
None.
Metered licenses for rendering
Don’t change. You’re not doing any rendering for this demo.
Node pricing tier
Standard A1 (1 core, 1.8 GB). This demo is not a compute-extensive job; hence, basic compute works. However, you can go for higher compute.
Mode
Fixed. This service allows you choose the “Auto scale” option, which allows the service to increase/decrease compute based on a formula. This helps the organization not to worry about scaling out and scaling in.
Target dedicated nodes
Set it to 1.
Low priority nodes
0. This option reduces compute cost. Low-priority nodes take advantages of surplus capacity in Azure. You use low-priority nodes when the job consumes less time or for batch processing. The trade-off of using such an option is that the VMs may not be available for allocation or preempted at any time, depending on the available capacity.
Resize Timeout
15 minutes. This is how long the process waits for resizing.
Start task
Disabled. Specify the task that needs to run first when a VM is added to the pool.
Max tasks per node
1. You can specify the maximum number of tasks that can be run on the VM. Be cautious about the VM size you choose.
User accounts
Default.
Task scheduling policy
Pack. This defines how tasks get distributed between VMs in the pool.
Inter-node communication
No.
Application Package
0. In case your application requires packages for it to run successfully.
Certificates
0.
Pool endpoint configuration
Default.
Network configuration
Default. Not required for this demo.
Subnet
Default. Not required for this demo.
Figure 5-87Inputting values to set up a pool
- 5)
Click OK.
- 6)Once the pool is created, click Pool (created in the previous step) and then Nodes to make sure a VM is created (see Figure 5-88).Figure 5-88
Available nodes
- 1)
Switch to Azure Key Vault.
- 2)Add all the secrets like the Azure Cosmos DB details and Twitter API details on Azure Key Vault. Switch to the respective services to capture the keys (see Figure 5-89).Figure 5-89
Setting secrets
- 3)
In Azure Key Vault, click “Access policies.”
- 4)Click +Add New (see Figure 5-90).Figure 5-90
Setting access policies
- 5)
Select Principal. This is the application registered in Azure Active Directory.
- 6)
- 7)
Click OK.
- 8)
Now the environment is set.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig93_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig93_HTML.jpg)
Python code
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig94_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig94_HTML.jpg)
Python code, continued
- 1)
Switch to the Azure Data Factory Author & Monitor UI.
- 2)
- 3)Provide a name and add a description to the activity (see Figure 5-96).Figure 5-96
Setting up a Custom activity
- 4)On the Azure Batch tab, click +New (see Figure 5-97).Figure 5-97
Setting up an Azure Batch linked service
- 5)Provide the Azure Batch account details. Retrieve all the information from the Azure Batch account services (see Figure 5-98).Figure 5-98
Options to set new linked service
- 6)
Click Finish.
- 7)
Click Settings.
- 8)Provide the command in Command Text Area (see Table 5-3).Table 5-3
Values for Custom Activity
Parameter
Description
getTweets.py
getTweets is the name of the Python program to execute.
Azure
This gets tweets for given hash tag.
2018/07/28
Read since. From date, when you want application to capture the tweets.
Todayrunid
This is any text value to be passed when testing or debugging purposes.
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig99_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig99_HTML.jpg)
Setting values for the Custom activity
- 9)
Select “Resource linked service.” This is the Azure storage location where the Python code is uploaded.
- 10)
Select “Folder path.” This is the folder location where the Python code is uploaded.
- 11)
Click Publish All.
- 12)
Click Trigger and then Trigger Now.
- 13)
Click Finish.
- 14)Go to the Monitoring page and wait until the pipeline gets executed successfully (see Figure 5-100).Figure 5-100
Monitoring pipeline progress
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig101_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig101_HTML.jpg)
Azure Cosmos DB
![../images/468833_1_En_5_Chapter/468833_1_En_5_Fig102_HTML.jpg](http://images-20200215.ebookreading.net/6/3/3/9781484241226/9781484241226__understanding-azure-data__9781484241226__images__468833_1_En_5_Chapter__468833_1_En_5_Fig102_HTML.jpg)
Inputting values to set up Azure SQL Database
In this hands-on chapter, you explored Databricks and the Custom and Stored Procedure activities to run various workloads. Azure Data Factory lets you build an end-to-end data pipeline, whether on Microsoft or on an open source platform.