We discussed how to get data into Einstein Analytics in Chapter 2, including a hands-on CSV file data load exercise, then getting hands-on with trending Salesforce report, where in both exercises a new dataset is created. But we have not discussed dataflow, one of the most important tools in bringing Salesforce data into Einstein Analytics, so let’s start now.
Trust me, this chapter will be fun. We will build dataflow from scratch and using dataset builder. You can architect the data transformation and data manipulation to produce datasets required for your dashboard. You also will learn how to use recipes to enrich datasets at the end of this chapter.
Getting Started with Dataflow
As mentioned in Chapter 1, the Einstein Analytics dataset is stored on the Einstein Analytics platform. This makes sense because if the data is stored outside Einstein Analytics, the performance will be very bad. In Chapter 2, we discussed multiple methods to get data into Einstein Analytics, including CSV file data load and Salesforce report trend. But so far dataflow has not been discussed. Because of its power and importance, I’ve chosen to treat it separately in this chapter.
Dataflow is typically used for data transformation from an existing dataset, from Salesforce data (both sync and not sync) and from external data sources (sync must be enabled).
Tip
you can use Change Set to deploy blank dataflow from org. enabled with sync to org. not enabled with sync. Change set deployment is only applicable for org. within the same productions org.
Dataflow User Interface
Run Dataflow: When the dataflow is ready, click this button to put the dataflow into run queue, then you can navigate to the Monitor tab to see the progress.
Upload JSON: Use this button to upload dataflow file in JSON format from your local computer. It can be an original backup or has been modified.
Download JSON: Use this button to download the dataflow in JSON format to your local computer, as a backup or to modify the dataflow locally.
As a backup file – if something went wrong when you modify a running the dataflow, you can restore back the dataflow with this backup file.
The other purpose to download dataflow JSON file is to update the dataflow manually. As best practice, always make a copy of the original file before making the changes, open the downloaded JSON file with text editor such as Notepad++, make the changes as necessary, then upload back the changes JSON file.
Preview JSON: Dataflow is stored in JSON format; click this icon to view the dataflow in JSON format. You also can download the JSON dataflow from here.
The Search nodes text box is useful when you have a lot of nodes in a dataflow. You can type in the text box to filter nodes that you are looking for; you should practice a good naming convention for each type of transformation nodes.
Dataset Builder
Dataset builder is meant to provide guidance to extract data from one or many local Salesforce objects and store the result as a dataset in Einstein Analytics. This is kind of similar with wizard style, where you select something and click next to continue. Once the dataflow is built, you can open the dataflow and edit it manually. To make this more interesting, let’s dive into this hands-on dataset builder exercise.
- 1.
Open Analytics Studio.
- 2.
Navigate to Data Manager, click Dataflows & Recipes tab, then click Create Dataflow button.
- 3.
Name the dataflow as All Account with Owner; click Create button.
- 4.
Click top left icon datasetBuilder – steps 2 to 3 are similar with clicking Create ➤ Dataset ➤ Salesforce Data from Analytics Studio; select “Add to new dataflow,” and enter dataflow name All Account with Owner.
- 5.
Type a name for new dataset – I will name my dataset name as Account_Owner – and click Continue button.
- 6.
Select Account as the object to start.
- 7.
Hover your mouse over Account box created and click + icon next to the box.
- 8.
Select fields you want to extract from Salesforce Account into Einstein Analytics; I’ll select Account Name, Account ID, and Account Type. You can type in the text box to search the field.
- 9.Next, click Relationships tab, and click Join next to Owner ID. You should have your dataflow similar with what is shown in Figure 3-3.
- 10.
Click + icon next to User box, and select First Name and Last Name.
- 11.
Now we have four fields selected from Account and two fields from the User object.
- 12.
Click the Next button to continue.
- 13.When finished, we have successfully built a dataflow (see Figure 3-4).
- 14.
Click the Update Dataflow button at the top right screen to save as new dataflow or update existing dataflow.
- 15.
Click the Update Dataflow button again on pop-up window to confirm.
- 16.
Now the Update Dataflow button will change to Run Dataflow.
- 17.
Click the Run Dataflow button to put the dataflow in the queue for run; click Go to Data Monitor to check the progress and result.
- 18.From my testing, it should take only 15 seconds or so to extract all Accounts from replicated data and store the result as a dataset. You can see all the nodes and number of rows processed from Monitor in Figure 3-5.
- 19.
Go back to Analytics Studio, and click Datasets tab.
- 20.
Look for the dataset name that you entered at step 2 in this exercise. In my case the dataset name Account_Owner.
- 21.
Notice that the dataset is stored in Shared App – you can move the dataset to another app; we will discuss dataset in detail in Chapter 4.
- 22.
Click the dataset name to open it as a new Lens, which will be presented in a bar chart without any grouping; the bar length is number of rows.
- 23.Click Fields to see all available fields for this dataset. This should be the same with fields selected from step 7 to step 9 (see Figure 3-6).
From this exercise, with just a few clicks, we can easily create a dataflow to extract data from Salesforce to Einstein Analytics.
Two nodes of sfdcDigest: This node is to extract Account and User data.
One node of augment: This node is used to look up data between nodes.
One node of sfdcRegister: This node is used to store data into a dataset.
Tips
If your org. is not enabled for sync, you can create a new dataflow from Analytics Studio; click Create ➤ Dataset ➤ Salesforce Data from Analytics Studio, and select “Add to new dataflow.”
We will walk through each type of nodes in dataflow in the next section.
Dataflow Transformation
datasetBuilder
We have explained the purpose and usage of dataset builder in the previous hands-on exercise, so I’ll not repeat again here.
sfdcDigest
Use this node to extract data from an object of local Salesforce org. You must specify the object name and select fields needed, and then you can add a filter if the need is not to get all records. You can extract from all custom objects and all main standard objects. However, some system objects are not supported, so definitely check out this list: https://help.salesforce.com/articleView?id=bi_integrate_salesforce_extract_transformation_unsupported_objects_fields.htm&type=5
You can have many sfdcDigest in a dataflow, but for a better performance when running the dataflow, whenever possible, try not to have more than one node to extract from the same object.
digest
This node is similar to sfdcDigest, but it supports data extract from connected (sync) data source, for both local Salesforce data and external data sources.
edgemart
Use this node to extract existing datasets in Einstein Analytics.
append
Once the data has been extracted from the source, you can use append node to combine rows from multiple sources. All source data should have the same structure: both field name and type; otherwise, you must enable “Allow disjoint schema”; this option will create all available fields from all nodes.
augment
This is one of the most popular nodes in Einstein Analytics data transformation. Imagine this as similar to VLOOKUP() in Microsoft Excel. In the previous dataset builder exercise, we used the augment node to get Account Owner Name because Account object only stores Owner Id, so we matched OwnerId fields from Account with Id field from User.
computeExpression
This is also another very popular node in the dataflow, where you can create new fields based on the value from other fields. Imagine this as like adding formula field in Salesforce, but you no need to have the fields physically created in Salesforce.
computeRelative
computeExpression helps to calculate new fields value based on other fields value in the same row; use computeRelative transformation to create a new field by comparing the value with other rows.
dim2mea
Use this transformation node to create a new measure (number) field by to convert from a dimension (string) field.
flatten
Use this node to flatten hierarchical data (e.g., role hierarchy), so you can use it for other purposes (e.g., to implement row-level security).
filter
Use this node to filter out unnecessary rows, so you will only have data needed for the dashboard. You have the option to use a SAQL filter or standard filter. You may notice in sfdcDigest node that you also can filter out data when fetch it, but adding filter in sfdcDigest will cause issues if you have data sync enabled, so it is better to use filter node to filter data out.
slideDataset
In a scenario where you have too many fields in the dataflow but not all are needed for your dashboard, you can set fields to be removed or to be kept.
update
Use this transformation to update the value of a field, based on lookup from another dataset. You need to provide the key field for each dataset to match.
sfdcRegister
This is the node to register/write all data transformed into a dataset. Every time the dataflow is run, it will overwrite existing dataset values. Once the dataset registers, you can use it for dashboard or lens. sfdcRegister node will not overwrite existing XMD defined for dataset.
export
This node makes data in Analytics available for Einstein Discovery. This node creates a data file and a schema file from data in a specified source node in the dataflow. This node is only available when you have Einstein Discovery license.
Creating a Dataflow from Scratch
Okay, this will be fun. We are going to create a new dataflow from scratch to produce a dataset that will be used for the dashboard. With the dataset, we are free to architect the dataflow, including filters and so on.
Use Case
Opportunity Id
Opportunity Name
Opportunity Owner
Account Name
Is New Business? if Type contains “New Business,” then “Yes,” else “No”
Opportunity Stage
Amount
Only Open and Closed Won opportunity
Amount must be > $0
The dataflow also needs to include Opportunity Contact Roles – only if the role is Decision Maker. If more than one contact is Decision Maker, randomly pick only one. If there is no Decision Maker, set the label as “Not Available.”
Building Concept
- 1.Determine how many objects are involved? For this use case, it would be four, which are
Opportunity
User
Account
OpportunityContactRole
Contact
- 2.
Determine the main object, which will determine the number of rows before the filter. For this use case, the main object is Opportunity.
- 3.Determine filters for each object:
Opportunity Stage <> “Closed Lost”
Opportunity Amount > 0
Opportunity Contact Role = “Decision Maker”
- 4.Determine fields to extract from each object:
- a)Opportunity
Id
Name
OwnerId
AccountId
Type
Amount
Stage
- b)User
Id
Name
- c)Account
Id
Name
- d)OpportunityContactRole
ContactId
OpportunityId
Role
- e)Contact
Id
Name
- a)
- 5.Determine the relationship of each object:
Opportunity.AccountId with Account.Id
Opportunity.OwnerId with User.Id
OpportunityContactRole.OpportunityId with Opportunity.Id
OpportunityContactRole.ContactId with Contact.Id
- 6.
Once you have all the preceding answers, we are ready to rock!
Hands-on
- 1.
Open Analytics Studio and then Data Manager.
- 2.
Click Dataflows & Recipes tab, then click Create Dataflow button.
- 3.
Name the dataflow – I will name it as Open Opportunity – then click Create button.
- 4.Click sfdcDigest nodes 4x for each object; I will name it as
sfdcDigest_Opportunity
sfdcDigest_User
sfdcDigest_Account
sfdcDigest_OCR (for Opportunity Contact Role)
sfdcDigest_Contact
- 5.
Select the relevant object and fields for each object as in Building Concept step number 4.
- 6.Before we start the transformation, let us use filter transformation to filter only data that we want, check out Building Concept step number 3, we have two objects to filter:
Click filter nodes, and name it filter_Opportunity.
Source = sfdcDigest_Opportunity.
Select “SAQL Filter,” and type in the following filter: (Amount > 0) && (StageName != “Closed Lost”).
Tip
When building a dataflow, you can temporarily add register nodes to debug and check the data at a particular point.
- 7.Do another filter transformation for Opportunity Contact Role:
Click filter nodes, and name it filter_OCR.
Source = sfdcDigest_OCR.
Select “SAQL Filter”, and type in the following filter: (Role == “Decision Maker”).
- 8.Because Opportunity only has OwnerId, let us enrich Opportunity with the Owner Name using augment transformation. Name the augment as augOpportunity_User:
Left Source = filter_Opportunity
Left Key = OwnerId
Relationship = “Owner”
Right Source = sfdcDigest_User
Right Key = Id
Right Fields = Id, Name
Click the Output Fields tab, and now you should see two additional fields starting with Owner (see Figure 3-8). - 9.The same with step 8, now we are going to get Account Name. Name the augment as augOpportunity_Account:
Left Source = augOpportunity_User
Left Key = AccountId
Relationship = “Account”
Right Source = sfdcDigest_Account
Right Key = Id
Right Fields = Id, Name
Click Output Fields tab; now you should see two additional fields start with Account.
- 10.Now we are going to get Decision Maker name from Opportunity Contact Role. Again, add augment node; name it as augOpportunity_OCR:
Left Source = augOpportunity_Account
Left Key = Id
Relationship = “ContactRole”
Right Source = filter_OCR
Right Key = OpportunityId
Right Fields = Name
- 11.The same previous steps, now we are going to get Contact Name for Opportunity Contact Role (see Figure 3-9). Name the augment as augOpportunity_Contact:
Left Source = augOpportunity_OCR
Left Key = ContactRole.ContactId
Relationship = “Contact”
Right Source = sfdcDigest_Contact
Right Key = Id
Right Fields = Name
Tip
When building dataflow, remember to click “Update Dataflow” over times to save updated dataflow; this is to make sure your hard work is not lost if something goes wrong with the web browser or the Internet connection. Also, always back up the dataflow JSON file frequently when you are testing the dataflow. That way, when adding new nodes and something goes wrong, you have backup of the JSON file that was working just before that point.
- 12.Add computeExpression node to check if
Opportunity Type contains “New Business,” then “Yes,” else “No”
Decision Maker is null, change to “Not Available”
Click “+ Add Field” button; I will name it “New Business”; Type = Text, and SAQL Expression = case when starts_with(‘Type’, “New Business”) then “Yes” else “No” end
Click “+ Add Field” button again; I will name it “Decision Maker”; Type = Text, and SAQL Expression = case when ‘ContactRole.ContactId’ is null then “Not Available” else ‘Contact.Name’ end
Both will add new fields called New Business and Decision Maker.
- 13.
Add a register node to end this dataflow. I’ll name my Node Name, Alias, and Name as regOpty1.
- 14.Click Update Dataflow button to save the dataflow (see Figure 3-10).
Note
If you close and reopen back the dataflow, the position of the nodes would be reorganized by the system, so no need to worry because the logic is not changed. Unfortunately you cannot save your preferred nodes layout, so you have to re-layout it back if you prefer to.
- 15.
Now let us run the dataflow by clicking Run Dataflow button and click Go to Data Monitor to monitor the progress.
- 16.
Back to Analytics Studio, click Datasets tab, and look for regOpty1 dataset. This will open the dataset as a new lens.
- 17.
Change to table mode, by clicking the Table Mode icon, then select Values Table.
- 18.You can select fields to show by clicking the pencil icon under the Data panel as shown in Figure 3-11. By default, it will show only 100 records. You can change this by click Edit link under Query Limit. Note that the maximum number of rows you can see in a lens without any json editing is 10000 only.
- 19.
You can save the lens by clicking the Save icon on the top right menu.
Backup and Restore Dataflow
There is no system backup provided by Einstein Analytics to backup dataflow, but you can download the configuration presented in JSON file, download the file from Dataflow editor, and store the JSON file in your local computer or server.
Click (1) to download JSON file and (2) to upload JSON file (see Figure 3-12). To restore a backup file, click Upload JSON. This can be the original JSON file, or a JSON file that has been modified. There are multiple reasons to download and edit JSON file, from finding a field, mass add node, mass update, to comparing JSON files with a previous version.
Create Dataset with Recipe
The purpose of recipe is to enrich existing dataset available in Einstein Analytics; enrich here includes adding a transformation, adding new rows or columns based on keys. All data should exist in Einstein Analytics as datasets or connected data from sync.
Note
Recipe will always create a new dataset, not overwrite the source dataset. You can schedule a recipe, and this will keep overwrite target dataset.
Using Data transformation to add row and add column in Einstein Analytics is presented as Add Data (1) and Append Data (2).
Add Data
This function is to add columns to source dataset with another dataset based on pair keys; it can be one or multiple pair keys. This function is similar to augment in dataflow or VLOOKUP () in Microsoft Excel.
Append Data
When you have two datasets, you can combine the dataset using a recipe. Even if the field’s name is not the same, the mapping field type must be the same. If one of the datasets has extra columns, the value will be null for rows coming from the dataset without that extra field.
Additional Transformations in Recipe
Filter: This will filter only rows that match the criteria.
Bucket: This will bucket values as defined, then will create a new column.
Replace: This is to replace a specific value from a field with other value, then this will create a new column.
Trim: To trim leading and trailing whitespaces, this will create a new column.
Substring: To get specific values from a starting position and length, this will create a new column.
Split: To split a column to multiple columns based on specific characters, such as a comma or semicolon.
Uppercase and Lowercase: To convert the value to Upper case or Lower case, this will create a new column.
Dim2Mea: To convert from text to number, this will create a new column.
Summary
We have finally learned one of the most powerful items in Einstein Analytics: dataflow. We reviewed this with hands-on exercises by building dataflow using dataset builder, and from scratch. To create a dataflow, you need to make sure sync is enabled.
We learned each transformation offered by Einstein Analytics, from getting data using sfdcDigest or digest or edgemart to using computeExpression node to create new fields based on some login of other fields, to using Augment node to look up and get data from other nodes, and using register node to write the data into a dataset.
We also learned how to back up and restore dataflow using JSON file. We ended this chapter by looking into recipes and the data transformations offered by recipes.
In the next chapter, we will look into dataset created in Einstein Analytics. It can be created from the CSV file, from dataflow, from Salesforce trend report, and so on. We will also review how to manipulate dataset metadata to produce a dataset as per our business needs.