© Johan Yu 2019
J. YuGetting Started with Salesforce Einstein Analyticshttps://doi.org/10.1007/978-1-4842-5200-0_3

3. Dataflow and Recipe

Johan Yu1 
(1)
Singapore, Singapore
 

We discussed how to get data into Einstein Analytics in Chapter 2, including a hands-on CSV file data load exercise, then getting hands-on with trending Salesforce report, where in both exercises a new dataset is created. But we have not discussed dataflow, one of the most important tools in bringing Salesforce data into Einstein Analytics, so let’s start now.

Trust me, this chapter will be fun. We will build dataflow from scratch and using dataset builder. You can architect the data transformation and data manipulation to produce datasets required for your dashboard. You also will learn how to use recipes to enrich datasets at the end of this chapter.

Getting Started with Dataflow

As mentioned in Chapter 1, the Einstein Analytics dataset is stored on the Einstein Analytics platform. This makes sense because if the data is stored outside Einstein Analytics, the performance will be very bad. In Chapter 2, we discussed multiple methods to get data into Einstein Analytics, including CSV file data load and Salesforce report trend. But so far dataflow has not been discussed. Because of its power and importance, I’ve chosen to treat it separately in this chapter.

Dataflow is typically used for data transformation from an existing dataset, from Salesforce data (both sync and not sync) and from external data sources (sync must be enabled).

You will be able to create a new dataflow only when sync is enabled. Open Analytics Studio, and select Data Manager tab menu at the left panel, select Dataflow & Recipes tab, and click Create Dataflow button (See Figure 3-1).
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig1_HTML.jpg
Figure 3-1

Create dataflow from Data Manager

Tip

you can use Change Set to deploy blank dataflow from org. enabled with sync to org. not enabled with sync. Change set deployment is only applicable for org. within the same productions org.

Dataflow User Interface

When you open an existing dataflow or create new dataflow, you will be presented with dataflow user interface, as in Figure 3-2. Hover your mouse over each icon in the dataflow user interface to see the icon name. Each icon below the dataflow name represents a node for data transformation. We will discuss each of them in this chapter.
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig2_HTML.jpg
Figure 3-2

Dataflow user interface

There are a few more icons and buttons on the top right of the screen, so let’s discuss each of them from right to left:
  • Run Dataflow: When the dataflow is ready, click this button to put the dataflow into run queue, then you can navigate to the Monitor tab to see the progress.

  • Upload JSON: Use this button to upload dataflow file in JSON format from your local computer. It can be an original backup or has been modified.

  • Download JSON: Use this button to download the dataflow in JSON format to your local computer, as a backup or to modify the dataflow locally.

    As a backup file – if something went wrong when you modify a running the dataflow, you can restore back the dataflow with this backup file.

    The other purpose to download dataflow JSON file is to update the dataflow manually. As best practice, always make a copy of the original file before making the changes, open the downloaded JSON file with text editor such as Notepad++, make the changes as necessary, then upload back the changes JSON file.

  • Preview JSON: Dataflow is stored in JSON format; click this icon to view the dataflow in JSON format. You also can download the JSON dataflow from here.

The Search nodes text box is useful when you have a lot of nodes in a dataflow. You can type in the text box to filter nodes that you are looking for; you should practice a good naming convention for each type of transformation nodes.

Dataset Builder

Dataset builder is meant to provide guidance to extract data from one or many local Salesforce objects and store the result as a dataset in Einstein Analytics. This is kind of similar with wizard style, where you select something and click next to continue. Once the dataflow is built, you can open the dataflow and edit it manually. To make this more interesting, let’s dive into this hands-on dataset builder exercise.

Use case : Extract all Account data, including the Account Owner Name:
  1. 1.

    Open Analytics Studio.

     
  2. 2.

    Navigate to Data Manager, click Dataflows & Recipes tab, then click Create Dataflow button.

     
  3. 3.

    Name the dataflow as All Account with Owner; click Create button.

     
  4. 4.

    Click top left icon datasetBuilder – steps 2 to 3 are similar with clicking CreateDatasetSalesforce Data from Analytics Studio; select “Add to new dataflow,” and enter dataflow name All Account with Owner.

     
  5. 5.

    Type a name for new dataset – I will name my dataset name as Account_Owner – and click Continue button.

     
  6. 6.

    Select Account as the object to start.

     
  7. 7.

    Hover your mouse over Account box created and click + icon next to the box.

     
  8. 8.

    Select fields you want to extract from Salesforce Account into Einstein Analytics; I’ll select Account Name, Account ID, and Account Type. You can type in the text box to search the field.

     
  9. 9.
    Next, click Relationships tab, and click Join next to Owner ID. You should have your dataflow similar with what is shown in Figure 3-3.
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig3_HTML.jpg
    Figure 3-3

    Dataset builder user interface

     
  10. 10.

    Click + icon next to User box, and select First Name and Last Name.

     
  11. 11.

    Now we have four fields selected from Account and two fields from the User object.

     
  12. 12.

    Click the Next button to continue.

     
  13. 13.
    When finished, we have successfully built a dataflow (see Figure 3-4).
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig4_HTML.jpg
    Figure 3-4

    Dataflow built successfully

     
  14. 14.

    Click the Update Dataflow button at the top right screen to save as new dataflow or update existing dataflow.

     
  15. 15.

    Click the Update Dataflow button again on pop-up window to confirm.

     
  16. 16.

    Now the Update Dataflow button will change to Run Dataflow.

     
  17. 17.

    Click the Run Dataflow button to put the dataflow in the queue for run; click Go to Data Monitor to check the progress and result.

     
  18. 18.
    From my testing, it should take only 15 seconds or so to extract all Accounts from replicated data and store the result as a dataset. You can see all the nodes and number of rows processed from Monitor in Figure 3-5.
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig5_HTML.jpg
    Figure 3-5

    Monitor completed and in progress dataflow

     
  19. 19.

    Go back to Analytics Studio, and click Datasets tab.

     
  20. 20.

    Look for the dataset name that you entered at step 2 in this exercise. In my case the dataset name Account_Owner.

     
  21. 21.

    Notice that the dataset is stored in Shared App – you can move the dataset to another app; we will discuss dataset in detail in Chapter 4.

     
  22. 22.

    Click the dataset name to open it as a new Lens, which will be presented in a bar chart without any grouping; the bar length is number of rows.

     
  23. 23.
    Click Fields to see all available fields for this dataset. This should be the same with fields selected from step 7 to step 9 (see Figure 3-6).
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig6_HTML.jpg
    Figure 3-6

    Explore fields detail from a dataset with lens

     

From this exercise, with just a few clicks, we can easily create a dataflow to extract data from Salesforce to Einstein Analytics.

Let us look at the dataflow created by using dataset builder:
  • Two nodes of sfdcDigest: This node is to extract Account and User data.

  • One node of augment: This node is used to look up data between nodes.

  • One node of sfdcRegister: This node is used to store data into a dataset.

Tips

If your org. is not enabled for sync, you can create a new dataflow from Analytics Studio; click CreateDatasetSalesforce Data from Analytics Studio, and select “Add to new dataflow.”

We will walk through each type of nodes in dataflow in the next section.

Dataflow Transformation

The following are all icons available for dataflow transformation; you will be the architect on how and when to use each component. Depending on the type of your Einstein Analytics license, you may see less or more types of nodes available in your Dataflow editor (see Figure 3-7).
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig7_HTML.jpg
Figure 3-7

Dataflow nodes

datasetBuilder

We have explained the purpose and usage of dataset builder in the previous hands-on exercise, so I’ll not repeat again here.

sfdcDigest

Use this node to extract data from an object of local Salesforce org. You must specify the object name and select fields needed, and then you can add a filter if the need is not to get all records. You can extract from all custom objects and all main standard objects. However, some system objects are not supported, so definitely check out this list: https://help.salesforce.com/articleView?id=bi_integrate_salesforce_extract_transformation_unsupported_objects_fields.htm&type=5

You can have many sfdcDigest in a dataflow, but for a better performance when running the dataflow, whenever possible, try not to have more than one node to extract from the same object.

digest

This node is similar to sfdcDigest, but it supports data extract from connected (sync) data source, for both local Salesforce data and external data sources.

edgemart

Use this node to extract existing datasets in Einstein Analytics.

append

Once the data has been extracted from the source, you can use append node to combine rows from multiple sources. All source data should have the same structure: both field name and type; otherwise, you must enable “Allow disjoint schema”; this option will create all available fields from all nodes.

augment

This is one of the most popular nodes in Einstein Analytics data transformation. Imagine this as similar to VLOOKUP() in Microsoft Excel. In the previous dataset builder exercise, we used the augment node to get Account Owner Name because Account object only stores Owner Id, so we matched OwnerId fields from Account with Id field from User.

computeExpression

This is also another very popular node in the dataflow, where you can create new fields based on the value from other fields. Imagine this as like adding formula field in Salesforce, but you no need to have the fields physically created in Salesforce.

computeRelative

computeExpression helps to calculate new fields value based on other fields value in the same row; use computeRelative transformation to create a new field by comparing the value with other rows.

dim2mea

Use this transformation node to create a new measure (number) field by to convert from a dimension (string) field.

flatten

Use this node to flatten hierarchical data (e.g., role hierarchy), so you can use it for other purposes (e.g., to implement row-level security).

filter

Use this node to filter out unnecessary rows, so you will only have data needed for the dashboard. You have the option to use a SAQL filter or standard filter. You may notice in sfdcDigest node that you also can filter out data when fetch it, but adding filter in sfdcDigest will cause issues if you have data sync enabled, so it is better to use filter node to filter data out.

slideDataset

In a scenario where you have too many fields in the dataflow but not all are needed for your dashboard, you can set fields to be removed or to be kept.

update

Use this transformation to update the value of a field, based on lookup from another dataset. You need to provide the key field for each dataset to match.

sfdcRegister

This is the node to register/write all data transformed into a dataset. Every time the dataflow is run, it will overwrite existing dataset values. Once the dataset registers, you can use it for dashboard or lens. sfdcRegister node will not overwrite existing XMD defined for dataset.

export

This node makes data in Analytics available for Einstein Discovery. This node creates a data file and a schema file from data in a specified source node in the dataflow. This node is only available when you have Einstein Discovery license.

Creating a Dataflow from Scratch

Okay, this will be fun. We are going to create a new dataflow from scratch to produce a dataset that will be used for the dashboard. With the dataset, we are free to architect the dataflow, including filters and so on.

Use Case

Build a dataflow to extract all Opportunities with the following fields:
  • Opportunity Id

  • Opportunity Name

  • Opportunity Owner

  • Account Name

  • Is New Business? if Type contains “New Business,” then “Yes,” else “No”

  • Opportunity Stage

  • Amount

With the following criteria
  • Only Open and Closed Won opportunity

  • Amount must be > $0

The dataflow also needs to include Opportunity Contact Roles – only if the role is Decision Maker. If more than one contact is Decision Maker, randomly pick only one. If there is no Decision Maker, set the label as “Not Available.”

Building Concept

Understand and analyze related objects and fields:
  1. 1.
    Determine how many objects are involved? For this use case, it would be four, which are
    • Opportunity

    • User

    • Account

    • OpportunityContactRole

    • Contact

     
  2. 2.

    Determine the main object, which will determine the number of rows before the filter. For this use case, the main object is Opportunity.

     
  3. 3.
    Determine filters for each object:
    • Opportunity Stage <> “Closed Lost”

    • Opportunity Amount > 0

    • Opportunity Contact Role = “Decision Maker”

     
  4. 4.
    Determine fields to extract from each object:
    1. a)
      Opportunity
      • Id

      • Name

      • OwnerId

      • AccountId

      • Type

      • Amount

      • Stage

       
    2. b)
      User
      • Id

      • Name

       
    3. c)
      Account
      • Id

      • Name

       
    4. d)
      OpportunityContactRole
      • ContactId

      • OpportunityId

      • Role

       
    5. e)
      Contact
      • Id

      • Name

       
     
  5. 5.
    Determine the relationship of each object:
    • Opportunity.AccountId with Account.Id

    • Opportunity.OwnerId with User.Id

    • OpportunityContactRole.OpportunityId with Opportunity.Id

    • OpportunityContactRole.ContactId with Contact.Id

     
  6. 6.

    Once you have all the preceding answers, we are ready to rock!

     

Hands-on

Let’s get started with our next hands-on scenario:
  1. 1.

    Open Analytics Studio and then Data Manager.

     
  2. 2.

    Click Dataflows & Recipes tab, then click Create Dataflow button.

     
  3. 3.

    Name the dataflow – I will name it as Open Opportunity – then click Create button.

     
  4. 4.
    Click sfdcDigest nodes 4x for each object; I will name it as
    • sfdcDigest_Opportunity

    • sfdcDigest_User

    • sfdcDigest_Account

    • sfdcDigest_OCR (for Opportunity Contact Role)

    • sfdcDigest_Contact

     
  5. 5.

    Select the relevant object and fields for each object as in Building Concept step number 4.

     
  6. 6.
    Before we start the transformation, let us use filter transformation to filter only data that we want, check out Building Concept step number 3, we have two objects to filter:
    • Click filter nodes, and name it filter_Opportunity.

    • Source = sfdcDigest_Opportunity.

    • Select “SAQL Filter,” and type in the following filter: (Amount > 0) && (StageName != “Closed Lost”).

     

Tip

When building a dataflow, you can temporarily add register nodes to debug and check the data at a particular point.

  1. 7.
    Do another filter transformation for Opportunity Contact Role:
    • Click filter nodes, and name it filter_OCR.

    • Source = sfdcDigest_OCR.

    • Select “SAQL Filter”, and type in the following filter: (Role == “Decision Maker”).

     
  2. 8.
    Because Opportunity only has OwnerId, let us enrich Opportunity with the Owner Name using augment transformation. Name the augment as augOpportunity_User:
    • Left Source = filter_Opportunity

    • Left Key = OwnerId

    • Relationship = “Owner”

    • Right Source = sfdcDigest_User

    • Right Key = Id

    • Right Fields = Id, Name

    Click the Output Fields tab, and now you should see two additional fields starting with Owner (see Figure 3-8).
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig8_HTML.jpg
    Figure 3-8

    Augment node with User object for Opportunity Owner

     
  3. 9.
    The same with step 8, now we are going to get Account Name. Name the augment as augOpportunity_Account:
    • Left Source = augOpportunity_User

    • Left Key = AccountId

    • Relationship = “Account”

    • Right Source = sfdcDigest_Account

    • Right Key = Id

    • Right Fields = Id, Name

    Click Output Fields tab; now you should see two additional fields start with Account.

     
  4. 10.
    Now we are going to get Decision Maker name from Opportunity Contact Role. Again, add augment node; name it as augOpportunity_OCR:
    • Left Source = augOpportunity_Account

    • Left Key = Id

    • Relationship = “ContactRole”

    • Right Source = filter_OCR

    • Right Key = OpportunityId

    • Right Fields = Name

     
  5. 11.
    The same previous steps, now we are going to get Contact Name for Opportunity Contact Role (see Figure 3-9). Name the augment as augOpportunity_Contact:
    • Left Source = augOpportunity_OCR

    • Left Key = ContactRole.ContactId

    • Relationship = “Contact”

    • Right Source = sfdcDigest_Contact

    • Right Key = Id

    • Right Fields = Name

    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig9_HTML.jpg
    Figure 3-9

    Augment node with more objects augmented

     

Tip

When building dataflow, remember to click “Update Dataflow” over times to save updated dataflow; this is to make sure your hard work is not lost if something goes wrong with the web browser or the Internet connection. Also, always back up the dataflow JSON file frequently when you are testing the dataflow. That way, when adding new nodes and something goes wrong, you have backup of the JSON file that was working just before that point.

  1. 12.
    Add computeExpression node to check if
    • Opportunity Type contains “New Business,” then “Yes,” else “No”

    • Decision Maker is null, change to “Not Available”

    Click “+ Add Field” button; I will name it “New Business”; Type = Text, and SAQL Expression = case when starts_with(‘Type’, “New Business”) then “Yes” else “No” end

    Click “+ Add Field” button again; I will name it “Decision Maker”; Type = Text, and SAQL Expression = case when ‘ContactRole.ContactId’ is null then “Not Available” else ‘Contact.Name’ end

    Both will add new fields called New Business and Decision Maker.

     
  2. 13.

    Add a register node to end this dataflow. I’ll name my Node Name, Alias, and Name as regOpty1.

     
  3. 14.
    Click Update Dataflow button to save the dataflow (see Figure 3-10).
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig10_HTML.jpg
    Figure 3-10

    The complete Open Opportunities dataflow

     

Note

If you close and reopen back the dataflow, the position of the nodes would be reorganized by the system, so no need to worry because the logic is not changed. Unfortunately you cannot save your preferred nodes layout, so you have to re-layout it back if you prefer to.

  1. 15.

    Now let us run the dataflow by clicking Run Dataflow button and click Go to Data Monitor to monitor the progress.

     
  2. 16.

    Back to Analytics Studio, click Datasets tab, and look for regOpty1 dataset. This will open the dataset as a new lens.

     
  3. 17.

    Change to table mode, by clicking the Table Mode icon, then select Values Table.

     
  4. 18.
    You can select fields to show by clicking the pencil icon under the Data panel as shown in Figure 3-11. By default, it will show only 100 records. You can change this by click Edit link under Query Limit. Note that the maximum number of rows you can see in a lens without any json editing is 10000 only.
    ../images/480970_1_En_3_Chapter/480970_1_En_3_Fig11_HTML.jpg
    Figure 3-11

    Dataflow open in table mode

     
  5. 19.

    You can save the lens by clicking the Save icon on the top right menu.

     

Backup and Restore Dataflow

There is no system backup provided by Einstein Analytics to backup dataflow, but you can download the configuration presented in JSON file, download the file from Dataflow editor, and store the JSON file in your local computer or server.

Using a text editor such as Notepad++, you can open and modify the JSON file as required. Because when you edit the file manually, this is prone to errors, so the best practice is always to keep a backup of the original JSON file.
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig12_HTML.jpg
Figure 3-12

Download and upload dataflow JSON file

Click (1) to download JSON file and (2) to upload JSON file (see Figure 3-12). To restore a backup file, click Upload JSON. This can be the original JSON file, or a JSON file that has been modified. There are multiple reasons to download and edit JSON file, from finding a field, mass add node, mass update, to comparing JSON files with a previous version.

Create Dataset with Recipe

The purpose of recipe is to enrich existing dataset available in Einstein Analytics; enrich here includes adding a transformation, adding new rows or columns based on keys. All data should exist in Einstein Analytics as datasets or connected data from sync.

Note

Recipe will always create a new dataset, not overwrite the source dataset. You can schedule a recipe, and this will keep overwrite target dataset.

The recipe user interface lets you preview the data in table mode, including field name and field type for each dataset (see Figure 3-13).
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig13_HTML.jpg
Figure 3-13

Recipe user interface

Using Data transformation to add row and add column in Einstein Analytics is presented as Add Data (1) and Append Data (2).

Add Data

This function is to add columns to source dataset with another dataset based on pair keys; it can be one or multiple pair keys. This function is similar to augment in dataflow or VLOOKUP () in Microsoft Excel.

Append Data

When you have two datasets, you can combine the dataset using a recipe. Even if the field’s name is not the same, the mapping field type must be the same. If one of the datasets has extra columns, the value will be null for rows coming from the dataset without that extra field.

Additional Transformations in Recipe

With recipes, you can implement transformation to a dataset, so it does not always need more than one dataset to use a recipe. The Column Profile provides a quick look of the value from a specific column, including missing values, and field attributes:
  • Filter: This will filter only rows that match the criteria.

  • Bucket: This will bucket values as defined, then will create a new column.

  • Replace: This is to replace a specific value from a field with other value, then this will create a new column.

  • Trim: To trim leading and trailing whitespaces, this will create a new column.

  • Substring: To get specific values from a starting position and length, this will create a new column.

  • Split: To split a column to multiple columns based on specific characters, such as a comma or semicolon.

  • Uppercase and Lowercase: To convert the value to Upper case or Lower case, this will create a new column.

  • Dim2Mea: To convert from text to number, this will create a new column.

The recipe user interface will show you all transformations defined in a recipe at the left panel, so you can trace the transformation easily (see Figure 3-14).
../images/480970_1_En_3_Chapter/480970_1_En_3_Fig14_HTML.jpg
Figure 3-14

Transformation in recipe

Summary

We have finally learned one of the most powerful items in Einstein Analytics: dataflow. We reviewed this with hands-on exercises by building dataflow using dataset builder, and from scratch. To create a dataflow, you need to make sure sync is enabled.

We learned each transformation offered by Einstein Analytics, from getting data using sfdcDigest or digest or edgemart to using computeExpression node to create new fields based on some login of other fields, to using Augment node to look up and get data from other nodes, and using register node to write the data into a dataset.

We also learned how to back up and restore dataflow using JSON file. We ended this chapter by looking into recipes and the data transformations offered by recipes.

In the next chapter, we will look into dataset created in Einstein Analytics. It can be created from the CSV file, from dataflow, from Salesforce trend report, and so on. We will also review how to manipulate dataset metadata to produce a dataset as per our business needs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.143.65