Finding out what happened is only half as interesting as finding out why it happened. Although data can never really tell us causal reasons (that’s left for logic), data can certainly point us to those areas where we humans can put our brains to work. In this chapter you will learn how AI can help you to surface those interesting patterns in data automatically so you and your colleagues can concentrate on the interpretation and impacts of this data.
We continue with the case study from the previous chapter: In a fictional manufacturing company we support sales management in their decision making process. In the last chapter we found out that sales slowly recovered after a period of strong declines. Now, we will dig deeper and concentrate on understanding why certain trends evolved.
As the business analyst on the team we want to help sales management to find explanations for two of the observed revenue trends: Why did the sales figures melt so dramatically between 2006 and 2010 and which factors explain the slow recovery between 2010 and 2014?
This process would normally involve analysts who go through the data manually which takes time and is subject to the analyst’s availability. Since we don’t have the capacity to do one-on-one sessions with each of the business users, we are looking for ways to provide them with insights in a more automated and interactive way.
Our goal is to find patterns in our dataset automatically by scanning through all available information with at little human assistance as possible. For this purpose, we are leveraging AI-powered techniques to deliver these insights in a fast and interactive manner. This will allow business users to get insights with less or even without the help of professional data analysts. To deliver this experience we are again using two Power BI tools here: The Power BI Key influencers and the Power BI Decomposition Tree tool.
The Key influencers visual aids in comprehending the aspects that influence a metric of interest by showing the top contributors to the selected metric. It examines the information you provide and surfaces the most important aspects to the top by labelling them as key influencers.
Power BI’s decomposition tree visual allows you to visualize data in numerous dimensions. It aggregates data automatically and allows you to drill down into any dimension in any order. AI will suggest the next most relevant dimension to dive into depending on your particular area of interest. As a result, it’s a useful tool for ad hoc investigation and root cause analysis.
We are going to apply these techniques to the Sales and Marketing example Power BI report to find out: Which factors were driving both revenue trends and which patterns can we observe from our data?
In a corporate scenario, both tools could be provided to the business users in the form of Power BI report pages so they can interact with the data on their own.
Open the Sales & Marketing sample PBIX file which we used already in the previous chapter. To recap real quick, this dataset contains various sales and marketing data from a manufacturing company and comes pre-populated with some reports for market share, product volume, sales figures and sentiment scores.
In Chapter 2 we found out that the overall revenue in the last four years has slowly recovered after a hard bounce in the previous year. Here is a snapshot of the sales dashboard we created before:
We are especially interested in what happened during the phase of declining revenues between 2006 and 2010 (“Phase A”) and the phase of slowly recovering revenues between 2010 and 2014 (“Phase B”). For both of these phases we want to answer the simple but profound question: What happened?
We will start by examining Phase A. To recreate the revenue visual, create a new Power BI report page, drag and drop the Q&A visual and type:
“Revenue by year between 2006 and 2010”
If you didn’t do the use case from the previous chapter you might need to replace Revenue with “sum of revenues”.
Click on the convert icon to turn the Q&A visual into a standard Power BI visual. The chart should look similar to this:
Select this chart and open the visualizations pane on the right. Click the “Key influencers” icon to convert the line chart into the Key influencers visual.
Once you select the new visual type, the chart will automatically update to something which should look similar to figure 5-4. Let’s explore the output of this visual in a bit more detail.
Now, this output is probably not what you expected at all and definitely looks a bit weird. What went wrong? Open the visualisations pane and inspect the properties of the Key influencers visual. This should still look like shown in figure 5-5.
In the visual properties you will find two critical fields: “Analyze” and “Explain by”. As the names suggest, “Analyze” refers to the metric which you want to explore and “Explain by” refers to the various dimensions you might consider to have an effect on it. Per default, given our previous line chart, Power BI suggests to explain the dimension Year by the field Sum of Revenue which absolutely does not make any sense.
Let’s fix this by dragging “Sum of Revenue” from the “Explain by” field to the “Analyze” field instead of “Year”. Consequently, add additional dimensions into the “Explain by'' field by pulling them from the data fields repository on the right. Reasonable candidate fields from a business point of view might be: Segment, City, State, Region, Average of Score, Category and Manufacturer. We will leave the dimension Product out since too many data points are missing. For example, some products were replaced by others, some products were just launched, some products were deprecated. If we wanted to analyse Key influencers down to a product level we should rather look at a single year rather than a four year period.
The updated properties of the Key influencers tool should now look like in figure 5-6:
Within the Key influencers visual, switch the drop down menu on the top from Increase to Decrease, since we are interested in finding out what made the revenue drop. The updated
visual should by now look similar to the one in figure 5-7.
This one intuitively makes more sense than the version before. Note that this would be the report page you could share or publish for business users so they can interact with the data on their own.
Let’s dive into this visual to see how it works in a bit more detail.
On the left side of the visual, we can see which variables Power BI identified as key influencers for the given metric, in this case revenue. Can see that Power BI identified the attribute “Youth” across the dimensions Category and Segment as the main key drivers that cause the Revenue metric to decrease. On the right side, we are presented with the underlying data distribution for the selected influencer, showing the average of the revenue metric across all categories from the selected dimension, in the example in Figure 5-7 the Category dimension. You can read the chart on the right as follows: The average revenue for the Category Youth was just $5,181.98 while the average revenue for all other Categories except Youth was $17,349,89. That means, when the Category is Youth the average revenue is $12.92K lower compared to all other values of the Category.
It is important to point out that in this case we are not judging based on the absolute values of a variable but on their averages and number of observations. Depending on the measure you choose to analyse, whether it is categorical or continuous, the key influencer tool will adapt. Instead of averages it will indicate probabilities such as that the outcome is x-times more likely if a certain value is met.
The Key influencers tool does not always consider the lowest value of a category as an influencer. Take for example the influencer “Manufacturer is Victoria” in our analysis. If you click on this influencer you will see the chart in figure 5-8. We can see that Victoria is not the manufacturer with the smallest revenue on average, but Salvus. Now why has Salvus not been identified as a Key influencer?
The reasons becomes clear if we look at the head to head comparison of Salvus and Victoria in figure 5-9, just for the sake of fostering our understanding how the Key Influencer tool works:
There are three factors which disqualify Salvus to be a key influencer: First, this manufacturer was only introduced in 2009, so there are fewer data points in total compared to the other manufacturers. Second, the absolute revenue contribution of Salvus is so little that it probably lacks overall impact. And last but not least, the actual revenue trend of this manufacturer is not negative, but positive. While Victoria dropped from 6.2M revenue in 2006 to 4.3M revenue in 2010, Salvus actually improved their revenues between 2009 and 2010. With this background information it is legit to not consider Salvus a Key influencer for overall decreasing revenues. The key influencer tool makes it easier for us to find such interesting patterns without combing through all of the data manually.
Let’s explore another useful feature of the influencer tool: The segments. Segments can be considered as groups in your data which show a high impact on a metric of interest. Switch over to Top Segments and select “When is Sum of Revenue more likely to be Low” from the dropdown at the top. You should see the following screen in Figure 5-10 as a result:
We can see that there are four segments which Power BI identified for us. The biggest segment, indicated by the size of the bubble, holds 24,506 observations (i.e. sold products) in it and accounts for an average revenue of just 5,4K per sale. The low performer segment with the highest (yet below total) average revenue of 11,11K contains 9,282 observations. To find out more about what is happening here, click on the 5,.43K bubble.
You will see a detailed view of this segments which in this case looks like this in Figure 5-11:
From this chart we can see that this segment contains almost 16% of all product sales and therefore has a very high relevance for us. The segment contains all sales where the category was not Urban, the manufacturer was not VanArsdel, the region was not West and the Segment was not Productivity. This segment contributed the largest group of data points and had the lowest average revenue which makes it an important segment to look at when it comes to explaining the falling revenues trend in Phase A (2006 - 2010).
To see what this means in detail, let’s examine the revenue trend in this segment 1 vs. the total revenue trend between 2006 and 2010 as shown in Figure 5-12.
While the overall revenue was down 46.6% from 643.28M to 343.13M in the total dataset, segment 1 alone contributed a loss of 12.3M from 31.10M in 2006 to 18.8M in 2010. These more focused segments should be easier to tackle for business users and identify potential problem areas as opposed to larger trends such as the Category “Youth”.
We could explore the other segments in a similar way, but let’s conclude our analysis for the past revenue drop for now and head over to finding out what happened during the recovering period between 2010 and 2014, called Phase B.
If you are interested in further features and descriptions on how the Key influencers tool works you might check out this comprehensive resource from Microsoft.
For the analysis of Phase B we will switch over to the decomposition tree feature in Power BI.
The decomposition tree is a great tool to conduct a root cause analysis. You can use it as a smart drilldown where Power BI suggests the next drill down level automatically based on a specific metric you want to explore.
For this purpose, start with a blank report page in your Power BI file. Recreate the revenue by year chart, this time for the period between 2010 and 2014. Convert this chart into a static visual if you used the Q&A tool for this. The revenue chart should now look similar to Figure 5-13:
Select the line chart and change the visualisation type to “Decomposition Tree” as shown in Figure 5-14:
Similar to the influencer tool, we have to tell Power BI which variable we want to analyse and which dimensions we consider as explanatory factors. Modify the visual properties so that they are looking like the ones in Figure 5-15.
The initial output of the decomposition tree visual will just be a plain summary of our metric of interest on a blank canvas. You can see this output in figure 5-16.
The task at hand is now to decompose this aggregated metric into smaller chunks, ideally splitting it in such a way that we capture the most important metrics at the top of the tree.
When you click the small “plus” icon on the right next to the aggregated revenue, you will see a menu popping up where you can select the next split you want to make. You could either select the split manually, for example based on some business logic or your own preferences, or you can use a feature called “AI splits”. We will come back to this concept later to find out in which cases it might be reasonable to select drill down levels manually. AI splits are indicated by small light bulbs and will automatically choose the next best drill down level for you, depending if you want to influence your top-level metric to be higher or smaller, as you can see in Figure 5-17.
Since we are interested in explaining the revenue growth in Phase B, let’s choose an AI split for high values. You will see that Power BI creates the first split for you on the criteria “Category” as shown in Figure 5-18.
If we hover over the light bulb next to the Category split, you will see a Power BI tooltip explaining the current node of the tree. In this case, this fist split tells us that the revenue between 2010 and 2014 was highest when the Category was Urban (see Figure 5-19).
We can now drill down further as we like, for example by finding out which criteria led to the revenue growth within the Urban category. Again, we could select the next drill-down level manually or let Power BI figure it out for us, depending on whether we’re interested in High or Low Values (See Figure 5-20)
If we choose “High values” again, we can see that the next split will be made upon the field “Manufacturer” as shown in Figure 5-21.
At this point, let’s stop for a second and quickly recap why exactly we are seeing these split in this order. Why isn’t the data splitted first for the Manufacturer and only afterwards according to Category? At least we have seen that the manufacturer “VanArsdel” claims most of the revenues.
This happens because the underlying algorithm in Power BI which makes the splits is greedy. It will choose the next drill down category where the splits will bring the biggest advantage. To demonstrate this phenomenon let’s look at a side by side comparison of revenue by category and revenue by manufacturer as shown in figure 5-22.
We can clearly see that both Urban and VanArsdel stand out in their classes as main revenue drivers. However, looking more closely we will note that Urban claims around 85% of the revenue share and VanArsdel only gets 53% of the revenue share. This is because there are many more manufacturers than categories which take away plenty of small shares from the leading key driver in this field. And that is why Power BI splits the data first according to Category and only afterwards according to Manufacturers.
While this greedy approach works quite well most of the time, you have to be careful in some cases. When we examine data after a certain split we can only analyse data which actually exist in this split. For example, by splitting for Category first and then exploring the Urban category, we will only see manufacturers in that tree branch which actually produce products in the Urban category. If there was a manufacturer in the dataset which does not produce any goods for the Urban category, this manufacturer would not show up in the downstream splits of that branch of the tree.
On the contrary, this means that Power BI would most likely never (or only at the very end of a tree) suggest to split based on a field where all observations are equally distributed. We can observe this for example in the “State” dimension of our datasets where the single biggest revenue contributor (California) only gains around 10% of the total revenue and the smallest contributor counts around 4.8%. The states are pretty much evenly distributed with regards to revenue so they are not a good candidate for Power BI to create splits upon them. If states did play a role for your business (or different business users) you would either select those splits manually at the first level or alternatively add a page filter according to the state(s) you are interested in.
Coming back to our Decomposition Tree example, we want to add two more AI splits for high values. If you add these splits you should see a tree which looks like the one in figure 5-23:
When we look at this tree we can immediately see that sales in the Urban category, from the Manufacturer VanArsdel with the Segment Moderation and within the region East were the main contributors to the revenue growth in the period 2010 - 2014. If we create a line chart with exactly this filter criteria and compare it to the overall total revenue we can clearly see how this plays out in figure 5-24.
Sales from this segment alone contributed more than a 20M revenue plus to our total revenue gain and grew even stronger on a relative level (+46%) than the absolute revenue trend (+15%).
With the Decomposition Tree business users can easily interact with the data and find the relevant contributors on the granularity level which they need. After having combed the data automatically using the AI features of Power BI it is much easier to loop in a data analyst, if necessary, to verify their conclusions or raise questions which came up during the analysis process.
The Power BI report with the Decomposition Tree can be shared with business users so they can browse through the data on their own. To share reports with other users in your organization you will need a Power BI server or a Power BI pro license.
To find out more about the Decomposition Tool you might want to check the following Microsoft resource.
In this chapter we have discovered two powerful AI-backed features in Power BI that help business users and analysts alike to reduce their time to insights by automating parts of the data analysis process. The Key Influencer tool can be used to surface key drivers of a given metric of interest and identify relevant segments. The Decomposition Tree in combination with AI splits is very useful for directed drill downs in any data dimensions while keeping an overall business metric fixed. This makes it a great tool for ad hoc data exploration and self-service root cause analysis. In the next chapter we will leave the realms of the past and present data and look how AI can help us to anticipate future events or make better predictions about future outcomes. As such, we will start with an automated classification task which helps us to categorize future business events based on our past experience.