This example solves a challenge that confronted my wife, Sue, in her job in human resources. She had to reconcile large spreadsheets taken from different systems that had conflicting data. Since these had hundreds of rows and dozens of columns, trying to eyeball the differences was time-consuming and prone to errors. Since she has to listen to me drone on about how great Power BI
and the rest of the Power Platform is An image of the smiley emoticon., the least I could do is try to help.
A Microsoft Access Solution
The key to a solution is to have a common data element in both spreadsheets and then “join” on that element. In our example, here I use a Position Number column. Before tackling the problem in Power BI
, I first wanted to try a Microsoft Access solution. Figure 18-1 shows the first step of importing both spreadsheets
as tables.
We use the Import Spreadsheet Wizard
and choose to use the first row as headers, as shown in Figure 18-2.
Once we get our tables imported, we create a query, as shown in Figure 18-3.
The key is to do a join1 on our common column, Position Number. We then add our common columns (Position Number and Position Title) and then our potentially non-matching ones for each spreadsheet, naming them Office Symbol 1, Office Symbol 2, Program Code 1, and Program Code 2. We are now ready for the useful part, which are “Match” columns that are True if the columns match and False if they don’t. Listing 18-1 shows the entire Structured Query Language (SQL) syntax
with the key elements in bold.
SELECT
Spreadsheet1.[Position Number], Spreadsheet1.[Position Title], Spreadsheet1.[Office Symbol] AS [Office Symbol 1], Spreadsheet2.[Office Symbol] AS [Office Symbol 2], IIf([Spreadsheet1].[Office Symbol]=[Spreadsheet2].[Office Symbol],True,False) AS [Office Symbol Match], Spreadsheet1.[Program Code] AS [Program Code 1], Spreadsheet2.[Program Code] AS [Program Code 2], IIf([Spreadsheet1].[Program Code]=[Spreadsheet2].[Program Code],True,False) AS [Program Code Match]
FROM
Spreadsheet1 INNER JOIN Spreadsheet2 ON Spreadsheet1.[Position Number] = Spreadsheet2.[Position Number];
Listing 18-1
SQL syntax
for “Compare Spreadsheets” Access query
We use the IIf function to show True if the two Office Symbol or Program Code columns match and False otherwise. Figure 18-4 shows the results.
Notice that True shows up as -1 while False shows up as 0. We would then export this query back out to Excel and then filter on these values to find our problem rows.
While this solution works, it is not very scalable in that the user needs to import new spreadsheets each month (or create a linked table, which is itself fairly complicated) and then redo the query. Worse, there is no good way to share this data with other team members without them having to be able to open and understand an Access database.
Configuring Our Data in Power BI
In Power BI
, we link to both spreadsheets as data sources in the normal way and then Transform data. The only change we need to make in the original data is to Change Type for Position Number to Text so that we don’t lose the leading 0’s, as shown in Figure 18-5.
We now have to figure out the best way to join our data. Going to the model screen as in a previous chapter will not work in this case as we want to do more than just show both tables in one visualization. We actually need our new “Match” columns in the same data source. For that, we need a new Merge Queries as New, as shown in Figure 18-6.
We choose our two data sources and select that the Position Number column is the one that matches in each. We perform an inner join as in our Access implementation. This gives us the result shown in Figure 18-8 after we have renamed Office Symbol to Office Symbol 1 and Program Code to Program Code 1.
Notice how all of Spreadsheet2 is shown as a Table. We expand that to add its Office Symbol and Program Code columns, as shown in Figure 18-9.
Since we check the box to “Use original column name as prefix,” that gives us columns like Spreadsheet2.Office Symbol. We rename them and reorder to get to Figure 18-10.
We are now ready to implement our “Match” columns. These are a good use of Conditional Columns
. Rather than comparing the value as is the default, however, we want to compare Office Symbol 1 to another column (Office Symbol 2), as shown in Figure 18-11.
If these match, we put in True and otherwise False, as shown in Figure 18-12.
We do the same thing for a Program Code Matchcolumn
.
Thinking forward to our desire to color-code our entries (e.g., show True in green and False in red), it turns out that we need a numeric conditional column as well. Figure 18-13 shows how we can set that up.
We won’t display this column in our visualization, but rather just use it in our conditional formatting. We are now ready to Close and Apply and start on the report.
Visualizing Our Data in Power BI
We are now ready to display our data. We start with a Table control to show the values from our original spreadsheet. The users can then filter to show non-matches and then export that data if desired. We create slicers for Office Symbol Match (Figure 18-14) and Program Code Match as well as card to show the number of positions affected.
Our last objective is to color code our “Match” columns so that True values show as green and False ones show as red. To do that, we select our column in the table and apply Conditional formatting, as shown in Figure 18-15.
Power BI conditional formatting
is not yet as flexible as Excel’s and thus we can’t apply formatting with non-numeric data. That’s why we created the additional conditional column back in Figure 18-13. As we see in Figure 18-16, we change the Format style to Rules, leave the Apply to as Values only, and then choose our What field should we base this on? to be our Office Symbol Status Color column. We leave it as Sum, but Average, Minimum, or Maximum would all have worked the same.
We edit the first rule to be = 0 as a Number and then choose red as the color. We add a New rule and use = 1 as a Number. We choose a variation of green (#039703 hex value) for its color. Figure 18-17 shows the final result.
While this is a comparable amount of work to the Access solution, the big advantage is that updated feeds could be uploaded to Teams or another location and processed via Power BI’s
scheduled refresh. The report can easily be shared with other staff. They just use the two slicers and, if needed, export the problem data. This is MUCH easier and less prone to error than the Access solution.
Summary
In this chapter, we took a common example of needing to determine if data from multiple systems was consistent. We first looked at a “legacy” technique of using Microsoft Access queries to do the job. We then examined how to create a reusable solution in Power BI via Merged Queries. We added slicers and conditional formatting to easily see unmatched data. Powerful indeed!