Adding in multiple answers, multiple lines of text, and Yes/No responses leads to some additional challenges in visualizing data. This chapter looks at some of these challenges and how to address them.
JSON and HTML Data Formats
Our first example
comes from an IT Demographic Survey of all our personnel. Figure 6-1 shows a section of the survey that reveals problematic data.
I used the same technique
as the previous chapter to copy each response to a SharePoint list using Power Automate. We see the first inkling of an issue when we look at the data in SharePoint, as shown in Figure 6-2. The extra explanatory text after Level 4 gets copied (since it was part of the answer).
More significantly, the multiple answers get represented as:
["EDU (wired)","MIL"]
Those with some web programming experience might recognize this as JavaScript Object Notation (JSON). When we connect to the data in Power BI, we see an additional issue in that the data is displayed as HTML,1 as shown in Figure 6-3.
The value looks like this
:
<div class="ExternalClassE89BC5E188494C92998F948FB46F942E">Yes. The current visual projection capability in my classroom is sufficient.</div>
Working with the Data in Power BI
As in the previous chapter, we connect to the SharePoint list as our data source and then Transform data to begin manipulating the data. Our first task is to deal with the HTML format. While we could extract the values as XML (like we will with JSON as follows), that is subject to error if there are stray brackets in the data AND it is more problematic for null data. Instead, we want to use a similar technique as a previous chapter where we connect to the SharePoint list
in the 1.0 implementation, as shown in Figure 6-4.
This gives us additional options when transforming our data. In particular, it exposes the FieldValuesAsText column, which allows us to extract the plain text version of any of our column values. This is shown
in Figure 6-5.
Our first task is to deal with null data. We initially sent out the survey requiring answers, but we updated it later for the stragglers with a request to just fill out the first four questions with the option to skip the rest. For those columns that were JSON, this meant we needed put in [] for that data. This is shown
in Figure 6-6. Note that we used to enter null for the Value to Find but now we just leave it blank.
Our next task is to remove all the extraneous text from our IT Level and instead have it show Level 1 through Level 4. As shown in Figure 6-7, we use the Extract option
and then use First Characters, which turns out to be seven. This will make our visualizations much cleaner.
We have two tasks when working with the JSON data. The first is to eliminate the brackets and the quotes. We do this by going to Extract – JSON, as shown
in Figure 6-8.
The second task is more difficult. Notice how the first row contains EDU (wired), MIL while the second contains EDU (wired), EDU (wireless). If we do a visualization
“as is,” one “pie slice” will be the EDU (wired), MIL combination and the other will be the EDU (wired), EDU (wireless) combination. What we actually want is to show two EDU (wired) answers and one each for MIL and EDU (wireless). To make this happen, we need to split each response into its own row. As shown in Figure 6-9, we click on the “Expand” button at the top of the column.2 We then select Expand to New Rows.
3 Each entry gets its own row. Notice the circled Id values in Figure 6-9. These are duplicated and are how we will match up these extra rows to the original responses
.
We do this for each of the multiple response columns. In our example, we started with 270 rows of data. After expanding all the columns, we ended up with 1,227,676 rows! Luckily, Power BI can handle this extra data just fine.
Visualizing the Data in Power BI
After performing these steps on all our JSON columns,4 we are ready to visualize the data. As we have seen in previous examples, we put the desired column (Current Networks (Office) in this case) as the Details and then find a unique value like Id for the Count. Figure 6-10 shows the results
. When we look at the tooltip showing the # of Responses for EDU (wired), we see the problem. 308,753 responses reflect all the expanded rows described earlier in this chapter and does not reflect the true data.
We can now take advantage of the fact we noticed in Figure 6-9 that the Id values were duplicated when we expanded the rows. We use Count (Distinct) instead as shown in Figure 6-11. The # of Responses is now reasonable at 261. Notice how the percentage is much different as well.
Another implication of the expanded rows becomes apparent when we want to view the actual data. I normally use a Table for this purpose. If we do that here, however, we see all the duplicate data (name, department, IT Level, etc.). Instead, we can use a Matrix, as shown
in Figure 6-12. We add each desired element to Rows and then click the Expand all down one level in the hierarchy button (see arrow in Figure 6-12) until it is disabled. We then go to the Format your visual tab and turn off Stepped Layout under Row headers ➤ Options.
Working with Yes/No Data
Our final task for this chapter is to effectively display Yes/No (or Boolean) data. Figure 6-13 shows a portion of a daily
status check form. It is largely a series of Yes/No/Other questions on whether various systems and networks are operational.
While we could easily make a standard pie chart for each question showing the numbers of Yes and No responses, this doesn’t give us the desired “dashboard
” of both the most recent date (Figure 6-14) and the results over a given time frame (Figure 6-15). To do this, we need to make some adjustments to the data.
As with previous examples, we go to Transform data. Before we get into the Yes/No data, let’s look at how we come up with the “Most Recent Date.” We don’t want to limit this to “today” since the daily check may not have occurred yet and we want to use yesterday’s (or Friday’s if this a Monday) data. For this, I found the easiest way was to first create a MostRecentSubmissionDate column
using this formula:
=List.Max(#"Changed Type"[Created])
But where does the #"Changed Type" bit come from? That is actually referencing a prior step. To see that, we need to look at the Advanced Editor
, as shown in Figure 6-16. With some study, you will see that each step is named and then the next or other later rows refer to that name (#”Changed Type”, #”Added Custom”, etc.).
This gives us a column that has the same value in every row, for example 8/23/2022 1:34:33 PM. We then add a Conditional Column that we name
isMostRecentDate
. We put 1 in this column if the Created column equals the MostRecentSubmissionDate and 0 if not, as shown in Figure 6-17.
We change its type to True/False and then use it as the Page Level Filter
, as shown in Figure 6-18. This limits all data on the page/tab to only the most recent.
Getting back to our Yes/No data, we remove all the unneeded columns (Author, Editor, etc. that came over from SharePoint). We then select all the Yes/No columns and choose to Unpivot Only Selected Columns, as shown
in Figure 6-19. This gives us a new Attribute column (that was the previous column name) and a new Value column (which is Yes/No/Other). We rename these to System/Task and Operational/Completed, respectively.
We are now ready to make our visualization. We choose
Stacked Bar Chart
and use System/Task for the Axis, Operational/Completed as the Legend and the Count of Operational/Completed as the Value. This is shown in Figure 6-20.
Since Yes is good and No is bad in our context, we set the Data Colors
so that Yes is green and No is red. This is shown in Figure 6-21.
Summary
This chapter introduced us to several tools and techniques that help us work with some of the more complicated data types and still accurately and effectively display our data and get the insights that we need.