Chapter 5. Visual Elements

In the previous two chapters we focused on understanding charts themselves. Charts are ultimately the main focus of any communication with data. However, if you only concentrate on deciding the type of chart you want to use, you’ll miss the opportunity to communicate your point to your audience even more clearly through the use of visual elements. Visual elements -- such as color, size and shape -- make a massive difference to your audience’s ability to interpret your charts which I will focus on more deeply here.

As mentioned in previous chapters, color, size and shape are three pre-attentive attributes. Knowing how to use each of these aspects of your data visualisations will improve the overall aesthetic of your work as well as making your communication’s message clearer.

Aside from pre-attentive attributes, this chapter will also look at the use of multiple axes when deciding on efficient visual elements. When looking at visualizing multiple measures on the same chart, the only communication style we’ve shown thus far is the scatterplot. In this chapter, we’ll look at another option, Dual Axis charts, which make you think about the type of mark you are using as well as the range of values each axis covers.

Any element that can help focus the audience’s attention, or highlight a key set of data points, will help dramatically to communicate your message. Reference lines and reference bands are a key element that can amplify your message but their use goes beyond just a simple line or band. We’ll also take a look at box and whisker plots, a more advanced use of reference lines and bands that can quickly show complex trends in your data.

The final element we’ll look at is Totals. Adding a total to your chart may be an easy step to take when working with any data tool but they pose challenges when using color or length to visualize their value compared to their constituent parts. In this section, you will gain a better understanding of the options you have.

By the end of this chapter, you will be comfortable with making great charts that communicate your message clearly and even allow your message to jump off the page.

Color

I’ve mentioned color a number of times in the previous chapters. Both hue and intensity are pre-attentive attributes that have shown up in our examples so far. As a reminder, hue refers to the type of color and intensity refers to the level of purity of the color.

Types of Color Palette

There are three types of color palette that you will frequently encounter as you view data communications or create your own: hue, sequential and diverging. The pre-attentive attribute of intensity is covered by sequential and diverging color. This section will allow you to pick the right colors for what your data is communicating to your audience. When choosing which colors to use, you might select a different selection of colors. Depending on the number of colors you are going to use changes how you might you them:

  • Three or more colors - this will be selecting a palette of different hues

  • Two - you might still be creating a color palette of two different hues or use a diverging color palette to show progression from one color to the other

  • One color palette - you can use a single hue on it’s own but you may want to show different levels of intensity of that color using a sequential color palette

Let’s look at each of these color options in turn to see how you might best utilize them when communicating with data.

Hue

Each different color you see is a different hue. Color is determined by the wavelength light possesses as it is reflected off an object. The primary use of hue in data visualization is to show different values in a categorical data field. Depending on the chart type and the medium the chart is used in, i.e. print or digital, hue is either an essential addition or a factor adding confusion.

For example, when using a scatterplot to show different categorical variables, hue is a clear way to show which plot refers to which variable. Using a soap retailer, Chin & Beards Suds Co as our example, Figure 5-1 shows how a separate color can be used to easily identify each store on the chart. Figure 5-1 shows the stores from the Southern region.

  Hue per Categorical Variable
Figure 5-1. - Hue per Categorical Variable

However, as soon as you approach ten or more colors, the task becomes much harder to recognise which plot is which. Adding in the rest of the United Kingdom’s stores and France’s stores sales makes assessing the scatterplot much harder (Figure 5-2). By using so many different hues, you will be able to see which store performs the best against their target whilst seeing their sales compared to other stores but the cognitive effort is significant when comparing different store locations to another.

  Too many colors to makes analysis harder
Figure 5-2. - Too many colors to makes analysis harder

If I asked whether France or the United Kingdom’s stores did better against their targets, would you know the answer? Possibly not. This is where the use of hue depends on the question you’re asking. In order to answer this question, let’s instead use one hue per country, with different levels of intensity for each color to represent the different stores (Figure 5-3). This makes the data much easier to interpret as to the balance between the French and UK stores meeting their targets.

  Two hues with differing levels of saturation
Figure 5-3. - Two hues with differing levels of saturation

Whilst it isn’t easy to differentiate the store locations, it is easy to pick apart the stores from the two countries. The United Kingdom has more stores in the top right corner of the scatterplot as we see more orange plots.

Using hue to show differences isn’t always required, though. Where charts segment variables by category, you don’t need different hues, as they add to cognitive load rather than reduce it. Figure 5-4 shows exactly that effect, where the colors distract from the consumer’s ability to compare the length of the bars, rather than improving it.

  Bar chart with too many colors
Figure 5-4. - Bar chart with too many colors

Removing the colors actually makes the chart much easier to read and consume (Figure 5-5) as the colors are no longer distracting the audience’s attention.

  Bar chart after the colors have been removed
Figure 5-5. - Bar chart after the colors have been removed

Another option was shown in Figure 4.14 where if you wanted to highlight a specific store, you could by just colouring one bar and not the others.

Intensity - sequential color palettes

The other pre-attentive attribute that utilizes color is Intensity. Intensity is shown through two different methods: sequential and diverging color sets. The difference between these two techniques are the number of hues involved. A Sequential color scheme only involves one color and the data points are shown based on the level of lightness. The lower the value, the more transparent the plot will be. Darker, more intense colors represent plots with higher values (Figure 5-6).

  Example of a sequential color palette
Figure 5-6. - Example of a sequential color palette

Sequential colors allow the audience to quickly determine at a glance whether values are high or low without even needing to check the position of a mark against the axis. A sequential color palette allows an additional measure to be shown on a chart that wouldn’t be possible otherwise. Using an example from an airline, in Figure 5-7, not only is the total sales value shown on the x-axis but quantity is shown using sequential color. This helps you see that as the number of tickets sold increases so does the overall sales for each of the ticket classes per quarter. Without the use of the sequential color palette, an additional chart would need to be used to show this behaviour

  A bar chart using sequential colors
Figure 5-7. - A bar chart using sequential colors

Intensity - diverging color palettes

A diverging color scheme involves two colors that go from one color to the other. The lowest values in the data set represent one color on the far left and the color on the far right represents the highest values. As the values near the crossover point to the other color, they normally fade to a white or light grey color (Figure 5-8).

  Example of a diverging color palette
Figure 5-8. - Example of a diverging color palette

As you can see in Figure 5-8, diverging palettes pop off the page more than a sequential palette since you are using two bold colors in your visualization. However, it’s important that you choose which palette type to use based on the data being visualized. For example, a diverging color palette is likely the best option when a measure crosses either the zero-point or a target, as a different color represents values either above or below that level. This colouring effect allows the audience to clearly see when the threshold has been crossed as there is a color change. As the value diverges further from the threshold, the audience will be able to use the color as an indicator for the level of progression beyond that point. Being able to quickly determine if something is above or below a target allows you to focus on what actions you might want to occur to ensure the target can be met. Using a sequential palette to show a range of values that cross zero isn’t effective because it does not have a clear visual indicator to demonstrate whether the value is above or below that key point (as Figure 5-9 demonstrates).

  Poor use of sequential color palettes
Figure 5-9. - Poor use of sequential color palettes

Choosing the ‘Right’ Color

By picking the right type of color palette for the communication, your audience will be able to effectively decode your message. To help them decode it faster, you can choose colors that are related to the subject of the communication.

Theme

You can use the theme of the data to help highlight the key messages and focus the audience’s attention. Let’s go through some examples to highlight what colors could be used to highlight your data:

  • Black / Red - Financial terminology refers to “in the black” for being profitable and “in the red” for loss making. Red and black can also be used to represent deaths. The context of the communication provides a lot of information about whether the color scheme represents deaths or company profits.

  • Green - can highlight ecological benefits or, in the United States, to represent money due to the color of the currency.

  • Red / Blue - can represent heat and cold respectively. This color range isn’t just about visualizing temperatures. ‘Hot’ can represent growth or intensity compared to ‘cool’ can represent cooling-off or falling values.

  • Yellow - can represent day time or hours of sunlight

  • Green / Yellow / Red - can represent colors of a traffic light meaning go, caution and stop. Most organizations have adopted this color scheme to represent good, ok and bad.

Your audience is likely to have a thematic color palette in mind when they access your work due to the nature of why they are accessing the work in the first place.

Let’s consider the color red in a cultural context. The difference between Eastern and Western cultures is significant when it comes to the color red. As mentioned above, red is used in many organizations to signal stop. This is not the case in Eastern cultures where red is used to signal luck, happiness and joy. The stark difference between the two interpretations means you need to consider who your audience is and their likely association with the color before making your choice.

Limitations to the effectiveness of color

Although society has common associations to certain colors, those colors are not always perceived by all members of that society in the same way. Color Blindness is the common name for people whose cones at the back of their eye don’t respond to certain colors. The condition is normally genetic but affects enough people that you should consider your design choices when communicating with data via color. 1 in 12 men and 1 in 200 women have one form or another of color blindness1.

There are different types of color blindness to be aware of so you can test that your communications are making the right impact:

  • Deuteranomaly - reduced sensitivity to green light

  • Protanomaly - reduced sensitivity to red light

  • Tritanomaly - reduced sensitivity to blue light

Deuteranomaly is the most common form, while Tritanomaly is the rarest. Protanomaly and Deuteranomaly often combine to form red-green color blindness, which presents as the inability to distinguish between colors that have red or green shades like oranges and browns as well as red and green.

There are many different websites that will let you upload an image to show how your visualization might affect those with color blindness. Even on a visualization intentionally developed to use contrasting colors suitable for color blind users, there are some major differences to what you’d see if you had color blindness or not (Figure 5-10). Running tests against the most common forms of color blindness is a must if you are sharing your work with the public.

  Using a Protanopia test on a visualization  left  to see the effects  right  for a reduction in perception of red lightTest conducted at  https   pilestone.co.uk pages color blindness simulator
Figure 5-10. - Using a Protanopia test on a visualization (left) to see the effects (right) for a reduction in perception of red light2

Avoiding Unnecessary Use of Color

Though color can greatly benefit a visualization, you’ve already seen examples in this book where the overuse of color can actually detriment your communication and make the audience have to think hard about what each color represents. Figure 5-2 and 5.4 are common examples you will likely come across as you help others start to make clearer communications with data. But beyond simple overuse of color, what other unnecessary use of color will you commonly encounter?

Double encoding

The term double encoding refers to the use of the same metric, or category, but shown on a chart in two different ways. Why would you want to do this, you may ask? Well, there are a few reasons but none of them justify the use of the technique. One place you might consider double encoding is to make your communications more accessible if you want to avoid color blindness. Ideally you should be selecting a color palette that removes the risk that someone who is color blind might not see the message in the data clearly.

Firstly, double encoding is used to make the chart look more interesting than just ‘another’ chart of that type. Using Chin & Beard Suds Co international sales data, a simple sales bar chart can be given extra flair by adding the sales metric to color too (Figure 5-11).

  Double encoded bar chart
Figure 5-11. - Double encoded bar chart

If this chart wasn’t ordered, the message would not be as clear. When you first look at Figure 5-12, you notice it’s the same data and use of color as Figure 5-11, but is much harder to read. When I first look at the image, I still take a moment to try to determine that the color of the bars is actually a second representation of sales.

  Unsorted double encoded bar chart
Figure 5-12. - Unsorted double encoded bar chart

I see examples of both of the previous charts frequently from both less experienced data workers as well as more experienced. Forcing your audience to ask the question of what the color represents is wasted cognitive effort. Yes, you could use a color legend on the chart but just removing the use of color makes the chart much easier to read, as Figure 5-13 shows.

   fig_11__double_encoded_bar_chart without double encoding
Figure 5-13. - Figure 5-11 without double encoding

Another reason to avoid using double encoding is that it over-exemplifies the message within the data. Let’s use the same sales data as the previous bar charts but show the data on a symbol map instead (Figure 5-14). Not only does the darker color pop off the map, but the size of the circles also indicate the sales values.

  Example of double encoding over exaggerating the message
Figure 5-14. - Example of double encoding over-exaggerating the message

You have already seen how dark, bold colors attract the audience’s attention, so by coupling that factor with a larger circle you are over-exaggerating the message within the data. Our role as communicator of data is to display the message clearly but not be overtly biased in what we are sharing based on the techniques we use.

Creating unbiased data visualization is a widely-covered subject and it is difficult to do because every choice you are making when communicating with data can introduce bias into your work. The potential biasing decisions you will make include where to source data from, what data points to include, how to represent the data and how to title the work. By removing a clear bias like double-encoding, you are giving the audience a fairer representation of the data.

Size and Shape

Size and shape are inadvertently linked because whenever you use one you need to think about the other, too. Both are pre-attentive attributes that can have a significant impact on how your audience will view the message you are communicating, especially when you use them in tandem.

Both attributes require careful use to avoid confusing the audience or adding heavy amounts of cognitive effort to interpret what you are presenting to them. As we’ve seen in chapter 4’s section on maps, it isn’t easy to decode the difference in size of two marks into a value. In chapter 4, I showed the impact of crossing a target or the zero point of an axis if a measure might return either positive or negative values. Let’s take this idea further and see if you can determine the values of each of these following circles representing the store sales (Figure 5-15). Can you tell me how much the sales were in Leeds compared to Lille? What about Paris compared to Plymouth?

  Using size to represent values. Can you quantify the difference
Figure 5-15. - Using size to represent values. Can you quantify the difference?

Don’t worry, it’s not your analytical skills that are weak, it’s the chart choice that is causing the issues here. If you want to check your answer, feel free to use Figure 5-5 which is much easier to interpret. I’ve used Tableau to build this chart so it’s a standard way to represent size before anyone gets out any rulers and starts to measure the specific circles and work on some trigonometric calculations. You might have even been tempted to flick back to one of the bar charts used earlier in the chapter like Figure 5-5

Leeds store’s sales were 140,157 whilst Lille store’s sales were 417,544 making a difference of 277,387. The difference is nearly double Leeds store’s sales but did you get that from this chart? I didn’t. The difference between Plymouth and Paris is much less but still difficult to determine exactly how much. Plymouth had sales of 320,850 and the Paris store’s sales were 284,686 making a difference of 36,164 or just over 10% of Plymouth’s sales. Again, this is difficult to calculate just by looking at the chart.

This isn’t to say size is not a suitable charting technique to show the story in your data. You can use the technique to draw the audience’s attention to either end of the size spectrum but not much else. It’s important to allow the audience to focus on what they want to understand most about the data being presented and this chart doesn’t allow for the range of investigation we’ve seen from a bar chart or scatterplot. Figure 5-15 can be adapted to fit many more differing questions if the values are added to the image too (Figure 5-16) but the chart doesn’t utilize strong pre-attentive attributes so is quite limited even with the labels.

  Size with added labels to assist interpretation
Figure 5-16. - Size with added labels to assist interpretation

Themed Charts

Using shapes in your data communications is another way to help set a theme for the audience. If you are sharing data about countries, then using a flag will set the theme. For sports teams, the teams’ logos allow the audience to know which data point represents their favourite team and how they compare to their competitors. When using shapes to visualize data, there are suddenly almost infinite options for the charts you can create and the themes you can use.

Scatterplots

In chapter 4 we explored the challenge of referencing the individual plots back to the different categorical variables they represent. If you use a different color or arbitrary shape for each variable, it puts a significant burden on the audience to look up each in turn to understand what each plot represents. You can use shapes to simplify this lookup process by using icons that represent each variable.

In Figure 5-17, I used the Bike Store Accessories to show the sales performance against target. Each item is easily identifiable but a legend is included if the image isn’t telltale enough.

  Shapes used to represent categorical variables
Figure 5-17. - Shapes used to represent categorical variables

Like other scatterplots, there can be a challenge to relate each shape to the variable it represents if too many are present on just one chart or if the plots are densely clustered. Yet, reducing the cognitive effort is a useful step to take whilst also giving the work a different look and feel compared to a standard scatterplot. There are some challenges with using shapes in a scatterplot but I’ll get to those shortly.

Unit charts

Shapes can be used to show a measure in the form of a unit chart. Unit charts can use a single shape to represent a set value. In Figure 5-18, the bicycle shape is used to represent 100 bikes being sold. Unit charts work in a similar way to a bar chart, as we notice the length of the shapes end-to-end first to make comparison easier between the different categorical variables. To ensure the images are clear, the values you are visualizing will either need to be rounded to the unit size of the shape or for partial shapes to be used. To infer the actual value, your audience will need to count the number of shapes they see.

Unit chart showing bike sales
Figure 5-18. Unit chart showing bike sales

By forcing the consumer to count, you are not creating a visual representation that is very quick to find values in but it does direct your audience to an answer. Tooltips, discussed further in Chapter 6, can provide clarity on providing your audience with exact answers when using an interactive format.

Size and Shape Challenges

The use of shapes and their sizes are not always an easy option to take when communicating with data. Let’s go through some of the common challenges you will come across so you can avoid making mistakes.

Scaling

As we’ve seen in Figure 5-15, size is a difficult metric to use to interpret the actual amount represented in a visualization. It is also difficult to articulate to your audience what you are actually showing with the data point via height, width or area. Figure 5-19 shows the effect of scaling if the value 1 increases to 2. If you are not clear to your audience what the size represents, they won’t know whether the shape on the right of the figure represents 2 or 4.

  The challenge of showing difference by size
Figure 5-19. - The challenge of showing difference by size

It’s important to ensure that the audience clearly understands how you are scaling the shapes. A legend is a common way to guide them.

Different devices

With modern technology, audiences are consuming data-based communications on a range of devices. The varying size of screens and methods to interact with those devices can pose a challenge when communicating with shapes. Chapter 6 will cover how we interact with charts and how the device type dictates what types of interactions to offer the audience. When viewing shapes, the size of the screen is the most significant factor. Being able to differentiate each shape can be a lot harder on smaller screens. The range of sizes will also be much harder to determine when the overall scale of the image is much smaller on mobile screen compared to a desktop’s screen.

Unsquare shapes

You can also change the size of the shapes to represent a third measure that isn’t shown by either axes. However, this choice can pose an additional challenge to interpret beyond what is covered in Figure 5-15. The size of bespoke icons can be complex to calculate as the shape isn’t always square. You must take care as to how the data visualization tool will determine what size to make the shape compared to how your audience might read it. Figure 5-20 shows how even if you made each icon consistently 30 pixels by 30 pixels, the blank space around the sides of the backpack logo would not be factored into the sizing as seen by the audience. Imagine the square border isn’t present in this image, you would be left guessing whether the area of the shape being used to represent the measure was square or not. This is where a size legend becomes very important and I’ll cover those in Chapter 6.

  Custom shape but difficult to use to represent a measure using size
Figure 5-20. - Custom shape but difficult to use to represent a measure using size

The same sizing issue is true for a common shape that is used to demonstrate location too. The inverted droplet shape can be a challenge to use as many mapping tools will plot the middle of the icon over the top of the exact point, rather than the tip of the droplet. Figure 5-21 demonstrates this issue.

  The incorrect use of the droplet icon  left  and the correct use  right
Figure 5-21. - The incorrect use of the droplet icon (left) and the correct use (right)

Depending on the data visualization software you use, you may need to alter where the point of the droplet sits within the image. To fix this effect, you need to change the middle of the shape to be the bottom of the droplet. To do this, you can pad the same length of the shape on to the bottom of the image, as illustrated by Figure 5-22.

  Padding the shape so it will be positioned correctly
Figure 5-22. - Padding the shape so it will be positioned correctly

Limitation of uses

You should only use shape and size to represent certain data types. For example, categorical variables should only be differentiated by shape but not size. There is no series of shapes that would clearly show different values of a measure. As seen in this section already, size can represent differing values of a measure but the shape would have to remain consistent throughout.

Likewise size would be a poor representation of different categorical variables. You could scale a shape to represent ordinal data with the earliest mark being the smallest and the latest mark being the largest. Whether the audience would find this method of communication intuitive is a significant question that would require some testing.

In summary, there are some intuitive use cases for using shapes to represent the variables themselves to save the audience having to look up what shape represents each variable. However, limitations do exist when comparing a measure based on size of differing shapes. Size can be useful to direct the user to find the data points of interest, but is difficult to compare accurately. Used carefully, size and shape can be an effective communication method but you must use caution and consideration to achieve the desired effect. You are likely to use color rather than size or shape much more frequently when communicating with data or receiving other’s communications.

Multiple Axes

When I think about multiple axis charts, I instantly think of scatterplots, closely followed by maps. Another common feature of charts that we’ll explore is the use of multiple axes for the same axis orientation, i.e. two y-axes. These charts are commonly called dual axis charts. You might question why I’m recommending using multiple axes on the same chart when I’ve been preaching simplification of charts wherever possible. The reason I find dual axis charts useful is the ability to overlay two layers of data on top of each other for direct comparison.

Let’s take a common example of a dual axis chart that compares one metric against the other using two different mark types, the way to show the data on the page. Figure 5-23 shows the direct comparison between the profit generated by the sales in each month. In this example, sales is represented by an area chart to act as background information for the profit value that is shown as a line chart. As profit is the focus of the chart, I’ve made it a more intense color.

  Example of synchronised axes
Figure 5-23. - Example of synchronised axes

You’ve already seen a couple of different ways this chart could be visualized differently but let’s quickly assess why this method is a useful way to communicate the data. The first option is to use two separate charts, one to show sales, the other to show profit. Although you’d see the overall pattern between the two charts, the cognitive effort to spot divergence of the trends is quite an effort when they are displayed separately. By relying on your audience to spot this trend on their own, you are risking them not seeing this clearly and missing the message entirely.

The second option is to use the two measures as a scatterplot. The challenge with this approach is how to show the trend between the months and not just the overall distribution of the plots. To show the trend we could link up the plots sequentially with a connected scatterplot as shown in Figure 5-24. The line linking up all the plots can be a tangled mess so this technique doesn’t work for every data set. I’d always recommend you try to follow the path created by the line to see how much cognitive effort you have to expend in interpreting the chart. If you struggle to follow the path yourself, it is unlikely your audience will be able to and therefore, a different technique should be used to visualize the data.

  Connected scatterplot
Figure 5-24. - Connected scatterplot

One choice you need to make when using two measures in a dual axis chart is whether to synchronise the axes together or leave them to be independent of each other. Synchronising axes means making sure the scales on the axes are identical to each other. When deciding which approach to take, I look to the question I am trying to answer to decide which is the best approach.

Choose to synchronise: if you are answering a question about what proportion of one metric is driven by another, you will need to synchronise the axes (Figure 5-23). For example, student attendance to a lecture should always be a proportion of the number of people taking that course. This ensures the visual represents the direct comparison of the two on the same scale.

Leave the metrics independent: if you want to see if there are any common trends between the metrics, you could leave the axes unsynchronised. This means the metrics will overlap more but you won’t be able to tell the proportion that one metric makes up of the other (Figure 5-25). The best practice approach is to synchronise axes to ensure your audience is clear on what proportion of sales forms the profits made. However, with a data literate audience who will carefully check the axes and any titles shown can use the unsynchronised axes to form different views of the data.

Multiple mark types on a dual axis chart
Figure 5-25. Multiple mark types on a dual axis chart

The most common use of dual axis charts I make is to compare performance from one time period with the same time period last year. In a bar-in-bar chart like Figure 5-26, I use the same mark type but formatted differently to show the story in the data. The bars are sized and coloured differently to help the audience understand what the chart is showing.

The main metric in Figure 5-26 is the profits earned in 2021, which is represented by a thinner bar that sits in front of the comparison metric. The 2021 profits have been coloured based on whether they exceed the 2020 profits. Those that exceed last year’s total I have coloured them in orange, while those that don’t are a dark grey.

  Bar in bar chart showing profit for 2021 versus 2020
Figure 5-26. - Bar-in-bar chart showing profit for 2021 versus 2020

Bar-in-bar charts are very effective as a communication tool because they utilize length but have much more detail built within them than just a simple bar chart. The ability to include additional context of the comparison period is useful whilst also comparing the trends of last year versus this year.

Reference Lines/ Bands

In this chapter so far I have shown you how to use different techniques to alter the marks showing the main data points of your chart. The next two sections will dig into additional chart features, starting with reference lines and bands.

Reference Lines

I’ve already shown how to highlight marks against another data point as used in the bar-in-bar chart. However, reference lines allow much more flexibility in many situations. A reference line can show a constant value, be calculated based on the data points, or be driven by a measure in your data set. Let’s have a look at each situation in turn.

Using the profit values of our bike store in 2021, let’s apply a target of $20,000 per month and show how that appears on the chart (Figure 5-27). You can still use the technique of colouring based on whether the target is met or not.

  Constant reference line
Figure 5-27. - Constant reference line

Reference lines don’t just have to be used for targets. You can use a reference line to help the audience interpret the chart too. If we took two years of profit data from our bike store, it might be challenging to piece together the stories within the data. By breaking down the 24 months to 8 quarterly periods, it can make the chart easier to interpret (Figure 5-28).

  Quarterly Average reference lines
Figure 5-28. - Quarterly Average reference lines

To make the chart easier to understand, I’ve increased the transparency of the marks to allow the reference line to be the dominant mark on the chart rather than the bars. The benefit of using averages is because if the data updates, the reference lines should also update to continue to show the latest message in the data.

You can also use the reference line as the main feature of the chart by exchanging the bars for circles instead to stack the data points per quarter, as seen in Figure 5-29.

  Showing distributions and stories using reference lines
Figure 5-29. - Showing distributions and stories using reference lines

The reference lines themselves can also be their own data field rather than being based on existing data points in the chart. In Figure 5-30, the targets have been set by a separate data field rather than the reference lines we’ve used previously that have been set based on the data points shown in the visualization. When working with real world data sets, data often comes from separate sources when adding targets to an existing view. The granularity at which targets are set is often a less detailed level than the main data set which can cause a challenge with data preparation.

  Targets from a separate data field
Figure 5-30. - Targets from a separate data field

Reference Bands

Reference lines are not the only formatting option you have when adding visual elements to a chart to help guide the audience’s interpretation of your chart. Using reference bands, which highlight a range of points, is another option that you can make use of. Using reference bands is a good way to simplify reading distributions of data. A distribution is where a chart is used to summarize all the varying data points in a data set for a particular measure. Understanding how your data is distributed is an important part of analysing a data set.

Reference bands can be used to highlight the range between:

  • Minimum / maximum

  • Quartiles, i.e.: between 25% and 75% of the data

  • Standard deviation either side of the median

The most common form of reference bands you are likely to come across in a chart is called a Control Chart (Figure 5-31). A control chart is often used in operational situations from call centres to manufacturing to understand levels of demand placed on a system. What I mean by demand is the object that operational set-up is designed for. In a call centre, this would be the number of calls a team might receive each day. By understanding the normal levels of demand, the right number of call handlers can be on the lines ready to answer the callers’ needs. Too many call handlers would mean the team is likely to be bored and the cost too high of employing that many people to be in the office. In Figure 5-31, the chart is split into two sections with the mean and control limits recalculated due to a change being made to the process being measured. This is a common requirement when measuring effectiveness.

  How to read a control chart
Figure 5-31. - How to read a control chart

A control chart can be complex to read at first glance but they are incredibly useful visualizations that allow the audience to use the key data points to make decisions and to avoid being misled by more extreme data points. A Standard deviation either side of the mean in a normal distribution focuses on roughly two-thirds of the data points in the data set. Two standard deviations capture 95% and three standard deviations capture 99%. This means any outliers are the more extreme data points. In the real world when you are measuring varying levels of demand, you want to build your operational system to meet the most common needs and not design to just meet the extremes, unless you absolutely have to.

The control chart shows a number of key metrics all in the same chart. Firstly, the mean which is calculated by adding up all the different plots and dividing by the number of them. The upper control limit is determined by adding three standard deviations on to the mean in the traditional six sigma view but other numbers of standard deviations are used in alternative versions. The Standard deviation is calculated by the square root of the collective variance of each value from the mean. The lower control limit is calculated by taking the same number of standard deviations away from the mean. The band between the upper and lower control limits demonstrate what plots are within control and should be designed into the operational systems. Any point that falls above the upper control limit or below the lower control limit should be ignored. This is due to designing a system to fit every eventuality would lead to a poorly optimized system. The system would be expensive to operate and increase the charges made to customers. Data points that fit outside of the control bands are very rare instances and therefore shouldn’t be factored into the design of the system. Control charts might show changes in the reference bands at certain points where the demand is known to have changed.

Ideally, you want the reference band to be as thin as possible, as this means there is little variation in the measure, which in turn means the operational systems can be developed to meet this need. Where the reference bands are wide, the levels of demand will vary and therefore will be harder to design a system to meet that demand. Figure 5-32 shows what the number of calls looks like for our Bike Store. If you had to determine how many people we’d need to answer the calls, it would be difficult to say even if you knew each person could handle 20 calls a day. By showing the data points on a control chart, you can begin to see how the volume of calls is becoming more consistent until the final quarter. Greater consistency is useful to ensure you are able to meet the needs of your customers.

  A control chart of number of calls received by the Bike Store
Figure 5-32. - A control chart of number of calls received by the Bike Store

Another chart type that uses reference bands is called a box and whisker plot. A box and whisker plot uses a reference band to show distributions of data like the control chart but in a different way (Figure 5-33). The chart gets its name from the reference band that is used to show the difference between the first quartile and the third quartile. Quartiles are a description of the distribution of data when each data point is ordered from smallest to largest and then split into quarters. A line is drawn out from either side of the box, also known as the whiskers due to their appearance, to show the full range of the data points. The whiskers can be used to show one and a half times the interquartile range or the full range of the data. The interquartile range is the difference between the first and third quartiles times by 1.5. The middle line of the box is the median. The median is the midpoint if all the were ordered from smallest to largest.

  How to read a box and whisker plot
Figure 5-33. - How to read a box and whisker plot

You can show multiple box and whisker plots on the same chart to show how the distributions change over time. When building your own box and whisker plots, you have the choice of shrinking, or not making the plots visible, to allow the box plot to stand out rather than the data points themselves to simplify the message for the reader.

Measuring the change in the length of the box and the whiskers will demonstrate how the distribution of your data changes overtime. As discussed with Control Charts, smaller ranges of data means the elements you are measuring have greater consistency. In most businesses, having better consistency means processes are easier to plan for and optimize so visually showing improvements is a key message to communicate.

Totals/Summaries

Another common addition to a chart is a total. A total often represents the sum of all the data points shown in a table or chart but can show different aggregations instead if required. Adding a total to a table is often very easy to complete in any tool that you are using to form your analysis. The same can’t be said when using totals in data visualisations but I will cover that after describing the basics.

Totals in Tables

It’s likely you will have read a table that contains a set of totals. Let’s explore the choices you have when using totals within a table:

Column Totals

Column totals are formed by adding up each of the measures in a single column. Remember to form the different rows within the table, there are categorical data fields that set the granularity of each row. Therefore, each row is a breakdown of the total for the metric being totalled. Frequently, the column total is shown at the bottom of the table (Figure 5-34) but it can be moved to the top of the table if required.

  Column totals
Figure 5-34. - Column totals
Row Totals

Row Totals are created by adding up each value found within a single row of a table. Row totals are more frequently used within pivot tables as a measure is spread across multiple columns rather than down a single column. The columns are likely to be headed as different variables of the same category or part of a date (Figure 5-35). Row totals are likely to appear on the right side of the metrics they relate to, but can be moved to the left if required.

  Row totals
Figure 5-35. - Row totals
Subtotals

An additional feature can be added to tables to show an intermediate total when tables are broken up by multiple categorical fields. Choosing which category forms the subtotal is driven by what questions the table is answering. In Figure 5-36, the category is broken down into subcategories so a subtotal gives the value for each category within the table.

  Subtotals in a table
Figure 5-36. - Subtotals in a table

Whilst Totals in tables are the sums of each column, summaries are a term that describe other aggregation methods. Any aggregation can be used as long as the user is clear on what is being shown. Averages, minimums or maximums are different aggregations that can be used where you’d normally expect to find the summed amount. This is frequently the case where highlight tables, first covered in Chapter 3 - Visualizing Data, are used because otherwise the total value will sway the color palette too much. In Figure 5-37, the effect of including the totals on the same sequential color palette as the values in the table. The overarching effect is that the color differentiation is reduced as the scale of the color covers a wider range than would be the case without the totals.

  Highlight table with summed totals
Figure 5-37. - Highlight table with summed totals

To ensure the color scale is still useful within the chart, a summary can be shown as the average across all the values in that column or row but clearly this shows a different piece of information to the Total. If you need to have totals in the view, clearly an average isn’t useful and therefore, an alternate chart might be more effective.

  Highlight table with average totals
Figure 5-38. - Highlight table with average totals

Totals in Charts

You may need to add totals to charts as well. The total in your chart can have a similar effect as what we saw when looking at the highlight table, but instead it would affect the length of the bar rather than the range of the color.

A poor use of totals in a chart would be to include a summed total at the bottom of a normal bar chart. The length of the total bar makes analysis of the other bars much harder to differentiate between them, as seen in Figure 5-39 which uses the same data as Figure 5-13 but the chart has a total added.

  Bar chart with summed total
Figure 5-39. - Bar chart with summed total

To share the total of the bars, it would be easier to not visualize the total but instead to share it elsewhere. Chapter 6 will look at some of the options to do this.

Totals are often required on charts where it is harder to form a clear view of the total. When bar or area charts use stacked sections, a total can allow the reader to view a total accurately (Figure 5-40).

  Stacked bar chart with total
Figure 5-40. - Stacked bar chart with total

Totals can help your audience find some key values to support their analysis. However, you need to use totals carefully to ensure they don’t actually hinder your audience’s analysis.

Summary

A range of visual elements play an important role in enhancing the message of your data visualizations. Use of color, size and shape to alter the marks on the view can help reduce the cognitive effort the audience must use to decode the message you are sharing.

Dual axis charts can be used to provide additional context to normal charts. Deciding when to synchronise the axes or not will depend on the question your visualization is answering. Additional elements can be added to charts to add extra detail or to help the audience interpret them. Totals can be used in either tables or charts but take care not to let them detract from the original chart.

There are many visual choices you can make when forming charts. The next chapter will look at the elements that surround a chart to help you communicate with data clearly.

1 Source of color blindness statistics: https://www.colourblindawareness.org/color-blindness/

2 Test conducted at: https://pilestone.co.uk/pages/color-blindness-simulator

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.212.93.133