Many SPSS users miss out on the advanced data visualization capabilities in SPSS because they do their charting in Excel, or don’t go beyond the basic capability of Chart Builder. However, data visualization is not just about sending a handful of data points to a charting menu. If it were, there would be little risk in doing your descriptive statistics in SPSS and your charting in Excel. The following are some reasons why it can be a less-than-efficient approach:
At a minimum you should avoid the copy-and-paste maneuver by utilizing the Output Management System, which is discussed in detail in Chapter 17. In this chapter we survey the landscape of SPSS graphics. We will explore options, but the broader goal is to understand SPSS graphics using menus to pave the way for the discussion of Graphics Production Language (GPL) in Chapter 6. Candidly, SPSS graphics has grown confusing in recent versions because multiple, seemingly competing systems are vying for your attention (even though underneath they all use the same charting engine).
There are three graphing options in SPSS Statistics: Chart Builder, Graphboard Template Chooser, and the Legacy Dialogs. Each is discussed in this chapter. First, we explore the history behind the design of current graphics options in SPSS. We investigate an influential book, The Grammar of Graphics by Leland Wilkinson (Springer, 2005), because the author of that book played a role in the design of SPSS graphics. Also, the popular ggplot2 package in the R statistical programming language is named after that book. Many who are impressed with R graphics, and who might be a bit befuddled by SPSS graphics, probably don’t realize that both are the intellectual heirs of the same author. We then discuss the graphing options in the menus, then the concepts behind them, and finally we walk through some examples.
The Graphs menu, shown in Figure 5.1, offers three fairly comprehensive submenus:
The three other menu items are extension commands (that generate GPL).
Legacy Dialogs are the original graphing options in SPSS. The options here are the least interesting. For example, when you explore the Legacy Dialogs looking for Bar Charts, you are greeted with some pretty standard options, as shown in Figure 5.2
In the next section we discuss this kind of design principle, but essentially it starts with a chart type followed by populating a fairly rigid structure with variables. Finally, there is usually pretty extensive editing involved. A quick look at the pasted syntax shows that there don’t seem to be a lot of options, which would leave much of the work to the editing window:
GRAPH /BAR(SIMPLE)=COUNT BY degree.
This is one of the biggest problems with the legacy graphs. They have very simplified syntax, which seems like a good thing until you realize that everything is standardized. If you want to customize a graph, it has to be done after the fact (that is, the graph has to be manually edited). For those who use Excel, this may be the only approach to creating charts that you’ve tried. The heavy lifting is in the editing, and if you have been disappointed with SPSS, it is probably because you are frustrated with not being able to transform the resulting chart into what you want during the editing process. While the frustration is understandable, you can consider a completely different approach. That alternative approach, revolutionary in its thinking, is explored for the entire balance of the chapter.
The first menu option in Figure 5.1 is the SPSS Chart Builder. In this menu, more extensive options are available before you create the chart—that is before you get to the editing window—than are available in the Legacy Dialogs. Having these options available before you create the chart is important because the chart can be more easily automated, customized, and replicated. Most actions in the editing window, shown in Figure 5.3, are manual.
There is a clue that we have more options—the Basic Elements tab (see Figure 5.4), and the element properties. The revolutionary approach is to build up visualizations as a collection of elements, thoughtfully mixing and matching these elements, paving the way to many combinations that might be impossible if you were choosing only from the traditional choices in the gallery.
If we paste the SPSS syntax from this menu, the result provides further evidence that we are in new territory. The details don’t matter now except to observe that, while clearly more complex, it is also richer in options: options that can be changed and options that can be saved for later, obviating spending all of our time in the editing window. This language, GPL, is the subject of Chapter 6. Learning GPL opens the doors to hundreds of options that you would not have via the menus. The Chart Builder menus only allow you to do about 5 to 8 % of what is possible with the GPL language.
* Chart Builder. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=degree COUNT()[name="COUNT"] MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: degree=col(source(s), name("degree"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) GUIDE: axis(dim(1), label("RS HIGHEST DEGREE")) GUIDE: axis(dim(2), label("Count")) SCALE: cat(dim(1), include("0", "1", "2", "3", "4")) SCALE: linear(dim(2), include(0)) ELEMENT: interval(position(degree*COUNT), shape.interior(shape.square)) END GPL.
The final Graphs menu choice is what we focus on in this chapter. The Graphboard Template Chooser (see Figure 5.5) allows users to select variables and then the appropriate charts are suggested based on the type of data. We can use a gallery approach, or an elements approach.
In contrast to the Chart Builder, our syntax looks different. Note that both the Chart Builder and the Graphboard Template Chooser generate code using the GGRAPH command.
GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=degree[LEVEL=nominal] MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=VIZTEMPLATE(NAME="Bar of Counts"[LOCATION=LOCAL] MAPPING( "categories"="degree"[DATASET="graphdataset"] "Summary"="count")) VIZSTYLESHEET="Traditional"[LOCATION=LOCAL] LABEL='BAR OF COUNTS: degree' DEFAULTTEMPLATE=NO.
There is no denying that having three options gets confusing. Here is the bottom line for deciding which is right for you:
In this section we discuss The Grammar of Graphics by Leland Wilkinson. Knowledge of these ideas is interesting on its own, and will aid in understanding why the Graphboard Template Chooser looks and works the way it does. As we show in the next chapter, GPL’s structure comes directly from The Grammar of Graphics, and the Graphboard Template Chooser is basically a menu system to eliminate the need to learn GPL. However, after reading both chapters, you may decide that GPL is not that difficult after all, and you may decide to work directly with GPL.
Should you read The Grammar of Graphics? For most the answer is probably not, and to some it would seem a tortuous read and a strange book indeed, and for a few it would be a fascinating read. It is quite abstract, with lots of math notation, and is written for computer scientists and theoreticians. The practical nuts and bolts that are useful for the SPSS practitioner will be reviewed here.
For our purposes, what you need to know is that this approach is an alternative to having a chart typology. You can use a standard chart type as a starting off point, but this approach is different—it is all about elements and aesthetics.
Each element and aesthetic can make a different aspect of your data visual. For instance, consider the bubble chart made famous by Hans Rosling https://www.gapminder.org/world/. Although he is using his own software in his well-received TED videos, it is easy to use his graphics as an example. A point element shows the location of the countries on two axes. This alone would be a standard scatter plot. By adding aesthetics he enriches the information content many fold. He uses color for region of the world, size for population, and even animation for calendar year. All of this is possible in the Graphboard Template Chooser, and we will do a case study like this.
Because the graphic elements and aesthetics are like words, and the grammar allows us to create “sentences,” we can make countless visualizations. Rather than restricting ourselves to a dozen (or even several dozen) chart types, we have boundless options, including combinations of elements that the developers of the grammar possibly never envisioned themselves. If you can draft an example on a whiteboard, and the data is capable of showing the relationships, then there is a very good chance that you can create it in SPSS.
Sometimes we think we are helping our audience if we make charts very simple, and put only statistics on each slide. This actually forces us to use our memory to establish relationships. Rosling is showing a great deal of information all at once, but that is precisely what makes the relationships easy to see.
For our first example, we will create a bar chart showing the relationship between how old people are when they have their first child and region of the country the live in. Then we will add a couple of variations like using color as an aesthetic to differentiate between men and women.
Click OK.
We have now created our graph, as shown in Figure 5.9.
At a glance we can see that the survey respondents in East South Central USA were the youngest on average when they had their first child. The bars are “range bars” showing minimums and maximums. Not surprisingly, in every region, the maximum age of the men was older.
At this point we could make many other changes. We could exclude categories, add captions, change font styles and sizes, etc.
In this case study, we are going to do a bubble chart, not unlike the ones made famous by Hans Rosling’s TED videos. The advantages of this case study are:
As shown on Figure 5.14, the result on default settings shows the shape of our graphic. It already shows the pattern, but it could use some editing to be more readable.
Some possible edits for you to try are:
For now we will just modify the point labeling.
In exploring the relationship between the percentage of adults (over 25 years old) who have earned a Bachelor’s degree and the unemployment rate at the state level, we discover that the relationship is rather weak. What becomes interesting in this graphic are the outliers. West Virginia has lowest degree attainment, but does not have the lowest unemployment. Michigan is average on degree attainment, but is very high on unemployment. The District of Columbia is striking in that it occupies a completely different position on the graphic than any of the states. If we had done a traditional scatter plot without labeling, and without color, none of this would have been visible. Even though there is not a strong correlation here, we still learn a lot of these five variables: region, population, degree attainment, unemployment, and state.
The Graphboard Template Chooser was made available to the SPSS community as a way of avoiding having to learn GPL, although you’ve probably learned more about GPL than you realize. Take a little time to familiarize yourself with it. You may opt to circle back and apply the approach you’ve learned in this chapter, or you may decide to take what you’ve learned to the next level and build your graphics with a programming language approach.
3.128.156.46