Chapter 1

Introducing SPSS

In This Chapter

arrowConsidering the quality of your data

arrowCommunicating with SPSS

arrowSeeing how SPSS works

arrowFinding help when you’re stuck

A statistic is a number. A raw statistic is a measurement of some sort. It’s fundamentally a count of something — occurrences, speed, amount, or whatever. A statistic is calculated using a sample. In a sense, a sample is the keyhole you have to peer through to the population, which is what you’re trying to understand. The value at the population level — the average height of an American male, for instance — is called a parameter. Unless you’ve got all the data there is, and you’ve collected a census of the population, you have to make do with the data in your sample. The job of SPSS is to calculate. Your job is to provide a good sample.

In this chapter, we discuss the importance of having accurate, reliable data, and some of the implications when this is not the case. We also talk about how best to organize your data in SPSS and the different kinds of files that SPSS creates. We take a trip down memory lane and discuss the origins of SPSS, as well as what can be done in the program and different ways of communicating with the software. Finally, we spend some time discussing different ways in which you can get help when navigating SPSS.

Garbage In, Garbage Out: Recognizing the Importance of Good Data

SPSS doesn’t warn you when there is something wrong with your sample. Its job is to work on the data you give it. If what you give SPSS is incomplete or biased, or if there is data that doesn’t belong in there, the resulting calculations won’t reflect the population very well. Not much in the SPSS output will signal to anyone that there is a problem. So, if you’re not careful, you can conclude just about anything from your data and your calculations.

Consider the data in Table 1-1. What if you calculated the survival rate of Titanic passengers based on this small sample? What if you calculated what fraction of the passengers were in each class of service? You can easily see that you’d be in real trouble.

Table 1-1 Sample of Titanic Passengers

Survived or Died

Class

Name

Sex

Age

Fare Paid

Cabin

Embarkation

Died

1

Andrews, Mr. Thomas, Jr.

Male

39

0.00

A36

Southampton

Died

1

Parr, Mr. William Henry Marsh

Male

0.00

Southampton

Died

1

Fry, Mr. Richard

Male

0.00

B102

Southampton

Died

1

Harrison, Mr. William

Male

40

0.00

B94

Southampton

Died

1

Reuchlin, Mr. John George

Male

38

0.00

Southampton

Died

2

Parkes, Mr. Francis “Frank”

Male

0.00

Southampton

Died

2

Cunningham, Mr. Alfred Fleming

Male

0.00

Southampton

Died

2

Campbell, Mr. William

Male

0.00

Southampton

Died

2

Frost, Mr. Anthony Wood “Archie”

Male

0.00

Southampton

Died

2

Knight, Mr. Robert J.

Male

0.00

Southampton

Died

2

Watson, Mr. Ennis Hastings

Male

0.00

Southampton

Died

3

Leonard, Mr. Lionel

Male

36

0.00

Southampton

Died

3

Tornquist, Mr. William Henry

Male

25

0.00

Southampton

Died

3

Johnson, Mr. William Cahoone, Jr.

Male

19

0.00

Southampton

Died

3

Johnson, Mr. Alfred

Male

49

0.00

Southampton

However, consider this: Would you be tempted to drop these cases from your analysis because their fare information appears to be missing? What if fare information were provided for all the other passengers? You might drop the cases in Table 1-1 but use everyone else. You’d be dropping only a handful of passengers out of hundreds, so that would be okay, right? The answer is no, it would not be okay. As it turns out, there is a good reason that each of these passengers didn’t pay a fare (for example, Mr. Thomas Andrews, Jr., designed the ship), and if this was your data, your job would be to know that.

Sampling is a big topic, but here’s the quick version:

  • The data points in your sample should be drawn at random from the population.
  • There should be enough data points.
  • You should be able to justify the removal of any data points.

remember This book is not about the accuracy, correctness, or completeness of the input data. Your data is up to you. This book shows you how to take the numbers you already have, put them into SPSS, crunch them, and display the results in a way that makes sense. Gathering valid data and figuring out which crunch to use is up to you.

warning Your data is your most valuable possession, so be sure to back it up. Make sure you have multiple backups, with at least one stored offsite. The last thing you want is to lose your data.

Talking to SPSS: Can You Hear Me Now?

More than one way exists for you to command SPSS to do your bidding. You can use any of four approaches to perform any of the SPSS functions, and we cover them all in this section. The method you should choose depends not only on which interface you prefer, but also (to an extent) on the task you want performed.

The graphical user interface

SPSS has a window interface. You can issue commands by using the mouse to make menu selections that cause dialog boxes to appear. This is a fill-in-the-blanks approach to statistical analysis that guides you through the process of making choices and selecting values. The advantage of the graphical user interface (GUI) approach is that, at each step, SPSS makes sure you enter everything necessary before you can proceed to the next step. This interface is preferred for those just starting out — and if you don’t go into depth with SPSS, this may be the only interface you ever use.

Syntax

Syntax is the internal language used to command actions from SPSS. It’s the command syntax of SPSS (hence, its name). Syntax is often referred to as the “command language.” You can use the Syntax command language to enter instructions into SPSS and have it do anything it’s capable of doing. In fact, when you select from menus and dialog boxes to command SPSS, you’re actually generating Syntax commands internally that do your bidding. In other words, the GUI is nothing more than the front end of a Syntax command-writing utility.

Writing (and saving) command-language programs is a good way to create processes that you expect to repeat. You can even grab a copy of the Syntax commands generated from the menu and save them to be repeated later.

Python programs

Python is a general-purpose language that has a collection of SPSS modules written for it; you can use Python to write programs that work inside SPSS. You can also run Python with the Syntax language to command SPSS to perform statistical functions.

One advantage of Python is that it’s a modern language, complete with the power and convenience that come with such languages, including the capability of constructing a more readable program. In addition, because Python is a general-purpose language, you can read and write data in other applications and files. Think of Python programs as a way of making Syntax more powerful.

Python scripts

What SPSS calls scripts are also written in Python, but they help you manipulate the GUI. They’re a little more advanced and quite powerful. You use Python scripts to automatically highlight certain results in the SPSS output, for instance.

How SPSS Works

The developers of SPSS have made every effort to make the software easy to use. SPSS prevents you from making mistakes or even forgetting something. That’s not to say it’s impossible to do something wrong in SPSS, but the SPSS software works hard to keep you from running into the ditch. To foul things up, you almost have to work at figuring out a way of doing something wrong.

You always begin by defining a set of variables; then you enter data for the variables to create a number of cases. For example, if you’re doing an analysis of automobiles, each car in your study would be a case. The variables that define the cases could be things such as the year of manufacture, horsepower, and cubic inches of displacement. Each car in the study is defined as a single case, and each case is defined as a set of values assigned to the collection of variables. Every case has a value for each variable. (Well, you can have a missing value, but that’s a special situation described later.)

Each variable is a specific type. Types describe how the data is stored — for example, as letters (strings), as numbers, as dates, or as currency (see Chapter 4 for more information on data types). Each variable is defined as containing a certain kind of number, so you also have to define the variable’s level of measurement. For example, a scale variable is a numeric measurement, such as weight or miles per gallon. A categorical variable contains values that define a category; for example, a variable named gender could be a categorical variable defined to contain only values 1 for female and 2 for male. Things that make sense for one type of variable don’t necessarily make sense for another. For example, it makes sense to calculate the average miles per gallon, but not the average gender.

After your data is entered into SPSS — your cases are all defined by values stored in the variables — you can easily run an analysis. You’ve already finished the hard part. Running an analysis on the data is simple compared to entering the data. To run an analysis, you select the analysis you want to run from the menu, select the appropriate variables, and click OK. SPSS reads through all your cases, performs the analysis, and presents you with the output as tables or graphs. Of course, you have to know which analysis to chose. For that, too, we have you covered (see Part V).

You can instruct SPSS to draw graphs and charts directly from your data the same way you instruct it to do an analysis. You select the desired graph from the menu, assign variables to it, and click OK.

tip When you’re preparing SPSS to run an analysis or draw a graph, the OK button is unavailable until you’ve made all the choices necessary to produce output. Not only does SPSS require that you select a sufficient number of variables to produce output, but it also requires you to choose the right kinds of variables. If a categorical variable is required for a certain slot, SPSS won’t allow you to choose any other kind of variable. Whether the output makes sense is up to you and your data, but SPSS makes sure that the choices you make can be used to produce some kind of result.

All output from SPSS goes to the same place — a dialog box named SPSS Statistics Viewer. This dialog box displays the results of whatever you’ve done. After you’ve produced output, if you perform some action that produces more output, the new output is displayed in the same dialog box. And almost anything you do produces output. Of course, you need to know how to interpret the output — SPSS will help you, and so does this book.

Getting Help When You Need It

You’re not alone. Some immediate help comes directly from the SPSS software package. More help can be found online. If you find yourself stumped, you can look for help in several places:

  • Topics: Choosing Help ⇒ Topics from the main window of the SPSS application is your gateway to immediate help. The help is somewhat terse, but it usually provides exactly the information you need. The information is in one large Help document, presented one page at a time. Choose Contents to select a heading from an extensive table of contents, choose Index to search for a heading by entering its name, or choose Search to enter a search string inside the body of the Help text.

    tip In the Help directory, the titles in all uppercase are descriptions of Syntax language commands.

  • Tutorial: Choose Help ⇒ Tutorial to open a dialog box with the outline of a tutorial that guides you through many parts of SPSS. You can start at the beginning and view each lesson in turn, or you can select your subject and view just that.
  • Case Studies: Choose Help ⇒ Case Studies to open a dialog box containing examples in a format similar to that of the Tutorial. You can select titles from the outline and view descriptions and examples of specific instances of using SPSS. You can also find descriptions of the different types of calculations. If some particular analysis type is eluding your comprehension, this is a good place to look.
  • Statistics Coach: Choose Help ⇒ Statistics Coach if you have a good idea of what you want to do but you need some specific information on how to go about doing it.
  • Command Syntax Reference: Choose Help ⇒ Command Syntax Reference to display more than 2,000 pages of references to the Syntax language in your PDF viewer. The regular Help topics, mentioned earlier, provide a brief overview of each topic, but this document is much more detailed.
  • Algorithms: Choose Help ⇒ Algorithms to get detailed information on how processes work internally. This is where you can dive far down into the internals. If you want to take a look at the math and how it’s applied, this is where you should start.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.255