Chapter 4
In This Chapter
Considering your choices when defining a variable
Defining variables
Entering numbers
Making sure that you’re using the right measurement type
To process your data, you have to get it into the computer. Entering data has been a problem with computers since the beginning. No matter how you decide to get your numbers into SPSS, at some point someone has to type them (unless they come from some form of automatic monitoring). These days, it feels like we spend half of our time entering data into online forms, which saves some analyst from typing on the other end. SPSS can read data from other places. You can also type directly into SPSS — and, if you want, copy the data to places other than SPSS later.
Entering data into SPSS is a two-step process: First, you define what sort of data you’ll be entering. Then you enter the actual numbers. This may sound difficult, but it isn’t so bad. When you see how data entry works in SPSS, you’ll discover you have some pretty nifty software to help you.
You organize your data into cases. Each case is made up of a collection of variables. First, you define the characteristics of the variables that make up a case, and then you enter the data into the variables to make up the contents of the cases. This chapter shows you how to work with this technique of getting data into your system.
You use the Variable View tab of the Data Editor window, shown in Figure 4-1, to define the names and characteristics of variables. This is where you always start if you plan on entering data into SPSS. As you can see in Figure 4-1, every characteristic you can define about your variables is named at the top of the window. All you have to do is enter something in each column for each variable.
Each variable characteristic has a default, so if you don’t specify a characteristic, SPSS fills one in for you. However, what it selects may not be what you want, so let’s look at all the possibilities.
The cell on the far left is where you enter the name of the variable. Just click the cell and type a short descriptor, such as age, income, sex, or odor. (A longer descriptor, called a label, comes later.) You can type longer names here, but you should keep them short because they’ll be used in named lists and as identifier tags on the data graphs and such — where the format can be a bit crowded. Names that are too long can cause the output from SPSS to be garbled or truncated.
If the name you assigned turns out to be too long or is misspelled, you can always change it on the Variable View tab. One of the nice things about SPSS is that you can correct mistakes quickly.
Most data you enter will be just regular numbers. Some, however, will be a special type, such as currency, and some will be displayed in a special format. Other data, such as dates, will require special procedures for calculation. You simply specify what type you have, and SPSS takes care of those other details for you. This is a comprehensive look at all the types. (We give you more advice about some special types in Chapter 7.)
Click the cell in the Type column you want to fill in, and a button with three dots appears on its right. Click that button, and the Variable Type dialog box, shown in Figure 4-2, appears.
You can choose from the following predefined types of variables:
String: A freeform non-numeric item (see Figure 4-6). The only good time to use string is when it truly is a string, like an address, a proper name, or a product code (SKU). Avoid using the String type when it really should be labeled Numeric. Something like favorite color, sex, or state should not be a string because it has a finite list of possibilities that are known in advance. (See the “Values” section later in this chapter.)
SPSS allows a very large number for the size of the string — so large that you could fit a paragraph, which is exactly what you would do if you were doing text mining. Open-ended response items in a survey would also be an example of a string.
The width setting in the definition of a variable determines the number of characters used to display the value. If the value to be displayed is not large enough to fill the space, the output will be padded with blanks. If it’s larger than you specify, it will either be reformatted to fit or asterisks will be displayed.
At this point, you can do one of three things:
The number of decimals is the number of digits that appear to the right of the decimal point when the value appears onscreen. This is the same number that you may have specified as the Decimal Places value when you defined the variable type. If you entered a number there, it appears here as the default. If you enter a number here, it changes the one you entered for the type. They’re the same.
Now you can do one of three things:
The name and the label serve the same basic purpose: They’re descriptors that identify the variable. The difference is that the name is the short identifier and the label is the long one. You need one of each because some output formats work fine with a long identifier and other formats need the short form.
You can use just about anything for the label. What you choose has to do with how you expect to use your data and what you want your output to look like. For example, the variable name may be “Sex” and the longer label may be “Boys and Girls,” “Men and Women,” or simply “Gender.”
You can also just skip defining a label. If you don’t have a label defined for a variable, SPSS will use the name you defined for everything.
The Values column is where you assign labels to all the possible values of a variable. If you select a cell in the Values column, a button with three dots appears. Clicking that button displays the dialog box shown in Figure 4-7.
Normally, you make one entry for each possible value that a variable can assume. For example, for a variable named Sex you could have the value 1 assigned the label “Male” and 2 assigned the label “Female.” Or, for a variable named Committed you could have 0 for “No,” 1 for “Yes,” and 2 for “Undecided.” If you have labels defined, when SPSS displays output, it can show the labels instead of the values.
To define a label for a value:
Click the Add button.
The value and label appear in the large text block.
You can always come back and change the definitions using the same process you used to enter them. The dialog box will reappear, filled in with all the definitions; then you can update the list.
You can specify what is to be entered for a value that is missing for a variable in a case. In other words, when you have values for all variables in a case except one, you can specify a placeholder for the missing value. Select a cell in the Missing column. Click the button with three dots and the Missing Values dialog box, shown in Figure 4-8, appears.
For example, say you’re entering responses to questions, and one of the questions is, “How many cars do you own?” The normal answer to this question is a number, so you define the variable type as a number. If someone chooses to ignore this question, this variable won’t have a value. However, you can specify a placeholder value. Perhaps 0 seems like a good choice for a placeholder here, but it’s not really — lots of people don’t have cars. Instead, a less likely value — like, say, –1 — makes a better choice. A very popular choice among SPSS users is –9, but this will depend on the values of the original variable.
You can even specify unique values to represent different reasons for a value being missing. In the previous example, you could define –1 as the value entered when the answer is, “I don’t remember,” and –2 could be used when the answer is, “None of your business.” If you specify that a value is representing a missing value, that value is not included in general calculations. During your analysis, however, you can determine how many values are missing for each of the different reasons. You can specify up to three specific values (called discrete values) to represent missing data, or you can specify a range of numbers along with one discrete value, all to be considered missing. The only reason you would need to specify a range of values is if you have lots of reasons why data is missing and want to track them all.
The Columns column is where you specify the width of the column you’ll use to enter the data. The folks at SPSS could have used the word Width to describe it, but they already used that term for the width of the data itself. A better name may have been the two words Column Width, but that would have been too long to display nicely in this window, so they just called it Columns. To specify the number of columns, select a cell and enter the number.
The Align column determines the position of the data in its allocated space, whenever the data is displayed for input or output. The data can be left-aligned, right-aligned, or centered. You’ve defined the width of the data and the size of the column in which the data will be displayed; the alignment determines what is done with any space left over.
When you select a cell in the Align column, a list appears and you can choose one of the three alignment possibilities, as shown in Figure 4-9. Aligning to the left means inserting all blanks on the right; aligning to the right inserts all the extra spaces on the left; centering the data splits the spaces evenly on each side — we don’t know what it does if an odd space is left over. (We also worry about things like the number of seeds in a tomato and where the clouds go at night.)
Your value here specifies the measure of something in one of three ways. When you click a cell in the Measure column, you can select one of these choices (see Figure 4-10):
Some of the SPSS dialog boxes select variables according to their role and include them as defaults. You don’t need to worry about this characteristic. It can be handy when you have some experience with SPSS and understand how defaults are chosen.
When you click a cell in the Role column, you can select one of six choices (see Figure 4-11):
After you’ve defined all the variables for each case, click the Data View tab of the Data Editor window so you can begin typing the data. At the top of the columns in Figure 4-12, you can see some names we chose for variables. Switching to the Data View tab makes the window ready to receive entered data — and to verify that what’s entered matches the specified format and type of the data.
Entering data into one of these cells is straightforward: You simply click the cell and start typing.
If something is already in a cell and you want to change it instead of just typing over it, look up toward the top of the window, just underneath the toolbar: You’ll see the name of the variable and the currently selected value. Click the value in the field at the top, and you can edit it right there. You can do all the normal mouse and keyboard stuff there, too — you can use the Backspace key to erase characters, or select the entire value and type right over it.
If your data is already in a file, you may be able to avoid typing it in again by reading that file directly into SPSS. For more information, see Chapter 5.
We all have to go back and refine our variable definitions from time to time. That’s normal. When you come across something that doesn’t do what you want it to, just switch back to the Variable View tab and correct it. Nobody but you and SPSS will ever know about it, and SPSS never talks.
Now that you’ve defined your variables and entered your data, you may want to check that you have names defined for all your actual ordinal and nominal values, and that you have defined the correct measures for them. SPSS can help by scanning your data, finding values for which you don’t have definitions, and pointing them out in a friendly way.
The following steps use an existing file to walk through a demonstration:
Choose File ⇒ Open ⇒ Data to load the file named car_sales.sav.
This file came with your installation of SPSS and is found, along with a number of other files, in the same directory in which you installed SPSS. You can load any of these data files, but car_sales.sav is the one used in this demonstration. If you load this file while you already have some other data showing in the window, SPSS will open a new Data Editor window to display the new information; your existing data will not be lost.
When you open this data file — or any data file, for that matter — SPSS opens the SPSS Statistics Viewer window to tell you that it has opened a file (or the information could be displayed in the SPSS Statistics Viewer window that’s already open). You won’t need this information for what you’re doing here, so you can just close the window.
Choose Data ⇒ Define Variable Properties.
The Define Variable Properties dialog box appears.
Select one of the variable names in the list on the left.
Its different values appear in the center of the dialog box, as shown in Figure 4-14. (In this example, every value has a name assigned to it.)
Ask SPSS to suggest a new type for this variable by clicking the Suggest button in the top center of the dialog box.
The dialog box in Figure 4-15 appears, telling you what SPSS concludes about this variable and its values. This same window, with different text, appears for each variable you test. Sometimes the text suggests changes in the variable definition, and sometimes it doesn’t.
To apply any changes, click Continue.
You return to the window shown in Figure 4-14, where you can select another variable.
You won’t want to make changes to all your variables, but SPSS helps you find the ones that you do need to change. Values defined as “missing” are not included in the computations. The text in the window always explains the criteria used to reach a conclusion, and SPSS allows you to make the final decision.
18.225.95.60