Entering Data: An Illustrative Example

Before data can be entered and analyzed by SAS, they must be entered in some systematic way. There are a number of different approaches to entering data; to keep things simple, this chapter presents only the fixed format approach. With the fixed format method, each variable is assigned to a specific column (or set of columns) in the dataset. The fixed format method has the advantage of being very general: you can use it for almost any type of research problem. An additional advantage is that researchers are probably less likely to make errors when entering data if they adhere to this format.

In the following example, you actually enter some fictitious data from a fictitious study. Assume that you have developed a survey to measure attitudes toward volunteerism. A copy of the survey appears here:

                         Volunteerism Survey

Please indicate the extent to which you agree or
 disagree with each of
the following statements.  You will do this by
 circling the appropriate
number to the left of that statement.  The
 following format shows what
each response alternative represents:

     5 = Agree Strongly
     4 = Agree Somewhat
     3 = Neither Agree nor Disagree
     2 = Disagree Somewhat
     1 = Disagree Strongly

For example, if you "Disagree Strongly" with the
 first question, circle
the "1" to the left of that statement.  If you
 "Agree Somewhat," circle
the "4," and so on.

-------------
 Circle Your
  Response
-------------
1  2  3  4  5     1.     I feel a personal
 responsibility to
                         help needy people in my
 community.

1  2  3  4  5     2.     I feel I am personally
 obligated to
                         help homeless families.

1  2  3  4  5     3.     I feel no personal
 responsibility to
                         work with poor people in
 my community.

1  2  3  4  5     4.     Most of the people in my
 community are
                         willing to help the needy.

1  2  3  4  5     5.     A lot of people around
 here are willing
                         to help homeless families.

1  2  3  4  5     6.     The people in my
 community feel no personal
                         responsibility to work
 with poor people.

1  2  3  4  5     7.     Everyone should feel the
 responsibility to
                         perform volunteer work in
 his/her community.

What is your age in years? _______________


Further assume that you administer this survey to 10 participants. For each of these individuals, you also obtain their intelligence quotient or IQ scores.

You then enter your data as a file in a computer. All of the survey responses and information about participant 1 appear on the first line of this file. All of the responses and information about participant 2 appear on the second line of this file, and so forth. You keep the data aligned so that responses to question 1 appear in column 1 for all participants, responses to question 2 appear in column 2 for all participants, and so forth. When you enter data in this fashion, your dataset should look similar to this:

2234243 22  98  1
3424325 20 105  2
3242424 32  90  3
3242323  9 119  4
3232143  8 101  5
3242242 24 104  6
4343525 16 110  7
3232324 12  95  8
1322424 41  85  9
5433224 19 107 10

You can think of the preceding dataset as a matrix consisting of 10 rows and 17 columns. The rows run horizontally (from left to right), and each row represents data for a different participant. The columns run vertically (up and down). For the most part, a given column represents a different variable that you measured or created. (Though, in some cases, a given variable is more than one column wide, but more on this later.)

For example, look at the last column in the matrix: the vertical column on the right side that goes from 1 (at the top) to 10 (at the bottom). This column codes the Participant Number variable. In other words, this variable simply tells us which participant’s data are included on that line. For the top line, the assigned value of Participant Number is 1, so you know that the top line includes data for participant 1. The second line down has the value 2 in the participant number column, so this second line down includes data for participant 2, and so forth.

The first column of data includes participant responses to survey question 1. It can be seen that participant 1 selected “2” in response to this item, while participant 2 selected “3.” The second column of data includes participants’ responses to survey question 2, the third column codes question 3, and so forth. After entering responses to question 7, you left column 8 blank. Then, in columns 9 and 10, you enter each participant’s age. We can see that participant 1 was 22 years old, while participant 2 was 20 years old. You left column 11 blank, and then entered the participants’ IQs in columns 12, 13, and 14. (IQ can be a three-digit number, so it required three columns to enter it.) You left column 15 blank, and entered participant numbers in columns 16 and 17.

The following table presents a brief coding guide to summarize how you entered your data.

ColumnVariable NameExplanation
1Q1Responses to survey question 1
2Q2Responses to survey question 2
3Q3Responses to survey question 3
4Q4Responses to survey question 4
5Q5Responses to survey question 5
6Q6Responses to survey question 6
7Q7Responses to survey question 7
8blank
9-10AGEParticipant’s age in years
11blank
12-14IQParticipant’s IQ score
15blank
16-17NUMBERParticipant’s number

Guides similar to this are used throughout this text to explain how datasets are arranged, so a few words of explanation are in order. This table identifies the specific columns in which variable values are assigned. For example, the first line of the preceding table indicates that in column 1 of the dataset, the values of a variable called Q1 are stored, and this variable includes responses to question 1. The next line shows that the values of variable Q2 are stored in column 2, and this variable includes responses to question 2. The remaining lines of the guide are interpreted in the same way. You can see, therefore, that it is necessary to read down the lines of this table to learn what is in each column of the dataset.

A few important notes about how you should enter data to be analyzed by SAS:

  • Make sure that you enter variables in the correct column. For example, make sure that the data are lined up so that responses to question 6 always appear in column 6. If a participant happened to leave question 6 blank, then you should leave column 6 blank when you are entering your data. (Leave this column blank by pressing the space bar on your keyboard.) Then, go on to type the participant’s response to question 7 in column 7. Do not enter a zero if the participant didn’t answer a question; leave the space blank.

  • It is also acceptable to enter a period (.) instead of a blank space to represent missing data. When using this convention, if a participant has a missing value on a variable, enter a single period in place of that missing value. If this variable happens to be more than one column wide, you should still enter just one period. For example, if the variable occupies columns 12 to 14 (as does IQ in the table), enter just one period in column 14; do not enter three periods in columns 12, 13, and 14.

  • Right-justify numeric data. You should align numeric variables to the right side of columns in which they appear. For example, IQ is a three-digit variable (it could assume values such as 112 or 150). However, the IQ score for many individuals is a two-digit number (such as 99 or 87). Therefore, the two-digit IQ scores should appear to the right side of this three-digit column of values. A correct example of how to right-justify your data follows:

     99
    109
    100
     87
    118

    The following is not right-justified and is less preferable:

    99
    109
    100
    87
    118

    There are exceptions to this rule. For example, if numeric data contain decimal points, it is generally preferable to align the decimal points when entering the data so that the decimals appear in the same column. If there are no values to the right of the decimal point for a given participant, you can enter zeros to the right of the decimal point. Here is an example of this approach:

      3.450
     12.000
      0.133
    144.751
      0.000

    The preceding dataset includes scores for five participants for just one variable. Assume that possible scores for this variable range from 0.00 to 200.00. Participant 1 had a score of 3.45, participant 2 had a score of 12, and so forth. Notice that the scores have been entered so that the decimal points are aligned in the same vertical column.

    Notice also that if a given participant’s score does not include any digits to the right of the decimal point, zeros have been added. For example, participant 2 has a score of 12. However, this participant’s score is entered as 12.000 so that it is aligned with the other scores.

    Technically, it is not always necessary to align participant data in this way in order to include it in a SAS dataset: however, arranging data in an orderly fashion generally decreases the likelihood of making errors when entering data.

  • Left-justify character data. Character variables can include letters of the alphabet. In contrast to numeric variables, you typically should left-justify character variables. This means that you align entries to the left, rather than to the right.

    For example, imagine that you are going to enter two character variables for each participant. The first variable will be called FIRST, and this variable will include each participant’s first name. You will enter this variable in columns 1 to 15. The second variable will be called LAST and will include each participant’s surname. You will enter this variable in columns 16 to 25. Data for four participants are reproduced here:

    Francis          Smith
    Ishmael          Khmali
    Michel           Hébert
    Jose             Lopez

    The preceding shows that the first participant is named Francis Smith, the second is named Ishmael Khmali, and so forth. Notice that the value “Francis” is moved to the left side of the column that include the FIRST variable (columns 1 to 15). The same is true for “Ishmael,” as well as the remaining first names. In the same way, “Smith” is moved over to the left side of the columns that include the LAST variable (columns 16 to 25). The same is true for the remaining surnames.

  • Use of blank columns can be helpful but is not necessary. Recall that when you entered your data, you left a blank column between Q7 and the AGE variable, and another blank column between AGE and IQ. Leaving blank columns between variables can be helpful because it makes it easier to look at your data and see if something has been entered out of place. However, leaving blank columns is not necessary for SAS to accurately read your data, so this approach is optional (though recommended).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.31.50