Rank the observations in a data set based on the value of a numeric variable. Determine the quartile based on the value of a numeric variable. List the observations and their ranks and quartiles.
Official Results of Boston Marathon 1980-2004 1 With Winning Times Ranked from Fastest(1) to Slowest(N) Women's Division Year Winner Country Time Order Quartile 1980 Jacqueline Gareau Canada 2:34:28 25 Bottom 1981 Allison Roe New Zealand 2:26:46 20 Bottom 1982 Charlotte Teske West Germany 2:29:33 23 Bottom 1983 Joan Benoit United States 2:22:43 3 Top 1984 Lorraine Moller New Zealand 2:29:28 22 Bottom 1985 Lisa Larsen Weidenbach United States 2:34:06 24 Bottom 1986 Ingrid Kristiansen Norway 2:24:55 12 High Mid 1987 Rosa Mota Portugal 2:25:21 15 Low Mid 1988 Rosa Mota Portugal 2:24:30 10 High Mid 1989 Ingrid Kristiansen Norway 2:24:33 11 High Mid 1990 Rosa Mota Portugal 2:25:24 16 Low Mid 1991 Wanda Panfil Poland 2:24:18 8 High Mid 1992 Olga Markova Russia 2:23:43 6 Top 1993 Olga Markova Russia 2:25:27 17 Low Mid 1994 Uta Pippig Germany 2:21:45 2 Top 1995 Uta Pippig Germany 2:25:11 13 High Mid 1996 Uta Pippig Germany 2:27:12 21 Bottom 1997 Fatuma Roba Ethiopia 2:26:23 19 Low Mid 1998 Fatuma Roba Ethiopia 2:23:21 4 Top 1999 Fatuma Roba Ethiopia 2:23:25 5 Top 2000 Catherine Ndereba Kenya 2:26:11 18 Low Mid 2001 Catherine Ndereba Kenya 2:23:53 7 High Mid 2002 Margaret Okayo Kenya 2:20:43 1 Top 2003 Svetlana Zakharova Russia 2:25:20 14 Low Mid 2004 Cathering Ndereba Kenya 2:24:27 9 High Mid |
Official Results of Boston Marathon 1980-2004 2 With Winning Times Ranked from Fastest(1) to Slowest(N) Men's Division Year Winner Country Time Order Quartile 1980 Bill Rodgers United States 2:12:11 24 Bottom 1981 Toshihiko Seko Japan 2:09:26 13 High Mid 1982 Alberto Salazar United States 2:08:52 7 High Mid 1983 Greg Meyer United States 2:09:00 8 High Mid 1984 Geoff Smith Great Britain 2:10:34 19 Low Mid 1985 Geoff Smith Great Britain 2:14:05 25 Bottom 1986 Robert de Castella Australia 2:07:51 3 Top 1987 Toshihiko Seko Japan 2:11:50 23 Bottom 1988 Ibrahim Hussein Kenya 2:08:43 6 Top 1989 Abebe Mekonnen Ethiopia 2:09:06 10 High Mid 1990 Gelindo Bordin Italy 2:08:19 5 Top 1991 Ibrahim Hussein Kenya 2:11:06 22 Bottom 1992 Ibrahim Hussein Kenya 2:08:14 4 Top 1993 Cosmas Ndeti Kenya 2:09:33 14 Low Mid 1994 Cosmas Ndeti Kenya 2:07:15 1 Top 1995 Cosmas Ndeti Kenya 2:09:22 12 High Mid 1996 Moses Tanui Kenya 2:09:15 11 High Mid 1997 Lameck Aguta Kenya 2:10:34 19 Low Mid 1998 Moses Tanui Kenya 2:07:34 2 Top 1999 Joseph Chebet Kenya 2:09:52 17 Low Mid 2000 Elijah Lagat Kenya 2:09:47 16 Low Mid 2001 Lee Bong-Ju Korea 2:09:43 15 Low Mid 2002 Rodgers Rop Kenya 2:09:02 9 High Mid 2003 Robert Kipkoech Cherui Kenya 2:10:11 18 Low Mid 2004 Timothy Cherigat Kenya 2:10:37 21 Bottom |
Data Set | MARATHON |
Preparatory Step | PROC SORT |
Featured Step | PROC RANK |
Featured Step Statements and Options | PROC RANK statement: DESCENDING, GROUPS=, OUT=, and TIES= options
BY statement RANKS statement |
Report Step | PROC REPORT |
Report Step Options and Statements | BY statement DEFINE statement: ORDER option |
Formatting Features | Customizing titles with #BYVAL DEFINE statement: WIDTH option when sending output to the LISTING destination. |
A Closer Look | Creating the Rank and Percentile Variables
Choosing a Method for Ordering Ranks and Handling Tied Values Selecting the Number of Groups for the Percentile Categories See “Working with #BYVAL, #BYVAR, and #BYLINE when Customizing Titles” in Example 2.3 |
When comparing the values of a numeric variable, you may not be able to easily identify the high and low values or judge the position of values relative to certain percentile levels, such as the median or quartiles or deciles. This can be especially difficult when the variable values contain many digits or are formatted in a way other than standard decimal notation.
Creating variables based on ranks and percentiles provides a method for directly comparing observations. With this report, a rank variable and a quartile variable demonstrate the general trend of the winning times over many years of the Boston Marathon.
The ranks and quartiles are determined by gender. PROC RANK determines the ranks and quartiles, and the program executes PROC RANK twice, once to compute ranks and once to compute quartiles. PROC RANK saves the ranks and quartiles in output data sets that also contain the variables in the input data set. A DATA step merges the two PROC RANK output data sets. The report includes both the original data plus the new ranking statistics.
Each row in the report corresponds to one observation in the data set. The first four columns correspond to variables in the data set. The last two columns are the ranking statistics produced by PROC RANK.
A Closer Look |
PROC RANK can examine a variable’s values and determine individual rankings or group membership where you specify the number of groups that PROC RANK should create. PROC RANK can save these results in variables and output them along with the original data in the input data set for additional processing.
By default, PROC RANK assigns the lowest ranking to the observation with the lowest value of the variable listed in the VAR statement; it assigns the highest ranking to the observation with the highest value. The DESCENDING option reverses this by assigning high rankings to low values and low rankings to high values. With the TIES= option, you can specify whether tied values receive the highest or lowest possible ranking.
The first PROC RANK step in this example applies the default method for ordering rankings. It assigns the lowest ranking of 1 to the observation with the lowest value of TIME (the best time).
This default method of ranking, along with the specification of TIES=LOW, assures that tied values receive the best possible rank. For example, if three values are tied for second place, all three receive a rank of 2. PROC RANK always assigns the same rank to tied values, regardless of the value of the TIES= option.
proc rank data=marathon out=ordered ties=low;
The second PROC RANK step assigns group membership. With GROUPS=4, quartiles are defined and the quartile values assigned to each observation range from 0 to 3. The DESCENDING option gives the group value of 3 to the lowest set of values of TIME, the group value of 2 to the next lowest set of values, and so on. The TIES=HIGH option assures that tied values in this PROC RANK step receive the best possible rank.
proc rank data=marathon out=grouped groups=4 descending ties=high;
Table 2.9 shows the results of using all the combinations of the DESCENDING option with the HIGH and LOW values of the TIES= option. The variables are similar to the ones in the marathon example, but the values of QUARTILE are not formatted by the QUARTER format. Table 2.9 uses only four observations, of which two have tied values for TIME.
ORDER | QUARTILE | ||||||||
---|---|---|---|---|---|---|---|---|---|
Without DESCENDING | With DESCENDING | Without DESCENDING | With DESCENDING | ||||||
RUNNER | TIME | TIES=LOW | TIES=HIGH | TIES=LOW | TIES=HIGH | TIES=LOW | TIES=HIGH | TIES=LOW | TIES=HIGH |
Mike | 2:34:51 | 1 | 1 | 4 | 4 | 0 | 0 | 3 | 3 |
Patrick | 2:36:09 | 2 | 3 | 2 | 3 | 1 | 2 | 1 | 2 |
Glenn | 2:36:09 | 2 | 3 | 2 | 3 | 1 | 2 | 1 | 2 |
Carl | 2:40:34 | 4 | 4 | 1 | 1 | 3 | 3 | 0 | 0 |
The GROUPS= option specifies the number of categories in which to group the observations. The values of the group variable go from 0 to one less than the value specified in the GROUPS= option. Formatting the values of the group variable with a descriptive label makes it easy to identify each observation’s group in the report.
The formula for calculating group values is
FLOOR(rank*k/(n+1))
where FLOOR is the FLOOR function, rank is the value’s order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable.
If the number of observations is evenly divisible by the number of groups, the result is an equal number of observations in each group, provided there are no tied values at the boundaries of the groups. However, if the number of observations is not evenly divisible by the value of GROUPS=, you can’t partition observations into groups of equal sizes. In the marathon example, dividing the women’s results with 25 observations into four groups places seven observations in the “High Mid” category and places six observations in each of the other three.
BY statement processing. See “BY Statement” in the “Statements” section of SAS 9.1 Language Reference: Dictionary, and “Statements with the Same Function in Multiple Procedures” in the “Concepts” section of Base SAS 9.1 Procedures Guide.
The BYLINE/NOBYLINE SAS System Options. See “SAS System Options” in SAS 9.1 Language Reference: Dictionary.
The FLOOR function syntax, usage information, and additional examples. See “Floor Function” in the “Functions and CALL Routines” section of SAS 9.1 Language Reference: Dictionary.
Inserting BY-group information in titles. See “Creating Titles that Contain BY-Group Information” in the “Fundamental Concepts for Using Base SAS Procedures” section of Base SAS 9.1 Procedures Guide.
MERGE statement syntax, usage information, and additional examples. See “MERGE Statement” in the “Statements” section of SAS 9.1 Language Reference: Dictionary.
One-to-one merging. See “Combining SAS Data Sets: Basic Concepts” in “Reading, Combining, and Modifying SAS Data Sets” in the “DATA Step Concepts” section of SAS 9.1 Language Reference: Concepts.
PROC RANK reference, usage information, and additional examples. See “The RANK Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.
PROC REPORT reference, usage information, and additional examples. See “The REPORT Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.
18.191.61.38