Example 2.9. Listing the Rank and Percentile for Each Observation

Goal

Rank the observations in a data set based on the value of a numeric variable. Determine the quartile based on the value of a numeric variable. List the observations and their ranks and quartiles.

Report

                     Official Results of Boston
 Marathon 1980-2004                                  1
                 With Winning Times Ranked from
 Fastest(1) to Slowest(N)

                                     Women's Division

          Year  Winner                  Country   
        Time   Order  Quartile
          1980  Jacqueline Gareau       Canada    
     2:34:28      25  Bottom
          1981  Allison Roe             New
 Zealand    2:26:46      20  Bottom
          1982  Charlotte Teske         West
 Germany   2:29:33      23  Bottom
          1983  Joan Benoit             United
 States  2:22:43       3  Top
          1984  Lorraine Moller         New
 Zealand    2:29:28      22  Bottom
          1985  Lisa Larsen Weidenbach  United
 States  2:34:06      24  Bottom
          1986  Ingrid Kristiansen      Norway    
     2:24:55      12  High Mid
          1987  Rosa Mota               Portugal  
     2:25:21      15  Low Mid
          1988  Rosa Mota               Portugal  
     2:24:30      10  High Mid
          1989  Ingrid Kristiansen      Norway    
     2:24:33      11  High Mid
          1990  Rosa Mota               Portugal  
     2:25:24      16  Low Mid
          1991  Wanda Panfil            Poland    
     2:24:18       8  High Mid
          1992  Olga Markova            Russia    
     2:23:43       6  Top
          1993  Olga Markova            Russia    
     2:25:27      17  Low Mid
          1994  Uta Pippig              Germany   
     2:21:45       2  Top
          1995  Uta Pippig              Germany   
     2:25:11      13  High Mid
          1996  Uta Pippig              Germany   
     2:27:12      21  Bottom
          1997  Fatuma Roba             Ethiopia  
     2:26:23      19  Low Mid
          1998  Fatuma Roba             Ethiopia  
     2:23:21       4  Top
          1999  Fatuma Roba             Ethiopia  
     2:23:25       5  Top
          2000  Catherine Ndereba       Kenya     
     2:26:11      18  Low Mid
          2001  Catherine Ndereba       Kenya     
     2:23:53       7  High Mid
          2002  Margaret Okayo          Kenya     
     2:20:43       1  Top
          2003  Svetlana Zakharova      Russia    
     2:25:20      14  Low Mid
          2004  Cathering Ndereba       Kenya     
     2:24:27       9  High Mid


                     Official Results of Boston
 Marathon 1980-2004                                  2
                 With Winning Times Ranked from
 Fastest(1) to Slowest(N)

                                      Men's Division

          Year  Winner                  Country   
        Time   Order  Quartile
          1980  Bill Rodgers            United
 States  2:12:11      24  Bottom
          1981  Toshihiko Seko          Japan     
     2:09:26      13  High Mid
          1982  Alberto Salazar         United
 States  2:08:52       7  High Mid
          1983  Greg Meyer              United
 States  2:09:00       8  High Mid
          1984  Geoff Smith             Great
 Britain  2:10:34      19  Low Mid
          1985  Geoff Smith             Great
 Britain  2:14:05      25  Bottom
          1986  Robert de Castella      Australia 
     2:07:51       3  Top
          1987  Toshihiko Seko          Japan     
     2:11:50      23  Bottom
          1988  Ibrahim Hussein         Kenya     
     2:08:43       6  Top
          1989  Abebe Mekonnen          Ethiopia  
     2:09:06      10  High Mid
          1990  Gelindo Bordin          Italy     
     2:08:19       5  Top
          1991  Ibrahim Hussein         Kenya     
     2:11:06      22  Bottom
          1992  Ibrahim Hussein         Kenya     
     2:08:14       4  Top
          1993  Cosmas Ndeti            Kenya     
     2:09:33      14  Low Mid
          1994  Cosmas Ndeti            Kenya     
     2:07:15       1  Top
          1995  Cosmas Ndeti            Kenya     
     2:09:22      12  High Mid
          1996  Moses Tanui             Kenya     
     2:09:15      11  High Mid
          1997  Lameck Aguta            Kenya     
     2:10:34      19  Low Mid
          1998  Moses Tanui             Kenya     
     2:07:34       2  Top
          1999  Joseph Chebet           Kenya     
     2:09:52      17  Low Mid
          2000  Elijah Lagat            Kenya     
     2:09:47      16  Low Mid
          2001  Lee Bong-Ju             Korea     
     2:09:43      15  Low Mid
          2002  Rodgers Rop             Kenya     
     2:09:02       9  High Mid
          2003  Robert Kipkoech Cherui  Kenya     
     2:10:11      18  Low Mid
          2004  Timothy Cherigat        Kenya     
     2:10:37      21  Bottom


Example Features

Data SetMARATHON
Preparatory StepPROC SORT
Featured StepPROC RANK
Featured Step Statements and OptionsPROC RANK statement: DESCENDING, GROUPS=, OUT=, and TIES= options

BY statement

RANKS statement
Report StepPROC REPORT
Report Step Options and StatementsBY statement DEFINE statement: ORDER option
Formatting FeaturesCustomizing titles with #BYVAL DEFINE statement: WIDTH option when sending output to the LISTING destination.
A Closer LookCreating the Rank and Percentile Variables

Choosing a Method for Ordering Ranks and Handling Tied Values

Selecting the Number of Groups for the Percentile Categories

See “Working with #BYVAL, #BYVAR, and #BYLINE when Customizing Titles” in Example 2.3

Example Overview

When comparing the values of a numeric variable, you may not be able to easily identify the high and low values or judge the position of values relative to certain percentile levels, such as the median or quartiles or deciles. This can be especially difficult when the variable values contain many digits or are formatted in a way other than standard decimal notation.

Creating variables based on ranks and percentiles provides a method for directly comparing observations. With this report, a rank variable and a quartile variable demonstrate the general trend of the winning times over many years of the Boston Marathon.

The ranks and quartiles are determined by gender. PROC RANK determines the ranks and quartiles, and the program executes PROC RANK twice, once to compute ranks and once to compute quartiles. PROC RANK saves the ranks and quartiles in output data sets that also contain the variables in the input data set. A DATA step merges the two PROC RANK output data sets. The report includes both the original data plus the new ranking statistics.

Each row in the report corresponds to one observation in the data set. The first four columns correspond to variables in the data set. The last two columns are the ranking statistics produced by PROC RANK.

Program

Sort the data set.
proc sort data=marathon;
  by gender;
run;

Create formats for GENDER and QUARTILE.
proc format;
  value $div    'M'="Men's Division"
                'F'="Women's Division";
  value quarter   3='Top'
                  2='High Mid'
                  1='Low Mid'
                  0='Bottom';
run;

Compute the rank.
proc rank data=marathon

Create an output data set containing the original observations and the ranking variable.
     out=ordered

Specify that ties should be assigned the smallest of the corresponding ranks.
     ties=low;

Compute the ranks separately for each gender.
  by gender;

Determine the rank based on the values of the variable TIME.
  var time;

Create a new variable that holds the rank.
  ranks order;
run;

Compute the quartile.
proc rank data=marathon

Create an output data set containing the original observations and the quartile variable.
     out=grouped

Partition the data within each BY group into quartiles.
     groups=4

Reverse the direction of the ranks so that the quartiles are defined from largest to smallest, to reflect that the highest quartile corresponds to the fastest times.
     descending

Specify that ties should be assigned the largest of the corresponding ranks.
     ties=high;

Compute the ranks separately for each gender.
  by gender;

Determine the quartile based on the values of the variable TIME.
  var time;

Create a new variable that holds the quartile.
  ranks quartile;
run;

Perform a one-to-one merge of the two data sets created by PROC RANK.
data combine;
  merge ordered grouped;
run;

Suppress the automatic printing of BY lines in subsequent procedure output.
options nobyline;

proc report data=combine nowindows;
  title1 'Official Results of Boston Marathon
          1980-2004';
  title2 'With Winning Times Ranked from Fastest(1) to
          Slowest(N)';

Insert the current value of the BY variable whose name is enclosed in parentheses.
  title4 '#byval(gender)';

Create a separate report for each value of the BY variable.
  by gender;

Specify the variables that should appear in the report and the order in which they should appear.
  column year winner country time order quartile;

Order the detail rows within each BY group by the ascending values of the order variable. Specify the column width when sending output to the LISTING destination.
  define year     / order format=4. 'Year';
  define winner   / width=22 'Winner';
  define country  / width=13 'Country';
  define time     / width=7 'Time' format=time7.;
  define order    / width=5 'Order';
  define quartile / width=8 'Quartile'
 format=quarter.;

Format the values of the variable GENDER, which in this example is inserted in TITLE4 in the position of #BYVAL(GENDER).
  format gender $div.;
run;

Reset the option to its default settings.
options byline;


A Closer Look

Creating the Rank and Percentile Variables

PROC RANK can examine a variable’s values and determine individual rankings or group membership where you specify the number of groups that PROC RANK should create. PROC RANK can save these results in variables and output them along with the original data in the input data set for additional processing.

By default, PROC RANK assigns the lowest ranking to the observation with the lowest value of the variable listed in the VAR statement; it assigns the highest ranking to the observation with the highest value. The DESCENDING option reverses this by assigning high rankings to low values and low rankings to high values. With the TIES= option, you can specify whether tied values receive the highest or lowest possible ranking.

Choosing a Method for Ordering Ranks and Handling Tied Values

The first PROC RANK step in this example applies the default method for ordering rankings. It assigns the lowest ranking of 1 to the observation with the lowest value of TIME (the best time).

This default method of ranking, along with the specification of TIES=LOW, assures that tied values receive the best possible rank. For example, if three values are tied for second place, all three receive a rank of 2. PROC RANK always assigns the same rank to tied values, regardless of the value of the TIES= option.

     proc rank data=marathon out=ordered ties=low;

The second PROC RANK step assigns group membership. With GROUPS=4, quartiles are defined and the quartile values assigned to each observation range from 0 to 3. The DESCENDING option gives the group value of 3 to the lowest set of values of TIME, the group value of 2 to the next lowest set of values, and so on. The TIES=HIGH option assures that tied values in this PROC RANK step receive the best possible rank.

     proc rank data=marathon out=grouped groups=4
          descending ties=high;

Table 2.9 shows the results of using all the combinations of the DESCENDING option with the HIGH and LOW values of the TIES= option. The variables are similar to the ones in the marathon example, but the values of QUARTILE are not formatted by the QUARTER format. Table 2.9 uses only four observations, of which two have tied values for TIME.

Table 2.9. Using the DESCENDING and TIES Options in PROC RANK
 ORDERQUARTILE
Without DESCENDINGWith DESCENDINGWithout DESCENDINGWith DESCENDING
RUNNERTIMETIES=LOWTIES=HIGHTIES=LOWTIES=HIGHTIES=LOWTIES=HIGHTIES=LOWTIES=HIGH
Mike2:34:5111440033
Patrick2:36:0923231212
Glenn2:36:0923231212
Carl2:40:3444113300

Selecting the Number of Groups for the Percentile Categories

The GROUPS= option specifies the number of categories in which to group the observations. The values of the group variable go from 0 to one less than the value specified in the GROUPS= option. Formatting the values of the group variable with a descriptive label makes it easy to identify each observation’s group in the report.

The formula for calculating group values is

FLOOR(rank*k/(n+1))

where FLOOR is the FLOOR function, rank is the value’s order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable.

If the number of observations is evenly divisible by the number of groups, the result is an equal number of observations in each group, provided there are no tied values at the boundaries of the groups. However, if the number of observations is not evenly divisible by the value of GROUPS=, you can’t partition observations into groups of equal sizes. In the marathon example, dividing the women’s results with 25 observations into four groups places seven observations in the “High Mid” category and places six observations in each of the other three.

Where to Go from Here

BY statement processing. See “BY Statement” in the “Statements” section of SAS 9.1 Language Reference: Dictionary, and “Statements with the Same Function in Multiple Procedures” in the “Concepts” section of Base SAS 9.1 Procedures Guide.

The BYLINE/NOBYLINE SAS System Options. See “SAS System Options” in SAS 9.1 Language Reference: Dictionary.

The FLOOR function syntax, usage information, and additional examples. See “Floor Function” in the “Functions and CALL Routines” section of SAS 9.1 Language Reference: Dictionary.

Inserting BY-group information in titles. See “Creating Titles that Contain BY-Group Information” in the “Fundamental Concepts for Using Base SAS Procedures” section of Base SAS 9.1 Procedures Guide.

MERGE statement syntax, usage information, and additional examples. See “MERGE Statement” in the “Statements” section of SAS 9.1 Language Reference: Dictionary.

One-to-one merging. See “Combining SAS Data Sets: Basic Concepts” in “Reading, Combining, and Modifying SAS Data Sets” in the “DATA Step Concepts” section of SAS 9.1 Language Reference: Concepts.

PROC RANK reference, usage information, and additional examples. See “The RANK Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

PROC REPORT reference, usage information, and additional examples. See “The REPORT Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.210.12