Chapter 12: Researching Alternative Pitching Metrics

12.1 Overview

12.2 The Sample Program

12.2.1 Adding More Metrics

12.2.2 One Output Data Set with All the Splits Results

12.3 Summary

12.1 Overview

Our business users and most baseball analysts agree that quantifying the performance of pitchers is much harder than for batters, and our users want to investigate a number of alternatives.

Earned Run Average (ERA) has long been a default metric to evaluate pitchers despite its limitations. For starting pitchers, if they are replaced and the relief pitcher gives up a hit that scores any of the runners who are already on base, the starting pitcher is charged for those runs.

Likewise, if a relief pitcher allows “inherited runners” (i.e., those runners already on base when they entered the game) their ERA does not reflect those runs.

We reminded the users that our mock data could be used to demonstrate alternative metrics but cautioned them to not make any inferences from the results since our mock data did not allow for such scenarios because pitchers can be replaced only at the beginning of an inning. Likewise, we did not distinguish between runs that are earned vs. unearned.

A popular alternative metric is Walks plus Hits per Inning Pitched (abbreviated as WHIP). Calculating WHIP was one of our examples in section 9.5.2 “Multiple Split Calculations.”

The challenge for this case study is to create a data set that has a number of other metrics that can be analyzed in a number of ways (for example, perhaps correlation analysis). The initial set of requested alternative metrics they asked for include:

   For the WHIP calculation, if a pitcher hits a batter with a pitch that does not count against their WHIP. Since hitting a batter with a pitch produces results similar to a walk, they would like to see what impact that has on the WHIP calculation.

   One other criticism of the WHIP metric is that, for example, a home run counts the same as a single. They want to investigate using the number of bases allowed (e.g., 4 for a Home Run and 1 for a single) instead of the number of hits. This would be similar to the difference between how a batter’s Batting Average (BA) is calculated vs. his Slugging Average (SLG).

We agreed to use one of the sample splits programs from Chapter 9 and modify it to create desired metrics.

12.2 The Sample Program

This use case example required just a few modifications to Program 9.8 Chapter 9 HoH Multiple Splits Batter and Pitcher.sas. The key lines of code we had to add to create the program below are noted in bold and annotated.

We also pointed out to the business and IT users that once these programs were modularized as part of the long-term project, it would be easier to use pieces and parts to address ad-hoc and research questions like this one.

Program 12.1 - Chapter 12 Pitcher Metrics What-If.sas

data chapter9splits;

 set template.chapter9splits;   

 by hashTable;

 output;

 if last.hashTable;

 Column = "IP";

 output;

 Column = "ERA";

 output;

 Column = "WHIP";

 output;

 Column = "_Runs";

 output;

 Column = "_Outs";

 output;

 Column = "_Walks";

 output;

 Column = "_HBP";   

 output;

 Column = "WHIP_HBP";

 output;

 Column = "BASES_IP";

 output;

run;

 

data _null_;

 dcl hash HoH(ordered:"A");

 HoH.defineKey("hashTable");

 HoH.defineData("hashTable","H","calcAndOutput");   

 HoH.defineDone();

 dcl hiter HoH_Iter("HoH");

 dcl hash h();

 dcl hiter iter;

 /* define the lookup hash object tables */

 do while(lr=0);

    set template.chapter9lookuptables

        chapter9splits(in=CalcAndOutput)

    end=lr;

    by hashTable;

    if first.hashTable then

    do;  /* create the hash object instance */

       if datasetTag ne ' ' then h = _new_ hash(dataset:datasetTag

                                               ,multidata:"Y");

       else h = _new_ hash(multidata:"Y");

    end; /* create the hash object instance */

    if Is_A_key then h.DefineKey(Column);

    h.DefineData(Column);

    if last.hashTable then

    do;  /* close the definition and add it to our HoH hash table */

       h.defineDone();

       HoH.add();

    end; /* close the definition and add it to our HoH hash table */

 end;

 /* create non-scalar fields for the lookup tables */

 HoH.find(key:"GAMES");

 dcl hash games;

 games = h;

 HoH.find(key:"PLAYERS");

 dcl hash players;

 players = h;

 HoH.find(key:"TEAMS");

 dcl hash teams;

 teams = h;

 dcl hash pitchers(dataset:"dw.pitches(rename=(pitcher_id = Player_ID))");   

 pitchers.defineKey("game_sk","top_bot","ab_number");

 pitchers.defineData("player_id");

 pitchers.defineDone();

 

 if 0 then set dw.players

               dw.teams

               dw.games

               dw.pitches;

 format PAs AtBats Hits comma6. BA OBP SLG OPS 5.3

        IP comma6. ERA WHIP WHIP_HBP BASES_IP 6.3;   

 

 lr = 0;

 do until(lr);

    set dw.AtBats end = lr;

    call missing(Team_SK,Last_Name,First_Name,Team_Name,Date,Month,DayOfWeek);

    games.find();

    pitchers.find();

    players_rc = players.find();   

    do while(players_rc = 0);

       if (Start_Date le Date le End_Date) then leave;

       players_rc = players.find_next();

    end;

    if players_rc ne 0 then call missing(Team_SK,First_Name,Last_Name);

    teams.find();

 

    do while (HoH_Iter.next() = 0);

       if not calcAndOutput then continue;

       call missing(PAs,AtBats,Hits,_Bases,_Reached_Base

                   ,_Outs,_Runs,_Bases,_HBP);   

       rc = h.find();

       PAs           + 1;

       AtBats        + Is_An_AB;

       Hits          + Is_A_Hit;

       _Bases        + Bases;

       _Reached_Base + Is_An_OnBase;

       _Outs         + Is_An_Out;

       _Runs         + Runs;

       _Walks        + (Result = "Walk");

       _HBP          + (Result = "Hit By Pitch");   

       BA = divide(Hits,AtBats);

       OBP = divide(_Reached_Base,PAs);

       SLG = divide(_Bases,AtBats);

       OPS = sum(OBP,SLG);

       if _Outs then

       do;  /* calculate pitcher metrics suppressing missing value note */

          IP = _Outs/3;

          ERA = divide(_Runs*9,IP);

          WHIP = divide(sum(_Walks,Hits),IP);

          WHIP_HBP = divide(sum(_Walks,Hits,_HBP),IP);   

          BASES_IP = divide(_Bases,IP);

       end; /* calculate pitcher metrics missing value note */

       h.replace();

    end;

 end;

 do while (HoH_Iter.next() = 0);

    if not calcAndOutput  then continue;

    h.output(dataset:hashTable||"(drop=_:)");

 end;

 stop;

run;

   We are calculating pitcher metrics only, so we don’t need to read in the data set twice to create pitcher and batter hash tables.

   Three additional variables are needed in the data portion of our splits hash tables to calculate these alternative metrics.

   The data item corresponding to the hash iterator is no longer needed.

   A hash object that can be used to find and retrieve the Pitcher_ID for each at bat. In the examples in Chapter 9 an SQL step was used to remove the duplicates (i.e., multiple pitches per at bat). Given the challenge to use only hash object methodology, we simply need to omit the MULTIDATA argument tag in order to force only a single hash table item for each pitcher. Note also that we renamed Pitcher_ID to Player_ID in order to facilitate the lookup to get the pitcher name.

   Add our output metrics to the format statement.

   Use the FIND method to look up the pitcher’s name and team. Since we need to look up only the pitcher name, we moved this lookup to before the loop that enumerates through the splits hash objects.

   The columns_Bases and _HBP were added to the CALL MISSING call routine.

   Count the number of at bats that resulted in an HBP event as well as the number of pitches that were thrown.

   Calculate our two additional metrics: WHIP_HBP which includes at bats for which the result was a Hit by Pitch: and BASES_IP which replaces the total number of hits with the total number of bases.

The program created the same 5 split tables plus the additional metrics WHIP including Hit by Pitch (WHIP_HBP) and Bases Allowed per Inning (BASES_IP) as was shown in Output 9.22.

Output 12.1 First 5 Rows of byPlayer Splits Output

Output 12.1 First 5 Rows of byPlayer Splits Output

12.2.1 Adding More Metrics

The users were surprised at how simple the changes were to make and wanted confirmation that more metrics could be added quickly. We pointed out that it depended on what the metrics were, adding that metrics that are calculated from values in the AtBats data could be that simple. We updated our program and saved it with a new name (Chapter 12 Pitcher Metrics What-If Updated.sas) to illustrate this.

The following lines of code were added to Program 12.1 that is highlighted in .

 Column = "BASES_PA";

 output;

 Column = "PAs_IP";

 output;

 Column = "_Pitches";

 output;

 Column = "PITCHES_IP";

 output;

The variables BASES_PA, PAs_IP, and PITCHES_IP were added to the FORMAT statement in .

The variable _Pitches was added to the CALL MISSING function in code described in and was calculated immediately after .

These lines were added to the code described in .

    PITCHES_IP = divide(_Pitches,IP);

    PAs_IP = divide(PAs,IP);

 end; /* calculate pitcher metrics suppressing missing value note */

 BASES_PA = divide(_Bases,PAs);

The modified program produced the same output as Program 12.1 Chapter 12 Pitcher Metrics What-If.sas plus the additional metrics Bases per Plate Appearance (BASES_PA), Plate Appearances per Inning Pitched (PAs_IP), and Pitches per Inning Pitched (Pitches_PA).

Output 12.2   First 5 Rows of byPlayer Splits Output

Output 12.2   First 5 Rows of byPlayer Splits Output

12.2.2 One Output Data Set with All the Splits Results

Our users then asked if it was possible to create a single output data set with the results from all of the splits. We agreed to do this but cautioned them to be careful when doing this since all the key variables and all the analysis variables would need to be included in a single output data set. The sample program below creates one output data set regardless of how many splits are being calculated and includes all the variables (both key variables and analysis variables) that are included in any of the splits. Variables that do not apply for a given split will have missing values.

The changes to the above program (Chapter 12 Pitcher Metrics What-If Updated.sas) are highlighted in bold.

Program 12.2 - Chapter 12 Pitcher Metrics What-If One Dataset.sas

data chapter9splits;   

   .

   .

   .

run;

 

proc sql noprint;   

 select distinct column into:keepList separated by ' '

 from chapter9splits

 /* where substr(column,1,1) ne '_' */;   

 select distinct column into:missingList separated by ','

 from chapter9splits;

quit;

 

data combined;   

 keep hashTable &keepList;

 dcl hash HoH(ordered:"A");

 HoH.defineKey ("hashTable");

 HoH.defineData ("hashTable","H","ITER","calcAndOutput");

 HoH.defineDone();

 dcl hiter HoH_Iter("HoH");

 dcl hash h();

 dcl hiter iter;

 /* define the lookup hash object tables */

 do while(lr=0);

    set template.chapter9lookuptables

        chapter9splits(in=CalcAndOutput)

    end=lr;

    by hashTable;

    if first.hashTable then

    do;  /* create the hash object instance */

       if datasetTag ne ' ' then h = _new_ hash(dataset:datasetTag

                                               ,multidata:"Y");

       else h = _new_ hash(multidata:"Y");

    end; /* create the hash object instance */

    if Is_A_key then h.DefineKey(Column);

    h.DefineData(Column);

    if last.hashTable then

    do;  /* close the definition and add it to our HoH hash table */

       h.defineDone();

       iter = _new_ hiter("h");   

       HoH.add();

    end; /* close the definition and add  it to our HoH hash table */

 end;

 /* create non-scalar fields for the lookup tables */

 HoH.find(key:"GAMES");

 dcl hash games;

 games = h;

 HoH.find(key:"PLAYERS");

 dcl hash players;

 players = h;

 HoH.find(key:"TEAMS");

 dcl hash teams;

 teams = h;

 dcl hash pitchers(dataset:"dw.pitches(rename=(pitcher_id = Player_ID))");

 pitchers.defineKey("game_sk","top_bot","ab_number");

 pitchers.defineData("player_id");

 pitchers.defineDone();

 

 if 0 then set dw.players

               dw.teams

               dw.games

               dw.pitches;

 format PAs AtBats Hits comma6. BA OBP SLG OPS 5.3

        IP comma6. ERA WHIP WHIP_HBP BASES_IP BASES_PA PAs_IP PITCHES_IP 6.3;

 

 lr = 0;

 do until(lr);

    set dw.AtBats end = lr;

    call missing(Team_SK,Last_Name,First_Name,Team_Name,Date,Month,DayOfWeek);

    games.find();

    pitchers.find();

    players_rc = players.find();

    do while(players_rc = 0);

       if (Start_Date le Date le End_Date) then leave;

       players_rc = players.find_next();

    end;

    if players_rc ne 0 then call missing(Team_SK,First_Name,Last_Name);

    teams.find();

 

    do while (HoH_Iter.next() = 0);

       if not calcAndOutput then continue;

       call missing(PAs,AtBats,Hits,_Bases,_Reached_Base,_Outs,_Runs

                   ,_Bases,_HBP,_Pitches);

       rc = h.find();

       PAs           + 1;

       AtBats        + Is_An_AB;

       Hits          + Is_A_Hit;

       _Bases        + Bases;

       _Reached_Base + Is_An_OnBase;

       _Outs         + Is_An_Out;

       _Runs         + Runs;

       _Walks        + (Result = "Walk");

       _HBP          + (Result = "Hit By Pitch");

       _Pitches      + Number_of_Pitches;

       BA = divide(Hits,AtBats);

       OBP = divide(_Reached_Base,PAs);

       SLG = divide(_Bases,AtBats);

       OPS = sum(OBP,SLG);

       if _Outs then

       do;  /* calculate pitcher metrics suppressing missing value note */

          IP = _Outs/3;

          ERA = divide(_Runs*9,IP);

          WHIP = divide(sum(_Walks,Hits),IP);

          WHIP_HBP = divide(sum(_Walks,Hits,_HBP),IP);

          BASES_IP = divide(_Bases,IP);

          PITCHES_IP = divide(_Pitches,IP);

          PAs_IP = divide(PAs,IP);

       end; /* calculate pitcher metrics suppressing missing value note */

       BASES_PA = divide(_Bases,PAs);

       h.replace();

    end;

 end;

 do while (HoH_Iter.next() = 0);

    if not calcAndOutput  then continue;

    call missing(&missingList);   

    do while (iter.next() = 0);   

       output;

    end;

 end;

 stop;

run;

   No changes to the creation of the parameter file.

   Create two macro variables to use: as the list of variables to include in the output data set (a blank-separated list); and the list of variables to be used in the CALL MISSING routine (a comma-separated list). Both lists include all the variables referenced in our splits definitions parameter files. We could have used hard-coded %LET macro statements for these two lists in order to be fully compliant with the user’s request to not use other SAS facilities. They agreed that this was an appropriate exception to the use only hash object facilities restriction.

   If the accumulator variables are not desired in the output this line can be uncommented.

   Define the output data set containing the combined results in the DATA statement and specify the list of variables to be kept using the macro variable created in the previous SQL step.

   Include a non-scalar field of type iterator in the Hash of Hash that can be used to loop through the splits hash tables. This will be used to create the combined output data set.

   For each splits hash table create an instance of the iterator object.

   Use the MISSING call routine to set all the fields to null before iterating through each hash table. This ensures that no values will be carried down from the previous hash table.

   Instead of using the OUTPUT method which creates a separate data set for each hash table, use the iterator to enumerate all the data items in each hash table and output each item to the combined SAS data set.

The output below shows selected rows in our combined output object. The output is ordered alphabetically by the hashTable name (i.e., the split name). The CALL MISSING routine is what ensures that, for example, the value 6 for the variable DayOfWeek is not carried down to the observations for the BYMONTH split and the value of 8 is not carried down to the observations for the BYPLAYER split.

Output 12.3   Multiple Splits As a Single Output Data Set

Output 12.3   Multiple Splits As a Single Output Data Set

image

12.3 Summary

The point of this case study was to illustrate how metrics could be added to our sample programs.

Upon presenting these results, we informed the business and IT users that we were moving on to the next case study and could not entertain any more modifications to this one. They responded that while they had additional requests that it was clear that using the hash object in a DATA step offered a lot of flexibility. They could see the advantages of this approach given their interest in researching additional metrics.

We impressed upon the business and IT users that stealing snippets of code from other programs is not an approach that we would suggest as a long-term solution. The SAS macro language, among the many available tools provided by SAS software, provides a framework to build a set of re-usable tools that can be parameterized and mixed and matched as needed in order to address ad-hoc questions. That approach can also be combined with custom coding as needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.201.217