Chapter 13: What If the Count Is 0-2 After the First Two Pitches

13.1 Overview

13.2 The Sample Program

13.3 Summary

13.1 Overview

Pace of play (i.e., how long the games last) is becoming a major concern for Bizarro Ball. There is a perception that very long at bats make games last a lot longer. Some have suggested that the rule be changed that says a foul does not count as the third strike. Purists really object to such a change and have asked how frequent long at bats are - specifically for those where the first two pitches result in strikes. The debate among the business users focused on the breakdown of when such at bats resulted in a run or not as well as whether it was specific to teams/batters/runners that were able to do this (foul off lots of pitches). So they asked for a program that finds all the at bats where the first two pitches were strikes and had at least 10 pitches. They requested two reports:

   A listing of the relevant data fields for all the rows in the AtBats data set that resulted in a run.

   A listing of the relevant data fields for all the rows in the AtBats data set that did not result in a run being scored.

On the assumption that the subsetting was done using the SAS hash object and the listing could be produced by one of the SAS reporting procedures, we agreed to provide such an example.

We further decided to also produce a listing of the counts for how many times such at bats occurred. They did not ask for this output, but we expected that as soon as they saw the results they would ask for it. We had already demonstrated how to create frequency tables (i.e., a distribution) using hash objects and felt that anticipating such a request would buy us some brownie points.

13.2 The Sample Program

This program is made up of quite a few snippets of code that we “stole” from other programs – either a simple cut-and-paste or a simple modification of other code.

Program 13.1 - Chapter 13 Long At Bats with Initial 0-2 Count.sas

%let Number_of_Pitches = 10;   

 

data _null_;

 if 0 then set dw.pitches(keep = Pitcher_ID)   

               dw.players

               dw.games

               dw.teams

               dw.runs

 ;

 

 /* define the lookup hash object lookup tables */   

 dcl hash HoH(ordered:"A");

 HoH.defineKey ("hashTable");

 HoH.defineData ("hashTable","H");

 HoH.defineDone();

 dcl hash h();

 do while(lr=0);

    set template.chapter9lookuptables end=lr;

    by hashTable;

    if first.hashTable then h = _new_ hash(dataset:datasetTag,multidata:"Y");

    if Is_A_Key then h.DefineKey(Column);

    h.DefineData(Column);

    if last.hashTable then

    do;  /* close the definition and add it to our HoH hash table */

       h.defineDone();

       rc=HoH.add();

    end; /* close the definition and add it to our HoH hash table */

 end;

 

 HoH.find(key:"GAMES");   

 dcl hash games;

 games = h;

 HoH.find(key:"PLAYERS");

 dcl hash players;

 players = h;

 HoH.find(key:"TEAMS");

 dcl hash teams;

 teams = h;

 

 dcl hash Count_0_2(dataset:"DW.Pitches(where=(Strikes = 2 and Balls = 0))");   

 Count_0_2.defineKey("game_sk","top_bot","ab_number");

 Count_0_2.defineData("Pitcher_ID");

 Count_0_2.defineDone();

 

 dcl hash results(multidata:"Y",ordered:"A");   

 results.defineKey("Date");

 results.defineData("Date","Team_AtBat","Team_InField","AB_Number","Result"

                   ,"Number_of_Pitches","Runs","Batter","Pitcher","Runner");

 results.defineDone();

 

 dcl hash runners(dataset:"DW.Runs",multidata:"Y");   

 runners.defineKey("Game_SK","Top_Bot","AB_Number");

 runners.defineData("Runner_ID");

 runners.defineDone();

 

 dcl hash distribution(ORDERED:"A");   

 distribution.defineKey("Result");

 distribution.defineData("Result","Count");

 distribution.defineDone();

 

 lr = 0;

 do until(lr);

    set dw.atbats end = lr;

    where Number_of_Pitches >= &Number_of_Pitches;   

    if Count_0_2.find() > 0 then continue;   

    if distribution.find() gt 0 then Count = 0;   image

    Count = Count + 1;

    distribution.replace();

    games.find();   image

    players_rc = players.find(Key:Batter_ID);   image

    link players;

    Batter = catx(', ',Last_Name,First_Name);   image

    Team_AtBat = Team_Name;

    players_rc = players.find(Key:Pitcher_ID);   image

    link players;

    Pitcher = catx(', ',Last_Name,First_Name);

    Team_InField = Team_Name;

    if runs then   image

    do;  /* if runs scored - get runner data */

       rc = runners.find();   image

       do while(rc=0);

          players_rc = players.find(Key:Runner_ID);   image

          link players;

          Runner = catx(', ',Last_Name,First_Name);

          results.add();   image

          rc = runners.find_next();

       end;

    end; /* if runs scored - get runner data */

    else

    do;  /* no runs scored */

       Runner = ' ';   image

       results.add();

    end; /* no runs scored */

 end;

 results.output(dataset:"Runs_Scored(where=(Runs))");   image

 results.output(dataset:"No_Runs_Scored(where=(not Runs))");

 distribution.output(dataset:"Distribution");   image

 stop;

 return;

 players:   image

    do while(players_rc = 0);

       if (Start_Date le Date le End_Date) then leave;

       players_rc = players.find_next();

    end;

    if players_rc ne 0 then call missing(Team_SK,First_Name,Last_Name);

    teams.find();

 return;

 stop;

run;

 

   A macro variable is used to define the threshold for the number of pitches for our subset criteria. The rationale is simply to make it easy to update this value should there be a request to use a different value (which, of course, there will be).

   Use the IF 0 THEN SET construct to define the needed PDV host variables. We added a KEEP data set option to one of the data sets to illustrate that we need not define all of the variables. Such refinements would be good examples of work that can be done later.

   We copied and modified the code starting with this comment through the end of the DO WHILE – END block from the examples in Chapter 9 to define the same needed lookup tables using a Hash of Hash approach. The changes made were to: only keep the data set name in the SET statement that defines the lookup tables; and removing the logic used to determine what kind of hash tables were being creating (lookup tables vs. result tables).

   Just as for the splits example in Chapter 9, we need a non-scalar variable of type hash object for each lookup table. We use the FIND method to populate non-scalar variables whose values point to the instances of the lookup table hash objects for Games, Players, and Teams.

   Create a lookup table that has two functions. First, it identifies all the at bats that meet the criteria of the first two pitches being strikes (i.e., the count is 0 balls and 2 strikes after the second pitch of the at bat). Second, it provides the id value for the pitcher, which is one of the variables for the output report.

   Create a hash table for the results of all the at bats that meet our criteria. We do not need a unique key so we decided to just use the date of the game as the key so the data will be ordered by the date of the game (thanks to the ORDERED:"A"attribute value). And if multiple such events occur on the same date, they will be added to the hash table in order. The data portion contains all the variables for the output report.

   We define a hash table that contains all the rows for the runs scored. The variables that are the keys, Game_SK, Top_Bot, and AB_Number uniquely define each at bat in the entire season. The data portion is the Runner_ID, one of the variables requested in the output. The MULTIDATA argument tag is used because multiple runners can score in a single at bat (e.g., if the result of the at bat is a Home Run and there are runners on base).

   Create the hash table that will tabulate the distribution of how many rows in the AtBats data set meet our filter criteria. The result field is the key for the table. Since the MULTIDATA argument is not used, the hash table will contain one row for each value of the Result variable.

   Read our data in a DO WHILE loop and use a WHERE clause to include only those at bats that have a number of pitches that meet our threshold criteria.

   This statement performs two functions. First, if the FIND method returns a non-zero value, this at bat did not start with an 0-2 count. So we use the CONTINUE statement to exit the DO-END block and read the next observation. Second, if the FIND method finds a value it retrieves the id for the pitcher.

image   Update the distribution hash table with the cumulative count of the result for the AtBats row. If the FIND method returns a non-zero value, the value of Result is not found in the hash table so the PDV host variable Count is initialized to 0. If it is found, the PDV host variable is updated with the value of Count from the distribution hash table. The Count field is then incremented and the hash table is updated in order to update the cumulative distribution.

image   Use the FIND method to retrieve the data from the GAMES hash table and update the PDV host variables. Specifically, we want the Date column.

image   We now have host variables that have the Batter_ID and Pitcher_ID values. We need to look up their names and their teams. Since the PLAYERS hash table is the Type 2 Slowly Changing Dimension table created in Chapter 7, it is not just a simple FIND method call. In earlier examples we just assigned a value to the PDV host variable for the key value to look up. What we do here is use an explicit FIND method call to specify the key and point to the first item for that key. In this case it is the Batter_ID. We then use a LINK statement to execute the search logic (since we need to do it multiple times for different id values).

image   We now have the data for the batter and we create a field that is their name and another field for their team.

image   Do the same thing to get the pitcher name and team.

image   If any runs were scored in this at bat we need to get the runner data. Note that since the variable Runs was read from our input data set, we don’t need to look it up.

image   Use the FIND and FIND_NEXT methods in a loop to read all the runner data.

image   Retrieve the runner name information from the PLAYERS hash table just as we did for batters and pitchers.

image   Use the ADD method to add an item to our results table as we have all the data needed for those at bats that resulted in runs being scored.

image   If no runs were scored for this at bat, we still want an item in our hash table (i.e., a row in our output SAS data set) with a blank value for the runner name. The ADD method is used here as well.

image   Create our output data sets using the OUTPUT method. Using just Runs as the WHERE clause, any item with a non-zero, non-missing value results in the WHERE clause being true. Likewise, Not Runs is true only when no runs were scored.

image   Create the output data set that contains the count of the number of at bats that met our filter criteria for each Result value.

image   This is the same Type 2 Slowly Changing Dimension table lookup that was shown in Section 7.3.3.1 “Performing Table Lookups Using an SCD Type 2 Table.”

Output 13.1 Distribution of the Results of the 0-2 At Bats with 10 Pitches

Output 13.1 Distribution of the Results of the 0-2 At Bats with 10 Pitches

Output 13.2 - Runs Scored - At Bats with 10 or More Pitches, Starting with a Count of 0-2

Output 13.2 - Runs Scored - At Bats with 10 or More Pitches, Starting with a Count of 0-2

Output 13.3 – No Runs Scored - At Bats with 10 or More Pitches, Starting with a Count of 0-2

Output 13.3 – No Runs Scored - At Bats with 10 or More Pitches, Starting with a Count of 0-2

image

Both the business and IT users liked these reports and asked about the code that was used to produce them. We told them that Output 13.1 was created using PROC PRINT, and PROC REPORT was used to produce Output 13.2 and 13.3. They seemed quite happy when we told them that the sample programs included the

code to produce the reports. We recommended the business and IT users review Carpenter's Complete Guide to the SAS REPORT Procedure by Art Carpenter to learn more helpful tricks for using PROC REPORT.

13.3 Summary

For the purposes of this sample, we wrote and documented all the code (as opposed to highlighting differences from other programs), as that approach would provide more insights on how the SAS hash object can be used for such requests.

We reinforced the comment we made about the first case study: that stealing snippets of code from other programs is not an approach that we would suggest as a long-term solution. SAS software, including the macro language, provides a framework to build a set of re-usable tools that can be parameterized and mixed and matched as needed in order to address both their ongoing reporting requirements as well as ad-hoc questions.

Given that this Proof of Concept is now complete we suggested that the business and IT users consider prioritizing their requirements and proposed that an iterative approach to building their solution was an ideal way to proceed. We suggested that a good place to start would be to develop a list of all the reports they want- both on an ongoing basis as well as a number of ad-hoc questions.

The League office thanked us for our efforts and said that we would definitely be hearing from them regarding an implementation project.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.32.1