Calculate an Expected Hits Matrix

Calculate the probability of a hit, walk, or out based on the ball–strike count.

The ball–strike count is one of the most important ways pitchers and hitters determine their strategy. Most batters will be ready to swing on an 0,2 (no balls, two strikes) count, but many batters won’t swing on a 3,0 count (three balls, no strikes). Of course, if the pitcher has generally good control, he might be able to throw the ball into the strike zone at this point, so it might be a good idea for the batter to swing. But then again, if the batter is planning to swing and the pitcher knows the batter is expecting a ball in the strike zone, the pitcher might decide to throw something unhittable and outside the strike zone…you get the idea. The count is at the core of baseball strategy. It drives the mental battle between the hitter and the pitcher.

This hack examines what is likely to happen on different counts. Clearly, a walk is more likely if the pitcher has thrown more balls, and a strikeout is more likely if the pitcher has thrown more strikes. But are extra base hits more likely in some situations? This hack shows you how to count the number of situations in which each of these things happened.

The Code

The key to understanding this code is to understand the possible set of counts and how a batter can move between them. Figure 6-4 illustrates how this works (the first number is the number of balls and the second is the number of strikes).

How the count progresses in a plate appearance

Figure 6-4. How the count progresses in a plate appearance

A batter can put a ball into play at any time. That means that from an 0,0 count, a batter can go to a 1,0 count or an 0,1 count, can stay in the same place (if there is a balk, a pitcher throws to first base, a base is stolen, or something else), can get a hit, can get on base on an error, can get on base on a fielder’s choice, or can be put out.

As you can tell, there is more than one path to certain states. As a simple example, consider a 1,1 count. A batter can get there in two ways: 0,0 to 1,0 to 1,1, or 0,0 to 0,1 to 1,1. But notice something else: many destinations cross through the same intermediate states. Batters with 3,1 counts and 2,2 counts once had 2,1 counts.

When we count the number of plays in which a batter had a certain count, we have to count all previous counts. For example, if a pitch sequence was “ball, ball, called strike, ball, swinging strike, and ball put in play as a single,” that means the counts would have been 0,0, 1,0, 2,0, 2,1, 3,1, and 3,2.

So, here is how we will count the number of situations that a batter was in each count. We’ll use a play-by-play database in MySQL, and we’ll use the REGEXP operator to help us search for pitch strings that start off with each pattern we want. We’ll code a flag for each situation.

To make the results a little easier to read, I summed the indicators (yielding a count of the number of times batters were in each situation). Here is the code:

	select sum(CNT_10) AS c10, sum(CNT_20) AS c20, sum(CNT_30) AS c30,
	    sum(CNT_11) AS c11, sum(CNT_12) AS c12, sum(CNT_21) AS c21,
	    sum(CNT_31) AS c31, sum(CNT_22) AS c22, sum(CNT_32) AS c32,
	    sum(CNT_01) AS c01, sum(CNT_02) AS c02, count(*) AS c00, event_type
	FROM (select
	 IF(pitch_sequence REGEXP
	  '^[.>123N]*[BIPV]',1,0) AS CNT_10,
	 IF(pitch_sequence REGEXP 
	  '^[.>123N]*[BIPV][.>123N]*[BIPV]',1,0) AS CNT_20,
	 IF(pitch_sequence REGEXP 
	  '^[.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[BIPV]',1,0) AS CNT_30,
	 IF(pitch_sequence REGEXP 
	  '^[.>123N]*[CFKLMOQRST]',1,0) AS CNT_01,
	 IF(pitch_sequence REGEXP 
	  '^[.>123N]*[CFKLMOQRST][.>123N]*[CFKLMOQRST]',1,0) AS CNT_02,
	 IF(pitch_sequence REGEXP 
	  '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV]'
	   or pitch_sequence REGEXP
	     '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST]',1,0) AS CNT_11,
	 IF(  pitch_sequence REGEXP
	    '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[BIPV]'
	   or pitch_sequence REGEXP
	     '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[BIPV]' 
	   or pitch_sequence REGEXP
	     '^[.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[CFKLMOQRST]',1,0) AS CNT_21,
	IF(  pitch_sequence REGEXP  -- SBBB
	   '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[BIPV]'
	  or pitch_sequence REGEXP -- BSBB
	    '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[BIPV]'
	  or pitch_sequence REGEXP  -- BBSB
	    '^[.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[BIPV]'
	  or pitch_sequence REGEXP  -- BBBS
	    '^[.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[CFKLMOQRST]'
	  ,1,0) AS CNT_31,
	IF(  pitch_sequence REGEXP --SSB
	   '^[.>123N]*[CFKLMOQRST][.>123N]*[CFKLMOQRST][.>123NF]*[BIPV]'
	  or pitch_sequence REGEXP -- BSS
	    '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[CFKLMOQRST]'
	  or pitch_sequence REGEXP -- SBS
	    '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[CFKLMOQRST]',1,0) AS
	  CNT_12,
	   IF( pitch_sequence REGEXP -- SSBB
	      '^[.>123N]*[CFKLMOQRST][.>123N]*[CFKLMOQRST][.>123NF]*[BIPV][.>
	   123NF]*[BIPV]'
	      or pitch_sequence REGEXP -- BBSS
	        '^[.>123N]*[BIPV][.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>
	   123N]*[CFKLMOQRST]'
	       or pitch_sequence REGEXP -- BSBS
	         '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>
	   123N]*[CFKLMOQRST]'
	       or pitch_sequence REGEXP -- BSSB
	         '^[.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>123N]*[CFKLMOQRST][.>
	   123NF]*[BIPV]'
	       or pitch_sequence REGEXP -- SBSB
	         '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[CFKLMOQRST][.>
	   123NF]*[BIPV]'
	       or pitch_sequence REGEXP -- SBBS
	         '^[.>123N]*[CFKLMOQRST][.>123N]*[BIPV][.>123N]*[BIPV][.>
	   123N]*[CFKLMOQRST]'
	      ,1,0) AS CNT_22,
	      IF( pitch_sequence REGEXP
	         '^[.123NCFKLMOQRST]*[BIPV][.123NCFKLMOQRST]*[BIPV][.
	   123NCFKLMOQRST]*[BIPV]'
	       and pitch_sequence REGEXP
	         '^[.123BIPV]*[CFKLMOQRST][.123BIPV]*[CFKLMOQRST]',1,0) AS CNT_32,
	    event_type
	    FROM pbp.pbp2k where substring(game_id,4,4)="2004") pc
	   group by event_type;

Running the Hack

Running this code produces the results shown in Table 6-2.

Table 6-4. Calculating an expected hits matrix

c10

c20

c30

c11

c12

c21

c31

c22

c32

c01

c02

c00

Event type

36761

10803

1864

33638

18549

16602

5116

14844

6811

41081

12465

93196

2

10639

2710

477

15956

18432

6704

1607

13034

5030

21664

13559

32326

3

1263

384

75

943

483

475

150

414

3

1016

301

2297

4

128

30

2

64

29

25

7

14

0

123

30

253

5

374

118

19

354

136

174

65

107

2

352

80

731

6

195

60

10

156

76

80

21

53

20

196

54

500

8

745

201

36

535

427

232

47

321

119

612

262

1361

9

193

59

13

128

61

65

20

54

21

130

42

323

10

51

11

2

40

23

14

3

16

7

62

20

159

11

31

11

0

19

19

11

2

15

6

26

13

59

12

27

10

2

30

15

17

3

16

9

38

9

66

13

10987

7674

5247

6457

2627

7158

7155

4846

6796

4094

934

15084

14

1361

1324

1317

76

4

79

79

4

2

42

2

1403

15

562

134

16

613

412

225

44

237

71

939

319

1886

16

5

1

0

7

9

2

1

7

4

15

7

20

17

712

202

45

667

385

308

102

295

133

781

249

1811

18

202

66

12

159

99

84

35

84

39

196

65

477

19

11566

3398

583

10894

5992

5381

1660

4779

2183

13195

3931

29674

20

3763

1178

195

3207

1648

1733

578

1430

731

3801

1137

9046

21

403

131

19

306

176

183

61

155

76

359

124

909

22

2475

840

182

2006

965

1165

436

916

505

2094

564

5554

23

It’s a little hard to understand the data when it’s presented in this way, so I imported the data into Excel so that I could easily reformat it. The event_code field is generated by the Retrosheet BEVENT program, or by Chadwick. This is what these values mean:

Table 6-5. 

Code

Meaning

Classification

0

Unknown event

Other

1

No event

Other

2

Generic out

Out

3

Strikeout

Out

4

Stolen base

Other

5

Defensive indifference

Other

6

Caught stealing

Other

7

Pickoff error

Other

8

Pickoff

Other

9

Wild pitch

Other

10

Passed ball

Other

11

Balk

Other

12

Other advance

Other

13

Foul error

Other

14

Walk

Walk

15

Intentional walk

Walk

16

Hit by pitch

Walk

17

Interference

Other

18

Error

Other

19

Fielder’s choice

Out

20

Single

Hit

21

Double

Hit

22

Triple

Hit

23

Home run

Hit

In Excel, I assigned one of the classification values in the preceding table to each event type and used a pivot table to summarize the results. As you can see, I skipped interference calls and errors. Errors are subjective, but they are often caused by someone other than the pitcher or batter, so I think it’s best to ignore them. Here is what I found about getting on base, based on the 2004 data:

Table 6-6. 

On base

No strikes

One strike

Two strikes

No balls

33.5%

28.0%

21.2%

One ball

39.5%

32.1%

24.2%

Two balls

51.9%

40.5%

30.7%

Three balls

76.3%

59.7%

46.6%

And here is what I found about getting a hit:

Table 6-7. 

Hits

No strikes

One strike

Two strikes

No balls

23.8%

22.2%

17.4%

One ball

23.1%

22.4%

18.0%

Two balls

19.6%

21.5%

18.1%

Three balls

9.9%

16.3%

15.7%

In general, as the number of strikes increases, the odds of getting on base decrease, except if there are three balls. (In particular, look at what happens on 3,0 counts—the odds of getting on base are enormous (76.3%), but nearly always from walking (only 9.9% hits). Additionally, as the number of balls increases, the chances of getting on base increase, but the chances of getting a hit decrease. This is pretty intuitive: if a pitcher is having trouble throwing strikes, he’s likely to walk batters. Additionally, if the pitcher is throwing balls, he’s likely to be unhittable.

Hacking the Hack

Without even running another query, we can use this data to answer a handful of other questions about what happens on different pitch counts.

Strikeouts and the count.

Let’s start with a simple question: what is the percentage of at bats ending in a strikeout? We can calculate this by dividing the number of strikeouts by the total number of hits, walks, and outs:

Table 6-8. 

Strikeouts

No strikes

One strike

Two strikes

No balls

17.1%

24.8%

41.0%

One ball

13.5%

21.8%

37.7%

Two balls

9.6%

17.1%

32.3%

Three balls

4.8%

9.6%

22.6%

Not surprisingly, a pitcher is most likely to get a strikeout on an 0,2 count and is least likely on a 3,0 count. The odds of a strikeout go way up with each strike, no matter what.

Extra base hits and the count.

Announcers often say things during games such as “It’s a 3,0 count, the batter knows a fastball is coming, he’s going to hit this ball hard!” If pitchers were more likely to throw hittable balls on certain counts, and batters could control where the ball went, we should be able to see it in the data. In particular, we should see a different percentage of extra base hits on different counts. Using exactly the same data as before, let’s look at what actually happened in 2004. Here are the extra base hits:

Table 6-9. 

Extra base hits

No strikes

One strike

Two strikes

No balls

34.3%

32.2%

31.7%

One ball

36.5%

33.6%

31.8%

Two balls

38.7%

36.4%

34.4%

Three balls

40.4%

39.3%

37.5%

And here is the number of home runs:

Table 6-10. 

Home runs

No strikes

One strike

Two strikes

No balls

12.3%

10.8%

9.8%

One ball

13.6%

12.2%

11.0%

Two balls

15.1%

13.8%

12.6%

Three balls

18.6%

15.9%

14.4%

There appears to be a subtle effect here. As the number of balls increases, it’s slightly more likely that a player is going to be able to hit the ball hard. Additionally, a player seems slightly more likely to hit the ball hard on no strikes than on one or two strikes. This probably means that pitchers are more likely to throw fastballs than off-speed pitches.

Balls in play and the count.

Another interesting question: what happens on balls in play, based on the count? Is a ball in play less likely to end up in an out on certain counts? I looked at the number of hits on balls in play (which were defined as generic outs, fielder’s choices, singles, doubles, triples, and home runs). Here’s what I found:

Table 6-11. 

Hits/balls in play

No strikes

One strike

Two strikes

No balls

32.5%

32.0%

31.5%

One ball

33.0%

32.7%

32.0%

Two balls

33.8%

33.6%

32.8%

Three balls

34.3%

34.7%

33.8%

It looks as though there is a slight increase in the number of hits when there are more balls. On zero, one, or two balls, it appears that the play is more likely to result in a hit on fewer strikes (but not on three balls). Either way, this is a very subtle effect.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.13.179