Measure Pitching with DIPS

Measure a pitcher’s performance independent of the fielders’ performance using DIPS.

In December 1999, baseball fan Voros McCracken came up with a new method of measuring pitching. McCracken started to wonder whether a pitcher could really do anything about balls in play; were outs from balls in play a function of a pitcher’s skill, the defense’s skill, or dumb luck? He set out to test this hypothesis and discovered (much to his surprise) that it wasn’t pitcher skill. He concluded that what happens after a ball is put in play depends on the defense. Only on walks, strikeouts, and home runs is the defense not involved.

To help measure a pitcher’s performance independent of the fielders’ performance, McCracken came up with a system he called defensive independent pitching stats (DIPS). If you want to know more about the story behind this formula, Michael Lewis’s book Moneyball has a nice write-up on Voros McCracken. If you want to learn how the formula works, read on.

The Formula

The formula for DIPS is, unfortunately, a little complicated. See the sample code that follows for an explanation of DIPS calculations. I’ll just launch into the code to explain how to calculate this measurement. (I used Voros McCracken’s own explanation for DIPS 2.0 numbers to make the following calculations.)

The process proceeds in two phases. First, a set of defense-independent measurements are calculated: DIPS intentional walks allowed (dIBB), DIPS hit batsmen (dHP), DIPS total walks (dBB), DIPS home runs (dHR), DIPS hits (dH), and DIPS innings pitched (dIP). A couple of other numbers are calculated that are used in later calculations. A few more variables are calculated based on a mix of actual values and DIPS values: strikeouts per ball in play (sSO), home runs per ball in play (sHR), and hits per ball in play (sH). These also are used in some later calculations. Finally, DIPS earned runs (dER) and DIPS ERA (dERA) are calculated.

Tip

The normal abbreviation for defensive efficiency is DER. Don’t confuse that with DIPS earned runs, which is abbreviated here as dER.

Sample Code

Here is some sample code for calculating DIPS in R:

	# 4
	dIBB <- BFP * .0074
	# 5
	sHP <- HBP / (BFP - IBB)
	dHP <- sHP * (BFP - dIBB)
	# 6
	sBB <- (BB - IBB) / (BFP - IBB - HBP)
	dBB <- sBB * (BFP - dIBB - dHP) + dIBB
	# 7
	sSO <- SO / (BFP - HBP - BB)
	dSO <- sSO * (BFP - dBB - dHP)
	# 8
	sHR <- HR / (BFP - HBP - BB - SO)
	dHR <- sHR * (BFP - dBB - dHP - dSO)
	# 9
	sH <- .304396 + .002321 * (throws == "L") - .04782 * sSO - .08095 * sHR
	# 10
	dH <- sH * (BFP - dHR - dBB - dSO - dHP) + dHR
	# 11
	dIP <- (1.048 * (BFP - dBB - dHP - dSO - dH) + dSO) / 3
	# 12
	dER <- (dH-dHR)*.49674 + dHR*1.294375 + (dBB-dIBB)*.3325 + dIBB*.0864336 + dSO*(-
	.084691) + dHP*.3077 + (BFP-dHP-dBB-dSO-dH)*(-.082927)
	# 13
	dERA <- dER * 9  / dIP Summary Statistics
	# calculate summary stats for 2003
	> summary(subset(dERA, yearID == 2003 & BFP > 249))

To compute this in SQL, you can use the following code. This code creates a temporary table called dipsERAs from which you can query DIPS ERA results. Notice the nine subqueries in this statement; I calculated each formula in sequence, just as I did in the R code.

	CREATE TEMPORARY TABLE dipsERAs AS
	SELECT nameFirst, nameLast, idxTeams,
	   lgID, ERA, dERA, IPOuts
	FROM ( -- p_w_d
	 SELECT *, dER * 9  / dIP AS dERA
	 FROM ( -- p9
	 SELECT *,
	   (dH-dHR)*.49674 + dHR*1.294375
	    + (dBB-dIBB)*.3325 + dIBB*.0864336
	    + dSO*(-.084691) + dHP*.3077
	    + (BFP-dHP-dBB-dSO-dH)*(-.082927) AS dER
	FROM ( -- p8
	 SELECT *,
	    (1.048 * (BFP - dBB - dHP - dSO - dH)
	   + dSO) / 3 AS dIP
	 FROM ( -- p7
	 SELECT *,
	   sH * (BFP - dHR - dBB - dSO - dHP)
	   + dHR AS dH
	 FROM ( -- p6
	  SELECT *,
	    .304396
	     + .002321 * (CASE WHEN throws='L' THEN 1 ELSE 0 END)
	     - .04782 * sSO - .08095 * sHR AS sH
	  FROM ( -- p5
	  SELECT *,
	     sHR *(BFP - dBB - dHP - dSO) AS dHR
	  FROM ( -- p4
	  SELECT *,
	    sSO * (BFP - dBB - dHP) AS dSO,
	  HR / (BFP - HBP - BB - SO) AS sHR
	 FROM ( -- p3
	  SELECT *,
	   sBB * (BFP - dIBB - dHP)
	   + dIBB AS dBB,
	   SO / (BFP - HBP - BB) AS sSO
	   FROM ( -- p2
	   SELECT *,
	   sHP * (BFP - dIBB) AS dHP,
	   (BB - IBB) / (BFP - IBB - HBP) as sBB
	   FROM  ( -- p1
	   SELECT p.*, m.nameFirst, m.nameLast, m.throws,
	    BFP * .0074 AS diBB,
	   HBP / (BFP - IBB) AS sHP
	   FROM pitching p inner join master m
	   ON p.idxLahman=m.idxLahman
	         ) p1
	          ) p2
	          ) p3
	         ) p4
	        ) p5
	       ) p6
	      ) p7
	     ) p8
	    ) p9
	   ) p_w_d
	;

To better understand DIPS ERA, let’s look at the distribution of DIPS ERA, as shown in Figure 5-12. Notice that the range and shape of the distribution is similar to ERA.

DIPS ERA distribution and box plot

Figure 5-12. DIPS ERA distribution and box plot

Last year (2003).

What can we learn from looking at DIPS numbers? Well, the top pitchers by ERA come out as the top pitchers by dERA. However, a closer look reveals that a couple of pitchers (LaTroy Hawkins and Guillermo Mota) might have been overperformers. In 2004, this proved to be true. Mota’s ERA went up to 2.14 in the beginning of 2004, when he played with the Dodgers, and then climbed up to 4.81 when he was traded to the Marlins, leaving the Dodgers’ super defense.

In 2004, LaTroy Hawkins played for the Cubs (as their closer), ending the year with a 2.63 ERA. This was higher than his 1.86 ERA with the Twins, probably because of the step down in defensive ability.

	  Min.  1st Qu.   Median    Mean  3rd Qu.    Max.
	 1.048    3.821    4.339   4.360    4.946   6.652

Table 5-2 shows the players with the lowest DIPS ERAs in 2003.

Table 5-2. Five lowest DIPS ERAs in 2003

First name

Last name

Year

Team

League

ERA

DIPS ERA

Outs pitched

Eric

Gagne

2003

LAN

NL

1.202

1.006

247

John

Smoltz

2003

ATL

NL

1.119

1.691

193

Pedro

Martinez

2003

BOS

AL

2.218

2.236

560

LaTroy

Hawkins

2003

MIN

AL

1.862

2.609

232

Guillermo

Mota

2003

LAN

NL

1.971

2.609

315

Last 10 years (1994–2003).

	 Min.    1st Qu.    Median    Mean  3rd Qu.    Max.
	1.048      3.895     4.419   4.409    4.943   7.676

Table 5-3 shows the five best pitcher seasons (ranked by DIPS ERA) between 1994 and 2003.

Table 5-3. Five best DIPS ERAs, 1994–2003

First name

Last name

Year

Team

League

ERA

DIPS ERA

Outs pitched

Eric

Gagne

2003

LAN

NL

1.202

1.006

247

Pedro

Martinez

1999

BOS

AL

2.067

1.519

640

John

Smoltz

2003

ATL

NL

1.119

1.691

193

Mariano

Rivera

1996

NYA

AL

2.09

1.768

323

Eric

Gagne

2002

LAN

NL

1.968

1.829

247

Last 50 years (1955–2003).

DIPS ERA gives us some interesting insight into Dennis Eckersley’s 1990 season. Eckersley always ranks at the top of pitching lists for this performance, a 0.61 ERA for the season. DIPS tells us something interesting: Eckersley got very lucky. He still had a super season, but an expected ERA of 1.68 based on his strikeouts, home runs, and walks is much less remarkable.

	 Min.   1st Qu.   Median     Mean   3rd Qu.    Max.    NA's
	1.048     3.816    4.276    4.291    4.747   7.676  141.000

Table 5-4 shows the five best pitcher seasons over the past 50 years, by DIPS ERA.

Table 5-4. Five best DIPS ERAs, 1994–2003

First name

Last name

year

Team

League

ERA

DIPS ERA

Outs pitched

Eric

Gagne

2003

LAN

NL

1.202

1.006

247

Pedro

Martinez

1999

BOS

AL

2.067

1.519

640

Dave

Smith

1987

HOU

NL

1.65

1.643

180

Dennis

Eckersley

1990

OAK

AL

0.614

1.68

220

John

Smoltz

2003

ATL

NL

1.119

1.691

193

Lucky and Unlucky Players

As I noted earlier, when a ball is put into play, it’s the defense’s responsibility to convert it into an out. If a pitcher is lucky, the defense will convert a lot of balls in play into outs, and a pitcher will have very few earned runs. If a pitcher is unlucky, the defense will miss a lot of balls, and a pitcher will have many earned runs. DIPS is designed to estimate a pitcher’s ERA independent of defense. So, we can use DIPS to determine which pitchers were lucky and which were unlucky.

We’ll just calculate the difference between actual ERA and dERA to measure how lucky or unlucky each pitcher has been. Interestingly, the five luckiest and five unluckiest players had similar DIPS ERAs. The lucky players had great ERAs (all under 3) and the unlucky players had poor ERAs (all over 7).

Here’s the SQL code I used. It’s almost identical to the earlier code, except I added a luck variable to measure the difference between DIPS ERA and ERA.

	select nameFirst, nameLast, yearID, teamID, lgID, round(ER / IPOuts * 27,3) ERA, dERA,
	    round(dERA - ERA, 3) AS luck, IPOuts
	FROM (select *, round(dER * 9  / dIP, 3) AS dERA
	FROM (select *,
	(dH-dHR)*.49674 + dHR*1.294375 + (dBB-dIBB)*.3325 + dIBB*.0864336 +
	dSO*(-.084691) + dHP*.3077 + (BFP-dHP-dBB-dSO-dH)*(-.082927) AS dER
	FROM (select *,
	(1.048 * (BFP - dBB - dHP - dSO - dH) + dSO) / 3 AS dIP
	FROM (select *,
	sH * (BFP - dHR - dBB - dSO - dHP) + dHR AS dH
	FROM (SELECT *,
	.304396 + .002321 * (CASE WHEN throws='L' THEN 1 ELSE 0 END) - .04782 * sSO - .08095 
* sHR AS sH
	FROM (SELECT *,
	sHR * (BFP - dBB - dHP - dSO) AS dHR
	FROM (SELECT *,
	sSO * (BFP - dBB - dHP) AS dSO,
	HR / (BFP - HBP - BB - SO) AS sHR
	FROM (select *,
	sBB * (BFP - dIBB - dHP) + dIBB AS dBB,
	SO / (BFP - HBP - BB) AS sSO
	FROM (select *,
	sHP * (BFP - dIBB) AS dHP,
	(BB - IBB) / (BFP - IBB - HBP) as sBB
	FROM  (select p.*, m.nameFirst, m.nameLast, m.throws,
	BFP * .0074 AS diBB,
	HBP / (BFP - IBB) AS sHP
	FROM pitching p inner join master m
	ON p.playerID=m.playerID
	where yearID>1954 AND IPOuts > 161
	) p1) p2) p3) p4) p5) p6) p7) p8) p9) p_w_d
	ORDER BY luck DESC
	LIMIT 5
	;

Table 5-5 shows the luckiest pitcher seasons; Table 5-6 shows the unluckiest pitcher seasons. Interestingly, Roy Halladay made the list in 2000. He’s now Toronto’s ace pitcher, and he won the Cy Young Award in 2003.

Table 5-5. Lucky pitchers

First name

Last name

year

Team

League

ERA

DIPS ERA

Diff

Outs pitched

Terry

Fox

1961

DET

AL

1.413

4.334

2.924

172

Darold

Knowles

1972

OAK

AL

1.371

4.227

2.857

197

Dave

Tobik

1981

DET

AL

2.685

5.44

2.75

181

Jerry

Bell

1972

ML4

AL

1.656

4.386

2.726

212

Terry

Forster

1985

ATL

NL

2.275

4.971

2.691

178

Table 5-6. Unlucky pitchers

First name

Last name

year

Team

League

ERA

DIPS ERA

Diff

Outs pitched

Roy

Halladay

2000

TOR

AL

10.64

5.972

-4.66

203

Andy

Ashby

1993

COL

NL

8.5

4.476

-4.02

162

Andy

Larkin

1998

FLO

NL

9.643

5.887

-3.75

224

Jesse

Jefferson

1976

CHA

AL

8.519

4.987

-3.53

187

Bobby

Ayala

1998

SEA

AL

7.228

3.844

-344

226

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.87