Measure a pitcher’s performance independent of the fielders’ performance using DIPS.
In December 1999, baseball fan Voros McCracken came up with a new method of measuring pitching. McCracken started to wonder whether a pitcher could really do anything about balls in play; were outs from balls in play a function of a pitcher’s skill, the defense’s skill, or dumb luck? He set out to test this hypothesis and discovered (much to his surprise) that it wasn’t pitcher skill. He concluded that what happens after a ball is put in play depends on the defense. Only on walks, strikeouts, and home runs is the defense not involved.
To help measure a pitcher’s performance independent of the fielders’ performance, McCracken came up with a system he called defensive independent pitching stats (DIPS). If you want to know more about the story behind this formula, Michael Lewis’s book Moneyball has a nice write-up on Voros McCracken. If you want to learn how the formula works, read on.
The formula for DIPS is, unfortunately, a little complicated. See the sample code that follows for an explanation of DIPS calculations. I’ll just launch into the code to explain how to calculate this measurement. (I used Voros McCracken’s own explanation for DIPS 2.0 numbers to make the following calculations.)
The process proceeds in two phases. First, a set of defense-independent measurements are calculated: DIPS intentional walks allowed (dIBB), DIPS hit batsmen (dHP), DIPS total walks (dBB), DIPS home runs (dHR), DIPS hits (dH), and DIPS innings pitched (dIP). A couple of other numbers are calculated that are used in later calculations. A few more variables are calculated based on a mix of actual values and DIPS values: strikeouts per ball in play (sSO), home runs per ball in play (sHR), and hits per ball in play (sH). These also are used in some later calculations. Finally, DIPS earned runs (dER) and DIPS ERA (dERA) are calculated.
Here is some sample code for calculating DIPS in R:
# 4 dIBB <- BFP * .0074 # 5 sHP <- HBP / (BFP - IBB) dHP <- sHP * (BFP - dIBB) # 6 sBB <- (BB - IBB) / (BFP - IBB - HBP) dBB <- sBB * (BFP - dIBB - dHP) + dIBB # 7 sSO <- SO / (BFP - HBP - BB) dSO <- sSO * (BFP - dBB - dHP) # 8 sHR <- HR / (BFP - HBP - BB - SO) dHR <- sHR * (BFP - dBB - dHP - dSO) # 9 sH <- .304396 + .002321 * (throws == "L") - .04782 * sSO - .08095 * sHR # 10 dH <- sH * (BFP - dHR - dBB - dSO - dHP) + dHR # 11 dIP <- (1.048 * (BFP - dBB - dHP - dSO - dH) + dSO) / 3 # 12 dER <- (dH-dHR)*.49674 + dHR*1.294375 + (dBB-dIBB)*.3325 + dIBB*.0864336 + dSO*(- .084691) + dHP*.3077 + (BFP-dHP-dBB-dSO-dH)*(-.082927) # 13 dERA <- dER * 9 / dIP Summary Statistics # calculate summary stats for 2003 > summary(subset(dERA, yearID == 2003 & BFP > 249))
To compute this in SQL, you can use the following code. This code creates a temporary table called dipsERAs from which you can query DIPS ERA results. Notice the nine subqueries in this statement; I calculated each formula in sequence, just as I did in the R code.
CREATE TEMPORARY TABLE dipsERAs AS SELECT nameFirst, nameLast, idxTeams, lgID, ERA, dERA, IPOuts FROM ( -- p_w_d SELECT *, dER * 9 / dIP AS dERA FROM ( -- p9 SELECT *, (dH-dHR)*.49674 + dHR*1.294375 + (dBB-dIBB)*.3325 + dIBB*.0864336 + dSO*(-.084691) + dHP*.3077 + (BFP-dHP-dBB-dSO-dH)*(-.082927) AS dER FROM ( -- p8 SELECT *, (1.048 * (BFP - dBB - dHP - dSO - dH) + dSO) / 3 AS dIP FROM ( -- p7 SELECT *, sH * (BFP - dHR - dBB - dSO - dHP) + dHR AS dH FROM ( -- p6 SELECT *, .304396 + .002321 * (CASE WHEN throws='L' THEN 1 ELSE 0 END) - .04782 * sSO - .08095 * sHR AS sH FROM ( -- p5 SELECT *, sHR *(BFP - dBB - dHP - dSO) AS dHR FROM ( -- p4 SELECT *, sSO * (BFP - dBB - dHP) AS dSO, HR / (BFP - HBP - BB - SO) AS sHR FROM ( -- p3 SELECT *, sBB * (BFP - dIBB - dHP) + dIBB AS dBB, SO / (BFP - HBP - BB) AS sSO FROM ( -- p2 SELECT *, sHP * (BFP - dIBB) AS dHP, (BB - IBB) / (BFP - IBB - HBP) as sBB FROM ( -- p1 SELECT p.*, m.nameFirst, m.nameLast, m.throws, BFP * .0074 AS diBB, HBP / (BFP - IBB) AS sHP FROM pitching p inner join master m ON p.idxLahman=m.idxLahman ) p1 ) p2 ) p3 ) p4 ) p5 ) p6 ) p7 ) p8 ) p9 ) p_w_d ;
To better understand DIPS ERA, let’s look at the distribution of DIPS ERA, as shown in Figure 5-12. Notice that the range and shape of the distribution is similar to ERA.
What can we learn from looking at DIPS numbers? Well, the top pitchers by ERA come out as the top pitchers by dERA. However, a closer look reveals that a couple of pitchers (LaTroy Hawkins and Guillermo Mota) might have been overperformers. In 2004, this proved to be true. Mota’s ERA went up to 2.14 in the beginning of 2004, when he played with the Dodgers, and then climbed up to 4.81 when he was traded to the Marlins, leaving the Dodgers’ super defense.
In 2004, LaTroy Hawkins played for the Cubs (as their closer), ending the year with a 2.63 ERA. This was higher than his 1.86 ERA with the Twins, probably because of the step down in defensive ability.
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.048 3.821 4.339 4.360 4.946 6.652
Table 5-2 shows the players with the lowest DIPS ERAs in 2003.
Table 5-2. Five lowest DIPS ERAs in 2003
First name |
Last name |
Year |
Team |
League |
ERA |
DIPS ERA |
Outs pitched |
---|---|---|---|---|---|---|---|
Eric |
Gagne |
2003 |
LAN |
NL |
1.202 |
1.006 |
247 |
John |
Smoltz |
2003 |
ATL |
NL |
1.119 |
1.691 |
193 |
Pedro |
Martinez |
2003 |
BOS |
AL |
2.218 |
2.236 |
560 |
LaTroy |
Hawkins |
2003 |
MIN |
AL |
1.862 |
2.609 |
232 |
Guillermo |
Mota |
2003 |
LAN |
NL |
1.971 |
2.609 |
315 |
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.048 3.895 4.419 4.409 4.943 7.676
Table 5-3 shows the five best pitcher seasons (ranked by DIPS ERA) between 1994 and 2003.
Table 5-3. Five best DIPS ERAs, 1994–2003
First name |
Last name |
Year |
Team |
League |
ERA |
DIPS ERA |
Outs pitched |
---|---|---|---|---|---|---|---|
Eric |
Gagne |
2003 |
LAN |
NL |
1.202 |
1.006 |
247 |
Pedro |
Martinez |
1999 |
BOS |
AL |
2.067 |
1.519 |
640 |
John |
Smoltz |
2003 |
ATL |
NL |
1.119 |
1.691 |
193 |
Mariano |
Rivera |
1996 |
NYA |
AL |
2.09 |
1.768 |
323 |
Eric |
Gagne |
2002 |
LAN |
NL |
1.968 |
1.829 |
247 |
DIPS ERA gives us some interesting insight into Dennis Eckersley’s 1990 season. Eckersley always ranks at the top of pitching lists for this performance, a 0.61 ERA for the season. DIPS tells us something interesting: Eckersley got very lucky. He still had a super season, but an expected ERA of 1.68 based on his strikeouts, home runs, and walks is much less remarkable.
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.048 3.816 4.276 4.291 4.747 7.676 141.000
Table 5-4 shows the five best pitcher seasons over the past 50 years, by DIPS ERA.
Table 5-4. Five best DIPS ERAs, 1994–2003
First name |
Last name |
year |
Team |
League |
ERA |
DIPS ERA |
Outs pitched |
---|---|---|---|---|---|---|---|
Eric |
Gagne |
2003 |
LAN |
NL |
1.202 |
1.006 |
247 |
Pedro |
Martinez |
1999 |
BOS |
AL |
2.067 |
1.519 |
640 |
Dave |
Smith |
1987 |
HOU |
NL |
1.65 |
1.643 |
180 |
Dennis |
Eckersley |
1990 |
OAK |
AL |
0.614 |
1.68 |
220 |
John |
Smoltz |
2003 |
ATL |
NL |
1.119 |
1.691 |
193 |
As I noted earlier, when a ball is put into play, it’s the defense’s responsibility to convert it into an out. If a pitcher is lucky, the defense will convert a lot of balls in play into outs, and a pitcher will have very few earned runs. If a pitcher is unlucky, the defense will miss a lot of balls, and a pitcher will have many earned runs. DIPS is designed to estimate a pitcher’s ERA independent of defense. So, we can use DIPS to determine which pitchers were lucky and which were unlucky.
We’ll just calculate the difference between actual ERA and dERA to measure how lucky or unlucky each pitcher has been. Interestingly, the five luckiest and five unluckiest players had similar DIPS ERAs. The lucky players had great ERAs (all under 3) and the unlucky players had poor ERAs (all over 7).
Here’s the SQL code I used. It’s almost identical to the earlier code, except I added a luck variable to measure the difference between DIPS ERA and ERA.
select nameFirst, nameLast, yearID, teamID, lgID, round(ER / IPOuts * 27,3) ERA, dERA, round(dERA - ERA, 3) AS luck, IPOuts FROM (select *, round(dER * 9 / dIP, 3) AS dERA FROM (select *, (dH-dHR)*.49674 + dHR*1.294375 + (dBB-dIBB)*.3325 + dIBB*.0864336 + dSO*(-.084691) + dHP*.3077 + (BFP-dHP-dBB-dSO-dH)*(-.082927) AS dER FROM (select *, (1.048 * (BFP - dBB - dHP - dSO - dH) + dSO) / 3 AS dIP FROM (select *, sH * (BFP - dHR - dBB - dSO - dHP) + dHR AS dH FROM (SELECT *, .304396 + .002321 * (CASE WHEN throws='L' THEN 1 ELSE 0 END) - .04782 * sSO - .08095 * sHR AS sH FROM (SELECT *, sHR * (BFP - dBB - dHP - dSO) AS dHR FROM (SELECT *, sSO * (BFP - dBB - dHP) AS dSO, HR / (BFP - HBP - BB - SO) AS sHR FROM (select *, sBB * (BFP - dIBB - dHP) + dIBB AS dBB, SO / (BFP - HBP - BB) AS sSO FROM (select *, sHP * (BFP - dIBB) AS dHP, (BB - IBB) / (BFP - IBB - HBP) as sBB FROM (select p.*, m.nameFirst, m.nameLast, m.throws, BFP * .0074 AS diBB, HBP / (BFP - IBB) AS sHP FROM pitching p inner join master m ON p.playerID=m.playerID where yearID>1954 AND IPOuts > 161 ) p1) p2) p3) p4) p5) p6) p7) p8) p9) p_w_d ORDER BY luck DESC LIMIT 5 ;
Table 5-5 shows the luckiest pitcher seasons; Table 5-6 shows the unluckiest pitcher seasons. Interestingly, Roy Halladay made the list in 2000. He’s now Toronto’s ace pitcher, and he won the Cy Young Award in 2003.
Table 5-5. Lucky pitchers
First name |
Last name |
year |
Team |
League |
ERA |
DIPS ERA |
Diff |
Outs pitched |
---|---|---|---|---|---|---|---|---|
Terry |
Fox |
1961 |
DET |
AL |
1.413 |
4.334 |
2.924 |
172 |
Darold |
Knowles |
1972 |
OAK |
AL |
1.371 |
4.227 |
2.857 |
197 |
Dave |
Tobik |
1981 |
DET |
AL |
2.685 |
5.44 |
2.75 |
181 |
Jerry |
Bell |
1972 |
ML4 |
AL |
1.656 |
4.386 |
2.726 |
212 |
Terry |
Forster |
1985 |
ATL |
NL |
2.275 |
4.971 |
2.691 |
178 |
Table 5-6. Unlucky pitchers
First name |
Last name |
year |
Team |
League |
ERA |
DIPS ERA |
Diff |
Outs pitched |
---|---|---|---|---|---|---|---|---|
Roy |
Halladay |
2000 |
TOR |
AL |
10.64 |
5.972 |
-4.66 |
203 |
Andy |
Ashby |
1993 |
COL |
NL |
8.5 |
4.476 |
-4.02 |
162 |
Andy |
Larkin |
1998 |
FLO |
NL |
9.643 |
5.887 |
-3.75 |
224 |
Jesse |
Jefferson |
1976 |
CHA |
AL |
8.519 |
4.987 |
-3.53 |
187 |
Bobby |
Ayala |
1998 |
SEA |
AL |
7.228 |
3.844 |
-344 |
226 |
18.116.42.87