Use fielding runs to measure a player’s defensive performance with the linear weights system.
Fielding runs (FR) is a formula for measuring a defense’s contribution compared to an average defense. FR is the fielding part of the linear weights system for measuring player contributions. It measures the number of runs that a defense prevented from scoring compared to the league average fielder.
Computing FR is a little bit more complicated than computing some other formulas in this chapter. First, I will explain how the system works. Then, I’ll explain how to compute fielding runs in four steps.
For each measurement (A, PO, DP, E, PB), we compare the total fielded by each player to the league average. We call the league averages the expected numbers.
We adjust the expected amounts by the time each player fielded each position. Ideally, we would use the number of batters or balls in play. But we don’t know this information. We do know innings played in each position, so we will use that measurement instead.
We weight the impact of each defensive statistic differently. Assists count more than putouts, double plays, and errors (assists count twice as much for infielders) because they are harder. All of those count more than passed balls (a catcher-only statistic that’s half the pitcher’s fault). Outfield assists count twice as much as infield assists (because they are harder). Finally, double plays and putouts for first basemen don’t count (because they’re so easy).
We adjust the catcher numbers to take away strikeouts (for which catchers receive credit).
That’s the idea behind fielder runs: it makes all of these adjustments to give a fair measurement of the runs saved by each player.
To see how this works, let’s look at the formula for different players:
The basic form of the linear weights formula compares a player’s assists, putouts, double plays, and errors to league averages:
FR = .2 * (2 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))
The formula for catchers is similar. However, there is a correction to the number of putouts: catchers are credited with a putout for every strikeout, which is really due to the pitcher.
FR = .2 * (2 (A – expA) + ((PO – SO) – (expPO – expSO)) + (DP – expDP) – .5 * (PB – expPB) – (E – expE))
First base is considered such an easy position that first basemen don’t get credit for putouts.
FR = .2 * (2 (A – expA) – (E – expE))
Assists are harder for outfielders than for infielders. Otherwise, this formula is similar.
FR = .2 * (4 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))
Pitchers don’t make as many defensive plays as infielders, so the weight is different.
FR = .1 * (2 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))
Designated hitters don’t contribute anything to fielding.
FR = 0
Now, let’s look at the procedure to calculate FR.
The first step in calculating FR is to calculate the total number of assists (A), putouts (PO), double plays (DP), errors (E), passed balls (PB), strikeouts (SO), and balls in play (BIP). Later, we will use these numbers to estimate the league average at each position.
We need to split up fielding plays between different players on teams, according to the time that each player played each position, so we will use outs played. We also need the number of strikeouts per team (to adjust catcher numbers), so we will total outs played (InnOuts) and strikeouts (SO) for each team.
We calculate the expected number of assists, putouts, double plays, passed balls, strikeouts, and errors that we expect each player to make (or see). To do this, we first calculate the number seen by the average player in each league (in each year). We then adjust this number by the number of innings played by each player (in each year).
This formula is a little trickier than pitching runs and batting runs because the fielding runs formula varies slightly by position. To calculate the statistics shown here with MySQL, I used the following code. Incidentally, I initially tried doing all the joins in one statement, but the code ran too slowly, so I split it into smaller statements.
-- calculate league totals for pitching create temporary table pitching_lg_totals as select sum(p.BFP - p.BB - p.SO - p.HR - p.HBP) as lgBIP, sum(p.SO) as lgSO, t.yearID, t.lgID from pitching p inner join teams t on p.idxTeams=t.idxTeams group by yearID, lgID; -- calculate team totals for pitching create temporary table pitching_tm_totals as select sum(p.BFP - p.BB - p.SO - p.HR - p.HBP) as tmBIP, sum(p.SO) as tmSO, t.yearID, t.idxTeamsFranchises, t.idxTeams, t.lgID from pitching p inner join teams t on p.idxTeams=t.idxTeams group by yearID, idxTeamsFranchises, idxTeams; create index plt_idx on pitching_lg_totals(yearID); create index ptt_idx on pitching_tm_totals(yearID); -- join the pitching totals together create temporary table pitching_totals as select plt.*, ptt.tmBIP, ptt.tmSO, ptt.idxTeams from pitching_lg_totals plt inner join pitching_tm_totals ptt where plt.yearID=ptt.yearID and plt.lgID=ptt.lgID; -- calculate league totals for fielding create temporary table fielding_lg_totals as select sum(f.A) as lgA, sum(f.PO) as lgPO, sum(f.DP) as lgDP, sum(f.E) as lgE, sum(f.PB) as lgPB, t.yearID, Pos, t.lgID from fielding f inner join teams t on f.idxTeams=t.idxTeams group by yearID, pos, lgID; -- calculate team totals for fielding create temporary table fielding_tm_totals as select sum(f.InnOuts) tmInnOuts, t.yearID, f.idxTeams, f.pos, t.lgID from fielding f inner join teams t on f.idxTeams=t.idxTeams group by yearID, idxTeams, pos; create index ftt_idx on fielding_lg_totals(yearID,pos); create index flt_idx on fielding_tm_totals(yearID,pos); -- join the fielding totals together create temporary table fielding_totals as select flt.*, ftt.tmInnOuts, ftt.idxTeams from fielding_tm_totals ftt inner join fielding_lg_totals flt ON flt.yearID=ftt.yearID AND flt.pos=ftt.pos AND flt.lgID=ftt.lgID; create index ft_idx on fielding_totals(idxTeams); create index pt_idx on pitching_totals(idxTeams); -- put together the fielding and pitching totals create temporary table fielding_aggregates AS select f.idxTeams, f.yearID, f.Pos, p.lgBIP, p.lgSO, p.tmBIP, p.tmSO, f.lgA, f.lgPO, f.lgDP, f.lgE, f.lgPB, f.tmInnOuts from fielding_totals f inner join pitching_totals p on f.idxTeams=p.idxTeams; create index fa_idx on fielding_aggregates(idxTeams, pos); -- calculate expected fielding totals create temporary table fielding_w_expected AS select f.*, a.lgBIP, a.lgSO, a.tmBIP, a.tmSO, a.lgA, a.lgPO, a.lgDP, a.lgE, a.lgPB, a.tmInnOuts, lgA * tmBIP / lgBIP * InnOuts / tmInnOuts as expA, lgE * tmBIP / lgBIP * InnOuts / tmInnOuts as expE, (lgPO - if(f.pos='C',1,0) * lgSO) * tmBIP / lgBIP * InnOuts / tmInnOuts as expPO, lgPB * tmBIP / lgBIP * InnOuts / tmInnOuts as expPB, lgDP * tmBIP / lgBIP * InnOuts / tmInnOuts as expDP from fielding f inner join fielding_aggregates a inner join teams t where f.idxTeams=a.idxTeams AND f.idxTeams=t.idxTeams AND f.pos=a.pos; -- finally, calculate fielding runs for each player create table fielding_w_fr AS select f.*, ((0.2 - 0.1 * if(Pos="P",1,0)) * ((2 + 2 * if(Pos IN ("CF","LF","RF"),1,0)) * (A - expA) + ((PO - if(Pos="C",1,0) * tmSO * (InnOuts / tmInnOuts)) - expPO) * if(Pos="1B",0,1) + (1 + if(Pos IN ("CF","LF","RF"),1,0)) * (DP - expDP) * if(Pos="1B",0,1) - (E - expE) - 0.5 * ifnull(PB,0) - ifnull(expPB,0))) as FR FROM fielding_w_expected f;
Here’s the code to load this into R:
> library(RMySQL) Loading required package: DBI > drv <-dbDriver("MySQL") > con <- dbConnect(drv, username="jadler", dbname="bbdatabank", host="localhost") > fr.query <- dbSendQuery(con, statement="select * from fielding_w_fr") > fielding_w_fr <- fetch(fr.query, n=-1)
Before about 1910, the figures for batters faced by pitcher (BFP) were not accurately reported in the Baseball Archive data, so these formulas don’t work quite right.
Let’s take a quick look at the distribution of values for FR:
>attach(fielding_w_fr) >summary(subset(FR,yearID > 1910)) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -58.06000 -0.55580 -0.05835 -0.01287 0.44710 38.26000 63855.00000
Most players are clustered around zero FR (meaning they performed at about the average level for a major league player).
Here is the SQL statement I used to select the 10 best fielding seasons by FR:
select t.name, concat(m.nameFirst, ' ', m.nameLast, ' (',t.yearID, ')') as name, f.pos, f.FR from fielding_w_fr f inner join master m inner join teams t where f.idxLahman=m.idxLahman and f.idxTeams=t.idxTeams and yearID>1910 order by FR DESC limit 10;
Here’s the top 10 list:
+----------------------+-----------------------+-----+--------------+ | name | name | pos | FR | +----------------------+-----------------------+-----+--------------+ | Toronto Blue Jays | Orlando Hudson (2003) | 2B | 41.328716186 | | Toronto Blue Jays | Orlando Hudson (2004) | 2B | 35.293815642 | | Toronto Blue Jays | Alex Gonzalez (2001) | SS | 34.661324816 | | Colorado Rockies | Neifi Perez (2000) | SS | 34.480802536 | | Los Angeles Dodgers | Alex Cora (2003) | 2B | 33.682829918 | | Oakland Athletics | Eric Chavez (2003) | 3B | 33.399151186 | | Tampa Bay Devil Rays | Felix Martinez (2000) | SS | 32.511943136 | | Kansas City Royals | Rey Sanchez (2001) | SS | 31.642047458 | | Baltimore Orioles | Miguel Tejada (2004) | SS | 30.849571158 | | Oakland Athletics | Bobby Crosby (2004) | SS | 28.824352106 | +----------------------+-----------------------+-----+--------------+
There are a couple of interesting things to notice. First, the highest-impact players were shortstops and second basemen. Second, it seems as though the best defensive seasons occurred very recently (over the past few years). Well, it turns out that we can see why this happened if we also look at the 10 worst seasons of all time:
select t.name, concat(m.nameFirst, ' ', m.nameLast, ' (',t.yearID, ')') as name, f.pos, f.FR from fielding_w_fr f inner join master m inner join teams t where f.idxLahman=m.idxLahman and f.idxTeams=t.idxTeams and yearID>1910 and FR is not null and pos IN ('SS', '2B') order by FR limit 10;
As you might notice, I am only including shortstops and second basemen.
+------------------+------------------------+-----+---------------+ | name | name | pos | FR | +------------------+------------------------+-----+---------------+ | Minnesota Twins | Luis Rivas (2001) | 2B | -56.353273254 | | New York Yankees | Derek Jeter (2002) | SS | -41.432782878 | | New York Yankees | Derek Jeter (2000) | SS | -37.447310276 | | Minnesota Twins | Luis Rivas (2003) | 2B | -33.715415308 | | Seattle Mariners | Bret Boone (2004) | 2B | -33.558463106 | | New York Yankees | Derek Jeter (2003) | SS | -33.157131056 | | Minnesota Twins | Jay Canizaro (2000) | 2B | -31.705916250 | | Anaheim Angels | David Eckstein (2004) | SS | -29.177353744 | | Minnesota Twins | Cristian Guzman (2003) | SS | -29.033513958 | | Minnesota Twins | Luis Rivas (2002) | 2B | -29.029130766 | +------------------+------------------------+-----+---------------+
Yup, that’s right: the worst seasons also occurred within the past few years. Right now, we’re seeing the largest range of fielding ability. Because fielding runs compared players to league average, players are getting much higher and lower scores than usual. (I took a quick look at assists, putouts, errors, and double plays to see whether there was anything funny about the past few years. It turns out that there wasn’t. The figures were completely consistent with the past.)
As a Yankees fan (hey, I grew up in Northern Jersey, got a problem with that?), it saddens me a little bit to see Jeter rank as one of the worst defensive shortstops ever. In case you’re wondering, Derek Jeter had another below-average season in 2004. But, thanks to A-Rod’s presence at third base, his fielding runs total was –13.4097. I guess you could consider his Golden Glove Award in 2004 to be fair if it were awarded for the biggest improvement.
The vast majority of player seasons are clustered around zero, so we plot the distribution with a little more resolution than usual:
> hist(subset(FR,yearID>1994), breaks=100)
As you can see in Figure 5-15, this is a very tightly clustered distribution, with almost all fielders breaking even and only a few giving away or taking more than a couple of runs through their fielding ability. This is not surprising; the statistic is designed to have an average of zero, and part-time players can’t do much better or worse than zero.
Finally, let’s look at how fielding runs have changed over time:
> fielding_w_fr$decade <- as.factor(floor(fielding_w_fr$yearID/10)*10) > boxplot(FR~decade,data=fielding_w_fr,subset=yearID>1910, + pars=c(xlab="decade", ylab="Fielding Runs"), + range=0)
As this plot shows, there is a huge difference in fielding runs between the best and worst players over the past decade.
To calculate pitching runs, you need to know fielding runs; see “Measure Pitching with Linear Weights” [Hack #49] for another way to use this.
3.22.224.81