Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Measure Fielding with Linear Weights

Use fielding runs to measure a player’s defensive performance with the linear weights system.

Fielding runs (FR) is a formula for measuring a defense’s contribution compared to an average defense. FR is the fielding part of the linear weights system for measuring player contributions. It measures the number of runs that a defense prevented from scoring compared to the league average fielder.

The Formula

Computing FR is a little bit more complicated than computing some other formulas in this chapter. First, I will explain how the system works. Then, I’ll explain how to compute fielding runs in four steps.

Comparisons to league average for position: For each measurement (A, PO, DP, E, PB), we compare the total fielded by each player to the league average. We call the league averages the expected numbers.
Adjustment for time played: We adjust the expected amounts by the time each player fielded each position. Ideally, we would use the number of batters or balls in play. But we don’t know this information. We do know innings played in each position, so we will use that measurement instead.
Weight by impact and position: We weight the impact of each defensive statistic differently. Assists count more than putouts, double plays, and errors (assists count twice as much for infielders) because they are harder. All of those count more than passed balls (a catcher-only statistic that’s half the pitcher’s fault). Outfield assists count twice as much as infield assists (because they are harder). Finally, double plays and putouts for first basemen don’t count (because they’re so easy).
Adjustments for strikeouts: We adjust the catcher numbers to take away strikeouts (for which catchers receive credit).

That’s the idea behind fielder runs: it makes all of these adjustments to give a fair measurement of the runs saved by each player.

To see how this works, let’s look at the formula for different players:

Shortstop, second baseman, and third baseman

The basic form of the linear weights formula compares a player’s assists, putouts, double plays, and errors to league averages:

	FR = .2 * (2 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))

Catcher

The formula for catchers is similar. However, there is a correction to the number of putouts: catchers are credited with a putout for every strikeout, which is really due to the pitcher.

	FR = .2 * (2 (A – expA) + ((PO – SO) – (expPO – expSO)) + (DP –
expDP) – .5 * (PB – expPB) – (E – expE))

First baseman

First base is considered such an easy position that first basemen don’t get credit for putouts.

	FR = .2 * (2 (A – expA) – (E – expE))

Outfielders

Assists are harder for outfielders than for infielders. Otherwise, this formula is similar.

	FR = .2 * (4 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))

Pitcher

Pitchers don’t make as many defensive plays as infielders, so the weight is different.

	FR = .1 * (2 (A – expA) + (PO – expPO) + (DP – expDP) – (E – expE))

Designated hitter

Designated hitters don’t contribute anything to fielding.

	FR = 0

Calculating Fielding Runs

Now, let’s look at the procedure to calculate FR.

Step 1: Calculate league totals.

The first step in calculating FR is to calculate the total number of assists (A), putouts (PO), double plays (DP), errors (E), passed balls (PB), strikeouts (SO), and balls in play (BIP). Later, we will use these numbers to estimate the league average at each position.

Step 2: Calculate team totals.

We need to split up fielding plays between different players on teams, according to the time that each player played each position, so we will use outs played. We also need the number of strikeouts per team (to adjust catcher numbers), so we will total outs played (InnOuts) and strikeouts (SO) for each team.

Step 3: Calculate expected values for each player.

We calculate the expected number of assists, putouts, double plays, passed balls, strikeouts, and errors that we expect each player to make (or see). To do this, we first calculate the number seen by the average player in each league (in each year). We then adjust this number by the number of innings played by each player (in each year).

Step 4: Calculate fielding runs for each player.

Finally, we calculate FR using the preceding formulas.

Sample Code

This formula is a little trickier than pitching runs and batting runs because the fielding runs formula varies slightly by position. To calculate the statistics shown here with MySQL, I used the following code. Incidentally, I initially tried doing all the joins in one statement, but the code ran too slowly, so I split it into smaller statements.

	-- calculate league totals for pitching
	create temporary table pitching_lg_totals as
	select sum(p.BFP - p.BB - p.SO - p.HR - p.HBP) as lgBIP,
	   sum(p.SO) as lgSO, t.yearID, t.lgID
	from pitching p inner join teams t
	on p.idxTeams=t.idxTeams
	group by yearID, lgID;

	-- calculate team totals for pitching
	create temporary table pitching_tm_totals
	as select sum(p.BFP - p.BB - p.SO - p.HR - p.HBP) as tmBIP,
	    sum(p.SO) as tmSO, t.yearID, t.idxTeamsFranchises,
	    t.idxTeams, t.lgID
	from pitching p inner join teams t
	on p.idxTeams=t.idxTeams
	group by yearID, idxTeamsFranchises, idxTeams;

	create index plt_idx on pitching_lg_totals(yearID);
	create index ptt_idx on pitching_tm_totals(yearID);

	-- join the pitching totals together
	create temporary table pitching_totals
	as select plt.*, ptt.tmBIP, ptt.tmSO, ptt.idxTeams
	from pitching_lg_totals plt inner join pitching_tm_totals ptt
	where plt.yearID=ptt.yearID and plt.lgID=ptt.lgID;

	-- calculate league totals for fielding
	create temporary table fielding_lg_totals as
	select sum(f.A) as lgA, sum(f.PO) as lgPO, sum(f.DP) as lgDP,
	    sum(f.E) as lgE, sum(f.PB) as lgPB,
	    t.yearID, Pos, t.lgID
	from fielding f inner join teams t
	on f.idxTeams=t.idxTeams
	group by yearID, pos, lgID;

	-- calculate team totals for fielding
	create temporary table fielding_tm_totals as
	select sum(f.InnOuts) tmInnOuts, t.yearID, f.idxTeams, f.pos, t.lgID
	from fielding f inner join teams t
	on f.idxTeams=t.idxTeams
	group by yearID, idxTeams, pos;

	create index ftt_idx on fielding_lg_totals(yearID,pos);
	create index flt_idx on fielding_tm_totals(yearID,pos);

	-- join the fielding totals together
	create temporary table fielding_totals as
	select flt.*, ftt.tmInnOuts, ftt.idxTeams
	from fielding_tm_totals ftt inner join fielding_lg_totals flt
	ON flt.yearID=ftt.yearID AND flt.pos=ftt.pos AND flt.lgID=ftt.lgID;

	create index ft_idx on fielding_totals(idxTeams);
	create index pt_idx on pitching_totals(idxTeams);

	-- put together the fielding and pitching totals
	create temporary table fielding_aggregates AS
	select f.idxTeams, f.yearID, f.Pos,
	    p.lgBIP, p.lgSO, p.tmBIP, p.tmSO,
	    f.lgA, f.lgPO, f.lgDP, f.lgE, f.lgPB,
	    f.tmInnOuts
	from fielding_totals f inner join pitching_totals p
	on f.idxTeams=p.idxTeams;

	create index fa_idx on fielding_aggregates(idxTeams, pos);
	-- calculate expected fielding totals
	create temporary table fielding_w_expected AS
	select f.*,
	   a.lgBIP, a.lgSO, a.tmBIP, a.tmSO,
	   a.lgA, a.lgPO, a.lgDP, a.lgE, a.lgPB,
	   a.tmInnOuts,
	   lgA * tmBIP / lgBIP * InnOuts / tmInnOuts as expA,
	   lgE * tmBIP / lgBIP * InnOuts / tmInnOuts as expE,
	   (lgPO - if(f.pos='C',1,0) * lgSO) *
	     tmBIP / lgBIP * InnOuts / tmInnOuts as expPO,
	   lgPB * tmBIP / lgBIP * InnOuts / tmInnOuts as expPB,
	   lgDP * tmBIP / lgBIP * InnOuts / tmInnOuts as expDP
	from fielding f inner join fielding_aggregates a
	   inner join teams t
	where f.idxTeams=a.idxTeams
	 AND f.idxTeams=t.idxTeams AND f.pos=a.pos;

	-- finally, calculate fielding runs for each player
	create table fielding_w_fr AS
	select f.*,
	 ((0.2 - 0.1 * if(Pos="P",1,0)) *
	 ((2 + 2 * if(Pos IN ("CF","LF","RF"),1,0)) *
	  (A - expA) +
	  ((PO - if(Pos="C",1,0) * tmSO *
	   (InnOuts / tmInnOuts)) - expPO) *
	  if(Pos="1B",0,1) +
	  (1 + if(Pos IN ("CF","LF","RF"),1,0)) *
	  (DP - expDP) * if(Pos="1B",0,1) - (E - expE) -
	  0.5 * ifnull(PB,0) - ifnull(expPB,0))) as FR
	FROM fielding_w_expected f;

Here’s the code to load this into R:

	> library(RMySQL)
	Loading required package: DBI
	> drv <-dbDriver("MySQL")
	> con <- dbConnect(drv, username="jadler", dbname="bbdatabank", host="localhost")
	> fr.query <- dbSendQuery(con, statement="select * from fielding_w_fr")
	> fielding_w_fr <- fetch(fr.query, n=-1)

Before about 1910, the figures for batters faced by pitcher (BFP) were not accurately reported in the Baseball Archive data, so these formulas don’t work quite right.

Summary statistics.

Descriptive statistics.

Let’s take a quick look at the distribution of values for FR:

	>attach(fielding_w_fr)
	>summary(subset(FR,yearID > 1910))
	      Min.    1st Qu.     Median       Mean         3rd Qu.         Max.          NA's
	 -58.06000   -0.55580     -0.05835     -0.01287     0.44710     38.26000   63855.00000

Most players are clustered around zero FR (meaning they performed at about the average level for a major league player).

Top 10.

Here is the SQL statement I used to select the 10 best fielding seasons by FR:

	select t.name,
	  concat(m.nameFirst, ' ', m.nameLast, ' (',t.yearID, ')') as name,
	  f.pos, f.FR
	from fielding_w_fr f inner join master m inner join teams t
	where f.idxLahman=m.idxLahman and f.idxTeams=t.idxTeams
	and yearID>1910
	order by FR DESC
	limit 10;

Here’s the top 10 list:

	+----------------------+-----------------------+-----+--------------+
	| name                 | name                  | pos | FR           |
	+----------------------+-----------------------+-----+--------------+
	| Toronto Blue Jays    | Orlando Hudson (2003) | 2B  | 41.328716186 |
	| Toronto Blue Jays    | Orlando Hudson (2004) | 2B  | 35.293815642 |
	| Toronto Blue Jays    | Alex Gonzalez (2001)  | SS  | 34.661324816 |
	| Colorado Rockies     | Neifi Perez (2000)    | SS  | 34.480802536 |
	| Los Angeles Dodgers  | Alex Cora (2003)      | 2B  | 33.682829918 |
	| Oakland Athletics    | Eric Chavez (2003)    | 3B  | 33.399151186 |
	| Tampa Bay Devil Rays | Felix Martinez (2000) | SS  | 32.511943136 |
	| Kansas City Royals   | Rey Sanchez (2001)    | SS  | 31.642047458 |
	| Baltimore Orioles    | Miguel Tejada (2004)  | SS  | 30.849571158 |
	| Oakland Athletics    | Bobby Crosby (2004)   | SS  | 28.824352106 |
	+----------------------+-----------------------+-----+--------------+

There are a couple of interesting things to notice. First, the highest-impact players were shortstops and second basemen. Second, it seems as though the best defensive seasons occurred very recently (over the past few years). Well, it turns out that we can see why this happened if we also look at the 10 worst seasons of all time:

	select t.name,
	   concat(m.nameFirst, ' ', m.nameLast, ' (',t.yearID, ')') as name,
	   f.pos, f.FR
	from fielding_w_fr f inner join master m inner join teams t
	where f.idxLahman=m.idxLahman and f.idxTeams=t.idxTeams
	and yearID>1910 and FR is not null and pos IN ('SS', '2B')
	order by FR
	limit 10;

As you might notice, I am only including shortstops and second basemen.

	+------------------+------------------------+-----+---------------+
	| name             | name                   | pos | FR            |
	+------------------+------------------------+-----+---------------+
	| Minnesota Twins  | Luis Rivas (2001)      | 2B  | -56.353273254 |
	| New York Yankees | Derek Jeter (2002)     | SS  | -41.432782878 |
	| New York Yankees | Derek Jeter (2000)     | SS  | -37.447310276 |
	| Minnesota Twins  | Luis Rivas (2003)      | 2B  | -33.715415308 |
	| Seattle Mariners | Bret Boone (2004)      | 2B  | -33.558463106 |
	| New York Yankees | Derek Jeter (2003)     | SS  | -33.157131056 |
	| Minnesota Twins  | Jay Canizaro (2000)    | 2B  | -31.705916250 |
	| Anaheim Angels   | David Eckstein (2004)  | SS  | -29.177353744 |
	| Minnesota Twins  | Cristian Guzman (2003) | SS  | -29.033513958 |
	| Minnesota Twins  | Luis Rivas (2002)      | 2B  | -29.029130766 |
	+------------------+------------------------+-----+---------------+

Yup, that’s right: the worst seasons also occurred within the past few years. Right now, we’re seeing the largest range of fielding ability. Because fielding runs compared players to league average, players are getting much higher and lower scores than usual. (I took a quick look at assists, putouts, errors, and double plays to see whether there was anything funny about the past few years. It turns out that there wasn’t. The figures were completely consistent with the past.)

As a Yankees fan (hey, I grew up in Northern Jersey, got a problem with that?), it saddens me a little bit to see Jeter rank as one of the worst defensive shortstops ever. In case you’re wondering, Derek Jeter had another below-average season in 2004. But, thanks to A-Rod’s presence at third base, his fielding runs total was –13.4097. I guess you could consider his Golden Glove Award in 2004 to be fair if it were awarded for the biggest improvement.

Distribution and box plot.

The vast majority of player seasons are clustered around zero, so we plot the distribution with a little more resolution than usual:

	> hist(subset(FR,yearID>1994), breaks=100)

As you can see in Figure 5-15, this is a very tightly clustered distribution, with almost all fielders breaking even and only a few giving away or taking more than a couple of runs through their fielding ability. This is not surprising; the statistic is designed to have an average of zero, and part-time players can’t do much better or worse than zero.

Figure 5-15. Fielding runs distribution and box plot

Finally, let’s look at how fielding runs have changed over time:

	> fielding_w_fr$decade <- as.factor(floor(fielding_w_fr$yearID/10)*10)
	> boxplot(FR~decade,data=fielding_w_fr,subset=yearID>1910,
	+         pars=c(xlab="decade", ylab="Fielding Runs"),
	+         range=0)

As this plot shows, there is a huge difference in fielding runs between the best and worst players over the past decade.

Table of Contents for
Measure Fielding with Linear Weights

Measure Fielding with Linear Weights

The Formula

Calculating Fielding Runs

Step 1: Calculate league totals.

Step 2: Calculate team totals.

Step 3: Calculate expected values for each player.

Step 4: Calculate fielding runs for each player.

Sample Code

Summary statistics.

Descriptive statistics.

Top 10.

Distribution and box plot.

See Also

Table of Contents for Measure Fielding with Linear Weights

Create new playlist

Sign In

Sign Up

Measure Fielding with Linear Weights

The Formula

Calculating Fielding Runs

Step 1: Calculate league totals.

Step 2: Calculate team totals.

Step 3: Calculate expected values for each player.

Step 4: Calculate fielding runs for each player.

Sample Code

Summary statistics.

Descriptive statistics.

Top 10.

Distribution and box plot.

See Also

Table of Contents for
Measure Fielding with Linear Weights