Hacks 40–59: Introduction

This chapter is a guide to some popular formulas for understanding baseball: common statistics that you remember from childhood and see displayed on the scoreboard during every game, more complicated statistics that illustrate the complexity of the game more clearly, and some advanced statistics developed by sabermetricians over the past 30 years.

The hacks in this chapter are a little different from the others in the book: each hack presents a single formula (or a set of formulas) as an equation (or a set of equations). I explain what each formula is designed to measure, and I present summary statistics for each formula that show the distribution of values by team and player over the past decade. I also give samples of a few exceptional seasons (by teams and/or players).

How I Chose the Formulas

I selected the hacks in this chapter by picking the simplest, cleverest, and most useful formulas I could find. A few of the best systems for evaluating players, such as Baseball Prospectus’s PECOTA, Bill James’s Win Shares, and TangoTiger’s Base Runs, work well in practice but are too difficult to be called hacks. I was looking for simple, clever tricks that would be easy to understand.

Summary Statistics for the Formulas

In this chapter, each hack gives you summary statistics that provide you with some idea of what good, bad, and average values are for each statistic. This is a trick that I learned at work while analyzing data on network security, advertising effectiveness, and credit card fraud. When a statistician (or an econometrician) first gets a data set, he calculates the summary statistics. These statistics are designed to give a feel for the data: what are the highest and lowest values, how are the values distributed, and which values are most common.

I didn’t have a good feeling for what good or bad values meant for many different formulas, so I calculated summary statistics for each formula. Most readers probably know that a batting average of .300 is a good number, .260 is about average, and .200 is poor. But what is a good OPS, WHIP, or ISO?

I provide six distribution statistics for each formula:

Minimum

The smallest value seen.

25th percentile

The value at the 25th percentile (25% of values are smaller, 75% are larger).

Median

The value in the middle. Half of the values are smaller, half are larger.

Mean

The average value. This is not exactly the same as the median. You can tell something about the shape of the distribution by comparing the median and the mean.

75th percentile

The value at the 75th percentile (75% of values are smaller, 25% are larger).

Maximum

The highest value seen.

On the team level, I don’t eliminate any players, regardless of their number of appearances. However, when analyzing single players, I often eliminate players who played in only a handful of games or had a handful of at bats. Many players play in only a few major league games (often during September, when the roster expands from 25 to 40 players). I exclude these players because including them would throw off the distribution of each measurement enormously and would not accurately reflect the level of talent in the game. Many measurements would have minima (and maybe even medians) at zero without this refinement. For more on how I thought about this problem, see “Significant Number of At Bats” [Hack #63] .

Using Formulas for Fantasy Baseball

If you’re an owner in a Rotisserie league, you’ll probably read this chapter looking for ideas on how to outdraft and outmanage your opponents. Most of the formulas in this chapter try to do several things:

Properly value a player’s total contribution

Some measures capture only one part of a player’s contributions and ignore others. For example, batting average doesn’t include walks. Some players, such as Garret Anderson, have high batting averages (a .298 career AVG) but never walk (a .327 career OBP), so they create a lot of outs.

Isolate a single player’s contribution

Traditional measurements of performance often mix the contributions of several players. For example, runs batted in are as much a function of a player’s hitting ability as is his place in the batting order.

Separate luck from skill

Some measures capture things that a player can’t really control. For example, the winning pitcher in a game is (usually) the pitcher of record at the time his team takes the lead. Obviously, pitchers don’t directly control how their team’s batters fare against the opposing team’s pitchers. A pitcher who gives up a lot of runs is not likely to win a lot of games, but a pitcher who gives up only a few runs can still lose a lot of games. (For example, during the 2005 season, Cleveland Indians starting pitcher Kevin Millwood won 9 games and lost 11 but had an incredible 2.86 ERA.)

Not all of these improvements will help fantasy owners. Let’s start by considering whether it helps to measure a player’s total contribution properly. Most fantasy leagues value players using simple measures such as batting average, home runs, runs batted in, earned run average, and strikeouts. So properly measuring a player’s total contribution might not matter: if the league doesn’t care about walks, it doesn’t make sense to consider walks when choosing players.

Isolating a single player’s contributions might be important if you know the players surrounding a player have changed. If you know a ground ball pitcher has moved to a team with a better defensive infield, you can use this information to better evaluate players.

Finally, let’s consider measurements that separate luck and skill. These are very important for fantasy use because they help show undervalued and overvalued players. (In particular, look at some pitching statistics, such as DIPS and CERA.)

Who Came Up with These Things?

As much as I’d like to take credit for these formulas, most of them come from the hard work of other people. I just explain what each formula does, show you how to calculate it easily, and give some distribution statistics. I’ve tried to give proper credit to the inventor of each measurement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.253.2