How many wins can you attribute to skill, and how many to luck, in the course of a baseball season?
Baseball is a game of luck and skill, and, fortunately, we can measure both quite accurately. The amount of luck involved is probably higher than most people think, and it’s higher than the Lords of Baseball would like to admit.
First, let’s look at team performance over a season. Take 16 evenly balanced teams that have an equal likelihood of winning or losing any game. The actual results should show a normal distribution with a mean of .500, with some variation. A common measure of variation is the standard deviation, called sigma, which you can calculate by simply taking the sum of the squares of the difference between the team’s actual wins and the expected value, which would be 81 wins in a 162-game season. For a simple example, take four teams with wins of 89, 84, 77, and 73. The mean is 81 and the sum of the squares is 64 + 16 + 16 + 64, or 160. Divide 160 by the number of teams (4) and take the square root, and you get ≈ 6.
Calculating the expected sigma for team wins involves a formula derived for a binomial distribution. The formula is the square root of p x q x n, where p is the probability of success (.5), q is the probability of failure (also .5), and n is the number of samples (162). This gives a value of the square root of .25 x 162 (40.5), or 6.36.
Another characteristic of sigma in a normal distribution is that approximately two-thirds of the values differ from the mean by an amount less than or equal to it, and about 5% will be off by more than double. Only about 1 in 400 will stray more than 3 sigma from the mean, indicating that over the course of a season, 1 team in 20 might be expected to have a win total of either less than 69 or greater than 93.
Now, we can measure exactly the actual standard deviation of team wins for any season or seasons, and we know exactly what the expected variation due to chance or luck should be. Because there is no reason to suspect that the two are correlated, we can say with confidence that the variation due to skill is equal to the square root of the total variation squared, minus the variation due to luck squared.
Let’s look at some numbers. Here is the standard deviation in wins per season for various periods:
period total luck skill 1901-1920 15.0 6.1 13.8 1921-1940 14.3 6.2 12.8 1941-1960 13.6 6.2 12.0 1961-1980 12.0 6.4 10.0 1981-1997 9.8 6.3 7.5 1998-2004 12.7 6.4 10.9
For 1998–2004, 6.42 + 10.92 = 12.72. As you can see, the difference in team skill was reduced almost equal to that of luck for an entire season in the 1980s, but recently it has gone back to historical levels. Of the top 13 variations by league since 1981, 10 have occurred in the 14 league seasons from 1998 to 2004, and only 3 in the 34 league seasons from 1981 to 1997. This variation might be due to a higher spread in team salaries. Subtracting expansion, when variation understandably shoots up, it is still 8 to 2 in favor of the current period.
An interesting question concerns the probability of the best team coming in first over the course of a season. Unfortunately, that question cannot be answered exactly, except for a specific definition of preseason win probabilities for all teams. You can calculate exactly the chances of one team beating another if you know the win probabilities of each team, but if they are the top two teams, the chances of another team beating them both is a function of the league as a whole. Using the normal distribution, here are the values for one team beating another. Because two teams are involved, the standard deviation on the expected difference in wins between the two teams is the square root of 2 times the one-team deviation, which is 9.
difference in expected wins probability team a will between team a and team b finish ahead of team b 0 50% 1 54% 2 59% 4 67% 6 75% 8 81% 10 86% 15 95% 21 99%
The best way to determine the probability of the best team coming in first in a season is to set the preseason win probabilities and run a simulation. To do this, you need to know the probability of any one team beating any other team.
Because team win percentages are almost always between .400 and .600, a reasonable rule of thumb is that the probability of Team A beating Team B in a given game is win percent (a) minus win percent (b) plus .500. If you want to add in a home team advantage, a typical value is to add .040 for the home team, based on actual home winning percentage data for 1961 through today.
However, if you use the actual winning percentage for each team for the season being studied, you will get a variation that is too high in wins in the simulation because the actual winning percentages already include a luck factor that tends to spread the team further from the mean. For example, in 1939 the Yankees set an all-time record for runs scored minus runs allowed (even without Lou Gehrig) and coasted to their fourth pennant in a row. Using actual team winning percentages gives a sigma of about 24 wins for the distribution of all team wins in the simulation of 1,000 seasons, where the actual number was only 19. To get a sigma of 19 in the simulation, you have to use only 80% of the team difference in percentage from .500. So, a .600 team would be simulated at .580, and a .400 team would be simulated at .420. In general, the fraction of actual winning percentage used is sigma skill over sigma total from the first table. This is usually around 80%, except for 1981–1997, when it was closer to 70%.
This raises an important point. Because a good amount of luck is involved, even over the course of a full season, and teams will tend to win 5 or 10 games more or less than what they would win if they achieved their exact expected number of wins based on true talent, teams with high win totals will tend to have more good luck and teams with low win totals will tend to have more bad luck. One way of looking at this is that a .600 team could be a .550 team with good luck or a .650 team with bad luck. However, because there are many more .550 teams, the odds are that it is the former.
You can show this even more dramatically when you look at team winning percentages from one season to the next. Here, the effect is intensified because the factors that brought a team to a successful season can deteriorate, and the expected change due to luck would tend to be neutral. Here are the results of season winning percentage based solely on the previous season:
1981-2004 1902-2004 n prev curr pct n prev curr pct 5 .248 .374 50 9 .277 .384 52 1 .265 .444 24 13 .301 .358 72 3 .332 .473 16 36 .326 .402 56 11 .347 .407 61 71 .349 .399 67 17 .375 .459 33 74 .375 .419 64 40 .399 .461 39 123 .400 .439 61 52 .423 .462 48 144 .424 .445 73 70 .449 .482 36 188 .449 .483 34 102 .473 .483 64 228 .474 .486 54 84 .501 .499 - 225 .501 .505 - 85 .527 .517 65 231 .526 .511 42 77 .552 .524 46 212 .550 .534 67 52 .576 .534 44 165 .574 .535 48 38 .599 .535 35 152 .600 .566 66 20 .626 .567 53 94 .625 .580 64 6 .645 .574 51 36 .649 .591 61 1 .667 .568 41 22 .674 .613 65 2 .699 .610 55 10 .697 .621 61 1 .716 .574 34 4 .719 .599 45 2 .752 .677 70
The pct column shows the amount of retained winning percentage from .500 from the previous season. For example, in the 1981–2004 period, teams from . 545 to .555 averaged .552 the first year and .524 the second, and .024 divided by .052 is 46%. This shows that teams tend to fall back to .500 in the following season. If you have only the previous year’s percentage to go on, your best estimate for the current season is about halfway between .500 and the previous year. The effect in recent years has been a bit more drastic in a smaller sample. In statistical terms, this is called regression to the mean.
Of course, we do have more information. We have the performance over past years. However, this does not change things very much. Multivariate linear regression is a tool that gives you an exact answer for the weights of each variable, to give a minimum prediction error for a given sample. The formula is current year = m1 x year1 + m2 x year2, etc., plus a constant, called the intercept. Here are the values for one-, two-, and three-year tests when tested on the years 1981–2004:
m1 m2 m3 intercept correlation sigma 1 year .437 .282 44% .0624 2 year .369 .147 .243 46% .0618 3 year .367 .142 .012 .240 46% .0619
This shows that using the past two years results in a slight improvement over just one year, and that three years makes no difference. If you want to predict a team’s winning percentage for next year, take .369 times this year, plus .147 times last year, plus .243.
The 66 teams that played .550 or better for two years between 1981 and 2004 fared no better. Their average percentages for the three years were .598, .602, and .558, where the general formula would predict .147 x.598 + .369 x .602 + . 242, or .552, less than one game different from the actual and well within expected differences (1 sigma for a sample of 66 teams would be .061 over the square root of 66, or .008).
Some people say that runs scored and runs allowed is a better predictor of future team performance than wins and losses, which seems logical. However, at least for the years 1981–2004, the difference is negligible. Here we used runs less opponent runs per game for past performance, instead of team winning percentage:
m1 m2 m3 intercept correlation sigma 1 year .0512 .500 47% .0615 2 year .0447 .0126 .501 48% .0611 3 year .0447 .0124 .0006 .500 48% .0612
Because it takes about 10 runs to produce one win, the multipliers are about one-tenth of the values in the wins calculation. So, if a team outscored its opponent by 50, 100, and then 150 runs, that would be .309, .617, and .926 delta runs per game. The predicted winning percentage using the three-year model would be .549. However, the correlation increases from 46% to only 48%, and the standard deviation between predicted and actual percentage is reduced from .0619 to .0612, less than one percentage point, or about a one-eighth win. Again, only two years of data are needed—even one year works pretty well. The one-year model would predict .547 (calculated as .0517 x.926 + .500).
The actual sigma between wins compared to the previous year (normalized to the 162-game schedule to account for strikes) was 11.7 for the years 1981–2004, where the difference due to chance was 9 (the square root of 2 times 6. 36). Thus, the true difference between teams from year to year is the square root of 11.72–92, or 7.5. This says that the wins difference from year to year is actually more from luck than from skill.
In the years 1981–2004, 2.5% of the 632 teams, or 16 teams, would expect to lose 18 or more games than they did in the previous year by chance. Thirtyfive teams actually did that, meaning that if a team increases its loss total by 18 or more games over two seasons, there is a 50–50 chance it is just bad luck. Only 11 of the managers involved did not start the following year, indicating reasonably that they are usually given another chance. However, in 2004, three of the four managers had already been fired, so maybe patience is wearing thin these days.
A good second-half performance is supposed to bode well for the following season. In reality, second-half performance does correlate slightly better with the following year than with the first half, but overall performance was still better, mainly because it is a larger sample:
m1 m2 intercept correlation sigma 1st half .303 .348 34% .0661 2nd half .366 .317 43% .0633 each half .144 .283 .284 45% .0622 full season .437 .282 44% .0624
Counting both halves separately gave an insignificant improvement over the season as a whole.
How close should an expert be able to predict future performance? We know it should be better than 11.7 wins, which you can get by picking each team to win the same number of games as it did in the previous year. The simple two-season formula using wins or runs gets you down to 10.0 (.0618 x 162). Looking at Las Vegas over and under wins for the season from 1996 to 2004, we get an error of 9.4. This prediction includes trades, free agent gains or losses, potential rookies, players who retired or might be coming back from injury, or aging of current players—all of which is quite a bit more information than just the team record for the past two years. Because overall predictability in that period has been slightly better than average, only 11.5 wins from the previous year instead of 11.7, the actual number for the 23-year period is probably a bit higher. Anyway, a prediction off by 11 wins is pretty bad, 10 is fair, and anything close to 9 is super.
I write my programs in BASIC and I use my own datafiles, which I have accumulated over the years from official sources. I originally made my team file on punch cards, so my datafiles are text-file card images. I used these to create Total Baseball editions 1 through 7 (Total Sports); a newly created database from the same sources was used for The 2005 ESPN Baseball Encyclopedia by Pete Palmer and Gary Gillette (Sterling). (You can use the team statistics from the Baseball DataBank or Baseball Archive databases described in “Get a MySQL Database of Player and Team Statistics” [Hack #10] .)
For each year, set SSQ and N to zero, then read in each team’s wins and losses, setting SSQ = SSQ + ((W -L) / 2) ** 2 and N = N + 1. If a team wins 91 games and loses 71, W -L is 20, but this is 10 games above the average of 81. Then I found the total standard deviation for the year by taking SQRT (SSQ/N). Say SSQ equals 2,016 and N equals 14, for example. Then the total sigma is SQRT (2,016/14), or 12. The standard deviation expected by chance from the binomial distribution is SQRT (P xQ x N), where P is the probability of success (set to .5), Q is the probability of failure (set to 1–P, or also .5), and N is the number of games (usually 162). So, this would be SQRT (40.5), or 6.36. The standard deviation due to skill would be SQRT (sigma (total) ** 2–sigma (chance) ** 2) or SQRT (144–40.5), or 10.2.
I have a FORTRAN multivariate linear regression program I copied out of an old IBM manual in the 1970s. It works fine, but it is pretty complicated and runs about 300 lines, including a frontend that allows using variable-size arrays. Many sites on the Web have statistical utility programs that you can use.
I will list my program for reference:
PROGRAM PALMR C CHANGED TO READ IN DATA DIRECTLY WITHOUT MANIPULATION DIMENSION RX1(2500),R(2500),RX(2500) CHARACTER*3 LN(50) CHARACTER*12 OFILE INTEGER*2 Y [HUGE] (1500,50) REAL*4 X [HUGE] ( 75000) DATA Y / 75000*0/ OPEN(UNIT=5,FILE='REGRESS.III',STATUS='OLD') C READ IN OUTPUT FILE READ(5,102) OFILE 102 FORMAT(A12) OPEN(UNIT=6,FILE=OFILE) WRITE(6,*) OFILE C READ IN NUMBER OF SAMPLES, NO OF VARIABLES READ(5,104) N,M 104 FORMAT(50I5) C READ IN VARIABLES NAMES READ(5,105) (LN(I),I=1,M) 105 FORMAT(20(2X,A3)) C READ IN DATA (1500 SAMPLES MAX) DO 2000 N=1,1500 READ(5,104,END=3000) (Y(N,K),K=1,M) 2000 CONTINUE 3000 CONTINUE N=N-1 WRITE(0,104) M,N DO 9500 K=1,M DO 9500 J=1,N X(J+N*(K-1))=Y(J,K) 9500 CONTINUE WRITE(0,104) 99,99 CALL PALM1(M,N,X,RX1,R,RX,LN) CLOSE(UNIT=5) CLOSE(UNIT=6,STATUS='KEEP') STOP END SUBROUTINE PALM1(M,N,X[HUGE],RX1,R,RX,LN) DIMENSION XBAR(50),STD(50),D(50),B(50),T(50),ISAVE(50), X RY(50),L(50),MM(50),SB(50),ANS(10), *RX1(1),R(1),RX(1),X(1) CHARACTER*3 LN(1) CHARACTER*12 ANN(10) DATA ANN / X12HINTERCEPT , X12HCORRELATION , X12H , X12H , X12HDEP VAR , X12H , X12H , X12HDEG/FREEDOM , X12H , X12H / IO=1 CALL CORRE(N,M,IO,X,XBAR,STD,RX1,R,D,B,T) NDEP=M K=M-1 DO 130 I=1,K 130 ISAVE(I)=I CALL ORDER(M,R,NDEP,K,ISAVE,RX,RY) WRITE(6,*) 'CROSS CORRELATION' WRITE(6,12) (LN(I),I=1,M-1) 12 FORMAT(1H ,10(5X,A3)) I1=1 DO 132 I=1,K I2=I*K WRITE(6,15) (RX(J),J=I1,I2) I1=I2+1 132 CONTINUE WRITE(6,13) ' CORRELATION WITH ',LN(M) 13 FORMAT(A18,A3) WRITE(6,12) (LN(I),I=1,M-1) WRITE(6,15) (RY(I),I=1,K) 15 FORMAT(1H ,10F8.5) CALL MINV(RX,K,DD,L,MM) IF(DD)140,135,140 135 WRITE(6,20) 20 FORMAT(1H ,3HD=0) RETURN 140 CALL MULTR(N,K,XBAR,STD,D,RX,RY,ISAVE,B,SB,T,ANS) WRITE(6,*)' N ITEM M SIGMA T' WRITE(6,30)(I,LN(I),B(I),SB(I),T(I),I=1,K) 30 FORMAT(1H ,I3,3X,A5,F10.5,F10.5,F10.5) WRITE(6,*)' N ITEM MEAN SIGMA' WRITE(6,31)(I,LN(I),XBAR(I),STD(I),I=1,M) 31 FORMAT(1H ,I3,3X,A5,F20.5,F20.5) WRITE(6,32)(ANN(K),ANS(K),K=1,10) 32 FORMAT(1H ,A12,F20.10) RETURN END SUBROUTINE CORRE(N,M,IO,X[HUGE],XBAR,STD,RX,R,B,D,T) DIMENSION X(1),XBAR(1),STD(1),RX(1),R(1),B(1),D(1),T(1) DO 100 J=1,M B(J)=0.0 100 T(J)=0.0 K=(M*M+M)/2 DO 102 I=1,K 102 R(I)=0.0 FN=N L=0 IF(IO)105,127,105 105 DO 108 J=1,M DO 107 I=1,N L=L+1 107 T(J)=T(J)+X(L) XBAR(J)=T(J) 108 T(J)=T(J)/FN DO 115 I=1,N JK=0 L=I-N DO 110 J=1,M L=L+N D(J)=X(L)-T(J) 110 B(J)=B(J)+D(J) DO 115 J=1,M DO 115 K=1,J JK=JK+1 115 R(JK)=R(JK)+D(J)*D(K) GO TO 205 127 GO TO 205 205 JK=0 DO 210 J=1,M XBAR(J)=XBAR(J)/FN DO 210 K=1,J JK=JK+1 210 R(JK)=R(JK)-B(J)*B(K)/FN JK=0 DO 220 J=1,M JK=JK+J 220 STD(J)=SQRT(ABS(R(JK))) DO 230 J=1,M DO 230 K=J,M JK=J+(K*K-K)/2 L=M*(J-1)+K RX(L)=R(JK) L=M*(K-1)+J RX(L)=R(JK) IF(STD(J)*STD(K))225,222,225 222 R(JK)=0.0 GO TO 230 225 R(JK)=R(JK)/(STD(J)*STD(K)) 230 CONTINUE FN=SQRT(FN-1.0) DO 240 J=1,M 240 STD(J)=STD(J)/FN L=-M DO 250 I=1,M L=L+M+1 250 B(I)=RX(L) RETURN END SUBROUTINE ORDER (M,R,NDEP,K,ISAVE,RX,RY) DIMENSION R(1),ISAVE(1),RX(1),RY(1) MM=0 DO 130 J=1,K L2=ISAVE(J) IF(NDEP-L2)122,123,123 122 L=NDEP+(L2*L2-L2)/2 GO TO 125 123 L=L2+(NDEP*NDEP-NDEP)/2 125 RY(J)=R(L) DO 130 I=1,K L1=ISAVE(I) IF(L1-L2)127,128,128 127 L=L1+(L2*L2-L2)/2 GO TO 129 128 L=L2+(L1*L1-L1)/2 129 MM=MM+1 130 RX(MM)=R(L) ISAVE(K+1)=NDEP RETURN END SUBROUTINE MINV(A,N,D,L,M) DIMENSION A(1),L(1),M(1) D=1.0 NK=-N DO 80 K=1,N NK=NK+N L(K)=K M(K)=K KK=NK+K BIGA=A(KK) DO 20 J=K,N IZ=N*(J-1) DO 20 I=K,N IJ=IZ+I IF(ABS(BIGA)-ABS(A(IJ)))15,20,20 15 BIGA=A(IJ) L(K)=I M(K)=J 20 CONTINUE J=L(K) IF(J-K)35,35,25 25 KI=K-N DO 30 I=1,N KI=KI+N HOLD=-A(KI) JI=KI-K+J A(KI)=A(JI) 30 A(JI)=HOLD 35 I=M(K) IF(I-K)45,45,38 38 JP=N*(I-1) DO 40 J=1,N JK=NK+J JI=JP+J HOLD=-A(JK) A(JK)=A(JI) 40 A(JI)=HOLD 45 IF(BIGA)48,46,48 46 D=0.0 RETURN 48 DO 55 I=1,N IF(I-K)50,55,50 50 IK=NK+I A(IK)=A(IK)/(-BIGA) 55 CONTINUE DO 65 I=1,N IK=NK+I HOLD=A(IK) IJ=I-N DO 65 J=1,N IJ=IJ+N IF(I-K)60,65,60 60 IF(J-K)62,65,62 62 KJ=IJ-I+K A(IJ)=HOLD*A(KJ)+A(IJ) 65 CONTINUE KJ=K-N DO 75 J=1,N KJ=KJ+N IF(J-K)70,75,70 70 A(KJ)=A(KJ)/BIGA 75 CONTINUE D=D*BIGA A(KK)=1.0/BIGA 80 CONTINUE K=N 100 K=(K-1) IF(K)150,150,105 105 I=L(K) IF(I-K)120,120,108 108 JQ=N*(K-1) JR=N*(I-1) DO 110 J=1,N JK=JQ+J HOLD=A(JK) JI=JR+J A(JK)=-A(JI) 110 A(JI)=HOLD 120 J=M(K) IF(J-K)100,100,125 125 KI=K-N DO 130 I=1,N KI=KI+N HOLD=A(KI) JI=KI-K+J A(KI)=-A(JI) 130 A(JI)=HOLD GO TO 100 150 RETURN END SUBROUTINE MULTR(N,K,XBAR,STD,D,RX,RY,ISAVE,B,SB,T,ANS) DIMENSION XBAR(1),STD(1),D(1),RX(1),RY(1),ISAVE(1), *B(1),SB(1),T(1),ANS(1) MM=K+1 DO 100 J=1,K 100 B(J)=0.0 DO 110 J=1,K L1=K*(J-1) DO 110 I=1,K L=L1+I 110 B(J)=B(J)+RY(I)*RX(L) RM=0.0 BO=0.0 L1=ISAVE(MM) DO 120 I=1,K RM=RM+B(I)*RY(I) L=ISAVE(I) B(I)=B(I)*(STD(L1)/STD(L)) 120 BO=BO+B(I)*XBAR(L) BO=XBAR(L1)-BO SSAR=RM*D(L1) RM=SQRT(ABS(RM)) SSDR=D(L1)-SSAR FN=N-K-1 SY=SSDR/FN DO 130 J=1,K L1=K*(J-1)+J L=ISAVE(J) SB(J)=SQRT(ABS((RX(L1)/D(L))*SY)) 130 T(J)=B(J)/SB(J) SY=SQRT(ABS(SY)) FK=K SSARM=SSAR/FK SSDRM=SSDR/FN F=SSARM/SSDRM ANS(1)=BO ANS(2)=RM ANS(3)=SY ANS(4)=SSAR ANS(5)=FK ANS(6)=SSARM ANS(7)=SSDR ANS(8)=FN ANS(9)=SSDRM ANS(10)=F RETURN END
—Pete Palmer
18.222.22.216