Measure Skill Versus Luck

How many wins can you attribute to skill, and how many to luck, in the course of a baseball season?

Baseball is a game of luck and skill, and, fortunately, we can measure both quite accurately. The amount of luck involved is probably higher than most people think, and it’s higher than the Lords of Baseball would like to admit.

First, let’s look at team performance over a season. Take 16 evenly balanced teams that have an equal likelihood of winning or losing any game. The actual results should show a normal distribution with a mean of .500, with some variation. A common measure of variation is the standard deviation, called sigma, which you can calculate by simply taking the sum of the squares of the difference between the team’s actual wins and the expected value, which would be 81 wins in a 162-game season. For a simple example, take four teams with wins of 89, 84, 77, and 73. The mean is 81 and the sum of the squares is 64 + 16 + 16 + 64, or 160. Divide 160 by the number of teams (4) and take the square root, and you get ≈ 6.

Calculating the expected sigma for team wins involves a formula derived for a binomial distribution. The formula is the square root of p x q x n, where p is the probability of success (.5), q is the probability of failure (also .5), and n is the number of samples (162). This gives a value of the square root of .25 x 162 (40.5), or 6.36.

Another characteristic of sigma in a normal distribution is that approximately two-thirds of the values differ from the mean by an amount less than or equal to it, and about 5% will be off by more than double. Only about 1 in 400 will stray more than 3 sigma from the mean, indicating that over the course of a season, 1 team in 20 might be expected to have a win total of either less than 69 or greater than 93.

Now, we can measure exactly the actual standard deviation of team wins for any season or seasons, and we know exactly what the expected variation due to chance or luck should be. Because there is no reason to suspect that the two are correlated, we can say with confidence that the variation due to skill is equal to the square root of the total variation squared, minus the variation due to luck squared.

Let’s look at some numbers. Here is the standard deviation in wins per season for various periods:

	period         total     luck      skill
	1901-1920       15.0      6.1       13.8
	1921-1940       14.3      6.2       12.8
	1941-1960       13.6      6.2       12.0
	1961-1980       12.0      6.4       10.0
	1981-1997        9.8      6.3        7.5
	1998-2004       12.7      6.4       10.9

For 1998–2004, 6.42 + 10.92 = 12.72. As you can see, the difference in team skill was reduced almost equal to that of luck for an entire season in the 1980s, but recently it has gone back to historical levels. Of the top 13 variations by league since 1981, 10 have occurred in the 14 league seasons from 1998 to 2004, and only 3 in the 34 league seasons from 1981 to 1997. This variation might be due to a higher spread in team salaries. Subtracting expansion, when variation understandably shoots up, it is still 8 to 2 in favor of the current period.

An interesting question concerns the probability of the best team coming in first over the course of a season. Unfortunately, that question cannot be answered exactly, except for a specific definition of preseason win probabilities for all teams. You can calculate exactly the chances of one team beating another if you know the win probabilities of each team, but if they are the top two teams, the chances of another team beating them both is a function of the league as a whole. Using the normal distribution, here are the values for one team beating another. Because two teams are involved, the standard deviation on the expected difference in wins between the two teams is the square root of 2 times the one-team deviation, which is 9.

	difference in expected wins       probability team a will
	between team a and team b          finish ahead of team b
	      0                              50%
	      1                              54%
	      2                              59%
	      4                              67%
	      6                              75%
	      8                              81%
	     10                              86%
	     15                              95%
	     21                              99%

The best way to determine the probability of the best team coming in first in a season is to set the preseason win probabilities and run a simulation. To do this, you need to know the probability of any one team beating any other team.

Because team win percentages are almost always between .400 and .600, a reasonable rule of thumb is that the probability of Team A beating Team B in a given game is win percent (a) minus win percent (b) plus .500. If you want to add in a home team advantage, a typical value is to add .040 for the home team, based on actual home winning percentage data for 1961 through today.

However, if you use the actual winning percentage for each team for the season being studied, you will get a variation that is too high in wins in the simulation because the actual winning percentages already include a luck factor that tends to spread the team further from the mean. For example, in 1939 the Yankees set an all-time record for runs scored minus runs allowed (even without Lou Gehrig) and coasted to their fourth pennant in a row. Using actual team winning percentages gives a sigma of about 24 wins for the distribution of all team wins in the simulation of 1,000 seasons, where the actual number was only 19. To get a sigma of 19 in the simulation, you have to use only 80% of the team difference in percentage from .500. So, a .600 team would be simulated at .580, and a .400 team would be simulated at .420. In general, the fraction of actual winning percentage used is sigma skill over sigma total from the first table. This is usually around 80%, except for 1981–1997, when it was closer to 70%.

This raises an important point. Because a good amount of luck is involved, even over the course of a full season, and teams will tend to win 5 or 10 games more or less than what they would win if they achieved their exact expected number of wins based on true talent, teams with high win totals will tend to have more good luck and teams with low win totals will tend to have more bad luck. One way of looking at this is that a .600 team could be a .550 team with good luck or a .650 team with bad luck. However, because there are many more .550 teams, the odds are that it is the former.

You can show this even more dramatically when you look at team winning percentages from one season to the next. Here, the effect is intensified because the factors that brought a team to a successful season can deteriorate, and the expected change due to luck would tend to be neutral. Here are the results of season winning percentage based solely on the previous season:

	     1981-2004                1902-2004  
	   n prev curr pct       n  prev curr  pct
	                         5  .248 .374   50
	                         9  .277 .384   52
	   1 .265 .444 24       13  .301 .358   72
	   3 .332 .473 16       36  .326 .402   56
	  11 .347 .407 61       71  .349 .399   67
	  17 .375 .459 33       74  .375 .419   64
	  40 .399 .461 39      123  .400 .439   61
	  52 .423 .462 48      144  .424 .445   73
	  70 .449 .482 36      188  .449 .483   34
	 102 .473 .483 64      228  .474 .486   54
	  84 .501 .499 -       225  .501 .505   -
	  85 .527 .517 65      231  .526 .511   42
	  77 .552 .524 46      212  .550 .534   67
	  52 .576 .534 44      165  .574 .535   48
	  38 .599 .535 35      152  .600 .566   66
	  20 .626 .567 53       94  .625 .580   64
	   6 .645 .574 51       36  .649 .591   61
	   1 .667 .568 41       22  .674 .613   65
	   2 .699 .610 55       10  .697 .621   61
	   1 .716 .574 34        4  .719 .599   45
	                         2  .752 .677   70

The pct column shows the amount of retained winning percentage from .500 from the previous season. For example, in the 1981–2004 period, teams from . 545 to .555 averaged .552 the first year and .524 the second, and .024 divided by .052 is 46%. This shows that teams tend to fall back to .500 in the following season. If you have only the previous year’s percentage to go on, your best estimate for the current season is about halfway between .500 and the previous year. The effect in recent years has been a bit more drastic in a smaller sample. In statistical terms, this is called regression to the mean.

Of course, we do have more information. We have the performance over past years. However, this does not change things very much. Multivariate linear regression is a tool that gives you an exact answer for the weights of each variable, to give a minimum prediction error for a given sample. The formula is current year = m1 x year1 + m2 x year2, etc., plus a constant, called the intercept. Here are the values for one-, two-, and three-year tests when tested on the years 1981–2004:

	         m1   m2   m3   intercept correlation sigma
	1 year .437                  .282         44% .0624
	2 year .369 .147             .243         46% .0618
	3 year .367 .142 .012        .240         46% .0619

This shows that using the past two years results in a slight improvement over just one year, and that three years makes no difference. If you want to predict a team’s winning percentage for next year, take .369 times this year, plus .147 times last year, plus .243.

The 66 teams that played .550 or better for two years between 1981 and 2004 fared no better. Their average percentages for the three years were .598, .602, and .558, where the general formula would predict .147 x.598 + .369 x .602 + . 242, or .552, less than one game different from the actual and well within expected differences (1 sigma for a sample of 66 teams would be .061 over the square root of 66, or .008).

Some people say that runs scored and runs allowed is a better predictor of future team performance than wins and losses, which seems logical. However, at least for the years 1981–2004, the difference is negligible. Here we used runs less opponent runs per game for past performance, instead of team winning percentage:

	        m1   m2    m3    intercept  correlation  sigma
	1 year .0512             .500       47%          .0615
	2 year .0447 .0126       .501       48%          .0611
	3 year .0447 .0124 .0006 .500       48%          .0612

Because it takes about 10 runs to produce one win, the multipliers are about one-tenth of the values in the wins calculation. So, if a team outscored its opponent by 50, 100, and then 150 runs, that would be .309, .617, and .926 delta runs per game. The predicted winning percentage using the three-year model would be .549. However, the correlation increases from 46% to only 48%, and the standard deviation between predicted and actual percentage is reduced from .0619 to .0612, less than one percentage point, or about a one-eighth win. Again, only two years of data are needed—even one year works pretty well. The one-year model would predict .547 (calculated as .0517 x.926 + .500).

The actual sigma between wins compared to the previous year (normalized to the 162-game schedule to account for strikes) was 11.7 for the years 1981–2004, where the difference due to chance was 9 (the square root of 2 times 6. 36). Thus, the true difference between teams from year to year is the square root of 11.72–92, or 7.5. This says that the wins difference from year to year is actually more from luck than from skill.

In the years 1981–2004, 2.5% of the 632 teams, or 16 teams, would expect to lose 18 or more games than they did in the previous year by chance. Thirtyfive teams actually did that, meaning that if a team increases its loss total by 18 or more games over two seasons, there is a 50–50 chance it is just bad luck. Only 11 of the managers involved did not start the following year, indicating reasonably that they are usually given another chance. However, in 2004, three of the four managers had already been fired, so maybe patience is wearing thin these days.

A good second-half performance is supposed to bode well for the following season. In reality, second-half performance does correlate slightly better with the following year than with the first half, but overall performance was still better, mainly because it is a larger sample:

	             m1     m2     intercept  correlation   sigma
	1st half     .303          .348       34%           .0661
	2nd half     .366          .317       43%           .0633
	each half    .144   .283   .284       45%           .0622
	full season  .437          .282       44%           .0624

Counting both halves separately gave an insignificant improvement over the season as a whole.

How close should an expert be able to predict future performance? We know it should be better than 11.7 wins, which you can get by picking each team to win the same number of games as it did in the previous year. The simple two-season formula using wins or runs gets you down to 10.0 (.0618 x 162). Looking at Las Vegas over and under wins for the season from 1996 to 2004, we get an error of 9.4. This prediction includes trades, free agent gains or losses, potential rookies, players who retired or might be coming back from injury, or aging of current players—all of which is quite a bit more information than just the team record for the past two years. Because overall predictability in that period has been slightly better than average, only 11.5 wins from the previous year instead of 11.7, the actual number for the 23-year period is probably a bit higher. Anyway, a prediction off by 11 wins is pretty bad, 10 is fair, and anything close to 9 is super.

The Code

I write my programs in BASIC and I use my own datafiles, which I have accumulated over the years from official sources. I originally made my team file on punch cards, so my datafiles are text-file card images. I used these to create Total Baseball editions 1 through 7 (Total Sports); a newly created database from the same sources was used for The 2005 ESPN Baseball Encyclopedia by Pete Palmer and Gary Gillette (Sterling). (You can use the team statistics from the Baseball DataBank or Baseball Archive databases described in “Get a MySQL Database of Player and Team Statistics” [Hack #10] .)

For each year, set SSQ and N to zero, then read in each team’s wins and losses, setting SSQ = SSQ + ((W -L) / 2) ** 2 and N = N + 1. If a team wins 91 games and loses 71, W -L is 20, but this is 10 games above the average of 81. Then I found the total standard deviation for the year by taking SQRT (SSQ/N). Say SSQ equals 2,016 and N equals 14, for example. Then the total sigma is SQRT (2,016/14), or 12. The standard deviation expected by chance from the binomial distribution is SQRT (P xQ x N), where P is the probability of success (set to .5), Q is the probability of failure (set to 1–P, or also .5), and N is the number of games (usually 162). So, this would be SQRT (40.5), or 6.36. The standard deviation due to skill would be SQRT (sigma (total) ** 2–sigma (chance) ** 2) or SQRT (144–40.5), or 10.2.

I have a FORTRAN multivariate linear regression program I copied out of an old IBM manual in the 1970s. It works fine, but it is pretty complicated and runs about 300 lines, including a frontend that allows using variable-size arrays. Many sites on the Web have statistical utility programs that you can use.

I will list my program for reference:

	    PROGRAM PALMR
	C    CHANGED TO READ IN DATA DIRECTLY WITHOUT MANIPULATION
	    DIMENSION RX1(2500),R(2500),RX(2500)
	    CHARACTER*3 LN(50)
	    CHARACTER*12 OFILE
	    INTEGER*2 Y [HUGE] (1500,50)
	    REAL*4 X [HUGE] ( 75000)
	    DATA Y / 75000*0/
	    OPEN(UNIT=5,FILE='REGRESS.III',STATUS='OLD')
	C    READ IN OUTPUT FILE
	    READ(5,102) OFILE
	102  FORMAT(A12)
	    OPEN(UNIT=6,FILE=OFILE)
	    WRITE(6,*) OFILE
	C    READ IN NUMBER OF SAMPLES, NO OF VARIABLES
	    READ(5,104) N,M
	104  FORMAT(50I5)
	C   READ IN VARIABLES NAMES
	    READ(5,105) (LN(I),I=1,M)
	105  FORMAT(20(2X,A3))
	C   READ IN DATA (1500 SAMPLES MAX)
	    DO 2000 N=1,1500
	    READ(5,104,END=3000) (Y(N,K),K=1,M)
	2000  CONTINUE
	3000  CONTINUE
	    N=N-1
	    WRITE(0,104) M,N
	    DO 9500 K=1,M
	    DO 9500 J=1,N
	    X(J+N*(K-1))=Y(J,K)
	9500  CONTINUE
	    WRITE(0,104) 99,99
	    CALL PALM1(M,N,X,RX1,R,RX,LN)
	    CLOSE(UNIT=5)
	    CLOSE(UNIT=6,STATUS='KEEP')
	    STOP
	    END

	   SUBROUTINE PALM1(M,N,X[HUGE],RX1,R,RX,LN)
	   DIMENSION XBAR(50),STD(50),D(50),B(50),T(50),ISAVE(50),
	   X  RY(50),L(50),MM(50),SB(50),ANS(10),
	   *RX1(1),R(1),RX(1),X(1)
	   CHARACTER*3 LN(1)
	   CHARACTER*12 ANN(10)
	   DATA ANN /
	   X12HINTERCEPT ,
	   X12HCORRELATION ,
	   X12H    ,
	   X12H    ,
	   X12HDEP VAR ,
	   X12H    ,
	   X12H    ,
	   X12HDEG/FREEDOM ,
	   X12H    ,
	   X12H    /
	   IO=1
	   CALL CORRE(N,M,IO,X,XBAR,STD,RX1,R,D,B,T)
	   NDEP=M
	   K=M-1
	   DO 130 I=1,K
	130 ISAVE(I)=I
	   CALL ORDER(M,R,NDEP,K,ISAVE,RX,RY)
	   WRITE(6,*) 'CROSS CORRELATION'
	   WRITE(6,12) (LN(I),I=1,M-1)
	12  FORMAT(1H ,10(5X,A3))
	   I1=1
	   DO 132 I=1,K
	   I2=I*K
	   WRITE(6,15) (RX(J),J=I1,I2)
	   I1=I2+1
	132 CONTINUE
	   WRITE(6,13) ' CORRELATION WITH ',LN(M)
	13  FORMAT(A18,A3)
	   WRITE(6,12) (LN(I),I=1,M-1)
	   WRITE(6,15) (RY(I),I=1,K)
	15  FORMAT(1H ,10F8.5)
	   CALL MINV(RX,K,DD,L,MM)
	   IF(DD)140,135,140
	135  WRITE(6,20)
	20  FORMAT(1H ,3HD=0)
	   RETURN
	140 CALL MULTR(N,K,XBAR,STD,D,RX,RY,ISAVE,B,SB,T,ANS)
	   WRITE(6,*)' N   ITEM   M   SIGMA    T'
	   WRITE(6,30)(I,LN(I),B(I),SB(I),T(I),I=1,K)
	30  FORMAT(1H ,I3,3X,A5,F10.5,F10.5,F10.5)
	   WRITE(6,*)' N   ITEM        MEAN        SIGMA'
	   WRITE(6,31)(I,LN(I),XBAR(I),STD(I),I=1,M)
	31  FORMAT(1H ,I3,3X,A5,F20.5,F20.5)
	   WRITE(6,32)(ANN(K),ANS(K),K=1,10)
	32  FORMAT(1H ,A12,F20.10)
	   RETURN
	   END

	   SUBROUTINE CORRE(N,M,IO,X[HUGE],XBAR,STD,RX,R,B,D,T)
	   DIMENSION X(1),XBAR(1),STD(1),RX(1),R(1),B(1),D(1),T(1)
	   DO 100 J=1,M
	   B(J)=0.0
	100  T(J)=0.0
	   K=(M*M+M)/2
	   DO 102 I=1,K
	102 R(I)=0.0
	   FN=N
	   L=0
	   IF(IO)105,127,105
	105 DO 108 J=1,M
	   DO 107 I=1,N
	   L=L+1
	107 T(J)=T(J)+X(L)
	   XBAR(J)=T(J)
	108 T(J)=T(J)/FN
	   DO 115 I=1,N
	   JK=0
	   L=I-N
	   DO 110 J=1,M
	   L=L+N
	   D(J)=X(L)-T(J)
	110 B(J)=B(J)+D(J)
	   DO 115 J=1,M
	   DO 115 K=1,J
	   JK=JK+1
	115 R(JK)=R(JK)+D(J)*D(K)
	   GO TO 205
	127 GO TO 205
	205 JK=0
	   DO 210 J=1,M
	   XBAR(J)=XBAR(J)/FN
	   DO 210 K=1,J
	   JK=JK+1
	210 R(JK)=R(JK)-B(J)*B(K)/FN
	   JK=0
	   DO 220 J=1,M
	   JK=JK+J
	220 STD(J)=SQRT(ABS(R(JK)))
	   DO 230 J=1,M
	   DO 230 K=J,M
	   JK=J+(K*K-K)/2
	   L=M*(J-1)+K
	   RX(L)=R(JK)
	   L=M*(K-1)+J
	   RX(L)=R(JK)
	   IF(STD(J)*STD(K))225,222,225
	222 R(JK)=0.0
	   GO TO 230
	225 R(JK)=R(JK)/(STD(J)*STD(K))
	230 CONTINUE
	   FN=SQRT(FN-1.0)
	   DO 240 J=1,M
	240 STD(J)=STD(J)/FN
	   L=-M
	   DO 250 I=1,M
	   L=L+M+1
	250 B(I)=RX(L)
	   RETURN
	   END

	   SUBROUTINE ORDER (M,R,NDEP,K,ISAVE,RX,RY)
	   DIMENSION R(1),ISAVE(1),RX(1),RY(1)
	   MM=0
	   DO 130 J=1,K
	   L2=ISAVE(J)
	   IF(NDEP-L2)122,123,123
	122 L=NDEP+(L2*L2-L2)/2
	   GO TO 125
	123 L=L2+(NDEP*NDEP-NDEP)/2
	125 RY(J)=R(L)
	   DO 130 I=1,K
	   L1=ISAVE(I)
	   IF(L1-L2)127,128,128
	127 L=L1+(L2*L2-L2)/2
	   GO TO 129
	128 L=L2+(L1*L1-L1)/2
	129 MM=MM+1
	130 RX(MM)=R(L)
	   ISAVE(K+1)=NDEP
	   RETURN
	   END

	   SUBROUTINE MINV(A,N,D,L,M)
	   DIMENSION A(1),L(1),M(1)
	   D=1.0
	   NK=-N
	   DO 80 K=1,N
	   NK=NK+N
	   L(K)=K
	   M(K)=K
	   KK=NK+K
	   BIGA=A(KK)
	   DO 20 J=K,N
	   IZ=N*(J-1)
	   DO 20 I=K,N
	   IJ=IZ+I
	   IF(ABS(BIGA)-ABS(A(IJ)))15,20,20
	15  BIGA=A(IJ)
	   L(K)=I
	   M(K)=J
	20  CONTINUE
	   J=L(K)
	   IF(J-K)35,35,25
	25  KI=K-N
	   DO 30 I=1,N
	   KI=KI+N
	   HOLD=-A(KI)
	   JI=KI-K+J
	   A(KI)=A(JI)
	30  A(JI)=HOLD
	35  I=M(K)
	   IF(I-K)45,45,38
	38  JP=N*(I-1)
	   DO 40 J=1,N
	   JK=NK+J
	   JI=JP+J
	   HOLD=-A(JK)
	   A(JK)=A(JI)
	40  A(JI)=HOLD
	45  IF(BIGA)48,46,48
	46  D=0.0
	   RETURN
	48  DO 55 I=1,N
	   IF(I-K)50,55,50
	50  IK=NK+I
	   A(IK)=A(IK)/(-BIGA)
	55  CONTINUE
	   DO 65 I=1,N
	   IK=NK+I
	   HOLD=A(IK)
	   IJ=I-N
	   DO 65 J=1,N
	   IJ=IJ+N
	   IF(I-K)60,65,60
	60  IF(J-K)62,65,62
	62  KJ=IJ-I+K
	   A(IJ)=HOLD*A(KJ)+A(IJ)
	65  CONTINUE
	   KJ=K-N
	   DO 75 J=1,N
	   KJ=KJ+N
	   IF(J-K)70,75,70
	70  A(KJ)=A(KJ)/BIGA
	75  CONTINUE
	   D=D*BIGA
	   A(KK)=1.0/BIGA
	80  CONTINUE
	   K=N
	100  K=(K-1)
	   IF(K)150,150,105
	105  I=L(K)
	   IF(I-K)120,120,108
	108  JQ=N*(K-1)
	   JR=N*(I-1)
	   DO 110 J=1,N
	   JK=JQ+J
	   HOLD=A(JK)
	   JI=JR+J
	   A(JK)=-A(JI)
	110  A(JI)=HOLD
	120  J=M(K)
	   IF(J-K)100,100,125
	125  KI=K-N
	    DO 130 I=1,N
	    KI=KI+N
	    HOLD=A(KI)
	    JI=KI-K+J
	    A(KI)=-A(JI)
	130  A(JI)=HOLD
	   GO TO 100
	150  RETURN
	   END

	   SUBROUTINE MULTR(N,K,XBAR,STD,D,RX,RY,ISAVE,B,SB,T,ANS)
	   DIMENSION XBAR(1),STD(1),D(1),RX(1),RY(1),ISAVE(1),
	   *B(1),SB(1),T(1),ANS(1)
	   MM=K+1
	   DO 100 J=1,K
	100  B(J)=0.0
	   DO 110 J=1,K
	   L1=K*(J-1)
	   DO 110 I=1,K
	   L=L1+I
	110  B(J)=B(J)+RY(I)*RX(L)
	   RM=0.0
	   BO=0.0
	   L1=ISAVE(MM)
	   DO 120 I=1,K
	   RM=RM+B(I)*RY(I)
	   L=ISAVE(I)
	   B(I)=B(I)*(STD(L1)/STD(L))
	120  BO=BO+B(I)*XBAR(L)
	   BO=XBAR(L1)-BO
	   SSAR=RM*D(L1)
	   RM=SQRT(ABS(RM))
	   SSDR=D(L1)-SSAR
	   FN=N-K-1
	   SY=SSDR/FN
	   DO 130 J=1,K
	   L1=K*(J-1)+J
	   L=ISAVE(J)
	   SB(J)=SQRT(ABS((RX(L1)/D(L))*SY))
	130  T(J)=B(J)/SB(J)
	   SY=SQRT(ABS(SY))
	   FK=K
	   SSARM=SSAR/FK
	   SSDRM=SSDR/FN
	   F=SSARM/SSDRM
	   ANS(1)=BO
	   ANS(2)=RM
	   ANS(3)=SY
	   ANS(4)=SSAR
	   ANS(5)=FK
	   ANS(6)=SSARM
	   ANS(7)=SSDR
	   ANS(8)=FN
	   ANS(9)=SSDRM
	   ANS(10)=F
	   RETURN
	   END

Pete Palmer

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.22.216