We usually refer to variance as sigma squared, and you'll find out why momentarily, but for now, just know that variance is the average of the squared differences from the mean.
- To compute the variance of a dataset, you first figure out the mean of it. Let's say I have some data that could represent anything. Let's say maximum number of people that were standing in line for a given hour. In the first hour, I observed 1 person standing in line, then 4, then 5, then 4, then 8.
- The first step in computing the variance is just to find the mean, or the average, of that data. I add them all, divide the sum by the number of data points, and that comes out to 4.4 which is the average number of people standing in line (1+4+5+4+8)/5 = 4.4.
- Now the next step is to find the differences from the mean for each data point. I know that the mean is 4.4. So for my first data point, I have 1, so 1 - 4.4 = -3.4, The next data point is 4, so 4 - 4.4 = -0.4 4 - 4.4 = -0.4, and so on and so forth. OK, so I end up with these both positive and negative numbers that represent the variance from the mean for each data point (-3.4, -0.4, 0.6, -0.4, 3.6).
- Now what I need is a single number that represents the variance of this entire dataset. So, the next thing I'm going to do is find the square of these differences. I'm just going to go through each one of those raw differences from the mean and square them. This is for a couple of different reasons:
-
- First, I want to make sure that negative variances. Count just as much as positive variances. Otherwise, they will cancel each other out. That'd be bad.
- Second, I also want to give more weight to the outliers, so this amplifies the effect of things that are very different from the mean while still, making sure that the negatives and positives are comparable (11.56, 0.16, 0.36, 0.16, 12.96).
Let's look at what happens there, so (-3.4)2 is a positive 11.56 and (-0.4)2 ends up being a much smaller number, that is 0.16, because that's much closer to the mean of 4.4. Also (0.6)2 turned out to be close to the mean, only 0.36. But as we get up to the positive outlier, (3.6)2 ends up being 12.96. That gives us: (11.56, 0.16, 0.36, 0.16, 12.96).
To find the actual variance value, we just take the average of all those squared differences. So we add up all these squared variances, divide the sum by 5, that is number of values that we have, and we end up with a variance of 5.04.
OK, that's all variance is.