Standard deviation

Standard deviation is a measurement of how values are spread around the mean. A high deviation means that there is a wide spread, whereas a low deviation means that the values are more tightly grouped around the mean. This measurement can be misleading if there is not a single focus point or there are numerous outliers.

We begin by showing a simple example using basic Java techniques. We are using our testData array from previous examples, duplicated here:

double[] testData = {12.5, 18.3, 11.2, 19.0, 22.1, 14.3, 16.2, 12.5,
   17.8, 16.5, 11.2}; 
 

Before we can calculate the standard deviation, we need to find the average. We could use any of our techniques listed in the Calculating the mean section, but we will add up our values and divide by the length of testData for simplicity's sake:

int sum = 0; 
for(double value : testData){ 
   sum += value; 
} 
double mean = sum/testData.length; 
  

Next, we create a variable, sdSum, to help us calculate the standard deviation. As we loop through our array, we subtract the mean from each data value, square that value, and add it to sdSum. Finally, we divide sdSum by the length of the array and square that result:

int sdSum = 0; 
for (double value : testData){ 
    sdSum += Math.pow((value - mean), 2); 
} 
out.println("The standard deviation is " +   
Math.sqrt( sdSum / ( testData.length ) )); 

Our output is our standard deviation:

The standard deviation is 3.3166247903554

Our next technique uses Google Guava's Stats class to calculate the standard deviation. We start by creating a Stats object with our testData. We then call the populationStandardDeviation method:

Stats testStats = Stats.of(testData); 
double sd = testStats.populationStandardDeviation(); 
out.println("The standard deviation is " + sd); 
 

The output is as follows:

The standard deviation is 3.3943803826056653

This example calculates the standard deviation of an entire population. Sometimes it is preferable to calculate the standard deviation of a sample subset of a population, to correct possible bias. To accomplish this, we use essentially the same code as before but replace the populationStandardDeviation method with sampleStandardDeviation:

Stats testStats = Stats.of(testData); 
double sd = testStats.sampleStandardDeviation(); 
out.println("The standard deviation is " + sd); 

In this case, our output is:

The sample standard deviation is 3.560056179332006

Our next example uses the Apache Commons  DescriptiveStatistics class, which we used to calculate the mean and median in previous examples. Remember, this technique has the advantage of being thread safe and synchronized. After we create a SynchronizedDescriptiveStatistics object, we add each value from the array. We then call the getStandardDeviation method.

DescriptiveStatistics statTest =  
    new SynchronizedDescriptiveStatistics(); 
for(double num : testData){ 
   statTest.addValue(num); 
} 
out.println("The standard deviation is " +  
statTest.getStandardDeviation()); 

Notice the output matches our output from our previous example. The getStandardDeviation method by default returns the standard deviation adjusted for a sample:

The standard deviation is 3.5600561793320065

We can, however, continue using Apache Commons to calculate the standard deviation in either form. The StandardDeviation class allows you to calculate the population standard deviation or subset standard deviation. To demonstrate the differences, replace the previous code example with the following:

StandardDeviation sdSubset = new StandardDeviation(false); 
out.println("The population standard deviation is " +  
sdSubset.evaluate(testData)); 
 
StandardDeviation sdPopulation = new StandardDeviation(true); 
out.println("The sample standard deviation is " +  
sdPopulation.evaluate(testData)); 

On the first line, we created a new StandardDeviation object and set our constructor's parameter to false, which will produce the standard deviation of a population. The second section uses a value of true, which produces the standard deviation of a sample. In our example, we used the same test dataset. This means we were first treating it as though it were a subset of a population of data. In our second example we assumed that our dataset was the entire population of data. In reality, you would might not use the same set of data with each of those methods. The output is as follows:

The population standard deviation is 3.3943803826056653
The sample standard deviation is 3.560056179332006

The preferred option will depend upon your sample and particular analyzation needs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.202.240