Basic statistical measures

Let's discuss some basic statistics so that we can utilize some of the statistical functionality provided by the API to the fullest extent. The five basic statistical parameters we may need to understand clearly before proceeding further are:

  • Minimum
  • Maximum
  • Average
  • Standard deviation
  • Standardization

Minimum

As the name suggests, this implies the least value in a dataset. In our case of the block-level household income, the minimum statistic indicates the block with the least median household income.

Maximum

Similar to the minimum, the maximum statistic defines the maximum median household income value among all the blocks considered.

Sum

Sum is a simple yet effective statistic that gives us the total value of all the data being considered.

Average

An Average statistic defines the arithmetic mean value of all the values. An average is derived by dividing the Sum statistic by the count of the data values taken for the calculation.

Average = Sum / Count

Standard deviation

Standard deviation is perhaps the most important statistic that one can derive from any given data. Standard deviation is a measure of how spread out the data are or how much the data deviates from the mean or average. When we know the standard deviation, we can normally observe that:

  • 68% of values are within plus or minus one times the standard deviation from the mean
  • 95% of values are within plus or minus two times standard deviation from the mean
  • 99.7% of values are within three times the standard deviation from the mean

This is based on the fact that most data follows the normal distribution curve. When we order the data and plot the values, the histogram looks like a bell curve.

Standardization

Knowing the concept of standard deviation and mean, we can normalize our data. This process is known as standardization and the statistical measure derived from the process is known as the standard score (z-score). When we have datasets with large values, standardization is an effective way to summarize the data and quantify it.

So to convert any value to a standard score (z-score), we need to first subtract the value from the mean, then divide by the standard deviation.

z-score = (Value – Mean)/Standard_Deviation
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.164.75