Let's discuss some basic statistics so that we can utilize some of the statistical functionality provided by the API to the fullest extent. The five basic statistical parameters we may need to understand clearly before proceeding further are:
As the name suggests, this implies the least value in a dataset. In our case of the block-level household income, the minimum
statistic indicates the block with the least median household income.
Similar to the minimum
, the maximum
statistic defines the maximum median household income value among all the blocks considered.
Sum
is a simple yet effective statistic that gives us the total value of all the data being considered.
An Average
statistic defines the arithmetic mean value of all the values. An average is derived by dividing the Sum
statistic by the count of the data values taken for the calculation.
Average = Sum / Count
Standard deviation is perhaps the most important statistic that one can derive from any given data. Standard deviation is a measure of how spread out the data are or how much the data deviates from the mean or average. When we know the standard deviation, we can normally observe that:
This is based on the fact that most data follows the normal distribution curve. When we order the data and plot the values, the histogram looks like a bell curve.
Knowing the concept of standard deviation and mean, we can normalize our data. This process is known as standardization and the statistical measure derived from the process is known as the standard score (z-score
). When we have datasets with large values, standardization is an effective way to summarize the data and quantify it.
So to convert any value to a standard score (z-score
), we need to first subtract the value from the mean, then divide by the standard deviation.
z-score = (Value – Mean)/Standard_Deviation
18.226.164.75