In data science, you’ll often use statistics to describe and summarize your data. Here, we begin by introducing several such descriptive statistics, including:
minimum—the smallest value in a collection of values.
maximum—the largest value in a collection of values.
range—the range of values from the minimum to the maximum.
count—the number of values in a collection.
sum—the total of the values in a collection.
We’ll look at determining the count and sum in the next chapter. Measures of dispersion (also called measures of variability), such as range, help determine how spread out values are. Other measures of dispersion that we’ll present in later chapters include variance and standard deviation.
First, let’s show how to determine the minimum of three values manually. The following script prompts for and inputs three values, uses if
statements to determine the minimum value, then displays it.
After inputting the three values, we process one value at a time:
First, we assume that number1
contains the smallest value, so line 8 assigns it to the variable minimum
. Of course, it’s possible that number2
or number3
contains the actual smallest value, so we still must compare each of these with minimum
.
The first if
statement (lines 10–11) then tests number2 < minimum
and if this condition is True
assigns number2
to minimum
.
The second if
statement (lines 13–14) then tests number3 < minimum
, and if this condition is True
assigns number3
to minimum
.
Now, minimum
contains the smallest value, so we display it. We executed the script three times to show that it always finds the smallest value regardless of whether the user enters it first, second or third.
min
and max
Python has many built-in functions for performing common tasks. Built-in functions min
and max
calculate the minimum and maximum, respectively, of a collection of values:
In [1]: min(36, 27, 12)
Out[1]: 12
In [2]: max(36, 27, 12)
Out[2]: 36
The functions min
and max
can receive any number of arguments.
The range of values is simply the minimum through the maximum value. In this case, the range is 12 through 36. Much data science is devoted to getting to know your data. Descriptive statistics is a crucial part of that, but you also have to understand how to interpret the statistics. For example, if you have 100 numbers with a range of 12 through 36, those numbers could be distributed evenly over that range. At the opposite extreme, you could have clumping with 99 values of 12 and one 36, or one 12 and 99 values of 36. In later data science sections, we’ll look at common data distributions.
Throughout this book, we introduce various functional-style programming capabilities. These enable you to write code that can be more concise, clearer and easier to debug—that is, find and correct errors. The min
and max
functions are examples of a functional-style programming concept called reduction. They reduce a collection of values to a single value. Other reductions you’ll see include the sum, average, variance and standard deviation of a collection of values. You’ll also learn how to define custom reductions.
In the next two chapters, we’ll continue our discussion of basic descriptive statistics with measures of central tendency, including mean, median and mode, and measures of dispersion, including variance and standard deviation.
(Fill-In) The range of a collection of values is a measure of .
Answer: dispersion.
(IPython Session) For the values 47, 95, 88, 73, 88 and 84 calculate the minimum, maximum and range.
Answer:
In [1]: min(47, 95, 88, 73, 88, 84)
Out[1]: 47
In [2]: max(47, 95, 88, 73, 88, 84)
Out[2]: 95
In [3]: print('Range:', min(47, 95, 88, 73, 88, 84), '-',
...: max(47, 95, 88, 73, 88, 84))
...:
Range: 47 - 95
18.117.182.179