2.9 Intro to Data Science: Basic Descriptive Statistics

In data science, you’ll often use statistics to describe and summarize your data. Here, we begin by introducing several such descriptive statistics, including:

  • minimum—the smallest value in a collection of values.

  • maximum—the largest value in a collection of values.

  • range—the range of values from the minimum to the maximum.

  • count—the number of values in a collection.

  • sum—the total of the values in a collection.

We’ll look at determining the count and sum in the next chapter. Measures of dispersion (also called measures of variability), such as range, help determine how spread out values are. Other measures of dispersion that we’ll present in later chapters include variance and standard deviation.

Determining the Minimum of Three Values

First, let’s show how to determine the minimum of three values manually. The following script prompts for and inputs three values, uses if statements to determine the minimum value, then displays it.

Fig. 2.2 | Find the minimum of three values.

 1 # fig02_02.py
 2 """Find the minimum of three values."""
 3
 4 number1 = int(input('Enter first integer: '))
 5 number2 = int(input('Enter second integer: '))
 6 number3 = int(input('Enter third integer: '))
 7
 8 minimum = number1
 9
10 if number2 < minimum:
11     minimum = number2
12
13 if number3 < minimum:
14     minimum = number3
15
16 print('Minimum value is', minimum)
Enter first integer: 12
Enter second integer: 27
Enter third integer: 36
Minimum value is 12
Enter first integer: 27
Enter second integer: 12
Enter third integer: 36
Minimum value is 12
Enter first integer: 36
Enter second integer: 27
Enter third integer: 12
Minimum value is 12

After inputting the three values, we process one value at a time:

  • First, we assume that number1 contains the smallest value, so line 8 assigns it to the variable minimum. Of course, it’s possible that number2 or number3 contains the actual smallest value, so we still must compare each of these with minimum.

  • The first if statement (lines 10–11) then tests number2 < minimum and if this condition is True assigns number2 to minimum.

  • The second if statement (lines 13–14) then tests number3 < minimum, and if this condition is True assigns number3 to minimum.

Now, minimum contains the smallest value, so we display it. We executed the script three times to show that it always finds the smallest value regardless of whether the user enters it first, second or third.

Determining the Minimum and Maximum with Built-In Functions min and max

Python has many built-in functions for performing common tasks. Built-in functions min and max calculate the minimum and maximum, respectively, of a collection of values:

In [1]: min(36, 27, 12)
Out[1]: 12

In [2]: max(36, 27, 12)
Out[2]: 36

The functions min and max can receive any number of arguments.

Determining the Range of a Collection of Values

The range of values is simply the minimum through the maximum value. In this case, the range is 12 through 36. Much data science is devoted to getting to know your data. Descriptive statistics is a crucial part of that, but you also have to understand how to interpret the statistics. For example, if you have 100 numbers with a range of 12 through 36, those numbers could be distributed evenly over that range. At the opposite extreme, you could have clumping with 99 values of 12 and one 36, or one 12 and 99 values of 36. In later data science sections, we’ll look at common data distributions.

Functional-Style Programming: Reduction

Throughout this book, we introduce various functional-style programming capabilities. These enable you to write code that can be more concise, clearer and easier to debug—that is, find and correct errors. The min and max functions are examples of a functional-style programming concept called reduction. They reduce a collection of values to a single value. Other reductions you’ll see include the sum, average, variance and standard deviation of a collection of values. You’ll also learn how to define custom reductions.

Upcoming Intro to Data Science Sections

In the next two chapters, we’ll continue our discussion of basic descriptive statistics with measures of central tendency, including mean, median and mode, and measures of dispersion, including variance and standard deviation.

Self Check

  1. (Fill-In) The range of a collection of values is a measure of      .
    Answer: dispersion.

  2. (IPython Session) For the values 47, 95, 88, 73, 88 and 84 calculate the minimum, maximum and range.
    Answer:

    In [1]: min(47, 95, 88, 73, 88, 84)
    Out[1]: 47
    
    In [2]: max(47, 95, 88, 73, 88, 84)
    Out[2]: 95
    
    In [3]: print('Range:', min(47, 95, 88, 73, 88, 84), '-',
       ...:     max(47, 95, 88, 73, 88, 84))
       ...:
    Range: 47 - 95
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.182.179