Statistics with pandas DataFrames

The pandas DataFrame has a dozen statistical methods. The following table lists these methods along with a short description:

Method

Description

describe

This method returns a small table with descriptive statistics.

count

This method returns the number of non-NaN items.

mad

This method calculates the mean absolute deviation, which is a robust measure similar to the standard deviation.

median

This method returns the median. This is equivalent to the value at the 50th percentile.

min

This method returns the lowest value.

max

This method returns the highest value.

mode

This method returns the mode, which is the most frequently occurring value.

std

This method returns the standard deviation, which measures dispersion. It is the square root of the variance.

var

This method returns the variance.

skew

This method returns skewness. Skewness is indicative of the distribution symmetry.

kurt

This method returns kurtosis. Kurtosis is indicative of the distribution shape.

Using the same data as in the previous example, we will demonstrate these statistical methods. The full script is in the stats_demo.py of this book's code bundle:

import Quandl

# Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual
# PyPi url https://pypi.python.org/pypi/Quandl
sunspots = Quandl.get("SIDC/SUNSPOTS_A")
print "Describe", sunspots.describe()
print "Non NaN observations", sunspots.count()
print "MAD", sunspots.mad()
print "Median", sunspots.median()
print "Min", sunspots.min()
print "Max", sunspots.max()
print "Mode", sunspots.mode()
print "Standard Deviation", sunspots.std()
print "Variance", sunspots.var()
print "Skewness", sunspots.skew()
print "Kurtosis", sunspots.kurt()

The following is the output of the script:

Describe            Number
count  314.000000
mean    49.528662
std     40.277766
min      0.000000
25%     16.000000
50%     40.000000
75%     69.275000
max    190.200000

[8 rows x 1 columns]
Non NaN observations Number    314
dtype: int64
MAD Number    32.483184
dtype: float64
Median Number    40
dtype: float64
Min Number    0
dtype: float64
Max Number    190.2
dtype: float64
Mode    Number
0      47

[1 rows x 1 columns]
Standard Deviation Number    40.277766
dtype: float64
Variance Number    1622.298473
dtype: float64
Skewness Number    0.994262
dtype: float64
Kurtosis Number    0.469034
dtype: float64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.124.145