Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Statistics with pandas DataFrames

The pandas DataFrame has a dozen statistical methods. The following table lists these methods along with a short description:

Method	Description
`describe`	This method returns a small table with descriptive statistics.
`count`	This method returns the number of non-NaN items.
`mad`	This method calculates the mean absolute deviation, which is a robust measure similar to the standard deviation.
`median`	This method returns the median. This is equivalent to the value at the 50th percentile.
`min`	This method returns the lowest value.
`max`	This method returns the highest value.
`mode`	This method returns the mode, which is the most frequently occurring value.
`std`	This method returns the standard deviation, which measures dispersion. It is the square root of the variance.
`var`	This method returns the variance.
`skew`	This method returns skewness. Skewness is indicative of the distribution symmetry.
`kurt`	This method returns kurtosis. Kurtosis is indicative of the distribution shape.

Using the same data as in the previous example, we will demonstrate these statistical methods. The full script is in the stats_demo.py of this book's code bundle:

import Quandl

# Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual
# PyPi url https://pypi.python.org/pypi/Quandl
sunspots = Quandl.get("SIDC/SUNSPOTS_A")
print "Describe", sunspots.describe()
print "Non NaN observations", sunspots.count()
print "MAD", sunspots.mad()
print "Median", sunspots.median()
print "Min", sunspots.min()
print "Max", sunspots.max()
print "Mode", sunspots.mode()
print "Standard Deviation", sunspots.std()
print "Variance", sunspots.var()
print "Skewness", sunspots.skew()
print "Kurtosis", sunspots.kurt()

The following is the output of the script:

Describe            Number
count  314.000000
mean    49.528662
std     40.277766
min      0.000000
25%     16.000000
50%     40.000000
75%     69.275000
max    190.200000

[8 rows x 1 columns]
Non NaN observations Number    314
dtype: int64
MAD Number    32.483184
dtype: float64
Median Number    40
dtype: float64
Min Number    0
dtype: float64
Max Number    190.2
dtype: float64
Mode    Number
0      47

[1 rows x 1 columns]
Standard Deviation Number    40.277766
dtype: float64
Variance Number    1622.298473
dtype: float64
Skewness Number    0.994262
dtype: float64
Kurtosis Number    0.469034
dtype: float64

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Statistics with pandas DataFrames

Create new playlist

Sign In

Sign Up

Statistics with pandas DataFrames

Table of Contents for
Statistics with pandas DataFrames