Chapter 3. Statistics and Linear Algebra

Statistics and linear algebra are branches of mathematics that are especially useful for data analysis. That's why we will focus on them in this chapter. Statistics is needed to make inferences from raw data. For instance, we can compute that the data for a variable has a certain arithmetic mean and standard deviation. From these numbers, we can then infer a range and the expected value for this variable. Then, we can run statistical tests to check how likely it is that we made the right conclusion.

Linear algebra concerns itself with systems of linear equations. These are easy to solve with NumPy and SciPy using the linalg package. Linear algebra is useful, for instance, to fit data to a model. We shall introduce other NumPy and SciPy packages in this chapter for random number generation and masked arrays.

In this chapter, we will cover the following topics:

  • Descriptive statistics
  • The linalg package
  • Polynomials
  • Matrices as specialized ndarray subclasses
  • Random numbers
  • Continuous and discrete distributions
  • Masked arrays

NumPy and SciPy modules

First, let's take a look at the NumPy and SciPy module documentation. What will be described here is not a topic specific to data analysis, but more of a general Python item.

The following code prints the descriptions of subpackages for NumPy and SciPy:

import pkgutil as pu
import numpy as np
import matplotlib as mpl
import scipy as sp
import pydoc


print "NumPy version", np.__version__
print "SciPy version", sp.__version__
print "Matplotlib version", mpl.__version__

def clean(astr):
   s = astr
   # remove multiple spaces
   s = ' '.join(s.split())
   s = s.replace('=','')

   return s

def print_desc(prefix, pkg_path):
   for pkg in pu.iter_modules(path=pkg_path):
      name = prefix + "." + pkg[1]

      if pkg[2] == True:
         try:
            docstr = pydoc.plain(pydoc.render_doc(name))
            docstr = clean(docstr)
            start = docstr.find("DESCRIPTION")
            docstr = docstr[start: start + 140]
            print name, docstr
         except:
            continue

print_desc("numpy", np.__path__)
print
print
print
print_desc("scipy", sp.__path__)

Using the standard Python modules pkgutil and pydoc, we can iterate through subpackages in NumPy and SciPy and extract short descriptions of these subpackages. We will also print the SciPy, matplotlib, and NumPy versions.

The versions for the various software used in this chapter can be obtained from the __version__ attribute of the corresponding module as follows:

print "NumPy version", np.__version__
print "SciPy version", sp.__version__
print "Matplotlib version", mpl.__version__

I have tested the code with the following versions (of course, you don't need to have the exact same versions):

  • NumPy Version 1.9.0.dev-e886943
  • SciPy Version 0.13.2
  • matplotlib Version 1.4.x

We can iterate through subpackages given a path with the iter_modules() function of pkgutil. The result of the function call is a list of tuples containing three elements each. For us, only the second and third elements are interesting right now. The second element contains the name of the subpackage and the third element is a Boolean indicating a subpackage.

for pkg in pu.iter_modules(path=pkg_path):

The pydoc.render_doc() function returns the documentation string for a given subpackage or function. It returns a string that can contains non-printable characters, so we use the pydoc.plain() function to get rid of them. From this string, we will extract a part of the text, following the DESCRIPTION heading (not the whole text to save space).

docstr = pydoc.plain(pydoc.render_doc(name))

The preceding code should make it easy to find information for locally installed Python modules. For NumPy, we get the following subpackage descriptions:

numpy.compat DESCRIPTION This module contains duplicated code from Python itself or 3rd party extensions, which may be included for the following reasons
numpy.core DESCRIPTION Functions - array - NumPy Array construction - zeros - Return an array of all zeros - empty - Return an unitialized array - shap
numpy.distutils 
numpy.doc DESCRIPTION Topical documentation  The following topics are available: - basics - broadcasting - byteswapping - constants - creation - gloss
numpy.f2py 
numpy.fft DESCRIPTION Discrete Fourier Transform (:mod:`numpy.fft`)  .. currentmodule:: numpy.fft Standard FFTs ------------- .. autosummary:: :toctre
numpy.lib DESCRIPTION Basic functions used by several sub-packages and useful to have in the main name-space. Type Handling -------------   iscomplexo
numpy.linalg DESCRIPTION Core Linear Algebra Tools ------------------------- Linear algebra basics: - norm Vector or matrix norm - inv Inverse of a squar
numpy.ma DESCRIPTION  Masked Arrays  Arrays sometimes contain invalid or missing data. When doing operations on such arrays, we wish to suppress inva
numpy.matrixlib 
numpy.polynomial DESCRIPTION Within the documentation for this sub-package, a "finite power series," i.e., a polynomial (also referred to simply as a "series
numpy.random DESCRIPTION  Random Number Generation    Utility functions  random_sample Uniformly distributed floats over ``[0, 1)``. random Alias for `ra
numpy.testing DESCRIPTION This single module should provide all the common functionality for numpy tests in a single location, so that test scripts can ju

For SciPy, we get the following subpackage descriptions:

scipy._build_utils 
scipy.cluster DESCRIPTION  Clustering package (:mod:`scipy.cluster`)  .. currentmodule:: scipy.cluster :mod:`scipy.cluster.vq` Clustering algorithms are u
scipy.constants DESCRIPTION  Constants (:mod:`scipy.constants`)  .. currentmodule:: scipy.constants Physical and mathematical constants and units. Mathemati
scipy.fftpack DESCRIPTION  Discrete Fourier transforms (:mod:`scipy.fftpack`)  Fast Fourier Transforms (FFTs)  .. autosummary:: :toctree: generated/ fft -
scipy.integrate DESCRIPTION  Integration and ODEs (:mod:`scipy.integrate`)  .. currentmodule:: scipy.integrate Integrating functions, given function object 
scipy.interpolate DESCRIPTION  Interpolation (:mod:`scipy.interpolate`)  .. currentmodule:: scipy.interpolate Sub-package for objects used in interpolation. A
scipy.io DESCRIPTION  Input and output (:mod:`scipy.io`)  .. currentmodule:: scipy.io SciPy has many modules, classes, and functions available to rea
scipy.lib DESCRIPTION Python wrappers to external libraries  - lapack -- wrappers for `LAPACK/ATLAS <http://netlib.org/lapack/>`_ libraries - blas -- 
scipy.linalg DESCRIPTION  Linear algebra (:mod:`scipy.linalg`)  .. currentmodule:: scipy.linalg Linear algebra functions. .. seealso:: `numpy.linalg` for
scipy.misc DESCRIPTION  Miscellaneous routines (:mod:`scipy.misc`)  .. currentmodule:: scipy.misc Various utilities that don't have another home. Note 
scipy.ndimage DESCRIPTION  Multi-dimensional image processing (:mod:`scipy.ndimage`)  .. currentmodule:: scipy.ndimage This package contains various funct
scipy.odr DESCRIPTION  Orthogonal distance regression (:mod:`scipy.odr`)  .. currentmodule:: scipy.odr Package Content  .. autosummary:: :toctree: gen
scipy.optimize DESCRIPTION  Optimization and root finding (:mod:`scipy.optimize`)  .. currentmodule:: scipy.optimize Optimization  General-purpose --------
scipy.signal DESCRIPTION  Signal processing (:mod:`scipy.signal`)  .. module:: scipy.signal Convolution  .. autosummary:: :toctree: generated/ convolve -
scipy.sparse DESCRIPTION  Sparse matrices (:mod:`scipy.sparse`)  .. currentmodule:: scipy.sparse SciPy 2-D sparse matrix package for numeric data. Conten
scipy.spatial DESCRIPTION  Spatial algorithms and data structures (:mod:`scipy.spatial`)  .. currentmodule:: scipy.spatial Nearest-neighbor Queries  .. au
scipy.special DESCRIPTION  Special functions (:mod:`scipy.special`)  .. module:: scipy.special Nearly all of the functions below are universal functions a
scipy.stats DESCRIPTION  Statistical functions (:mod:`scipy.stats`)  .. module:: scipy.stats This module contains a large number of probability distribu
scipy.weave DESCRIPTION C/C++ integration  inline -- a function for including C/C++ code within Python blitz -- a function for compiling Numeric express
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.77.208