Downloading and installing pandas

The pandas library is part of the Python language, so we can now proceed to install pandas. At the time of writing this book, the latest stable version of pandas available is version 0.12. The various dependencies along with the associated download locations are as follows:

Package

Required

Description

Download location

NumPy : 1.6.1 or higher

Required

NumPy library for numerical operations

http://www.numpy.org/

python-dateutil 1.5

Required

Date manipulation and utility library

http://labix.org/

Pytz

Required

Time zone support

http://sourceforge.net/

numexpr

Optional, recommended

Speeding up of numerical operations

https://code.google.com/

bottleneck

Optional, recommended

Performance-related

http://berkeleyanalytics.com/

Cython

Optional, recommended

C-extensions for Python used for optimization

http://cython.org/

SciPy

Optional, recommended

Scientific toolset for Python

http://scipy.org/

PyTables

Optional

Library for HDF5-based storage

http://pytables.github.io/

matplotlib

Optional, recommended

Matlab-like Python plotting library

http://sourceforge.net/

statsmodels

Optional

Statistics module for Python

http://sourceforge.net/

openpyxl

Optional

Library to read/write Excel files

https://www.python.org/

xlrd/xlwt

Optional

Libraries to read/write Excel files

http://python-excel.org/

boto

Optional

Library to access Amazon S3

https://www.python.org/

BeautifulSoup and one of html5lib, lxml

Optional

Libraries needed for the read_html() function to work

http://www.crummy.com/

html5lib

Optional

Library for parsing HTML

https://pypi.python.org/pypi/html5lib

lmxl

Optional

Python library for processing XML and HTML

http://lxml.de/

Linux

Installing pandas is fairly straightforward for popular flavors of Linux. First, make sure that the Python .dev files are installed. If not, then install them as explained in the following section.

Ubuntu/Debian

For the Ubantu/Debian environment, run the following command:

sudo apt-get install python-dev

Red Hat

For the Red Hat environment, run the following command:

yum install python-dev

Now, I will show you how to install pandas.

Ubuntu/Debian

For installing pandas in the Ubuntu/Debian environment, run the following command:

sudo apt-get install python-pandas

Fedora

For Fedora, run the following command:

sudo yum install python-pandas

OpenSuse

Install Python-pandas via YaST Software Management or use the following command:

sudo zypper install python-pandas

Sometimes, additional dependencies may be needed for the preceding installation, particularly in the case of Fedora. In this case, you can try installing additional dependences:

sudo yum install gcc-gfortran gcc44-gfortran libgfortran lapack blas python-devel
sudo python-pip install numpy

Mac

There are a variety of ways to install pandas on Mac OS X. They are explained in the following sections.

Source installation

The pandas have a few dependencies for it to work properly, some are required and the others are optional, although needed for certain desirable features to work properly. This installs all the required dependencies:

  1. Install the easy_install program:
    wget http://python-distribute.org/distribute_setup.pysudo python distribute_setup.py
    
  2. Install Cython
    sudo easy_install -U Cython
    
  3. You can then install from the source code as follows:
          git clone git://github.com/pydata/pandas.git
          cd pandas
          sudo python setup.py install

Binary installation

If you have installed pip as described in the Python installation section, installing pandas is as simple as the following:

pip install pandas

Windows

The following methods describe the installation in the Windows environment.

Binary Installation

Make sure that numpy, python-dateutil, and pytz are installed first. The following commands need to be run for each of these modules:

  • For python-dateutil:
    C:Python27Scriptspip install python-dateutil
    
  • For pytz:
    C:Python27Scriptspip install pytz 
    

Install from the binary download, and run the binary for your version of Windows from https://pypi.python.org/pypi/pandas. For example, if your processor is an AMD64, you can download and install pandas by using the following commands:

  1. Download the following file: (applies to pandas 0.16)
    pandas-0.16.1-cp26-none-win_amd64.whl (md5)
  2. Install the downloaded file via pip:
    pip install  
    pandas-0.16.1-cp26-none-win_amd64.whl

To test the install, run Python and type the following on the command prompt:

import pandas

If it returns with no errors then the installation was successful.

Source installation

The steps here explain the installation completely:

  1. Install the MinGW compiler by following the instructions in the documentation titled Appendix: Installing MinGW on Windows at http://docs.cython.org/src/tutorial/appendix.html.
  2. Make sure that the MingW binary location is added to the PATH variable, that has C:MingWin appended to it.
  3. Install Cython and Numpy.

    Numpy can be downloaded and installed from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy.

    Cython can be downloaded and installed from http://www.lfd.uci.edu/~gohlke/pythonlibs/#cython

The steps to install Cython are as follows:

  • Installation via Pip:
    C:Python27Scriptspip install Cython
  • Direct Download:
    1. Download and install the pandas source from GitHub: http://github.com/pydata/pandas.
    2. You can simply download and extract the zip file to a suitable folder.
    3. Change to the folder containing the pandas download to C:python27python and run setup.py install.
    4. Sometimes, you may obtain the following error when running setup.py:
      distutils.errors.DistutilsError: Setup script exited with error:
      Unable to find vcvarsall.bat
      

This may have to do with not properly specifying mingw as the compiler. Check that you have followed all the steps again.

Note

Installing pandas on Windows from the source is prone to many bugs and errors and is not really recommended.

IPython

Interactive Python (IPython) is a tool that is very useful for using Python for data analysis, and a brief description of the installation steps is provided here. IPython provides an interactive environment that is much more useful than the standard Python prompt. Its features include the following:

  • Tab completion to help the user do data exploration.
  • Comprehensive Help functionality using object_name? to print details about objects.
  • Magic functions that enable the user to run operating system commands within IPython, and run a Python script and load its data into the IPython environment by using the %run magic command.
  • History functionality via the _, __, and __ variables, the %history and other magic functions, and the up and down arrow keys.

Note

For more information, see the documentation at http://bit.ly/1Is4zIW.

IPython Notebook

IPython Notebook is the web-enabled version of IPython. It enables the user to combine code, numerical computation, and display graphics and rich media in a single document, the notebook. Notebooks can be shared with colleagues and converted to the HTML/PDF formats. For more information, refer to the documentation titled The IPython Notebook at http://ipython.org/notebook.html. Here is an illustration:

IPython Notebook

The preceding image of PYMC Pandas Example is taken from http://healthyalgorithms.files.wordpress.com/2012/01/pymc-pandas-example.png.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.146.155