pandas is named after panel data (an econometric term) and Python data analysis, and is a popular open source Python project. This chapter is a tutorial on basic pandas functionalities, where we will learn about pandas data structures and operations.
In this chapter, we will install and explore pandas. Then, we will acquaint ourselves with the two central pandas data structures: DataFrame
and Series
. After this, you will learn how to perform SQL-like operations on the data contained in these data structures. pandas has statistical utilities including time-series routines, some of which will be demonstrated. The topics we will pursue are as follows:
DataFrame
and Series
data structuresThe minimal dependency set requirements for pandas is given as follows:
This list is the bare minimum; a longer list of optional dependencies can be located at http://pandas.pydata.org/pandas-docs/stable/install.html. We can install pandas via PyPI with pip
or easy_install
, using a binary installer, with the aid of our operating system package manager, or from the source by checking out the code. The binary installers can be downloaded from http://pandas.pydata.org/getpandas.html.
The command to install pandas with pip
is as follows:
$ pip install pandas
You may have to prepend the preceding command with sudo
if your user account doesn't have sufficient rights. For most, if not all, Linux distributions, the pandas package name is python-pandas
. Please refer to the manual pages of your package manager for the correct command to install. These commands should be the same as the ones summarized in Chapter 1, Getting Started with Python Libraries. To install from the source, we need to execute the following commands from the command line:
$ git clone git://github.com/pydata/pandas.git $ cd pandas $ python setup.py install
This procedure requires the correct setup of the compiler and other dependencies; therefore, it is recommended only if you really need the most up-to-date version of pandas. Once we have installed pandas, we can explore it further by adding pandas-related lines to our documentation-scanning script pkg_check.py
of the previous chapter. The program prints the following output:
pandas version 0.13.1 pandas.compat DESCRIPTION compat Cross-compatible functions for Python 2 and 3. Key items to import for 2/3 compatible code: * iterators: range(), map(), pandas.computation pandas.core pandas.io pandas.rpy pandas.sandbox pandas.sparse pandas.stats pandas.tests pandas.tools pandas.tseries pandas.util
Unfortunately, the documentation of the pandas subpackages lacks informative descriptions; however, the subpackage names are descriptive enough to get an idea of what they are about.
3.145.40.189