Let's get started. We can find a mind map describing software that can be used for data analysis at http://www.xmind.net/m/WvfC/. Obviously, we can't install all of this software in this chapter. We will install NumPy, SciPy, matplotlib, and IPython on different operating systems and have a look at some simple code that uses NumPy.
NumPy is a fundamental Python library that provides numerical arrays and functions.
SciPy is a scientific Python library, which supplements and slightly overlaps NumPy. NumPy and SciPy historically shared their code base but were later separated.
matplotlib is a plotting library based on NumPy. You can read more about matplotlib in Chapter 6, Data Visualization.
IPython provides an architecture for interactive computing. The most notable part of this project is the IPython shell. We will cover the IPython shell later in this chapter.
Installation instructions for the other software we need will be given throughout the book at the appropriate time. At the end of this chapter, you will find pointers on how to find additional information online if you get stuck or are uncertain about the best way to solve problems.
In this chapter, we will cover:
The software used in this book is based on Python, so you are required to have Python installed. On some operating systems, Python is already installed. You, however, need to check whether the Python version is compatible with the software version you want to install. There are many implementations of Python, including commercial implementations and distributions. In this book, we will focus on the standard CPython implementation, which is guaranteed to be compatible with NumPy.
You can download Python from https://www.python.org/download/. On this website, we can find installers for Windows and Mac OS X as well as source archives for Linux, Unix, and Mac OS X.
The software we will install in this chapter has binary installers for Windows, various Linux distributions, and Mac OS X. There are also source distributions if you prefer that. You need to have Python 2.4.x or above installed on your system. Python 2.7.x is currently the best Python version to have because most Scientific Python libraries support it. Python 2.7 will be supported and maintained until 2020. After that, we will have to switch to Python 3.
We will learn how to install and set up NumPy, SciPy, matplotlib, and IPython on Windows, Linux and Mac OS X. Let's look at the process in detail.
Installing on Windows is, fortunately, a straightforward task that we will cover in detail. You only need to download an installer and a wizard will guide you through the installation steps. We will give you steps to install NumPy here. The steps to install the other libraries are similar. The actions we will take are as follows:
Library |
URL |
Latest version |
---|---|---|
NumPy |
1.8.1 | |
SciPy |
0.14.0 | |
matplotlib |
1.3.1 | |
IPython |
2.0.0 |
numpy-1.8.1-win32-superpack-python2.7.exe
.If you have Python installed, it should automatically be detected. If it is not detected, maybe your path settings are wrong.
The situation around installers is rapidly evolving. Other alternatives exist in various stages of maturity (see http://www.scipy.org/install.html). It might be necessary to put the msvcp71.dll
file in your system32
directory located at C:Windows
. You can get it from http://www.dll-files.com/dllindex/dll-files.shtml?msvcp71.
Installing the recommended software on Linux depends on the distribution you have. We will discuss how you would install NumPy from the command line;you could probably use graphical installers depending on your distribution (distro). The commands to install matplotlib, SciPy, and IPython are the same; only the package names are different. Installing matplotlib, SciPy, and IPython is recommended but optional.
Most Linux distributions have NumPy packages. We will go through the necessary commands for some of the popular Linux distributions as follows:
$ yum install python-numpy
$ urpmi python-numpy
$ sudo emerge numpy
$ sudo apt-get install python-numpy
The following table gives an overview of the Linux distributions and corresponding package names for NumPy, SciPy, matplotlib, and IPython:
Linux distribution |
NumPy |
SciPy |
matplotlib |
IPython |
---|---|---|---|---|
Arch Linux |
|
|
|
|
Debian |
|
|
|
|
Fedora |
|
|
|
|
Gentoo |
|
|
|
|
openSUSE |
|
|
|
|
Slackware |
|
|
|
|
You can install NumPy, matplotlib, and SciPy on Mac OS X with a graphical installer or from the command line with a port manager, such as MacPorts or Fink, depending on your preference. The prerequisite is to install XCode, as it is not part of OS X releases. We will install NumPy with a GUI installer using the following steps:
numpy
in the previous URL to scipy
or matplotlib
to get installers of the respective libraries. IPython didn't have a GUI installer at the time of writing this.Another alternative is SciPy Superpack (https://github.com/fonnesbeck/ScipySuperpack).
Whichever option you choose, it is important to make sure that updates that impact the system Python library don't negatively influence already-installed software by not building against the Python library provided by Apple. Install NumPy, matplotlib, and SciPy using the following steps:
numpy-1.8.1-py2.7-python.org-macosx10.6.dmg
)..mpkg
. We will be presented with the welcome screen of the installer.Alternatively, we can install the libraries through the MacPorts route, with Fink or Homebrew. The following installation commands install all these packages. We only need NumPy for all the tutorials in this book, so please omit the packages you are not interested in.
$ sudo port install py-numpy py-scipy py-matplotlib py-ipython
scipy-core-py24
, scipy-core-py25
, and scipy-core-py26
. The SciPy packages are scipy-py24
, scipy-py25
, and scipy-py26
. We can install NumPy and other recommended packages that we will be using in this book for Python 2.6 with the following command:$ fink install scipy-core-py26 scipy-py26 matplotlib-py26
3.141.200.3