© Valentina Porcu 2018
Valentina PorcuPython for Data Mining Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4113-4_1

1. Getting Started

Valentina Porcu1 
(1)
Nuoro, Italy
 

Python is one of the most important programming languages used in data science. In this chapter, you’ll learn how to install Python and review some of the integrated development environments (IDEs) used for data analysis. You’ll also learn how to set up a working directory on your computer.

Installing Python

Python2 and Python3 can be downloaded easily from https://www.python.org/downloads/ (Figure 1-1) and then installed. Note that if you are working on a Unix system using a Mac or Linux, Python is preinstalled. Simply type “python” to load the program.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig1_HTML.jpg
Figure 1-1

Python home page

From the python.org ( http://python.org/ ) website, click Downloads then select the appropriate version to use based on your operating system. Then, follow the on-screen instructions to install Python.

Editor and IDEs

There are many ways to use a programming language such as Python. To start, type the word “python” followed immediately by its version number. There is no space before the number. For example, in Figure 1-2, I’ve typed “python2.”
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig2_HTML.jpg
Figure 1-2

Terminal with Python open

Writing code this way may prove to be somewhat cumbersome, so we use text editors or IDEs to facilitate the process.

There are many editors (those that are free and those that can be purchased) that differ in their completeness, scalability, and ease of use. Some are simple and some are more advanced. The most used editors include Sublime Text, Text Wrangler ( http://www.barebones.com/ ), Notepad++ ( http://notepad-plus-plus.org/download/v7.3.1.html ) (for Windows), or TextMate ( http://macromates.com/ ) (for Mac).

As for Python-specific IDEs , Wingware ( http://wingware.com/ ), Komodo ( http://www.activestate.com/komodo-ide ), Pycharm, and Emacs ( http://www.gnu.org/software/emacs/ ) are popular, but there are plenty of others. They provide tools to simplify work, such as self-completion, auto-editing and auto-indentation, integrated documentation, syntax highlighting, and code folding (the ability to hide some pieces of code while you works on others), and to support debugging.

Spyder (which is included in Anaconda ( http://www.continuum.io/downloads )) and Jupyter ( http://jupyter.readthedocs.io/en/latest/ ), that you can download from the website www.anaconda.com , are the IDEs used most in data science, along with Canopy. A useful tool in Jupyter is nbviewer, which allows the exchange of Jupyter’s .ipynb files, and can be downloaded from http://nbviewer.jupyter.org . nbviewer can also be linked to GitHub.

As for Anaconda, which is a very useful tool because it also features Jupyter, it can be downloaded from http://www.continuum/ . A partial list of resources installed with Anaconda (which contains more than 100 packets for data mining, math, data analysis, and algebra) is presented in Figure 1-3. You can view the complete list by opening the a terminal window shown in Figure 1-3 and then typing:
conda list
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig3_HTML.jpg
Figure 1-3

Part of the resources installed with Anaconda

We can program with Python using one or more of these tools, depending on our habits and what we want to do. Spyder (Figure 1-4) and Jupyter (Figure 1-5) are very common for data mining. Both can be used and installed individually. For example, Jupyter can be tested using http://try.jupyter.org/. However, both Spyder and Jupyter are available after Anaconda is installed.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig4_HTML.jpg
Figure 1-4

Spyder home screen

../images/469457_1_En_1_Chapter/469457_1_En_1_Fig5_HTML.jpg
Figure 1-5

Example of open script on Jupyter IDE

Python code can be run directly from a computer terminal or saved as a .py file and then run from these other editors. As mentioned earlier, “>>>” (displayed in Figure 1-6) tells us we are running Python code.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig6_HTML.jpg
Figure 1-6

The command prompt in Python

To follow the examples presented in this book, I recommend you install Anaconda (Figure 1-7) from the AAnaconda.com web site and use Jupyter. Because Anaconda automatically includes (and installs) a set of packages and modules that we will use later, we won’t have to install packages or modules separately thereafter; we’ll already have them loaded and ready to use.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig7_HTML.jpg
Figure 1-7

Anaconda’s main screen

Differences between Python2 and Python3

Python was released in two different versions: Python2 and Python3. Python2 was born in 2000 (currently, the latest release is 2.7) and its support is expected to continue until 2020. It is the historical and most complete version.

Python3 was released in 2008 (current version is 3.6). There are many libraries in Python3, but not all of them have been converted from Python2 for Python3.

The two versions are very similar but feature some differences. One example includes mathematical operations:
>>> 5/2
2
# Python2 performs division by breaking the decimal.
Listing 1-1

Mathematical Operations in Python 2.7

>>> 5/2
2.5
Listing 1-2

Mathematical Operations in Python 3.5.2

To get the correct result in Python2, we have to specify the decimal as
>>> 5.0/2
2.5
# or like this
>>> 5/2.0
2.5
# or specify we are talking about a decimal (float)
>>> float(5)/2
2.5
To keep the two versions of Python together, you can also import Python into a form called future , which allows you to import Python3 functions into Python2:
>>> from __future__ import division
>>> 5/2
2.5

For a closer look at the differences between the two versions of Python, access this online resource ( http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html ).

Why choose one version of Python over the other? Python2 is the best-defined and most stable version, whereas Python3 represents the future of the language, although the two versions may not always coincide. In the first part of this book, I highlight the differences between the two versions. However, beginning with Chapter 7 and moving to the end of the book, we will use Python3.

Let’s start by setting up a work directory. This directory will house our files.

Work Directory

A work directory stores our scripts and our files. It is where Python automatically looks when we ask it to import a file or run a script. To set up a work directory, type the following in the Python shell:
>>> import os
>>>> os.getcwd()
'~/mypc'
# to edit the work directory, we use the following notation, inserting the new directory in parentheses
>>> os.chdir("/~/Python_script")
# then we determine whether it is correct
>>> os.getcwd()
'~/Python_script'
Now, when we want to import a file in our workbook, we simply type the name of the file followed by the extension, all surrounded by double quotation marks:
"file_name.extension"
For instance,
"dataframe_data_collection1.csv"

Python checks whether there is a file with that name inside that folder and imports it. The same thing happens when we save a Python file by typing it on a computer. Python automatically puts it in that folder. Even when we run a Python script, as we will see, we have to access the folder where the script (the work directory or another one) is located directly from the terminal.

If we want to import a file that is not in the work directory but is elsewhere on our computer or on the Web, we do this by entering the full file address:"
complete_address.file_name.extension"
For instance,
"/~/dataframe_data1.csv"

Now let’s make sure that you understand the difference between using a the terminal and starting a session in our favorite programming language.

Using a Terminal

To run Python scripts, we first open a terminal window, as shown in Figure 1-8.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig8_HTML.jpg
Figure 1-8

My terminal

As you can see, the dollar symbol ($) is displayed, not the Python shell symbol (>>>). To view a list of our folders and files, use the “ls” command (Figure 1-9).
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig9_HTML.jpg
Figure 1-9

List of resources on my computer

At this point, we can move to the Python_test folder by typing
cd Python_test
In that folder, I find my Python scripts—that is, the .py files I can run by typing
python test.py

test.py is the name of the script I am going to run.

Summary

In this chapter we learned how to install Python and I reviewed some of the various IDEs we can use for data analysis. We also examined Python2 and Python3, and learned how to set up a work directory on a terminal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.122.82