© Randy Betancourt, Sarah Chen 2019
R. Betancourt, S. ChenPython for SAS Usershttps://doi.org/10.1007/978-1-4842-5001-3_1

1. Why Python?

Randy Betancourt1  and Sarah Chen2
(1)
Chadds Ford, PA, USA
(2)
Livingston, NJ, USA
 

There are plenty of substantive open source software projects out there for data scientists, so why Python?1 After all, there is the R language. R is a robust and well-supported language written initially by statistician for statisticians. Our view is not to promote one language over the other. The goal is to illustrate how the addition of Python to the SAS user’s toolkit is a means for valuable skills augmentation. Besides, Bob Muenchen has already written R for SAS and SPSS Users.2

Python is used in a wide range of computing applications from web and internet development to scientific and numerical analysis. Its pedigree from the realm of scientific and technical computing domains gives the language a natural affinity for data analysis. This is one of the reasons why Google uses it so extensively and has developed an outstanding tutorial for programmers.3

Perhaps the best answer as to why Python is best expressed in the Zen of Python, written by Tim Peters.4 While these are design principles used to influence the development of a language like Python, they apply (mostly) to our own efforts. These aphorisms are worth bookmarking and re-reading periodically.

Setting Up a Python Environment

One of the first questions a new Python user is confronted with is which version to use, Python 2 or Python 3. For this writing we used Python 3.6.4 (Version 3.6, Maintenance 4). The current release of Python is 3.7.2, released on December 24, 2018. Python release 3.8.0 is expected in November 2019. As with any language, minor changes in syntax occur as the developers make feature improvements and Python is no exception. We have chosen Python 3.6 since this was the latest release as time of writing and the release of 3.7 has not impacted any of the chapters. You can read more about the differences between Python 2 and Python 3.5

An attractive feature for Python is the availability of community-contributed modules. Python comes with a base library or core set of modules, referred to as the Standard Library. Due to Python’s design, individuals and organizations contribute to the creation of thousands of additional modules which are mostly written in Python. Interested in astronomical calculations used to predict any planet’s location in space? Then the kplr package is what you need.6 Closer to home, we will utilize the Python-dateutil 2.7.3 package to extend Python’s base capabilities for handling datetime arithmetic.7

Just as you can configure your SAS development environment in numerous ways, the same is true for Python. And while there are various implementations of Python, such as Jython, IronPython, and PyPy to make life simpler, organizations package distributions for you so you can avoid having to understand dependencies or using build scripts to assemble a custom environment. At the time of this writing, we are using the Anaconda distribution 5.2.0 for Windows 10 located at Anaconda’s distribution page at www.anaconda.com/download/ .

The Anaconda distribution of Python also supports OSX and Linux. They conveniently take care of all the details for you by providing familiar tools for installing, uninstalling, upgrading, determining package dependencies, and so on. But they do much more than just make a convenient distribution. They provide detailed documentation, support a community of enthusiastic users, and offer a supported enterprise product around the free distribution.

Anaconda3 Install Process for Windows

The following text describes the steps for installing a new version of Python 3.6. If you have an existing version already installed, you can either uninstall the older version or follow the instructions for managing multiple Python installs at https://conda.io/docs/user-guide/tasks/manage-python.html .
  1. 1.

    From www.anaconda.com/download/ download the Anaconda3-5.2.0 for Windows Installer for Python 3.6. Select the 32-bit or 64-bit installer (depending on your Windows machine architecture).

     
  2. 2.

    From this download location on your machine, you should see the file Anaconda3-5.2.0-Windows-x86-64.exe (assuming your Windows machine is 64-bit) for launching the Windows Installer.

     
  3. 3.
    Launch the Windows Installer (see Figure 1-1).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig1_HTML.jpg
    Figure 1-1

    Windows Installer

     
  4. 4.
    Click Next to review the license agreement and click the “I Agree” button (see Figure 1-2).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig2_HTML.jpg
    Figure 1-2

    License Agreement

     
  5. 5.
    Select the installation type, stand-alone or multi-user (see Figure 1-3).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig3_HTML.jpg
    Figure 1-3

    Select Installation Type

     
  6. 6.
    Select the installation location (see Figure 1-4).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig4_HTML.jpg
    Figure 1-4

    Select Installation Location

     
  7. 7.
    Register Anaconda as the default Python 3.6 installation by ensuring the “Register Anaconda as my default Python 3.6” box is checked. Press the “Install” button (see Figure 1-5).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig5_HTML.jpg
    Figure 1-5

    Advanced Installation Options

     
  8. 8.
    Start the installation process (see Figure 1-6). You may be asked if you would like to install Microsoft’s Visual Studio Code. Visual Studio provides a visual interface for constructing and debugging Python scripts. It is an optional component and is not used in this book.
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig6_HTML.jpg
    Figure 1-6

    Start Installation

     
  9. 9.

    Validate the install by opening a Windows Command Prompt window and enter (after the > symbol prompt):

     
> python
Assuming the installation worked correctly, the output should look similar to Listing 1-1.
C:Users andy>python
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Listing 1-1

Python Command for Windows

This is an indication that the installation is complete including modifications made to Windows environment variable PATH.

Troubleshooting Python Installation for Windows

If you receive the error message, ‘Python’ is not recognized as an internal or external command then ensure the Windows PATH environment variable has been updated to include the location of the Python installation directory.
  1. 1.
    On Windows 10, open File Explorer and select “Properties” for “This PC” (see Figure 1-7).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig7_HTML.jpg
    Figure 1-7

    PC Properties

     
  2. 2.
    Right-click the “Properties” dialog to open the Control Panel for the System (see Figure 1-8).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig8_HTML.jpg
    Figure 1-8

    PC Control Panel

     
  3. 3.
    Select “Advanced” tab for System Properties and press the Environment Variables… button (see Figure 1-9).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig9_HTML.jpg
    Figure 1-9

    Advanced System Properties

     
  4. 4.
    Highlight the “Path” Environment Variables (see Figure 1-10).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig10_HTML.jpg
    Figure 1-10

    Environment Variables

     
  5. 5.
    Edit the Path Environment Variables by clicking the “New” button (see Figure 1-11).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig11_HTML.jpg
    Figure 1-11

    Edit Environment Variables

     
  6. 6.
    Add the Anaconda Python installation path specified in step 6 from the Anaconda3 Install Process for Windows as seen earlier (see Figure 1-12).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig12_HTML.jpg
    Figure 1-12

    Add Anaconda3 to Path

     
  7. 7.

    Ensure the path you entered is correct and click “OK”.

     
  8. 8.

    To validate start a new Windows Command Prompt and enter the command “Python”. The output should look similar to the one in Listing 1-2.

     
C:Users andy>python
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Listing 1-2

Validate Python for Windows

The three angle brackets (>>>) is the default prompt for Python 3.

Anaconda3 Install Process for Linux

The following are the steps to install Python 3.6 in a Linux environment.
  1. 1.
    From www.anaconda.com/download/ download the Anaconda3-5.2.0 for Linux Installer for Python 3.6. This is actually a script file. Select the 32-bit or 64-bit installer (depending on your machine architecture). Select “Save File” and click “OK” (see Figure 1-13).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig13_HTML.jpg
    Figure 1-13

    Execute Linux Install

     
  2. 2.

    Open a Linux terminal window and navigate to the location for the default directory /<userhome>/Downloads.

     
$ cd /home/randy/Downloads
  1. 3.

    Change the permission to allow the script to execute with chmod command.

     
$ chmod +x Anaconda3-5.2.0-Linux-x86_64.sh
  1. 4.

    If you are using a Bash shell, you can execute the shell script with ./ preceding the script filename (see Figure 1-14).

     
$ ./Anaconda3-5.2.0-Linux-x86_64.sh
Alternatively, you can execute the script with the following (Figure 1-14) :
$ sh Anaconda3-5.1.0-Linux-x86_64.sh
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig14_HTML.jpg
Figure 1-14

Execute Script

  1. 5.
    Press <enter> to continue and display the License Agreement (Figure 1-15).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig15_HTML.jpg
    Figure 1-15

    License Agreement

     
  2. 6.
    Accept the license term by entering “yes” and pressing <enter> (Figure 1-16).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig16_HTML.jpg
    Figure 1-16

    Accept License Terms

     
  3. 7.
    Confirm the Anaconda3 installation directory and press <enter> (Figure 1-17).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig17_HTML.jpg
    Figure 1-17

    Confirm Install Location

     
  4. 8.
    Append the Anaconda3 installation directory to the $PATH environment variable by entering “yes” and pressing <enter> (Figure 1-18).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig18_HTML.jpg
    Figure 1-18

    Append to $PATH Variable

     
You may be asked if you would like to install Microsoft’s Visual Studio. Visual Studio provides a visual interface for constructing and debugging Python scripts. It is an optional component and is not used in this book.
  1. 9.
    Confirm the installation by closing the terminal window used to execute the installation script and opening a new terminal window. This action will execute the .bashrc file in your home directory and “pick up” the updated $PATH environment variable that includes the Anaconda3 installation directory (Figure 1-19).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig19_HTML.jpg
    Figure 1-19

    Confirm Installation

     

Executing a Python Script on Windows

Now that you have a working version of Python 3, we can begin. Consider the Python program in Listing 1-3. It is a simple script illustrating a Python for loop. The set of numbers contained inside the square brackets [ ] make up the elements of a Python list. In Python, a list is a data structure that holds an arbitrary collection of items. The variable i is used as the index into the loop. The variable product holds the integer value from the arithmetic assignment of product * i. And finally, the print function displays the output.
# First Python program using a for loop
numbers = [2, 4, 6, 8, 11]
product = 1
for i in numbers:
   product = product * i
print('The product is:', product)
Listing 1-3

loop.py Program

Notice there appears to be no symbols used to end a program statement. The end-of-line character is used to end a Python statement. This also helps to enforce legibility by keeping each statement on a separate physical line.

Coincidently, like SAS, Python honors a semi-colon as an end-of-statement terminator. However, you rarely see this. That’s because multiple statements on the same physical line are considered an affront to program legibility.

The pound sign (#) on the first line indicates the statement is a comment. The same program logic is written in SAS and shown in Listing 1-4. It uses a NULL DATA Step with a DO/END loop and the PUT statement to write its output to the SAS log. All of the SAS code examples in this book are executed using SAS release 9.4M5.
data _null_;
   array numbers {5} _TEMPORARY_ (2,4,6,8,11);
   product=1;
   do i=1 to dim(numbers);
      product=product*numbers{i};
   end;
put 'The product is: ' product;
run;
Listing 1-4

Equivalent of loop.py Written in SAS

Follow these steps to execute the loop.py example on Windows:
  1. 1.
    Using your favorite text editor, copy the loop.py script from Listing 1-3. It is recommended you not use Windows Notepad since it is unlikely to preserve the indentations the Python script requires. An alternative editor for Windows is Notepad++ shown in Figure 1-20.
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig20_HTML.jpg
    Figure 1-20

    Notepad++ Editor Version of loop.py

     
  2. 2.

    Open a Windows Command window and navigate to the directory where you saved the loop.py Python script.

     
  3. 3.

    Execute the loop.py script from the Windows Command window by entering

     
> python loop.py
This style of executing a Python script is the equivalent of executing a SAS program in non-interactive mode. Similar to the behavior of SAS, any output or errors generated from the Python script’s execution is displayed in the Windows Command window as shown in Listing 1-5.
C:Users andysourcepython> python loop.py
The product is: 4224
Listing 1-5

Output from loop.py

If you received an error executing this script, it is likely that you misspelled the path or filename for the loop.py script resulting in a “No Such File or Directory” error message shown in Listing 1-6 .
C:Users andysourcepython> python for_loop.py
python: can't open file 'for_loop.py': [Errno 2] No such file or directory
Listing 1-6

No Such File or Directory

As a means to ensure Python code is legible, there are strict rules on indentation. The Python program in Listing 1-7 is the same as the loop.py script from Listing 1-3 except all the statements are left aligned. Notice how the modified loop.py script when executed raises the IndentationError shown in Listing 1-8.
# First Python program using a for loop
numbers = [2, 4, 6, 8, 11]
product = 1
for i in numbers:
product = product * i
print('The product is:', product)
Listing 1-7

Modified loop.py with No Indentation

When the modified loop.py script is executed, the lack of indentation raises the error shown in Listing 1-8.
C:Users andysourcepython> python loop1.py
  File "loop1.py", line 6
    product = product * i
          ^
IndentationError: expected an indented block
Listing 1-8

Expected an Indented Block Error

Once you get over the shock of how Python imposes the indentation requirements, you will come to see this as an important feature for creating and maintaining legible, easy-to-understand code. The standard coding practice is to have four whitespaces rather than using <TAB>’s. In the section “Integrated Development Environment (IDE) for Python,” you will see how this and other formatting details are handled for you automatically.

Case Sensitivity

Naturally, the incorrect spelling of language keywords, variables, and object names are sources of errors. Unlike the SAS language, Python names are case-sensitive. Consider the simple two-line Python script in Listing 1-9.
C:Users andypython> python
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Y=201
>>> print(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined
Listing 1-9

Case Sensitivity

Python scripts can be executed interactively. In this example, we invoke the Python command. This causes the command line prompts to change to the default Python prompt, >>>. To end an interactive Python session, submit the statement exit().

The variable Y (uppercase) is assigned the integer value of 201. The Python print() function is called for the variable y (lowercase). Since the variable y is not presently defined in the Python namespace, a NameError is raised.

Line Continuation Symbol

Should you find you have a line of code needing to extend past the physical line (i.e., wrap), then use the backslash (). This causes the Python interpreter to ignore the physical end-of-line terminator for the current line and continues scanning for the next end-of-line terminator. The Python line continuation symbol is shown in Listing 1-10.
>>> y = 1 +
...     2
>>> print(y)
3
Listing 1-10

Line Continuation

Finally, a word about name choices. Names should be descriptive because more than likely you will be the one who has to re-read and understand tomorrow the code you write today. As with any language, it is a good practice to avoid language keywords for object names.

A language that makes it hard to write elegant code makes it hard to write good code.

—Eric Raymond, Why Python?8

Executing a Python Script on Linux

The steps for executing a Python script on Linux are similar to the ones for executing on Windows described previously.
  1. 1.
    Use an editor such as nano or vi and copy the loop.py Python program from Figure 1-21. Save this file as loop.py. Notice that the .py extension is used to indicate a Python script (Figure 1-21).
    ../images/440803_1_En_1_Chapter/440803_1_En_1_Fig21_HTML.jpg
    Figure 1-21

    vi Editor Displaying loop.py

     
  2. 2.

    Open a terminal window and navigate to the directory where you saved the loop.py Python script.

     
  3. 3.

    Execute the loop.py Python script with command

     
$ python loop.py
The execution of the Python script writes its output and error messages to the terminal window. You should see the output from Figure 1-22.
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig22_HTML.jpg
Figure 1-22

Output from loop.py on Linux

Now that you understand how to execute Python scripts in “non-interactive mode,” you are probably wondering about Python’s equivalent for SAS Display Manager or the SAS Studio client. This leads us to the next topic, “Integrated Development Environment (IDE) for Python.”

Integrated Development Environment (IDE) for Python

In order to improve our Python coding productivity, we need a tool for interactive script development, as opposed to the non-interactive methods we have discussed thus far. We need the equivalent of the SAS Display Manager or SAS Studio.

SAS Display Manager, SAS Enterprise Guide, and SAS Studio are examples of an integrated development environment or IDE for short. Beyond just editing your SAS programs, these IDEs provide a set of services, such as submitting programs for execution, logging execution, rendering output, and managing resources. For example, in the SAS Display Manager, opening the LIBREF window to view assigned SAS Data Libraries is an example of the IDE’s ability to provide a non-programming method to visually inspect the properties and members for a SAS LIBNAME statements assigned to the current session.

As you might expect, not all IDEs are created equal. The more sophisticated IDEs permit setting checkpoints to enable a “walk through” of code execution displaying variable values and resource states on a line-at-a-time basis. They also provision methods to store a collection of programs into a coherent set of packages. These packages can then be re-distributed to others for execution. Perhaps the most compelling feature is how an IDE encourages team collaboration by allowing multiple users to work together creating, testing, and documenting a project composed of a collection of these artifacts.

If you are familiar with R and like using RStudio, then you will appreciate the similarities between RStudio and Spyder. Spyder is a component bundled with the Anaconda distribution. Spyder IDE Executing loop.py is illustrated in Figure 1-23.
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig23_HTML.jpg
Figure 1-23

Spyder IDE Executing loop.py

One of the more interesting IDEs developed specifically for the data scientist community is the Jupyter notebook. It uses a web-based interface to write, execute, test, and document your code. Jupyter notebooks support over 40 languages, including Python, R, Scala, and Julia. It also has an open architecture, so vendors and users can write plug-ins for their own execution engines, or what Jupyter refers to as kernels. SAS Institute supports a bare-bones SAS kernel executing on Linux for Jupyter notebooks.9

A compelling feature for Jupyter notebooks is the ability to develop and share them across the Web. All of the Python examples used in this book were developed using the Jupyter notebook. Best of all, the Anaconda distribution of Python comes bundled with the Jupyter notebook IDE.

Jupyter Notebook

Figure 1-24 displays the start page for a Jupyter notebook. Using the Windows Start Menu, you launch the Jupyter notebook by using the following path:
Start -> Anaconda3 ->Jupyter Notebook
This action launches the Jupyter notebook into the default browser. Alternatively, you can launch the notebook on Windows using the command line command
> python -m notebook
Another commonly used command is
> jupyter notebook
This method is convenient if you wish to change directories to a Window’s folder location used to store and retrieve Python scripts. This allows you to change directories before you launch the notebook.
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig24_HTML.jpg
Figure 1-24

Jupyter Notebook Home Page

To start a new project
  1. 1.

    On the dashboard (labeled Home page), click the New button on the upper right and then select Python 3 from the drop-down. This launches a new untitled notebook page.

     
  2. 2.

    Enter the loop.py script created earlier into a cell. You can also copy the script you created earlier and paste directly into the notebook cell.

     
  3. 3.

    Click the “Play” button to execute the code you copied into the cell (Figure 1-25).

     
The documentation on how to use the Jupyter notebook is concise, and it is worth the effort to read. See https://jupyter.org/ .
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig25_HTML.jpg
Figure 1-25

Jupyter Notebook with loop.py Script

You may have multiple notebooks, each represented by a browser tab, opened at the same time. You may also have multiple instances of Jupyter notebooks opened at the same time (with multiple notebooks open, pay attention to names to avoid accidental overwriting).

Jupyter Notebook for Linux

Open a terminal window and enter the command
$ jupyter notebook &

On Linux the terminal window remains open while the Jupyter notebook is active.

In some instances, the Linux default browser may not open automatically after the Jupyter notebook command is issued. If that is the case
  1. 1.

    Start a new browser instance.

     
  2. 2.

    The notebook should launch a browser session. If it does not start the browser, then look for the message in the Linux terminal window:

     
See Figure 1-26, Launch Jupyter notebook on Linux. Copy/paste this URL into your browser when you connect for the first time to login with a token: http://localhost:8888/?token=<token string> and copy the URL into the browser address window.
../images/440803_1_En_1_Chapter/440803_1_En_1_Fig26_HTML.jpg
Figure 1-26

Launching Jupyter notebook on Linux

Summary

In this chapter we illustrated how to install and configure the Python environment for Windows and Linux. We also introduced basic formatting and syntax rules needed to execute simple Python scripts. And we introduced different methods for executing Python scripts including the use of Jupyter notebooks. With a working Python environment established, we can begin exploring Python as a language to augment SAS for data exploration and analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.170