Chapter 2. Python Infrastructure

In building a house, there is the problem of the selection of wood.

It is essential that the carpenter’s aim be to carry equipment that will cut well and, when he has time, to sharpen that equipment.

Miyamoto Musashi (The Book of Five Rings)

For someone new to Python, Python deployment might seem all but straightforward. The same holds true for the wealth of libraries and packages that can be installed optionally. First of all, there is not only one Python. Python comes in many different flavors, like CPython, Jython, IronPython or PyPy. Then there is still the divide between Python 2.7 and the 3.x world. In what follows, the chapter focuses on CPython, the most popular version of the Python programming language, and here on version 3.8.

Even when focusing on CPython 3.8 (henceforth just “Python”), deployment is made difficult due to a number of reasons:

  • The interpreter (a standard CPython installation) only comes with the so-called standard library (e.g. covering typical mathematical functions).

  • Optional Python packages need to be installed separately — and there are hundreds of them.

  • Compiling (“building”) such non-standard packages on your own can be tricky due to dependencies and operating system-specific requirements.

  • Taking care of such dependencies and of version consistency over time (maintenance) is often tedious and time consuming.

  • Updates and upgrades for certain packages might cause the need for re-compiling a multitude of other packages.

  • Changing or replacing one package might cause trouble in (many) other places.

  • Migrating from one Python version to another one at some later point might amplify all these above.

Fortunately, there are tools and strategies available that help with the Python deployment issue. This chapter covers the following types of technologies that help with Python deployment:

  • Package manager: Package managers like pip or conda help with the installing, updating and removing of Python packages. They also help with version consistency of different packages.

  • Virtual environment manager: A virtual environment manager like virtualenv or conda allows to manage multiple Python installations in parallel (for example, to have both a Python 2.7 and 3.8 installation on a single machine or to test the most recent development version of a fancy Python package without risk).1

  • Container: Docker containers represent complete file systems containing all pieces of a system needed to run a certain software, like code, runtime or system tools. For example, you can run a Ubuntu 20.04 operating system with a Python 3.8 installation and the respective Python codes in a Docker container hosted on a machine running Mac OS or Windows 10. Such a containerized environment can then also be deployed later in the cloud without any major changes in general.

  • Cloud instance: Deploying Python code for financial applications generally requires high availability, security, and also performance. These requirements can typically only be met by the use of professional compute and storage infrastructure that is nowadays available at attractive conditions in the form of fairly small to really large and powerful cloud instances. One benefit of a cloud instance (virtual server) compared to a dedicated server rented longer term is that users generally get charged only for the hours of actual usage. Another advantage is that such cloud instances are available literally in a minute or two if needed which helps agile development and also with scalability.

The structure of this chapter is as follows. “Conda as a Package Manager” introduces conda as a package manager for Python. “Conda as a Virtual Environment Manager” focuses on conda capabilities for virtual environment management. “Using Docker Containers” gives a brief overview of Docker as a containerization technology and focuses on the building of a Ubuntu-based container with Python 3.8 installation. “Using Cloud Instances” shows how to deploy Python and Jupyter Lab — as a powerful, browser-based tool suite — for Python development and deployment in the cloud.

The goal of this chapter is to have a proper Python installation with the most important tools as well as numerical, data analysis, and visualization packages available on a professional infrastructure. This combination then serves as the backbone for implementing and deploying the Python codes in later chapters, be it interactive financial analytics code or code in the form of scripts and modules.

Conda as a Package Manager

Although conda can be installed stand alone, an efficient way of doing it is via Miniconda — a minimal Python distribution including conda as a package and virtual environment manager.

Installing Miniconda

You can download the different versions of Miniconda on the Miniconda page. In what follows, the Python 3.8 64-bit version is assumed which is available for Linux, Windows and Mac OS. The main example in this sub-section is a session in a Ubuntu-based Docker container which downloads the Linux 64-bit installer via wget and then installs Miniconda. The code as shown should work — with maybe minor modifications — on any other Linux-based or Mac OS-based machine as well.2

$ docker run -ti -h pyalgo -p 11111:11111 ubuntu:latest /bin/bash

root@pyalgo:/# apt-get update; apt-get upgrade -y
...
root@pyalgo:/# apt-get install -y gcc wget
...
root@pyalgo:/# cd root
root@pyalgo:~# wget 
> https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 
> -O miniconda.sh
...
HTTP request sent, awaiting response... 200 OK
Length: 93052469 (89M) [application/x-sh]
Saving to: 'miniconda.sh'

miniconda.sh              100%[============>]  88.74M  1.60MB/s    in 2m 15s

2020-08-25 11:01:54 (3.08 MB/s) - 'miniconda.sh' saved [93052469/93052469]

root@pyalgo:~# bash miniconda.sh

Welcome to Miniconda3 py38_4.8.3

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>>

Simply pressing the ENTER key starts the installation process. After reviewing the license agreement, approve the terms by answering yes.

...
Last updated February 25, 2020

Do you accept the license terms? [yes|no]
[no] >>> yes

Miniconda3 will now be installed into this location:
/root/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/miniconda3] >>>
PREFIX=/root/miniconda3
Unpacking payload ...
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3
...
  python             pkgs/main/linux-64::python-3.8.3-hcff3b4d_0
...
Preparing transaction: done
Executing transaction: done
installation finished.

After you have agreed to the licensing terms and have confirmed the install location, you should allow Miniconda to prepend the new Miniconda install location to the PATH environment variable by answering yes once again.

Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes
...
no change     /root/miniconda3/etc/profile.d/conda.csh
modified      /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup,
   set the auto_activate_base parameter to false:

conda config --set auto_activate_base false

Thank you for installing Miniconda3!
root@pyalgo:~#

After that, you might want to update conda since the Miniconda installer is in general not as regularly updated as conda itself.

root@pyalgo:~# export PATH="/root/miniconda3/bin/:$PATH"
root@pyalgo:~# conda update -y conda
...
root@pyalgo:~# echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
root@pyalgo:~# bash
(base) root@pyalgo:~#

After this rather simple installation procedure, there are now both a basic Python installation as well as conda available. The basic Python installation comes already with some nice batteries included like the SQLite3 database engine. You might try out whether you can start Python in a new shell instance or after appending the relevant path to the respective environment variable (as done above) .

(base) root@pyalgo:~# python
Python 3.8.3 (default, May 19 2020, 18:47:26)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print('Hello Python for Algorithmic Trading World.')
Hello Python for Algorithmic Trading World.
>>> exit()
(base) root@pyalgo:~#

Basic Operations with Conda

conda can be used to efficiently handle, among others, the installing, updating and removing of Python packages. The following list provides an overview of the major functions.

installing Python x.x

conda install python=x.x

updating Python

conda update python

installing a package

conda install $PACKAGE_NAME

updating a package

conda update $PACKAGE_NAME

removing a package

conda remove $PACKAGE_NAME

updating conda itself

conda update conda

searching for packages

conda search $SEARCH_TERM

listing installed packages

conda list

Given these capabilities, installing, for example, NumPy — as one of the most important packages of the so-called scientific stack — is a single command only. When the installation takes place on a machine with an Intel processor, the procedure automatically installs the Intel Math Kernel Library mkl which speeds up numerical operations not only for NumPy on Intel machines but also for a few other scientific Python packages.3

(base) root@pyalgo:~# conda install numpy
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3

  added / updated specs:
    - numpy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blas-1.0                   |              mkl           6 KB
    intel-openmp-2020.1        |              217         780 KB
    mkl-2020.1                 |              217       129.0 MB
    mkl-service-2.3.0          |   py38he904b0f_0          62 KB
    mkl_fft-1.1.0              |   py38h23d657b_0         150 KB
    mkl_random-1.1.1           |   py38h0573a6f_0         341 KB
    numpy-1.19.1               |   py38hbc911f0_0          21 KB
    numpy-base-1.19.1          |   py38hfa32c7d_0         4.2 MB
    ------------------------------------------------------------
                                           Total:       134.5 MB

The following NEW packages will be INSTALLED:

  blas               pkgs/main/linux-64::blas-1.0-mkl
  intel-openmp       pkgs/main/linux-64::intel-openmp-2020.1-217
  mkl                pkgs/main/linux-64::mkl-2020.1-217
  mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py38he904b0f_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.1.0-py38h23d657b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.1.1-py38h0573a6f_0
  numpy              pkgs/main/linux-64::numpy-1.19.1-py38hbc911f0_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.19.1-py38hfa32c7d_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
numpy-base-1.19.1    | 4.2 MB    | ########################################################## | 100%
blas-1.0             | 6 KB      | ########################################################## | 100%
mkl_fft-1.1.0        | 150 KB    | ########################################################## | 100%
mkl-service-2.3.0    | 62 KB     | ########################################################## | 100%
numpy-1.19.1         | 21 KB     | ########################################################## | 100%
mkl-2020.1           | 129.0 MB  | ########################################################## | 100%
mkl_random-1.1.1     | 341 KB    | ########################################################## | 100%
intel-openmp-2020.1  | 780 KB    | ########################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@pyalgo:~#

Multiple packages can also be installed at once. The -y flag indicates that all (potential) questions shall be answered with yes.

(base) root@pyalgo:~# conda install -y ipython matplotlib pandas 
> pytables scikit-learn scipy
...
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3

  added / updated specs:
    - ipython
    - matplotlib
    - pandas
    - pytables
    - scikit-learn
    - scipy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    backcall-0.2.0             |             py_0          15 KB
    ...
    zstd-1.4.5                 |       h9ceee32_0         619 KB
    ------------------------------------------------------------
                                           Total:       144.9 MB

The following NEW packages will be INSTALLED:

  backcall           pkgs/main/noarch::backcall-0.2.0-py_0
  blosc              pkgs/main/linux-64::blosc-1.20.0-hd408876_0
  ...
  zstd               pkgs/main/linux-64::zstd-1.4.5-h9ceee32_0



Downloading and Extracting Packages
glib-2.65.0          | 2.9 MB    | ############################## | 100%
...
snappy-1.1.8         | 40 KB     | ############################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@pyalgo:~#

After the resulting installation procedure, some of the most important libraries for financial analytics are available in addition to the standard ones.

IPython

An improved interactive Python shell.

matplotlib

The standard plotting library for Python.

NumPy

Efficient handling of numerical arrays.

pandas

Management of tabular data, like financial time series data.

PyTables

A Python wrapper for the HDF5 library.

scikit-learn

A package for machine learning and related tasks.

SciPy

A collection of scientific classes and functions.

This provides a basic tool set for data analysis in general and financial analytics in particular. The next example uses IPython and draws a set of pseudo-random numbers with NumPy.

(base) root@pyalgo:~# ipython
Python 3.8.3 (default, May 19 2020, 18:47:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: np.random.seed(100)

In [3]: np.random.standard_normal((5, 4))
Out[3]:
array([[-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604],
       [ 0.98132079,  0.51421884,  0.22117967, -1.07004333],
       [-0.18949583,  0.25500144, -0.45802699,  0.43516349],
       [-0.58359505,  0.81684707,  0.67272081, -0.10441114],
       [-0.53128038,  1.02973269, -0.43813562, -1.11831825]])

In [4]: exit
(base) root@pyalgo:~#

Executing conda list shows which packages are installed.

(base) root@pyalgo:~# conda list
# packages in environment at /root/miniconda3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
backcall                  0.2.0                      py_0
blas                      1.0                         mkl
blosc                     1.20.0               hd408876_0
...
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.5                h9ceee32_0
(base) root@pyalgo:~#

In case a package is not needed anymore, it is efficiently removed with conda remove.

(base) root@pyalgo:~# conda remove matplotlib
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3

  removed specs:
    - matplotlib


The following packages will be REMOVED:

The following packages will be REMOVED:

  cycler-0.10.0-py38_0
  ...
  tornado-6.0.4-py38h7b6447c_1


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) root@pyalgo:~#

conda as a package manager is already quite useful. However, its full power only becomes evident when adding virtual environment management to the mix.

Tip

conda as a package manager makes installing, updating and removing of Python packages a pleasant experience. There is no need to take care of building and compiling packages on your own — which can be tricky sometimes given the list of dependencies a package specifies and given the specifics to be considered on different operating systems.

Conda as a Virtual Environment Manager

Having installed Miniconda with conda included provides a default Python installation depending on what version of Miniconda has been chosen. The virtual environment management capabilities of conda allow, for example, to add to a Python 3.8 default installation a completely separated installation of Python 2.7.x. To this end, conda offers the following functionality.

creating a virtual environment

conda create --name $ENVIRONMENT_NAME

activating an environment

conda activate $ENVIRONMENT_NAME

deactivating an environment

conda deactivate $ENVIRONMENT_NAME

removing an environment

conda env remove --name $ENVIRONMENT_NAME

export to an environment file

conda env export > $FILE_NAME

creating an environment from file

conda env create -f $FILE_NAME

listing all environments

conda info --envs

As a simple illustration, the example code that follows creates an environment called py27, installs IPython and executes a line of Python 2.7.x code. Although the support for Python 2.7 has ended, the example illustrates how legacy Python 2.7 code can easily be executed and tested.

(base) root@pyalgo:~# conda create --name py27 python=2.7
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json,
will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /root/miniconda3/envs/py27

  added / updated specs:
    - python=2.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py27_0         153 KB
    pip-19.3.1                 |           py27_0         1.7 MB
    python-2.7.18              |       h15b4118_1         9.9 MB
    setuptools-44.0.0          |           py27_0         512 KB
    wheel-0.33.6               |           py27_0          42 KB
    ------------------------------------------------------------
                                           Total:        12.2 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  ca-certificates    pkgs/main/linux-64::ca-certificates-2020.6.24-0
  ...
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3


Proceed ([y]/n)? y


Downloading and Extracting Packages
certifi-2019.11.28   | 153 KB    | ############################### | 100%
python-2.7.18        | 9.9 MB    | ############################### | 100%
pip-19.3.1           | 1.7 MB    | ############################### | 100%
setuptools-44.0.0    | 512 KB    | ############################### | 100%
wheel-0.33.6         | 42 KB     | ############################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py27
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) root@pyalgo:~#

Notice how the prompt changes to include (py27) after the activation of the environment.

(base) root@pyalgo:~# conda activate py27
(py27) root@pyalgo:~# pip install ipython
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020.
...
Executing transaction: done
(py27) root@pyalgo:~#

Finally, this allows to use IPython with Python 2.7 syntax.

(py27) root@pyalgo:~# ipython
Python 2.7.18 |Anaconda, Inc.| (default, Apr 23 2020, 22:42:48)
Type "copyright", "credits" or "license" for more information.

IPython 5.10.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: print "Hello Python for Algorithmic Trading World."
Hello Python for Algorithmic Trading World.

In [2]: exit
(py27) root@pyalgo:~#

As this example demonstrates, conda as a virtual environment manager allows to install different Python versions alongside each other. It also allows to install different versions of certain packages. The default Python installation is not influenced by such a procedure, nor are other environments which might exist on the same machine. All available environments can be shown via conda info --envs.

(py27) root@pyalgo:~# conda env list
# conda environments:
#
base                     /root/miniconda3
py27                  *  /root/miniconda3/envs/py27

(py27) root@pyalgo:~#

Sometimes, it is necessary to share environment information with others or to use environment information on multiple machines, for instance. To this end, one can export the installed packages list to a file with conda env export. However, this only works properly by default for the same operating system since the build versions are specified in the resulting yaml file. However, they can be deleted to only specify the package version via the --no-builds flag.

(py27) root@pyalgo:~# conda deactivate
(base) root@pyalgo:~# conda env export --no-builds > base.yml
(base) root@pyalgo:~# cat base.yml
name: base
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - backcall=0.2.0
  - blas=1.0
  - blosc=1.20.0
  ...
  - zlib=1.2.11
  - zstd=1.4.5
prefix: /root/miniconda3
(base) root@pyalgo:~#

Often, virtual environments, which are technically not that much more than a certain (sub-)folder structure, are created to do some quick tests.4 In such a case, an environment is easily removed (after deactivation) via conda env remove.

(base) root@pyalgo:~# conda env remove -n py27

Remove all packages in environment /root/miniconda3/envs/py27:

(base) root@pyalgo:~#

This concludes the overview of conda as a virtual environment manager.

Tip

conda not only helps with managing packages, but it is also a virtual environment manager for Python. It simplifies the creation of different Python environments, allowing to have multiple versions of Python and optional packages available on the same machine without influencing each other in any way. conda also allows to export environment information to easily replicate it on multiple machines or to share it with others.

Using Docker Containers

Docker containers have taken over the IT world by storm (see Docker). Although the technology is still relatively young, it has established itself as one of the benchmarks for the efficient development and deployment of almost any kind of software application.

For our purposes, it suffices to think of a Docker container as a separated (“containerized”) file system that includes an operating system (for example, Ubuntu 20.04 LTS for server), a (Python) runtime, additional system and development tools as well as further (Python) libraries and packages as needed. Such a Docker container might run on a local machine with Windows 10 Professional 64 Bit or on a cloud instance with a Linux operating system, for instance.

This section does go into the exciting details of Docker containers. It is rather a concise illustration of what the Docker technology can do in the context of Python deployment.5

Docker Images and Containers

However, before moving on to the illustration, two fundamental terms need to be distinguished when talking about Docker. The first is a Docker image which can be compared to a Python class. The second is a Docker container which can be compared to an instance of the respective Python class.

On a more technical level, you find the following definition for a Docker image in the Docker glossary:

Docker images are the basis of containers. An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes.

Similarly, you find the following definition for a Docker container in the Docker glossary which makes the analogy to Python classes and instances of such classes transparent:

A container is a runtime instance of a docker image.

A Docker container consists of

  • A Docker image

  • An execution environment

  • A standard set of instructions

The concept is borrowed from Shipping Containers, which define a standard to ship goods globally. Docker defines a standard to ship software.

Depending on the operating system, the installation of Docker is somewhat different. That is why this section does not go into the respective details. More information and further links are found on the Get Docker page.

Building a Ubuntu & Python Docker Image

This sub-section illustrates the building of a Docker image based on the latest version of Ubuntu that includes Miniconda as well as a few important Python packages. In addition, it also does some Linux housekeeping by updating the Linux packages index, upgrading packages if required and installing certain, additional system tools. To this end, two scripts are needed. One is a Bash script doing all the work on the Linux level.6 The other is a so-called Dockerfile which controls the building procedure for the image itself.

The Bash script in Example 2-1 which does the installing consists of three major parts. The first part handles the Linux housekeeping. The second part installs Miniconda while the third part installs optional Python packages. There are also more detailed comments inline.

Example 2-1. Script installing Python and optional packages
#!/bin/bash
#
# Script to Install
# Linux System Tools and
# Basic Python Components
#
# Python for Algorithmic Trading
# (c) Dr. Yves J. Hilpisch
# The Python Quants GmbH
#
# GENERAL LINUX
apt-get update  # updates the package index cache
apt-get upgrade -y  # updates packages
# installs system tools
apt-get install -y bzip2 gcc git  # system tools
apt-get install -y htop screen vim wget  # system tools
apt-get upgrade -y bash  # upgrades bash if necessary
apt-get clean  # cleans up the package index cache

# INSTALL MINICONDA
# downloads Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O Miniconda.sh
bash Miniconda.sh -b  # installs it
rm -rf Miniconda.sh  # removes the installer
export PATH="/root/miniconda3/bin:$PATH"  # prepends the new path

# INSTALL PYTHON LIBRARIES
conda install -y pandas  # installs pandas
conda install -y ipython  # installs IPython shell

# CUSTOMIZATION
cd /root/
wget http://hilpisch.com/.vimrc  # Vim configuration

The Dockerfile in Example 2-2 uses the Bash script in Example 2-1 to build a new Docker image. It also has its major parts commented inline.

Example 2-2. Dockerfile to build the image
#
# Building a Docker Image with
# the Latest Ubuntu Version and
# Basic Python Install
#
# Python for Algorithmic Trading
# (c) Dr. Yves J. Hilpisch
# The Python Quants GmbH
#

# latest Ubuntu version
FROM ubuntu:latest

# information about maintainer
MAINTAINER yves

# add the bash script
ADD install.sh /
# change rights for the script
RUN chmod u+x /install.sh
# run the bash script
RUN /install.sh
# prepend the new path
ENV PATH /root/miniconda3/bin:$PATH

# execute IPython when container is run
CMD ["ipython"]

If these two files are in a single folder and Docker is installed, then the building of the new Docker image is straightforward. Here, the tag pyalgo:basic is used for the image. This tag is needed to reference the image, for example, when running a container based on it.

(base) pro:Docker yves$ docker build -t pyalgo:basic .
Sending build context to Docker daemon  4.096kB
Step 1/7 : FROM ubuntu:latest
 ---> 4e2eef94cd6b
Step 2/7 : MAINTAINER yves
 ---> Running in 859db5550d82
Removing intermediate container 859db5550d82
 ---> 40adf11b689f
Step 3/7 : ADD install.sh /
 ---> 34cd9dc267e0
Step 4/7 : RUN chmod u+x /install.sh
 ---> Running in 08ce2f46541b
Removing intermediate container 08ce2f46541b
 ---> 88c0adc82cb0
Step 5/7 : RUN /install.sh
 ---> Running in 112e70510c5b
...
Removing intermediate container 112e70510c5b
 ---> 314dc8ec5b48
Step 6/7 : ENV PATH /root/miniconda3/bin:$PATH
 ---> Running in 82497aea20bd
Removing intermediate container 82497aea20bd
 ---> 5364f494f4b4
Step 7/7 : CMD ["ipython"]
 ---> Running in ff434d5a3c1b
Removing intermediate container ff434d5a3c1b
 ---> a0bb86daf9ad
Successfully built a0bb86daf9ad
Successfully tagged pyalgo:basic
(base) pro:Docker yves$

Existing Docker images can be listed via docker images. The new image should be on top of the list.

(base) pro:Docker yves$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
pyalgo              basic               a0bb86daf9ad        2 minutes ago       1.79GB
ubuntu              latest              4e2eef94cd6b        5 days ago          73.9MB
(base) pro:Docker yves$

Having built the pyalgo:basic image successfully allows to run a respective Docker container with docker run. The parameter combination -ti is needed for interactive processes running within a Docker container, like a shell process of IPython (see the Docker Run Reference page).

(base) pro:Docker yves$ docker run -ti pyalgo:basic
Python 3.8.3 (default, May 19 2020, 18:47:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: np.random.seed(100)

In [3]: a = np.random.standard_normal((5, 3))

In [4]: import pandas as pd

In [5]: df = pd.DataFrame(a, columns=['a', 'b', 'c'])

In [6]: df
Out[6]:
          a         b         c
0 -1.749765  0.342680  1.153036
1 -0.252436  0.981321  0.514219
2  0.221180 -1.070043 -0.189496
3  0.255001 -0.458027  0.435163
4 -0.583595  0.816847  0.672721

In [7]:

Exiting IPython will exit the container as well since it is the only application running within the container. However, you can detach from a container via

Ctrl+p --> Ctrl+q

After having detached from the container, the docker ps command shows the running container (and maybe other currently running containers):

(base) pro:Docker yves$ docker ps
CONTAINER ID  IMAGE         COMMAND     CREATED       ...    NAMES
e93c4cbd8ea8  pyalgo:basic  "ipython"   About a minute ago   jolly_rubin
(base) pro:Docker yves$

Attaching to the Docker container is accomplished by docker attach $CONTAINER_ID (notice that a few letters of the CONTAINER ID are enough):

(base) pro:Docker yves$ docker attach e93c
In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       5 non-null      float64
 1   b       5 non-null      float64
 2   c       5 non-null      float64
dtypes: float64(3)
memory usage: 248.0 bytes

In [8]:

The exit command terminates IPython and therewith stops the Docker container as well. It can be removed by docker rm.

In [8]: exit
(base) pro:Docker yves$ docker rm e93c
e93c
(base) pro:Docker yves$

Similarly, the Docker image pyalgo:basic can be removed via docker rmi if not needed any longer. While containers are relatively light weight, single images might consume quite a bit of storage. In the case of the pyalgo:basic image, the size is close to 2 GB. That is why you might want to regularly clean up the list of Docker images.

(base) pro:Docker yves$ docker rmi a0bb86
Untagged: pyalgo:basic
Deleted: sha256:a0bb86daf9adfd0ddf65312ce6c1b068100448152f2ced5d0b9b5adef5788d88
...
Deleted: sha256:40adf11b689fc778297c36d4b232c59fedda8c631b4271672cc86f505710502d
(base) pro:Docker yves$

Of course, there is much more to say about Docker containers and their benefits in certain application scenarios. For the purposes of this book, they provide a modern approach to deploying Python, to do Python development in a completely separated (containerized) environment, and to ship codes for algorithmic trading.

Tip

If you are not yet using Docker containers, you should consider starting to use them. They provide a number of benefits when it comes to Python deployment and development efforts, not only when working locally but in particular when working with remote cloud instances and servers deploying code for algorithmic trading.

Using Cloud Instances

This section shows how to set up a full-fledged Python infrastructure on a DigitalOcean cloud instance. There are many other cloud providers out there, among them Amazon Web Services (AWS) as the leading provider. However, DigitalOcean is well known for its simplicity and also its relatively low rates for smaller cloud instances, which they call Droplet. The smallest Droplet, which is generally sufficient for exploration and development purposes, only costs 5 USD per month or 0.007 USD per hour. Usage is charged by the hour so that one can easily spin up a Droplet for 2 hours, say, destroy it afterwards and get charged just 0.014 USD.7

The goal of this section is to set up a Droplet on DigitalOcean that has a Python 3.8 installation plus typically needed packages (such as NumPy and pandas) in combination with a password-protected and Secure Sockets Layer (SSL)-encrypted Jupyter Lab server installation.8 As a web-based tool suite, Jupyter Lab provides several tools that can be used via a regular browser:

  • Jupyter Notebook: This is one of the most popular — if not the most popular — browser-based, interactive development environment that features a selection of different language kernels like Python, R and Julia.

  • Python console: This is an IPython-based console that has a graphical user interface different from the look and feel of the standard, terminal-based implementation.

  • Terminal: A system shell implementation accessible via the browser which allows for all typical system administration tasks but also for usage of such helpful tools like Vim for code editing or git for version control.

  • Editor: Another major tool is a browser-based text file editor with syntax highlighting for many different programming languages and file types as well as typical text/code editing capabilities.

  • File manager: Jupyter Lab also provides a full-fledged file manager that allows for typical file operations, such as uploading, downloading, renaming, and so on.

Having Jupyter Lab installed on a Droplet allows to do Python development and deployment via the browser, circumventing the need to log in to the cloud instance via Secure Shell (SSH) access.

To accomplish the goal of this section, several scripts are needed.

  • Server set-up script: This script orchestrates all steps necessary, like, for instance, copying other files to the Droplet and running them on the Droplet.

  • Python and Jupyter installation script: This script installs Python, additional packages, Jupyter Lab and starts the Jupyter Lab server

  • Jupyter Notebook configuration file: this file is for the configuration of the Jupyter Lab server, for example, with regard to password protection.

  • RSA public and private key files: These two files are needed for the SSL encryption of the communication with the Jupyter Lab server.

In what follows, the section works backwards through this list of files — since the set-up script is executed first but the other files need to have been created beforehand.

RSA Public and Private Keys

In order to accomplish a secure connection to the Jupyter Lab server via an arbitrary browser, a SSL certificate consisting of RSA public and private keys (see RSA Wikipedia page) is needed. In general, one would expect that such a certificate comes from a so-called Certificate Authority (CA). For the purposes of this book, however, a self-generated certificate is “good enough”.9 A popular tool to generate RSA key pairs is OpenSSL. The brief interactive session to follow generates a certificate appropriate for use with a Jupyter Lab server (see Running a notebook server).

(base) pro:cloud yves$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 
> -keyout mykey.key -out mycert.pem
Generating a RSA private key
.........++++
egin{equation}+
..........end{equation}
+++++
writing new private key to 'mykey.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:DE
State or Province Name (full name) [Some-State]:Saarland
Locality Name (eg, city) []:Voelklingen
Organization Name (eg, company) [Internet Widgits Pty Ltd]:TPQ GmbH
Organizational Unit Name (eg, section) []:Algorithmic Trading
Common Name (e.g. server FQDN or YOUR name) []:Jupyter Lab
Email Address []:[email protected]
(base) pro:cloud yves$

The two files mykey.key and mycert.pem need to be copied to the Droplet and need to be referenced by the Jupyter Notebook configuration file. This file is presented next.

Jupyter Notebook Configuration File

A public Jupyter Lab server can be deployed securely as explained on the Running a notebook server. Among others, Jupyter Lab shall be password protected. To this end, there is a password hash code-generating function called passwd() available in the notebook.auth sub-package. The code below generates a password hash code with jupyter being the password itself.

In [1]: from notebook.auth import passwd

In [2]: passwd('jupyter')
Out[2]: 'sha1:da3a3dfc0445:052235bb76e56450b38d27e41a85a136c3bf9cd7'

In [3]: exit

This hash code needs to be placed in the Jupyter Notebook configuration file as presented in Example 2-3. The configuration file assumes that the RSA key files have been copied on the Droplet to the /root/.jupyter/ folder.

Example 2-3. Jupyter Notebook configuration file
#
# Jupyter Notebook Configuration File
#
# Python for Algorithmic Trading
# (c) Dr. Yves J. Hilpisch
# The Python Quants GmbH
#
# SSL ENCRYPTION
# replace the following file names (and files used) by your choice/files
c.NotebookApp.certfile = u'/root/.jupyter/mycert.pem'
c.NotebookApp.keyfile = u'/root/.jupyter/mykey.key'

# IP ADDRESS AND PORT
# set ip to '*' to bind on all IP addresses of the cloud instance
c.NotebookApp.ip = '0.0.0.0'
# it is a good idea to set a known, fixed default port for server access
c.NotebookApp.port = 8888

# PASSWORD PROTECTION
# here: 'jupyter' as password
# replace the hash code with the one for your password
c.NotebookApp.password = 
	'sha1:da3a3dfc0445:052235bb76e56450b38d27e41a85a136c3bf9cd7'

# NO BROWSER OPTION
# prevent Jupyter from trying to open a browser
c.NotebookApp.open_browser = False

# ROOT ACCESS
# allow Jupyter to run from root user
c.NotebookApp.allow_root = True
Caution

Deploying Jupyter Lab in the cloud leads to a number of security issues since it is a full-fledged development environment accessible via a web browser. It is therefore of paramount importance to use the security measures that a Jupyter Lab server provides by default, like password protection and SSL encryption. But this is just the beginning and further security measures might be advised depending on what exactly is done on the cloud instance.

The next step is to make sure that Python and Jupyter Lab get installed on the Droplet.

Installation Script for Python and Jupyter Lab

The bash script to install Python and Jupyter Lab is similar to the one presented in section “Using Docker Containers” to install Python via Miniconda in a Docker container. However, the script here needs to start the Jupyter Lab server as well. All major parts and lines of code are commented inline.

Example 2-4. Bash script to install Python and to run the Jupyter Notebook server
#!/bin/bash
#
# Script to Install
# Linux System Tools and Basic Python Components
# as well as to
# Start Jupyter Lab Server
#
# Python for Algorithmic Trading
# (c) Dr. Yves J. Hilpisch
# The Python Quants GmbH
#
# GENERAL LINUX
apt-get update  # updates the package index cache
apt-get upgrade -y  # updates packages
# install system tools
apt-get install -y build-essential git  # system tools
apt-get install -y screen htop vim wget  # system tools
apt-get upgrade -y bash  # upgrades bash if necessary
apt-get clean  # cleans up the package index cache

# INSTALLING MINICONDA
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 
		-O Miniconda.sh
bash Miniconda.sh -b  # installs Miniconda
rm -rf Miniconda.sh  # removes the installer
# prepends the new path for current session
export PATH="/root/miniconda3/bin:$PATH"
# prepends the new path in the shell configuration
cat >> ~/.profile <<EOF
export PATH="/root/miniconda3/bin:$PATH"
EOF

# INSTALLING PYTHON LIBRARIES
conda install -y jupyter  # interactive data analytics in the browser
conda install -y jupyterlab  # Jupyter Lab environment
conda install -y numpy  #  numerical computing package
conda install -y pytables  # wrapper for HDF5 binary storage
conda install -y pandas  #  data analysis package
conda install -y scipy  #  scientific computations package
conda install -y matplotlib  # standard plotting library
conda install -y seaborn  # statistical plotting library
conda install -y quandl  # wrapper for Quandl data API
conda install -y scikit-learn  # machine learning library
conda install -y openpyxl  # package for Excel interaction
conda install -y xlrd xlwt  # packages for Excel interaction
conda install -y pyyaml  # package to manage yaml files

pip install --upgrade pip  # upgrading the package manager
pip install q  # logging and debugging
pip install plotly  # interactive D3.js plots
pip install cufflinks  # combining plotly with pandas
pip install tensorflow  # deep learning library
pip install keras  # deep learning library
pip install eikon  # Python wrapper for the Refinitiv Eikon Data API
# Python wrapper for Oanda API
pip install git+git://github.com/yhilpisch/tpqoa

# COPYING FILES AND CREATING DIRECTORIES
mkdir -p /root/.jupyter/custom
wget http://hilpisch.com/custom.css
mv custom.css /root/.jupyter/custom
mv /root/jupyter_notebook_config.py /root/.jupyter/
mv /root/mycert.pem /root/.jupyter
mv /root/mykey.key /root/.jupyter
mkdir /root/notebook
cd /root/notebook

# STARTING JUPYTER LAB
jupyter lab &

This script needs to be copied to the Droplet and needs to be started by the orchestration script as described in the next sub-section.

Script to Orchestrate the Droplet Set-up

The second bash script which sets up the Droplet is the shortest one. It mainly copies all the other files to the Droplet for which the respective IP address is expected as a parameter. In the final line, it starts the install.sh bash script which in turn does the installation itself and starts the Jupyter Lab server.

Example 2-5. Bash script to setup the Droplet
#!/bin/bash
#
# Setting up a DigitalOcean Droplet
# with Basic Python Stack
# and Jupyter Notebook
#
# Python for Algorithmic Trading
# (c) Dr Yves J Hilpisch
# The Python Quants GmbH
#

# IP ADDRESS FROM PARAMETER
MASTER_IP=$1

# COPYING THE FILES
scp install.sh root@${MASTER_IP}:
scp mycert.pem mykey.key jupyter_notebook_config.py root@${MASTER_IP}:

# EXECUTING THE INSTALLATION SCRIPT
ssh root@${MASTER_IP} bash /root/install.sh

Everything now is together to give the set-up code a try. On DigitalOcean, create a new Droplet with options similar to these:

  • Operating system: Ubuntu 20.04 LTS x64 (the newest version available at the time of this writing)

  • Size: 2 core, 2GB, 60GB SSD (standard Droplet)

  • data center region: Frankfurt (since your author lives in Germany)

  • SSH key: Add a (new) SSH key for password-less login.10

  • Droplet name: You can go with the pre-specified name or you can choose something like pyalgo.

Finally, clicking on the Create button initiates the Droplet creation process which generally takes about one minute. The major outcome for proceeding with the set-up procedure is the IP address which might be, for instance, 134.122.74.144 when you have chosen Frankfurt as your data center location. Setting up the Droplet now is as easy as follows:

(base) pro:cloud yves$ bash setup.sh 134.122.74.144

The resulting process, however, might take a couple of minutes. It is finished when there is a message from the Jupyter Lab server saying something like:

[I 12:02:50.190 LabApp] Serving notebooks from local directory: /root/notebook
[I 12:02:50.190 LabApp] Jupyter Notebook 6.1.1 is running at:
[I 12:02:50.190 LabApp] https://pyalgo:8888/

In any current browser, visiting the following address accesses the running Jupyter Notebook server (note the https protocol):

https://134.122.74.144:8888

After maybe adding a security exception, the Jupyter Notebook login screen prompting for a password (in our case jupyter) should appear. Everything is now ready to start Python development in the browser via Jupyter Lab, via the IPython-based console, via a terminal window or the text file editor. Other file management capabilities like file upload, deletion of files or creation of folders are also available.

Tip

Cloud instances, like those from DigitalOcean, and Jupyter Lab (powered by the Jupyter Notebook server) are a powerful combination for the Python developer and algorithmic trading practitioner to work on and make use of professional compute and storage infrastructure. Professional cloud and data center providers make sure that your (virtual) machines are physically secure and highly available. Using cloud instances also keeps the exploration and development phase at rather low costs since usage generally gets charged by the hour without the need to enter long term agreements.

Conclusions

Python is the programming language and technology platform of choice, not only for this book but for almost every leading financial institution. However, Python deployment can be tricky at best and sometimes even tedious and nerve wrecking. Fortunately, technologies are available today — all in general younger than ten years — that help with the deployment issue. The open source software conda helps with both Python package and virtual environment management. Docker containers go even further in that complete file systems and runtime environments can be easily created in a technically shielded “sandbox”, that is the container. Going even one step further, cloud providers like DigitalOcean offer compute and storage capacity in professionally managed and secured data centers within minutes and billed by the hour. This in combination with a Python 3.8 installation and a secure Jupyter Notebook/Lab server installation provides a professional environment for Python development and deployment in the context of Python for algorithmic trading projects.

Further Resources

For Python package management, consult the following resources:

For virtual environment management, consult these resources:

Information about Docker containers is found, among others, here:

  • Docker home page

  • Matthias, Karl and Sean Kane (2018): Docker: Up and Running. 2nd ed., O’Reilly, Beijing et al.

Robbins (2016) provides a concise introduction to and overview of the Bash scripting language.

  • Robbins, Arnold (2016): Bash Pocket Reference. 2nd ed., O’Reilly, Beijing et al.

How to run a public Jupyter Notebook/Lab server securely is explained under Running a notebook server. There is also JupyterHub available which allows the management of multiple users for a Jupyter Notebook server — see JupyterHub.

To sign up on DigitalOcean with a 10 USD starting balance in your new account visit the page http://bit.ly/do_sign_up. This pays for two months of usage for the smallest Droplet.

1 A recent project called pipenv combines the capabilities of the package manager pip with those of the virual environment manager virtualenv. See https://github.com/pypa/pipenv.

2 On Windows, you can also run the exact same commands in a Docker container (see https://docs.docker.com/docker-for-windows/install/). Working on Windows directly requires some adjustments. See, for example, the book Matthias and Kane (2018) for further details on Docker usage.

3 Installing the meta package nomkl, such as in conda install numpy nomkl, avoids the automatic installation and usage of mkl and related other packages.

4 In the official documentation you find the following explanation: “Python Virtual Environments allow Python packages to be installed in an isolated location for a particular application, rather than being installed globally.” See the Creating Virtual Environments page.

5 See the book Matthias and Kane (2018) for a comprehensive introduction to the Docker technology.

6 Consult the book by Robbins (2016) for a concise introduction to and a quick overview of Bash scripting. Also see see GNU Bash.

7 For those who do not have an account with a cloud provider yet, on this page http://bit.ly/do_sign_up new users get a starting credit of 10 USD for DigitalOcean.

8 Technically, Jupyter Lab is an extension of Jupyter Notebook. Both expressions are, however, sometimes used interchangebly.

9 With such a self-generated certificate you might need to add a security exception when prompted by the browser. On Mac OS you might even explicitely register the certificate as trustworthy.

10 If you need assistance, visit either How To Use SSH Keys with DigitalOcean Droplets or How To Use SSH Keys with PuTTY on DigitalOcean Droplets (Windows users).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.158.184