Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6. The Module: Organization of Program Parts into a Unit

Gabor Guta¹

(1)

Budapest, Hungary

... reuse components that are already available, compose applications from big chunks of premade libraries, glue them together, and make sure it works, even without fully understanding how. Although many would reject this point of view, it is the de facto style, mostly unconsciously, behind today’s biggest software projects.

Jaroslav Tulach

When developing computer programs, perhaps the most important question is how to organize your program into logical units. Two of the three most important constructions supporting this goal, namely, functions and classes, have already been discussed. What has not been discussed yet is the next organizational unit above the class, the module. The related variable names, functions, and classes are usually organized into a module. In this chapter, we will discuss the concepts of modules and packages, how they can be imported, the built-in and third-party packages, how packages can be created, and what kind of tools can help to make packages high quality.

Built-in Modules

Python comes with more than 200 built-in modules, including everything from specialized data structures to relational database management functionality. This is also the reason behind one of the slogans of Python, namely, “batteries included.”

You reference a module by using the import keyword and specifying the name of the module. The module containing the date type is imported on line 1 of Listing 6-1 and used on line 2. Modules are also objects, so, for example, classes defined in them can be accessed by putting a dot between the module name and the class name.

import datetime

datetime.date(2020,2,2).strftime('%Y.%m.%d.')

Listing 6-1

Importing a Module

When a module is frequently used and its name is lengthy, a shorter name can be defined after the as keyword. As shown in Listing 6-2, the example module name is shortened to dt.

import datetime as dt

dt.date(2020,2,2).strftime('%Y.%m.%d.')

Listing 6-2

Importing a Module with a New Name

Variable names , functions, and classes can be imported selectively from a module. It is also possible in this case to assign another name to the imported object, as shown in Listing 6-3 and Listing 6-4.

from datetime import date

date(2020,2,2).strftime('%Y.%m.%d.')

Listing 6-3

Importing an Object from a Module

from datetime import date as Date

Date(2020,2,2).strftime('%Y.%m.%d.')

Listing 6-4

Importing an Object from a Module with a New Name

During the import, even multiple names of classes, functions, or variable names can be specified after the from keyword in a list. The difference between dates is a time difference object. Listing 6-5 shows how to test whether the difference of two dates is more than 30 days.

from datetime import date, timedelta

date(2020,2,2)-date(2020,1,1) > timedelta(days=30)

Listing 6-5

Operations with Date Type

The float type used so far is not suitable to store financial data, as it performs rounding of the decimal places based on standards usual in mathematics. This is the reason why only an integer type is used for this purpose so far. Therefore, to store financial data, the decimal package is recommended. This package defines the decimal type that can be used, as shown in Listing 6-6. Decimal numbers are specified generally as strings, and the value of the decimal object will match exactly with the number described by the string. Listing 6-7 compares the float type and decimal type. The first value will be a number, which approximates 1.100000000000000088, while the second one will be exactly 1.1.

from decimal import Decimal

VALUE_DEC = Decimal('9.45')

Listing 6-6

Importing Decimal Types

FLOAT_NUM = 1.1

FINANCIAL_NUM = Decimal('1.1')

print(f'{FLOAT_NUM:.50f}, {FINANCIAL_NUM:.50f},')

Listing 6-7

Comparing the Precision of Number Types

The result of an operation between two decimals (i.e. how the result is rounded and the number of its stored digits) depends on the environment of the calculation. In Listing 6-8, the environment of the calculation is accessed by the getcontext() function . The listing shows how to use the two instance variables from among the numerous settings: the prec variable specifies the number of stored digits (before and after the decimal point together), and the rounding variable controls the rounding rules (to apply the rules used in standard rounding or in banker’s rounding, the ROUND_HALF_UP or ROUND_HALF_EVEN value has to be set, respectively). These settings affect operations only on decimal numbers. During the operations it can happen that the decimal digits produced by the calculation exceed the precision of the original numbers. This can be restored to the accuracy specified in the parameter by the quantize() method.

from decimal import getcontext, ROUND_HALF_UP

getcontext().rounding = ROUND_HALF_UP

getcontext().prec=28

PRICE=Decimal('33')

VAT=Decimal('1.1')

total=(PRICE*VAT).quantize(Decimal('0.01'))

print(f'{total:.20f}')

Listing 6-8

Operations with Decimal Type

The other important module—containing various extra data structures—is a collection. The deque type—a double-ended queue—is imported in Listing 6-9. This is a list type optimized to be manipulated from both sides (beginning and end) by adding new elements or removing elements. In line 1 of Listing 6-10, a deque with four elements is assigned to the variable name index_numbers . A new element is appended to this list in line 2 in the usual way. Then the first element, whose value is 1 in this example, from the beginning of the index_numbers is removed in the last line.

from collections import deque

Listing 6-9

Importing the Deque Type

index_numbers = deque((1, 2, 3, 4))

index_numbers.append(5)

index_numbers.popleft()

Listing 6-10

Operations with the Deque Type

Python searches for the module to be imported first among the built-in modules. If it cannot be found here, try to load it from the location listed in the path variable of the sys module. (Its value can be checked by printing the value of the path variable name after executing the from sys import path statement.) These are as follows: the current directory (i.e., the directory from which your Python program is started); the directories in the environmental variable PYTHONPATH, which was set by the operating system; and the directories specified during the installation.

Defining Modules

It is simple to prepare your own module in Python as the file containing the source code is considered a module by default. Importing a module in Python means that the source code of that module is executed. To turn a stand-alone Python script into a reusable module, you must make its functionalities accessible through functions or classes. Additionally, the statements that are not intended to be executed when the file is used as a module must be guarded by an if statement. The conditional expression of this if statement is typically __name__=='__main__'. The exact meaning of this if statement is as follows: if the file is imported as a module, the name of the module is assigned to the __name__ variable name, while if the file is executed directly as a script, its value is the __main__ string.

The upcoming listings contain only the most relevant fragments of the files from this point. Listing 6-11 references the classes associated with the Order class (in Listings 3-7, 3-13, 3-17, and 3-20) organized into a file named model.py . You can download the complete file of this module. The first line of the fragment is the condition, which is needed to ensure the rest of the code runs only when launched as an independent program.

if __name__=='__main__':

customer = Customer('X Y',

'[email protected]',

'1/1234567',

Address('1011', 'Budapest',

'Wombat street 1st', 'HUNGARY'))

products = [

Order.Item(Product('A', 'cubE', 1), 2),

Order.Item(Product('B', 'cubF', 3), 5)

]

order = Order(products, customer)

print(order)

Listing 6-11

Fragment of the model.py File

Modules can be run from a command line by specifying the filename after the Python command. If your newly generated file is run with the python model.py command , the defined Order type object will appear on the screen.

Note

In this chapter, some of the examples do not consist of source code written in the Python language but commands writable to the operating system command prompt or shell. We covered how to access the command line on a particular operating system at the end of the Introduction chapter.

Commands that need to run Python may be different depending on the operating system and the environment. After installation, under a Windows OS, the py-3.10 command can be used instead of the python command, while under macOS and Linux the python3.10 command has to be issued.

Packages

Packages are modules containing other modules. They can be used to organize the modules into further units. One package is usually one directory with modules with additionally a file named __init__.py in it. This package can also be placed into another package, and this can be repeated arbitrarily. If this package has to be executable directly, a __main__.py file can be placed in the directory that will contain the code to be executed in such a case only.

A model.py file can be created from the class definitions in Listings 3-7, 3-13, 3-17, 3-20, and 6-11. As an example, a package can be built by creating a registry directory and copying the model.py file into this directory. An empty __init__.py file must be created in this directory too, which can be used in the future to add documentation and statements to be executed when loading the package. The model module from this newly constructed package can be imported with the import registry.model statement.

Future Package

The Python language has a statement that can switch on and off new language additions or change their behavior. This statement is the from __future__ import followed by the name of the feature and can be present only at the beginning of the source file. This statement is a different statement than the earlier reviewed import statement. For compatibility reasons, the __future__ package exists and can be used with other forms of import statements, but this is not to be confused with the previous statement.

Since version 3.7, the only active feature that can be turned on is the delayed evaluation of the type annotations, and the name of this feature is annotations (see PEP563; type annotations will be discussed in Appendix C). This functionality is turned on by default starting from version 3.11, and in later versions this statement will not change the behavior of the language anymore.

Package Management

The Python environment supports managing third-party packages with a package managing tool named pip. This package manager is able to download versions of the package together with their dependencies from the Python Package Index and make it available to your machine.

Listing 6-12 shows the most important package management commands.

python -m pip list

python -m pip list --outdated

python -m pip search requests

python -m pip install requests

python -m pip install requests ==2.20

python -m pip install requests --upgrade

python -m pip show requests

python -m pip freeze > requirements.txt

python -m pip install -r requirements.txt

Listing 6-12

Package Management Commands

The first two commands list all the installed packages and all packages having a more up-to-date version than the one installed. The command in line 3 lists packages from the Python Package Index that match the requested word. Lines 4 and 5 show how simple it is to install a package (in the second case, a version number is also specified; a relation sign can also be used here to express the required package version more loosely). The command in line 6 shows information about the installed package, such as the list of packages this one depends on. The last two lines show how to save the list of the installed packages into a file and how to install packages based on a dependency file.

Useful Third-Party Packages

Two scenarios of using third-party packages will be presented in this section. In the first scenario, a web page is downloaded, and information is extracted from the downloaded page. In another scenario, an Excel table is processed.

The package requests will download the web page, while the HTML processing will be carried out with the bs4 package. In the other scenario, the pandas package will load an Excel table and answer queries about it. This package can also be connected to databases and import data from other data formats. Listing 6-13 shows how to install the corresponding packages for the two scenarios.

python -m pip install requests

python -m pip install beautifulsoup4

python -m pip install pandas

python -m pip install openpyxl

Listing 6-13

Installation of Third-Party Packages

Note

The commands needed to install the package may depend on the operating system and the environment. If you have installed the default environment described in the introduction, these commands are as follows: in the case of Windows 10, replace the python -m pip part at the beginning of the commands with py -3.10 -m pip; in the case of macOS and Linux, replace the python -m pip part at the beginning of the commands with sudo python3.10 -m pip or python3.10 -m pip --user.

The first step of the first scenario is to import the packages as shown in Listing 6-14. The next step is to download the web page, as shown in Listing 6-15, followed by printing the response code of the download request, type of the content, and format of the text coding. Then the downloaded web page is processed. Listing 6-16 and Listing 6-17 show how the header element of the processed web page and the text of the header element can be accessed, respectively.

import requests

from bs4 import BeautifulSoup

Listing 6-14

Importing Requests and bs4 Packages

APRESS = 'https://apress.github.io'

Q = APRESS + 'pragmatic-python-programming/quotes.html'

r = requests.get(Q, timeout=1)

print(r.status_code, r.headers['content-type'],

r.encoding)

site = BeautifulSoup(r.text, 'html.parser')

Listing 6-15

Downloading a Website

site.head.title

Listing 6-16

Header Element of the Web Page

site.head.title.text

Listing 6-17

Header Test of the Web Page

Listing 6-18 shows fragments of the data obtained from the website. Listing 6-19 shows the data being processed. This is implemented by iterating through all the tr elements; the class is book, and the text part of the two td elements are printed during this procedure.

<td class="auth">Donald E. Knuth</td>

<td class="title">TAOCP</td>

</tr>

Listing 6-18

Fragment of the Web Page

for row in site.find_all('tr',

class_='book'):

cells = row.find_all('td')

print(cells[0].text, ': ',

cells[1].text, sep='')

Listing 6-19

Extracting Data from the Body of the Web Page

In the second scenario, the pandas package is imported, and the Excel table is loaded according to Listing 6-20. To display the loaded table, the orders variable is printed.

import pandas as pd

orders = pd.read_excel('orders.xlsx',

index_col=0)

Listing 6-20

Importing the pandas Package

On the loaded tables, different kinds of queries can be executed. The table is sorted according to column 1 in Listing 6-21. In turn, values of the orders are grouped by the customer ID in Listing 6-22.

orders.sort_values(by='Order value')

Listing 6-21

Sorting the Table by Order Value

orders.groupby('Customer id').sum()

Listing 6-22

Grouping the Value of the Orders by Customer ID

Modules in Practice

Modules are the highest-level organizational unit in the Python language . For the design of the modules, it is important that the related definitions are located in one module following some organizational principle. It is important for the organizational intent (frequently called the responsibility of the module) to be documented as well. In Listing 6-23, the beginning and end fragments of the models.py file are shown. At the beginning the module, the documentation includes short and long descriptions of the module and then presents the use of the module. At the end of the module, there is the idiom-like structure, which runs only when the module is directly launched. This section is usually applied when you want your module to work also as an independent command-line program. This example also contains the version number of the module, assigned to the __version__ variable name.

"""Model of the order management

The domain model of order management system is

modeled by these classes. They can be used

to represent an actual order.

Example usage:

product = Product('T1', 'A product', 100)

product.reduce_price(10)

"""

__version__ = '1.0.0'

...

if __name__=='__main__':

...

Listing 6-23

Fragment of the Model Module

Modules are frequently written to be reusable, and it’s helpful when the functionality of the module can be accessed via a class providing a simplified interaction. This is called a facade designing pattern , and it has two benefits: the module does not have to know the exact internal structure of the module, and using the module takes place on a well-specified “narrow” surface. Therefore, in the case of an internal change, other modules using this one would not need to be changed. Developing an easily reusable module can be even three times more effort than developing a module for only a single use.

Advanced Concepts

This section describes some technical details in reference manual style and some advanced concepts that may need more technical background.

Structure of Python Projects

Several recommendations exist for the structure of Python projects , which differ in detail like the format used to store dependencies or the package description (often named README) file. The recommended location of the source files is the src directory. Depending on whether the program contains one or more files, the src directory contains a single Python source file named identically with the package name or a directory named identically to the package name. In addition, it usually includes a tests directory for the tests and a docs directory of the documentation. In addition, the project usually includes a LICENSE file containing a description of the license and/or a package description file. This file is named README.md or README.rst depending on whether markdown or reStructuredText is chosen as a format, respectively. In the simplest case, the dependencies of our module on third-party packages are stored in a requirements.txt file. If you want to share/publish your module, you will also need a setup.py file, pyproject.toml file, or other files that can also substitute the function of requirements.txt as well.

If you want the Python package to be available for others, you can prepare a compressed file from it suitable for binary distribution. This file can be shared manually or can be uploaded to a package index server. This server can be the default pypi.org or any other repository server. Packages can be configured classically with a setup.py file, which stores the package information (such as version, author, dependencies, license, and the like) programmatically. New versions of tools support to substitute the setup.py file with a configuration file, which is named pyproject.toml and contains the necessary information to describe the package.

Listing 6-24 shows the content of the setup.py file . If this file is executed with the python setup.py bdist command, it will generate a compressed package file in the build directory. For example, this file would be named registry-1.0.win-amd64.zip.

from setuptools import setup

setup(name='registry',

version='1.0.0',

description='Order Management System',

author='Gabor Guta, PhD',

author_email='[email protected]',

license='GPL',

packages=['registry'],

package_dir={'':'src'},

python_requires='>=3.10')

Listing 6-24

The setup.py File

To transition to the new approach, you need the install the build package and update the setuptools package . This can be achieved with the pip install --upgrade setuptools build command. The contents of pyproject.toml and setup.cfg are shown in Listing 6-25 and Listing 6-26, respectively. The command python -m build can be used to generate the registry-1.0.0-py2.py3-none-any.whl and registry-1.0.0.tar.gz compressed package files.

[build-system]

requires = ["setuptools"]

build-backend = "setuptools.build_meta"

Listing 6-25

The pyproject.toml File

[metadata]

name = registry

version = 1.0.0

description = Order Management System

author = Gabor Guta, PhD

author_email = [email protected]

license = GPL

[options]

package_dir =

= src

packages = find:

python_requires = >=3.10

[options.packages.find]

where=src

Listing 6-26

The setup.cfg File

Virtual Environments

The virtual environment can be useful when the exact reproducibility of the environment is important or you have to use various Python versions and package versions in parallel. The virtual environment can be created by the python -m venv ENVIRONMENT_NAME command, where ENVIRONMENT_NAME is the name of the environment to be created. The environment will be created in a directory named the same as the name specified in the command. The directory will contain a pyvenv.cfg configuration file; an include directory for the C header files; a lib directory, which contains the site-packages directory for the third-party packages; and finally, a bin or Scripts directory—depending on whether the installation is under Linux or Windows—containing program files of the Python environment . The environment can be activated with the source ENVIRONMENT_NAME/bin/activate command on Linux, while the same can be accomplished by the ENVIRONMENT_NAMEscriptactivate.bat command on Windows 10. The environment can be switched off by the deactivate command. (The macOS commands are identical to the commands used for Linux.)

Other tools are also available to manage the virtual environments. The most popular alternatives for the built-in tools are the pipenv and poetry tools.

Tools for Testing

Python provides built-in packages to make testing easier. The most important package is the unittest package , which supports the automatic testing of functions and classes. A test case class typically contains test methods, which exercise functionalities and verify that the functionality worked as expected. Often special setUp and tearDown methods are used to prepare the test method and clean up the environment after the execution of the method, respectively. The verification of the functionality is typically realized by calling assert methods to compare actual results to the expected results. Test cases can be organized into test suites.

Listing 6-27 shows a test class to test the Product class . The first method instantiates a Product object. This method is called each time before the execution of other test methods. The second method is the first actual test: it verifies that the attributes of the tested object are initialized correctly. The third function calls a method of the object and verifies that the price and old_price data attributes have changed as expected. Finally, the last method verifies that the method terminates with an exception in the case of an invalid parameter value.

from unittest import TestCase

class TestProduct(TestCase):

def setUp(self):

self.prod = Product('K01', 'Standard Cube', 1000)

def test_product_creation(self):

self.assertEqual(self.prod.code, 'K01')

self.assertEqual(self.prod.name, 'Standard Cube')

self.assertEqual(self.prod.price, 1000)

self.assertEqual(self.prod.old_price, 1000)

def test_price_reduction(self):

self.prod.reduce_price(10)

self.assertEqual(self.prod.price, 900)

self.assertEqual(self.prod.old_price, 1000)

def test_invalid_input(self):

with self.assertRaises(TypeError):

self.prod.reduce_price("A")

Listing 6-27

Unit Test of the Product Class

If you are interested in testing, two topics may be worth further study:

How can functionalities that require complex environments be tested in isolation with the help packages like the unittest.mock package ?
What are the popular tools to ease testing like the pytest framework?

Tools for Static Analysis

Several tools are available to automatically identify suspected errors in the code. Only two popular tools that can help to ensure the quality of the Python code will be shown in this section. One is pylint, which basically checks syntactic patterns and the formatting rules described in PEP8. The other tool is mypy, which can perform static type checking based on the type annotations. Listing 6-28 shows how to use them.

python -m pip install pylint

python -m pip install mypy

pylint src/registry

mypy --strict src/registry

Listing 6-28

Static Analysis Commands

There is an option for both tools to include comments in the source file, which disable some of the checks of the tool. This is useful when you intentionally do not want to comply with the default checking rules for some reason.

Tools for Formatting

Python code formatters are useful to automatically make your source code easier to read and conform to the PEP8 standard. There are several such tools like autopep8 or yapf, but the most popular tool is probably black. While the first two of tools can be widely customized, black is famous for providing good results out of the box and hardly allowing any customization in its formatting style.

In Listing 6-29 you can see the installation of the black package and the reformatting of the model.py file.

pip install black

black src/registry/model.py

Listing 6-29

Installing and Using the black Formatting Tool

Preparation of Documentation

The program Sphynx can be used to generate documentation for your package. Preparing the documentation consists of two steps: the tool extracts the documentation comments from the Python source files, and then it combines them with the documentation files and creates documentation of your package. Documentation files provide the frame of the documentation of the package, and they can also contain further documentation about the package. As an example, a file containing the user guide will be created. A format called reStructuredText can be used to add formatting to the text in the documentation files. Some formatting notation for example: the text highlighting can be denoted by stars put around the text; or the chapter title can be marked by underlining (a series of equal signs in the next line as long as the title text).

Listing 6-30 shows the commands used to generate the documentation . The command in line installs the tool. The documentation files have to be created next, which can be done with the commands in lines 2 and 3. After executing the sphinx-quickstart command, select the no option to answer the source and destination question, name the project registry, set any name for the author’s name, and confirm the default option to the last two questions. The default documentation and configuration files are automatically generated by these two commands.

python -m pip install sphinx

sphinx-quickstart

sphinx-apidoc -o docs src/registry

make html

Listing 6-30

Commands to Execute Sphinx

Thereafter, you can add the user guide shown in Listing 6-31 to your project, by copying the content of the listing into the userguide.rst file in the docs directory. The files that are generated by default require the following modifications: references to the documentation files have to be added in the index.rst file, as shown in Listing 6-32; in the conf.py file, the hash marks before the first three lines have to be removed, the source directory path have to be fixed, and lines 7–10 (the section beginning with the extensions word) should be rewritten as shown in Listing 6-33. These modifications are necessary for Sphynx to be able to automatically load the comments (docstrings) from the Python source files. The command in line 4 will generate the description of the module in HTML format. If the files are changed, the documentation can be regenerated with the last two commands.

User Guide

=====================

This is the description of the *modul*.

Listing 6-31

The File which Contains the Module Description

.. toctree::

:maxdepth: 2

:caption: Contents:

docs/userguide

docs/modules

Listing 6-32

The Sphinx index.rst File

import os

import sys

sys.path.insert(0, os.path.abspath('./src/'))

# Please, leave the original conent between these two sections

extensions = [

'sphinx.ext.autodoc',

'sphinx.ext.napoleon',

]

Listing 6-33

The Sphinx conf.py File

Key Takeaways

Functions, classes, and other definitions can be organized into modules to make it easier to navigate between them. A Python source file itself is a module already. A package is a module that contains further modules in it. Modules must be imported before they can be used.
Packages are usually stored as files. If a package is not included in the built-in Python library, the package file must be copied into your work environment somehow. This problem can be solved with the package manager called pip, which can download and copy the packages into your work environment. The third-party packages used by your program are called the dependencies of your program, and it is recommended to list them in a file (e.g., requirements.txt).
When you’re developing a Python program , you have a huge collection of built-in packages that can help you. If this isn’t enough, a huge collection of third-party packages are provided by the Python ecosystem .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. The Module: Organization of Program Parts into a Unit

Create new playlist

Sign In

Sign Up