... reuse components that are already available, compose applications from big chunks of premade libraries, glue them together, and make sure it works, even without fully understanding how. Although many would reject this point of view, it is the de facto style, mostly unconsciously, behind today’s biggest software projects.
Jaroslav Tulach
When developing computer programs, perhaps the most important question is how to organize your program into logical units. Two of the three most important constructions supporting this goal, namely, functions and classes, have already been discussed. What has not been discussed yet is the next organizational unit above the class, the module. The related variable names, functions, and classes are usually organized into a module. In this chapter, we will discuss the concepts of modules and packages, how they can be imported, the built-in and third-party packages, how packages can be created, and what kind of tools can help to make packages high quality.
Built-in Modules
Python comes with more than 200 built-in modules, including everything from specialized data structures to relational database management functionality. This is also the reason behind one of the slogans of Python, namely, “batteries included.”
Importing a Module
Importing a Module with a New Name
Importing an Object from a Module
Importing an Object from a Module with a New Name
Operations with Date Type
Importing Decimal Types
Comparing the Precision of Number Types
Operations with Decimal Type
Importing the Deque Type
Operations with the Deque Type
Python searches for the module to be imported first among the built-in modules. If it cannot be found here, try to load it from the location listed in the path variable of the sys module. (Its value can be checked by printing the value of the path variable name after executing the from sys import path statement.) These are as follows: the current directory (i.e., the directory from which your Python program is started); the directories in the environmental variable PYTHONPATH, which was set by the operating system; and the directories specified during the installation.
Defining Modules
It is simple to prepare your own module in Python as the file containing the source code is considered a module by default. Importing a module in Python means that the source code of that module is executed. To turn a stand-alone Python script into a reusable module, you must make its functionalities accessible through functions or classes. Additionally, the statements that are not intended to be executed when the file is used as a module must be guarded by an if statement. The conditional expression of this if statement is typically __name__=='__main__'. The exact meaning of this if statement is as follows: if the file is imported as a module, the name of the module is assigned to the __name__ variable name, while if the file is executed directly as a script, its value is the __main__ string.
Fragment of the model.py File
Modules can be run from a command line by specifying the filename after the Python command. If your newly generated file is run with the python model.py command , the defined Order type object will appear on the screen.
In this chapter, some of the examples do not consist of source code written in the Python language but commands writable to the operating system command prompt or shell. We covered how to access the command line on a particular operating system at the end of the Introduction chapter.
Commands that need to run Python may be different depending on the operating system and the environment. After installation, under a Windows OS, the py-3.10 command can be used instead of the python command, while under macOS and Linux the python3.10 command has to be issued.
Packages
Packages are modules containing other modules. They can be used to organize the modules into further units. One package is usually one directory with modules with additionally a file named __init__.py in it. This package can also be placed into another package, and this can be repeated arbitrarily. If this package has to be executable directly, a __main__.py file can be placed in the directory that will contain the code to be executed in such a case only.
A model.py file can be created from the class definitions in Listings 3-7, 3-13, 3-17, 3-20, and 6-11. As an example, a package can be built by creating a registry directory and copying the model.py file into this directory. An empty __init__.py file must be created in this directory too, which can be used in the future to add documentation and statements to be executed when loading the package. The model module from this newly constructed package can be imported with the import registry.model statement.
Future Package
The Python language has a statement that can switch on and off new language additions or change their behavior. This statement is the from __future__ import followed by the name of the feature and can be present only at the beginning of the source file. This statement is a different statement than the earlier reviewed import statement. For compatibility reasons, the __future__ package exists and can be used with other forms of import statements, but this is not to be confused with the previous statement.
Since version 3.7, the only active feature that can be turned on is the delayed evaluation of the type annotations, and the name of this feature is annotations (see PEP563; type annotations will be discussed in Appendix C). This functionality is turned on by default starting from version 3.11, and in later versions this statement will not change the behavior of the language anymore.
Package Management
The Python environment supports managing third-party packages with a package managing tool named pip. This package manager is able to download versions of the package together with their dependencies from the Python Package Index and make it available to your machine.
Package Management Commands
The first two commands list all the installed packages and all packages having a more up-to-date version than the one installed. The command in line 3 lists packages from the Python Package Index that match the requested word. Lines 4 and 5 show how simple it is to install a package (in the second case, a version number is also specified; a relation sign can also be used here to express the required package version more loosely). The command in line 6 shows information about the installed package, such as the list of packages this one depends on. The last two lines show how to save the list of the installed packages into a file and how to install packages based on a dependency file.
Useful Third-Party Packages
Two scenarios of using third-party packages will be presented in this section. In the first scenario, a web page is downloaded, and information is extracted from the downloaded page. In another scenario, an Excel table is processed.
Installation of Third-Party Packages
The commands needed to install the package may depend on the operating system and the environment. If you have installed the default environment described in the introduction, these commands are as follows: in the case of Windows 10, replace the python -m pip part at the beginning of the commands with py -3.10 -m pip; in the case of macOS and Linux, replace the python -m pip part at the beginning of the commands with sudo python3.10 -m pip or python3.10 -m pip --user.
Importing Requests and bs4 Packages
Downloading a Website
Header Element of the Web Page
Header Test of the Web Page
Fragment of the Web Page
Extracting Data from the Body of the Web Page
Importing the pandas Package
Sorting the Table by Order Value
Grouping the Value of the Orders by Customer ID
Modules in Practice
Fragment of the Model Module
Modules are frequently written to be reusable, and it’s helpful when the functionality of the module can be accessed via a class providing a simplified interaction. This is called a facade designing pattern , and it has two benefits: the module does not have to know the exact internal structure of the module, and using the module takes place on a well-specified “narrow” surface. Therefore, in the case of an internal change, other modules using this one would not need to be changed. Developing an easily reusable module can be even three times more effort than developing a module for only a single use.
Advanced Concepts
This section describes some technical details in reference manual style and some advanced concepts that may need more technical background.
Structure of Python Projects
Several recommendations exist for the structure of Python projects , which differ in detail like the format used to store dependencies or the package description (often named README) file. The recommended location of the source files is the src directory. Depending on whether the program contains one or more files, the src directory contains a single Python source file named identically with the package name or a directory named identically to the package name. In addition, it usually includes a tests directory for the tests and a docs directory of the documentation. In addition, the project usually includes a LICENSE file containing a description of the license and/or a package description file. This file is named README.md or README.rst depending on whether markdown or reStructuredText is chosen as a format, respectively. In the simplest case, the dependencies of our module on third-party packages are stored in a requirements.txt file. If you want to share/publish your module, you will also need a setup.py file, pyproject.toml file, or other files that can also substitute the function of requirements.txt as well.
If you want the Python package to be available for others, you can prepare a compressed file from it suitable for binary distribution. This file can be shared manually or can be uploaded to a package index server. This server can be the default pypi.org or any other repository server. Packages can be configured classically with a setup.py file, which stores the package information (such as version, author, dependencies, license, and the like) programmatically. New versions of tools support to substitute the setup.py file with a configuration file, which is named pyproject.toml and contains the necessary information to describe the package.
The setup.py File
The pyproject.toml File
The setup.cfg File
Virtual Environments
The virtual environment can be useful when the exact reproducibility of the environment is important or you have to use various Python versions and package versions in parallel. The virtual environment can be created by the python -m venv ENVIRONMENT_NAME command, where ENVIRONMENT_NAME is the name of the environment to be created. The environment will be created in a directory named the same as the name specified in the command. The directory will contain a pyvenv.cfg configuration file; an include directory for the C header files; a lib directory, which contains the site-packages directory for the third-party packages; and finally, a bin or Scripts directory—depending on whether the installation is under Linux or Windows—containing program files of the Python environment . The environment can be activated with the source ENVIRONMENT_NAME/bin/activate command on Linux, while the same can be accomplished by the ENVIRONMENT_NAMEscriptactivate.bat command on Windows 10. The environment can be switched off by the deactivate command. (The macOS commands are identical to the commands used for Linux.)
Other tools are also available to manage the virtual environments. The most popular alternatives for the built-in tools are the pipenv and poetry tools.
Tools for Testing
Python provides built-in packages to make testing easier. The most important package is the unittest package , which supports the automatic testing of functions and classes. A test case class typically contains test methods, which exercise functionalities and verify that the functionality worked as expected. Often special setUp and tearDown methods are used to prepare the test method and clean up the environment after the execution of the method, respectively. The verification of the functionality is typically realized by calling assert methods to compare actual results to the expected results. Test cases can be organized into test suites.
Unit Test of the Product Class
How can functionalities that require complex environments be tested in isolation with the help packages like the unittest.mock package ?
What are the popular tools to ease testing like the pytest framework?
Tools for Static Analysis
Static Analysis Commands
There is an option for both tools to include comments in the source file, which disable some of the checks of the tool. This is useful when you intentionally do not want to comply with the default checking rules for some reason.
Tools for Formatting
Python code formatters are useful to automatically make your source code easier to read and conform to the PEP8 standard. There are several such tools like autopep8 or yapf, but the most popular tool is probably black. While the first two of tools can be widely customized, black is famous for providing good results out of the box and hardly allowing any customization in its formatting style.
Installing and Using the black Formatting Tool
Preparation of Documentation
The program Sphynx can be used to generate documentation for your package. Preparing the documentation consists of two steps: the tool extracts the documentation comments from the Python source files, and then it combines them with the documentation files and creates documentation of your package. Documentation files provide the frame of the documentation of the package, and they can also contain further documentation about the package. As an example, a file containing the user guide will be created. A format called reStructuredText can be used to add formatting to the text in the documentation files. Some formatting notation for example: the text highlighting can be denoted by stars put around the text; or the chapter title can be marked by underlining (a series of equal signs in the next line as long as the title text).
Commands to Execute Sphinx
The File which Contains the Module Description
The Sphinx index.rst File
The Sphinx conf.py File
Key Takeaways
Functions, classes, and other definitions can be organized into modules to make it easier to navigate between them. A Python source file itself is a module already. A package is a module that contains further modules in it. Modules must be imported before they can be used.
Packages are usually stored as files. If a package is not included in the built-in Python library, the package file must be copied into your work environment somehow. This problem can be solved with the package manager called pip, which can download and copy the packages into your work environment. The third-party packages used by your program are called the dependencies of your program, and it is recommended to list them in a file (e.g., requirements.txt).
When you’re developing a Python program , you have a huge collection of built-in packages that can help you. If this isn’t enough, a huge collection of third-party packages are provided by the Python ecosystem .