Chapter 15. Modules: The Big Picture

This chapter begins our look at the Python module, the highest-level program organization unit, which packages program code and data for reuse. In concrete terms, modules usually correspond to Python program files (or C extensions). Each file is a module, and modules import other modules to use the names they define. Modules are processed with two new statements and one important built-in function:

import

Lets a client (importer) fetch a module as a whole

from

Allows clients to fetch particular names from a module

reload

Provides a way to reload a module’s code without stopping Python

We introduced module fundamentals in Chapter 3, and have been using them ever since. Part V begins by expanding on core module concepts, and then moves on to explore more advanced module usage. This first chapter begins with a general look at the role of modules in overall program structure. In the next and following chapters, we’ll dig into the coding details behind the theory.

Along the way, we’ll flesh out module details we’ve omitted so far: reloads, the __name__ and __all__ attributes, package imports, and so on. Because modules and classes are really just glorified namespaces, we formalize namespace concepts here as well.

Why Use Modules?

Modules provide an easy way to organize components into a system, by serving as packages of names. From an abstract perspective, modules have at least three roles:

Code reuse

As we saw in Chapter 3, modules let us save code in files permanently. Unlike code you type at the Python interactive prompt, which goes away when you exit Python, code in module files is persistent—it can be reloaded and rerun as many times as needed. More to the point, modules are a place to define names, or attributes, that may be referenced by external clients.

System namespace partitioning

Modules are also the highest-level program organization unit in Python. Fundamentally, they are just packages of names. Modules seal up names into self-contained packages that avoid name clashes—you can never see a name in another file, unless you explicitly import it. In fact, everything “lives” in a module: code you execute and objects you create are always implicitly enclosed by a module. Because of that, modules are a natural tool for grouping system components.

Implementing shared services or data

From a functional perspective, modules also come in handy for implementing components that are shared across a system, and hence only require a single copy. For instance, if you need to provide a global object that’s used by more than one function or file, you can code it in a module that’s imported by many clients.

To truly understand the role of modules in a Python system, though, we need to digress for a moment and explore the general structure of a Python program.

Python Program Architecture

So far in this book, we’ve sugar-coated some of the complexity in our descriptions of Python programs. In practice, programs usually are more than just one file; for all but the simplest scripts, your programs will take the form of multifile systems. And even if you can get by with coding a single file yourself, you will almost certainly wind up using external files that someone else has already written.

This section introduces the general architecture of Python programs—the way you divide a program into a collection of source files (a.k.a. modules), and link the parts into a whole. Along the way, we also define the central concepts of Python modules, imports, and object attributes.

How to Structure a Program

Generally, a Python program consists of multiple text files containing Python statements. The program is structured as one main, top-level file, along with zero or more supplemental files known as modules in Python.

In a Python program, the top-level file contains the main flow of control of your program—the file you run to launch your application. The module files are libraries of tools, used to collect components used by the top-level file, and possibly elsewhere. Top-level files use tools defined in module files, and modules use tools defined in other modules. In Python, a file imports a module to gain access to the tools it defines. And the tools defined by a module are known as its attributes—variable names attached to objects such as functions. Ultimately, we import modules, and access their attributes to use their tools.

Imports and Attributes

Let’s make this a bit more concrete. Figure 15-1 sketches the structure of a Python program composed of three files: a.py, b.py, and c.py. The file a.py is chosen to be the top-level file; it will be a simple text file of statements, which is executed from top to bottom when launched. Files b.py and c.py are modules; they are simple text files of statements as well, but are usually not launched directly. Rather, modules are normally imported by other files that wish to use the tools they define.

Program architecture
Figure 15-1. Program architecture

For instance, suppose file b.py in Figure 15-1 defines a function called spam, for external use. As we learned in Part IV, b.py would contain a Python def statement to generate the function, which is later run by passing zero or more values in parenthesis after the function’s name:

def spam(text):
    print text, 'spam'

Now, if a.py wants to use spam, it might contain Python statements such as the following:

import b
b.spam('gumby')

The first of these two, a Python import statement, gives file a.py access to everything defined in file b.py. It roughly means: “load file b.py (unless it’s already loaded), and give me access to all its attributes through name b.” import (and as you’ll see later, from) statements execute and load another file at runtime. In Python, cross-file module linking is not resolved until such import statements are executed.

The second of these statements calls the function spam defined in module b using object attribute notation. The code b.spam means: “fetch the value of name spam that lives within object b.” This happens to be a callable function in our example, so we pass a string in parenthesis ('gumby'). If you actually type these files and run a.py, the words “gumby spam” are printed.

More generally, you’ll see the notation object.attribute throughout Python scripts—most objects have useful attributes that are fetched with the “.” operator. Some are callable things like functions, and others are simple data values that give object properties (e.g., a person’s name).

The notion of importing is also general throughout Python. Any file can import tools from any other file. For instance, file a.py may import b.py to call its function, but b.py might also import c.py in order to leverage different tools defined there. Import chains can go as deep as you like: in this example, module a can import b, which can import c, which can import b again, and so on.

Besides serving as a highest organization structure, modules (and module packages, described in Chapter 17) are also the highest level of code reuse in Python. By coding components in module files, they become useful in both the original program, as well as in any other program you may write. For instance, if after coding the program in Figure 15-1 we discover that function b.spam is a general purpose tool, we can reuse it in a completely different program; simply import file b.py again, from the other program’s files.

Standard Library Modules

Notice the rightmost portion of Figure 15-1. Some of the modules that your programs will import are provided by Python itself, not files you will code. Python automatically comes with a large collection of utility modules known as the Standard Library .

This collection, roughly 200 modules large at last count, contains platform independent support for common programming tasks: operating system interfaces, object persistence, text pattern matching, network and Internet scripting, GUI construction, and much more. None of these are part of the Python language itself, but can be used by importing the appropriate modules on any standard Python installation.

In this book, you will meet a few of the standard library modules in action in the examples, but for a complete look, you should browse the standard Python Library Reference Manual, available either with your Python installation (they are in IDLE and your Python Start button entry on Windows), or online at http://www.python.org.

Because there are so many modules, this is really the only way to get a feel for what tools are available. You can also find Python library materials in commercial books, but the manuals are free, viewable in any web browser (they ship in HTLM format), and updated each time Python is re-released.

How Imports Work

The prior section talked about importing modules, without really explaining what happens when you do so. Since imports are at the heart of program structure in Python, this section goes into more detail on the import operation to make this process less abstract.

Some C programmers like to compare the Python module import operation to a C #include, but they really shouldn’t—in Python, imports are not just textual insertions of one file into another. They are really runtime operations that perform three distinct steps the first time a file is imported by a program:

  1. Find the module’s file.

  2. Compile it to byte-code (if needed).

  3. Run the module’s code to build the objects it defines.

All three of these steps are only run the first time a module is imported during a program’s execution; later imports of the same module bypass all of these and simply fetch the already-loaded module object in memory. To better understand module imports, let’s explore each of these steps in turn.

1. Find It

First off, Python must locate the module file referenced by your import statement. Notice the import statement in the prior section’s example names the file without a .py suffix and without its directory path. It says just import b, instead of something like import c:dir1.py. Import statements omit path and suffix details like this on purpose; you can only list a simple name.[1] Instead, Python uses a standard module search path to locate the module file corresponding to an import statement.

The module search path

In many cases, you can rely on the automatic nature of the module import search path and need not configure this path at all. If you want to be able to import files across user-defined directory boundaries, though, you will need to know how the search path works, in order to customize it. Roughly, Python’s module search path is automatically composed as the concatenation of these major components:

  1. The home directory of the top-level file.

  2. PYTHONPATH directories (if set).

  3. Standard library directories.

  4. The contents of any .pth files (if present).

The first and third of these are defined automatically. Because Python searches the concatenation of these from first to last, the second and fourth can be used to extend the module search path to include your own directories. Here is how Python uses each of these path components:

Home directory

Python first looks for the imported file in the home directory. Depending on how you are launching code, this is either the directory containing your program’s top-level file, or the directory in which you are working interactively. Because this is always searched first, if a program is located entirely in a single directory, all its imports will work automatically, with no path configuration required.

PYTHONPATH directories

Next, Python searches all directories listed in your PYTHONPATH envronment variable setting, from left to right (assuming you have set this at all). In brief, PYTHONPATH is simply set to a list of user-defined and platform-specific names of directories that contain Python code files. Add all the directories that you wish to be able to import from; Python uses your setting to extend the module search path.

Because Python searches the home directory first, you only need to make this setting to import files across directory boundaries—that is, to import a file that is stored in a different directory than the file that imports it. In practice, you probably will make this setting once you start writing substantial programs. When you are first starting out, though, if you save all your module files in the directory that you are working in (i.e., the home directory), your imports will work without making this setting at all.

Standard library directories

Next, Python will automatically search the directories where the standard library modules are installed on your machine. Because these are always searched, they normally do not need to be added to your PYTHONPATH.

.pth file directories

Finally, a relatively new feature of Python allows users to add valid directories to the module search path by simply listing them, one per line, in a text file whose name ends in a .pth suffix (for “path”). These path configuration files are a somewhat advanced installation-related feature, which we will not discuss fully here.

In short, a text file of directory names, dropped in an appropriate directory, can serve roughly the same role as the PYTHONPATH environment variable setting. For instance, a file named myconfig.pth, may be placed at the top level of the Python install directory on Windows (e.g., in C:Python22), to extend the module search path. Python will add the directories listed on each line of the file, from first to last, near the end of the module search path list. Because they are based on files instead of shell settings, path files can also apply to all users of an installation, instead of just one user or shell.

This feature is more sophisticated than we will describe here. We recommend that beginners use either PYTHONPATH or a single .pth file, and then only if you must import across directories. See the Python library manual for more details on this feature, especially its documentation for standard library module site.

See also Appendix A for examples of common ways to extend your module search path with PYTHONPATH or .pth files on various platforms. Depending on your platform, additonal directories may be automatically added to the module search path as well. In fact, this description of the module search path is accurate, but generic; the exact configuration of the search path is prone to change over both platforms and Python releases.

For instance, Python may add an entry for the current working directory—the directory from which you launched your program—in the search path, after the PYTHONPATH directories, and before standard library entries. When launching from a command line, the current working directory may not be the same as the home directory of your top-level file—the directory where your program file resides. (See Chapter 3 for more on command lines.) Since the current working directory can vary each time your program runs, you normally shouldn’t depend on its value for import purposes.

The sys.path list

If you want to see how the path is truly configured on your machine, you can always inspect the module search path as it is known to Python, by printing the built-in sys.path list (that is, attribute path, of built-in module sys). This Python list of directory name strings is the actual search path; on imports, Python searches each directory on this list, from left to right.

Really, sys.path is the module search path. It is configured by Python at program startup, using the four path components just described. Python automatically merges any PYTHONPATH and .pth file path settings you’ve made into this list, and always sets the first entry to identify the home directory of the top-level file, possibly as an empty string.

Python exposes this list for two good reasons. First of all, it provides a way to verify the search path settings you’ve made—if you don’t see your settings somewhere on this list, you need to recheck your work. Secondly, if you know what you’re doing, this list also provides a way for scripts to tailor their search paths manually. As you’ll see later in this part, by modifying the sys.path list, you can modify the search path for all future imports. Such changes only last for the duration of the script, however; PYTHONPATH and .pth files are more permanent ways to modify the path.[2]

Module file selection

Keep in mind that filename suffixes (e.g., .py) are omitted in import statements, intentionally. Python chooses the first file it can find on the search path that matches the imported name. For example, an import statement of the form import b, might load:

  • Source file b.py

  • Byte-code file b.pyc

  • A directory named b, for package imports

  • A C extension module (e.g., b.so on Linux)

  • An in-memory image, for frozen executables

  • A Java class, in the Jython system

  • A zip file component, using the zipimport module

Some standard library modules are actually coded in C. C extensions, Jython, and package imports all extend imports beyond simple files. To importers, though, the difference in loaded file type is completely transparent, both when importing and fetching module attributes. Saying import b gets whatever module b is, according to your module search path, and b.attr fetches an item in the module, be that a Python variable or a linked-in C function. Some standard modules we will use in this book, for example, are coded in C, not Python; their clients don’t have to care.

If you have both a b.py and a b.so in different directories, Python will always load the one on the first (leftmost) directory on your module search path, during the left to right search of sys.path. But what happens if there is both a b.py and b.so in the same directory? Python follows a standard picking order, but it is not guaranteed to stay the same over time. In general, you should not depend on which type of file Python will choose within a given directory—make your module names distinct, or use module search path configuration to make module selection more obvious. It is also possible to redefine much of what an import operation does in Python, with what are known as import hooks. These hooks can be used to make imports do useful things such as load files from zip archives, perform decryption, and so on (in fact, Python 2.3 includes a zipimport standard module, which allows files to be directly imported from zip archives). Normally, though, imports work as described in this section. Python also supports the notion of .pyo optimized byte-code files, created and run with the -O Python command-line flag; because these run only slightly faster than normal .pyc files (typically 5% faster), they are infrequently used. The Psyco system (see Chapter 2) provides more substantial speedups.

2. Compile It (Maybe)

After finding a source code file that matches an import statement according to the module search path, Python next compiles it to byte code, if necessary. (We discussed byte code in Chapter 2.)

Python checks file timestamps and skips the source to byte code compile step, if it finds a .pyc byte code file that is not older than the corresponding .py source file. In addition, if Python finds only a byte code file on the search path and no source, it simply loads the byte code directly. Because of this, the compile step is bypassed if possible, to speed program startup. If you change the source code, Python will automatically regenerate the byte code the next time your program is run. Moreover, you can ship a program as just byte code files, and avoid sending source.

Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc byte code file for the top-level file of your program, unless it is also imported elsewhere—only imported files leave behind a .pyc on your machine. The byte code of top-level files is used internally and discarded; byte-code of imported files is saved in files to speed future imports.

Top-level files are often designed to be executed directly and not imported at all. Later, we’ll see that it is possible to design a file that serves both as the top-level code of a program, and a module of tools to be imported. Such files may be both executed and imported, and thus generate a .pyc. To learn how, watch for the discussion of the special __name__ attribute and "__main__" in Chapter 18.

3. Run It

The final step of an import operation executes the byte code of the module. All statements in the file execute in turn, from top to bottom, and any assignments made to names during this step generate attributes of the resulting module object. This execution step generates all the tools that the module’s code defines. For instance, def statements in a file are run at import time to create functions, and assign attributes within the module to those functions. The functions are called later in the program by importers.

Because this last import step actually runs the file’s code, if any top-level code in a module file does real work, you’ll see its results at import time. For example, top-level print statements in a module show output when the file is imported. Function def statements simply define objects for later use.

As you can see, import operations involve quite a bit of work—they search for files, possibly run a compiler, and run Python code. A given module is only imported once per process by default. Future imports skip all three import steps, and reuse the already-loaded module in memory.[3]

As you can also see, the import operation is at the heart of program architecture in Python. Larger programs are divided into multiple files, which are linked together at runtime by imports. Imports in turn use module search paths to locate your files, and modules define attributes for external use.

Of course, the whole point of imports and modules is to provide a structure to your program, which divides its logic into self-contained software components. Code in one module is isolated from code in another; in fact, no file can ever see the names defined in another, unless explicit import statements are run. To see what this all means in terms of actual code, let’s move on to Chapter 16.



[1] In fact, it’s syntactically illegal to include path and suffix detail in an import. In Chapter 17, we’ll meet package imports, which allow import statements to include part of the directory path leading to a file, as a set of period-separated names. However, package imports still rely on the normal module search path, to locate the leftmost directory in a package path. They also cannot make use of any platform-specific directory syntax in the import statement; such syntax only works on the search path. Also note that module file search path issues are not as relevant when you run frozen executables (discussed in Chapter 2); they typically embed byte code in the binary image.

[2] Some programs really need to change sys.path, though. Scripts that run on web servers, for example, usually run as user “nobody” to limit machine access. Because such scripts cannot usually depend on “nobody” to have set PYTHONPATH in any particular way, they often set sys.path manually to include required source directories, prior to running any import statements.

[3] Technically, Python keeps already-loaded modules in the built-in sys.modules dictionary, and checks that at the start of an import operation to know if the module is already loaded. If you want to see which modules are loaded, import sys, and print sys.modules.keys( ). More on this internal table in Chapter 18.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201