Chapter 18. Modules: The Big Picture

This chapter begins our in-depth look at the Python module, the highest-level program organization unit, which packages program code and data for reuse. In concrete terms, modules usually correspond to Python program files (or extensions coded in external languages such as C, Java, or C#). Each file is a module, and modules import other modules to use the names they define. Modules are processed with two statements, and one important built-in function:

import

Lets a client (importer) fetch a module as a whole

from

Allows clients to fetch particular names from a module

reload

Provides a way to reload a module’s code without stopping Python

Chapter 3 introduced module fundamentals, and we’ve been using them ever since. Part V begins by expanding on core module concepts, then moves on to explore more advanced module usage. This first chapter offers a general look at the role of modules in overall program structure. In the next and following chapters, we’ll dig into the coding details behind the theory.

Along the way, we’ll flesh out module details omitted so far: you’ll learn about reloads, the _ _name_ _ and _ _all_ _ attributes, package imports, and so on. Because modules and classes are really just glorified namespaces, we’ll formalize namespace concepts here as well.

Why Use Modules?

In short, modules provide an easy way to organize components into a system by serving as self-contained packages of variables known as namespaces. All the names defined at the top level of a module file become attributes of the imported module object. As we saw in the last part of this book, imports give access to names in a module’s global scope. That is, the module file’s global scope morphs into the module object’s attribute namespace when it is imported. Ultimately, Python’s modules allow us to link individual files into a larger program system.

More specifically, from an abstract perspective, modules have at least three roles:

Code reuse

As discussed in Chapter 3, modules let you save code in files permanently. Unlike code you type at the Python interactive prompt, which goes away when you exit Python, code in module files is persistent—it can be reloaded and rerun as many times as needed. More to the point, modules are a place to define names, known as attributes, that may be referenced by multiple external clients.

System namespace partitioning

Modules are also the highest-level program organization unit in Python. Fundamentally, they are just packages of names. Modules seal up names into self-contained packages, which helps avoid name clashes—you can never see a name in another file, unless you explicitly import that file. In fact, everything “lives” in a module—code you execute and objects you create—are always implicitly enclosed in modules. Because of that, modules are natural tools for grouping system components.

Implementing shared services or data

From an operational perspective, modules also come in handy for implementing components that are shared across a system, and hence require only a single copy. For instance, if you need to provide a global object that’s used by more than one function or file, you can code it in a module that can then be imported by many clients.

For you to truly understand the role of modules in a Python system, though, we need to digress for a moment, and explore the general structure of a Python program.

Python Program Architecture

So far in this book, I’ve sugarcoated some of the complexity in my descriptions of Python programs. In practice, programs usually involve more than just one file; for all but the simplest scripts, your programs will take the form of multifile systems. And even if you can get by with coding a single file yourself, you will almost certainly wind up using external files that someone else has already written.

This section introduces the general architecture of Python programs—the way you divide a program into a collection of source files (a.k.a. modules) and link the parts into a whole. Along the way, we’ll also explore the central concepts of Python modules, imports, and object attributes.

How to Structure a Program

Generally, a Python program consists of multiple text files containing Python statements. The program is structured as one main, top-level file, along with zero or more supplemental files known as modules in Python.

In Python, the top-level file contains the main flow of control of your program—this is the file you run to launch your application. The module files are libraries of tools used to collect components used by the top-level file (and possibly elsewhere). Top-level files use tools defined in module files, and modules use tools defined in other modules.

Module files generally don’t do anything when run directly; rather, they define tools intended for use in other files. In Python, a file imports a module to gain access to the tools it defines, which are known as its attributes (i.e., variable names attached to objects such as functions). Ultimately, we import modules and access their attributes to use their tools.

Imports and Attributes

Let’s make this a bit more concrete. Figure 18-1 sketches the structure of a Python program composed of three files: a.py, b.py, and c.py. The file a.py is chosen to be the top-level file; it will be a simple text file of statements, which is executed from top to bottom when launched. The files b.py and c.py are modules; they are simple text files of statements as well, but they are not usually launched directly. Instead, as explained previously, modules are normally imported by other files that wish to use the tools they define.

Program architecture in Python. A program is a system of modules. It has one top-level script file (launched to run the program), and multiple module files (imported libraries of tools). Scripts and modules are both text files containing Python statements, though the statements in modules usually just create objects to be used later. Python’s standard library provides a collection of precoded modules.
Figure 18-1. Program architecture in Python. A program is a system of modules. It has one top-level script file (launched to run the program), and multiple module files (imported libraries of tools). Scripts and modules are both text files containing Python statements, though the statements in modules usually just create objects to be used later. Python’s standard library provides a collection of precoded modules.

For instance, suppose the file b.py in Figure 18-1 defines a function called spam, for external use. As we learned in Part IV, b.py will contain a Python def statement to generate the function, which can later be run by passing zero or more values in parentheses after the function’s name:


def spam(text):
    print text, 'spam'

Now, suppose a.py wants to use spam. To this end, it might contain Python statements such as the following:


import b
b.spam('gumby')

The first of these, a Python import statement, gives the file a.py access to everything defined by top-level code in the file b.py. It roughly means “load the file b.py (unless it’s already loaded), and give me access to all its attributes through the name b.” import (and, as you’ll see later, from) statements execute and load other files at runtime.

In Python, cross-file module linking is not resolved until such import statements are executed at runtime; their net effect is to assign module names—simple variables—to loaded module objects. In fact, the module name used in an import statement serves two purposes: it identifies the external file to be loaded, but it also becomes a variable assigned to the loaded module. Objects defined by a module are also created at runtime, as the import is executing: import literally runs statements in the target file one at a time to create its contents.

The second of the statements in a.py calls the function spam defined in the module b, using object attribute notation. The code b.spam means “fetch the value of the name spam that lives within the object b.” This happens to be a callable function in our example, so we pass a string in parentheses ('gumby'). If you actually type these files, save them, and run a.py, the words “gumby spam” will be printed.

You’ll see the object.attribute notation used throughout Python scripts—most objects have useful attributes that are fetched with the “.” operator. Some are callable things like functions, and others are simple data values that give object properties (e.g., a person’s name).

The notion of importing is also completely general throughout Python. Any file can import tools from any other file. For instance, the file a.py may import b.py to call its function, but b.py might also import c.py to leverage different tools defined there. Import chains can go as deep as you like: in this example, the module a can import b, which can import c, which can import b again, and so on.

Besides serving as the highest organizational structure, modules (and module packages, described in Chapter 20) are also the highest level of code reuse in Python. Coding components in module files makes them useful in your original program, and in any other programs you may write. For instance, if after coding the program in Figure 18-1 we discover that the function b.spam is a general-purpose tool, we can reuse it in a completely different program; all we have to do is import the file b.py again from the other program’s files.

Standard Library Modules

Notice the rightmost portion of Figure 18-1. Some of the modules that your programs will import are provided by Python itself, and are not files you will code.

Python automatically comes with a large collection of utility modules known as the standard library. This collection, roughly 200 modules large at last count, contains platform-independent support for common programming tasks: operating system interfaces, object persistence, text pattern matching, network and Internet scripting, GUI construction, and much more. None of these tools are part of the Python language itself, but you can use them by importing the appropriate modules on any standard Python installation. Because they are standard library modules, you can also be reasonably sure that they will be available, and will work portably on most platforms on which you will run Python.

You will see a few of the standard library modules in action in this book’s examples, but for a complete look, you should browse the standard Python library reference manual, available either with your Python installation (via IDLE or the Python Start button menu on Windows), or online at http://www.python.org.

Because there are so many modules, this is really the only way to get a feel for what tools are available. You can also find tutorials on Python library tools in commercial books that cover application-level programming, such as Programming Python, but the manuals are free, viewable in any web browser (they ship in HTML format), and updated each time Python is rereleased.

How Imports Work

The prior section talked about importing modules without really explaining what happens when you do so. Because imports are at the heart of program structure in Python, this section goes into more detail on the import operation to make this process less abstract.

Some C programmers like to compare the Python module import operation to a C #include, but they really shouldn’t—in Python, imports are not just textual insertions of one file into another. They are really runtime operations that perform three distinct steps the first time a program imports a given file:

  1. Find the module’s file.

  2. Compile it to byte code (if needed).

  3. Run the module’s code to build the objects it defines.

To better understand module imports, we’ll explore these steps in turn. Bear in mind that all three of these steps are carried out only the first time a module is imported during a program’s execution; later imports of the same module bypass all of these steps, and simply fetch the already loaded module object in memory.

1. Find It

First off, Python must locate the module file referenced by an import statement. Notice that the import statement in the prior section’s example names the file without a .py suffix and without its directory path: it just says import b, instead of something like import c:dir1.py. In fact, you can only list a simple name; path and suffix details are omitted on purpose, as Python uses a standard module search path to locate the module file corresponding to an import statement.[45] Because this is the main part of the import operation that programmers must know about, let’s study this step in more detail.

The module search path

In many cases, you can rely on the automatic nature of the module import search path, and need not configure this path at all. If you want to be able to import files across user-defined directory boundaries, though, you will need to know how the search path works in order to customize it. Roughly, Python’s module search path is composed of the concatenation of these major components, some of which are preset for you, and some of which you can tailor to tell Python where to look:

  1. The home directory of the program.

  2. PYTHONPATH directories (if set).

  3. Standard library directories.

  4. The contents of any .pth files (if present).

Ultimately, the concatenation of these four components becomes sys.path, a list of directory name strings that I’ll expand upon in the next section. The first and third elements of the search path are defined automatically, but because Python searches the concatenation of these components from first to last, the second and fourth elements can be used to extend the path to include your own source code directories. Here is how Python uses each of these path components:

Home directory

Python first looks for the imported file in the home directory. Depending on how you are launching code, this is either the directory containing your program’s top-level file, or the directory in which you are working interactively. Because this directory is always searched first, if a program is located entirely in a single directory, all of its imports will work automatically with no path configuration required.

PYTHONPATH directories

Next, Python searches all directories listed in your PYTHONPATH environment variable setting, from left to right (assuming you have set this at all). In brief, PYTHONPATH is simply set to a list of user-defined and platform-specific names of directories that contain Python code files. You can add all the directories from which you wish to be able to import, and Python will use your setting to extend the module search path.

Because Python searches the home directory first, this setting is only important when importing files across directory boundaries—that is, if you need to import a file that is stored in a different directory from the file that imports it. You’ll probably want to set your PYTHONPATH variable once you start writing substantial programs, but when you’re first starting out, as long as you save all your module files in the directory in which you’re working (i.e., the home directory), your imports will work without you needing to worry about this setting at all.

Standard library directories

Next, Python automatically searches the directories where the standard library modules are installed on your machine. Because these are always searched, they normally do not need to be added to your PYTHONPATH.

.pth file directories

Finally, a relatively new feature of Python allows users to add valid directories to the module search path by simply listing them, one per line, in a text file whose name ends with a .pth suffix (for “path”). These path configuration files are a somewhat advanced installation-related feature, and we will not discuss them fully here.

In short, a text file of directory names dropped in an appropriate directory can serve roughly the same role as the PYTHONPATH environment variable setting. For instance, a file named myconfig.pth may be placed at the top level of the Python install directory on Windows (e.g., in C:Python25 or C:Python25Libsite-packages) to extend the module search path. Python will add the directories listed on each line of the file, from first to last, near the end of the module search path list. Because they are files rather than shell settings, path files can apply to all users of an installation, instead of just one user or shell.

This feature is more sophisticated than I’ve described here. For more details, see the Python library manual (especially its documentation for the standard library module site). I recommend that beginners use PYTHONPATH or a single .pth file, and then only if you must import across directories. See also Appendix A for examples of common ways to extend your module search path with PYTHONPATH or .pth files on various platforms.

This description of the module search path is accurate, but generic; the exact configuration of the search path is prone to changing across platforms and Python releases. Depending on your platform, additional directories may automatically be added to the module search path as well.

For instance, Python may add an entry for the current working directory—the directory from which you launched your program—in the search path after the PYTHONPATH directories, and before the standard library entries. When launching from a command line, the current working directory may not be the same as the home directory of your top-level file (i.e., the directory where your program file resides).[46] Because the current working directory can vary each time your program runs, you normally shouldn’t depend on its value for import purposes.[47]

The sys.path list

If you want to see how the module search path is truly configured on your machine, you can always inspect the path as Python knows it by printing the built-in sys.path list (that is, the path attribute of the standard library module sys). This list of directory name strings is the actual search path within Python; on imports, Python searches each directory in this list from left to right.

Really, sys.path is the module search path. Python configures it at program startup, automatically merging any PYTHONPATH and .pth file path settings you’ve made into the list, and setting the first entry to identify the home directory of the top-level file (possibly as an empty string).

Python exposes this list for two good reasons. First, it provides a way to verify the search path settings you’ve made—if you don’t see your settings somewhere in this list, you need to recheck your work. Second, if you know what you’re doing, this list also provides a way for scripts to tailor their search paths manually. As you’ll see later in this part of the book, by modifying the sys.path list, you can modify the search path for all future imports. Such changes only last for the duration of the script, however; PYTHONPATH and .pth files offer more permanent ways to modify the path.[48]

Module file selection

Keep in mind that filename suffixes (e.g., .py) are intentionally omitted from import statements. Python chooses the first file it can find on the search path that matches the imported name. For example, an import statement of the form import b might load:

  • A source code file named b.py.

  • A byte code file named b.pyc.

  • A directory named b, for package imports (described in Chapter 20).

  • A compiled extension module, usually coded in C or C++, and dynamically linked when imported (e.g., b.so on Linux, or b.dll or b.pyd on Cygwin and Windows).

  • A compiled built-in module coded in C and statically linked into Python.

  • A ZIP file component that is automatically extracted when imported.

  • An in-memory image, for frozen executables.

  • A Java class, in the Jython version of Python.

  • A .NET component, in the IronPython version of Python.

C extensions, Jython, and package imports all extend imports beyond simple files. To importers, though, differences in the loaded file type are completely transparent, both when importing and when fetching module attributes. Saying import b gets whatever module b is, according to your module search path, and b.attr fetches an item in the module, be it a Python variable or a linked-in C function. Some standard modules we will use in this book are actually coded in C, not Python; because of this transparency, their clients don’t have to care.

If you have both a b.py and a b.so in different directories, Python will always load the one found in the first (leftmost) directory of your module search path during the left-to-right search of sys.path. But what happens if it finds both a b.py and a b.so in the same directory? In this case, Python follows a standard picking order, though this order is not guaranteed to stay the same over time. In general, you should not depend on which type of file Python will choose within a given directory—make your module names distinct, or configure your module search path to make your module selection preferences more obvious.

Advanced module selection concepts

Normally, imports work as described in this section—they find and load files on your machine. However, it is possible to redefine much of what an import operation does in Python, using what are known as import hooks. These hooks can be used to make imports do various useful things, such as loading files from archives, performing decryption, and so on. In fact, Python itself uses these hooks to enable files to be directly imported from ZIP archives—the archived files are automatically extracted at import time when a .zip file is selected in the import search path. For more details, see the Python standard library manual’s description of the built-in _ _import_ _ function, the customizable tool that import statements actually run.

Python also supports the notion of .pyo optimized byte code files, created and run with the -O Python command-line flag; because these run only slightly faster than normal .pyc files (typically 5 percent faster), however, they are infrequently used. The Psyco system (see Chapter 2) provides more substantial speedups.

2. Compile It (Maybe)

After finding a source code file that matches an import statement by traversing the module search path, Python next compiles it to byte code, if necessary. (We discussed byte code in Chapter 2.)

Python checks the file timestamps and skips the source-to-byte-code compile step if it finds a .pyc byte code file that is not older than the corresponding .py source file. In addition, if Python finds only a byte code file on the search path and no source, it simply loads the byte code directly. In other words, the compile step is bypassed if possible to speed program startup. If you change the source code, Python will automatically regenerate the byte code the next time your program is run. Moreover, you can ship a program as just byte code files, and avoid sending source.

Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc byte code file for the top-level file of your program, unless it is also imported elsewhere—only imported files leave behind a .pyc on your machine. The byte code of top-level files is used internally and discarded; byte code of imported files is saved in files to speed future imports.

Top-level files are often designed to be executed directly and not imported at all. Later, we’ll see that it is possible to design a file that serves both as the top-level code of a program, and as a module of tools to be imported. Such a file may be both executed and imported, and thus does generate a .pyc. To learn how this works, watch for the discussion of the special _ _name_ _ attribute and _ _main_ _ in Chapter 21.

3. Run It

The final step of an import operation executes the byte code of the module. All statements in the file are executed in turn, from top to bottom, and any assignments made to names during this step generate attributes of the resulting module object. This execution step therefore generates all the tools that the module’s code defines. For instance, def statements in a file are run at import time to create functions and assign attributes within the module to those functions. The functions can then be called later in the program by the file’s importers.

Because this last import step actually runs the file’s code, if any top-level code in a module file does real work, you’ll see its results at import time. For example, top-level print statements in a module show output when the file is imported. Function def statements simply define objects for later use.

As you can see, import operations involve quite a bit of work—they search for files, possibly run a compiler, and run Python code. Because of this, any given module is imported only once per process by default. Future imports skip all three import steps and reuse the already loaded module in memory.[49] If you need to import a file again after it has already been loaded (for example, to support end-user customization), you have to force the issue with a reload call—a tool we’ll meet in the next chapter.

Chapter Summary

In this chapter, we covered the basics of modules, attributes, and imports, and explored the operation of import statements. We learned that imports find the designated file on the module search path, compile it to byte code, and execute all of its statements to generate its contents. We also learned how to configure the search path to be able to import from other directories than the home directory and the standard library directories, primarily with PYTHONPATH settings.

As this chapter demonstrated, the import operation and modules are at the heart of program architecture in Python. Larger programs are divided into multiple files, which are linked together at runtime by imports. Imports in turn use the module search path to locate files, and modules define attributes for external use.

Of course, the whole point of imports and modules is to provide a structure to your program, which divides its logic into self-contained software components. Code in one module is isolated from code in another; in fact, no file can ever see the names defined in another, unless explicit import statements are run. Because of this, modules minimize name collisions between different parts of your program.

You’ll see what this all means in terms of actual code in the next chapter. Before we move on, though, let’s run through the chapter quiz.

BRAIN BUILDER

1. Chapter Quiz

Q:

How does a module source code file become a module object?

Q:

Why might you have to set your PYTHONPATH environment variable?

Q:

Name the four major components of the module import search path.

Q:

Name four file types that Python might load in response to an import operation.

Q:

What is a namespace, and what does a module’s namespace contain?

2. Quiz Answers

Q:

A:

A module’s source code file automatically becomes a module object when that module is imported. Technically, the module’s source code is run during the import, one statement at a time, and all the names assigned in the process become attributes of the module object.

Q:

A:

You only need to set PYTHONPATH to import from directories other than the one in which you are working (i.e., the current directory when working interactively, or the directory containing your top-level file).

Q:

A:

The four major components of the module import search path are the top-level script’s home directory (the directory containing it), all directories listed in the PYTHONPATH environment variable, standard library directories, and all directories in .pth path files located in standard places. Of these, programmers can customize PYTHONPATH and .pth files.

Q:

A:

Python might load a source code (.py) file, a byte code (.pyc) file, a C extension module (e.g., a .so file on Linux or a .dll or .pyd file on Windows), or a directory of the same name for package imports. Imports may also load more exotic things such as ZIP file components, Java classes under the Jython version of Python, .NET components under IronPython, and statically linked C extensions that have no files present at all. With import hooks, imports can load anything.

Q:

A:

A namespace is a self-contained package of variables, which are known as the attributes of the namespace object. A module’s namespace contains all the names assigned by code at the top level of the module file (i.e., not nested in def or class statements). Technically, a module’s global scope morphs into the module object’s attributes namespace. A module’s namespace may also be altered by assignments from other files that import it, though this is frowned upon (see Chapter 16 for more on this).



[45] * It’s actually syntactically illegal to include path and suffix details in a standard import. Package imports, which we’ll discuss in Chapter 20, allow import statements to include part of the directory path leading to a file as a set of period-separated names; however, package imports still rely on the normal module search path to locate the leftmost directory in a package path (i.e., they are relative to a directory in the search path). They also cannot make use of any platform-specific directory syntax in the import statements; such syntax only works on the search path. Also, note that module file search path issues are not as relevant when you run frozen executables (discussed in Chapter 2); they typically embed byte code in the binary image.

[46] * See Chapter 3 for more on launching programs from command lines.

[47] See also Chapter 21’s discussion of the new relative import syntax in Python 2.5; this modifies the search path for from statements when “.” characters are used (e.g., from . import string).

[48] Some programs really need to change sys.path, though. Scripts that run on web servers, for example, usually run as the user “nobody” to limit machine access. Because such scripts cannot usually depend on “nobody” to have set PYTHONPATH in any particular way, they often set sys.path manually to include required source directories, prior to running any import statements. A sys.path.append(dirname) will often suffice.

[49] * Technically, Python keeps already loaded modules in the built-in sys.modules dictionary, which it checks at the start of an import operation to determine whether the referenced module is already loaded. If you want to see which modules are loaded, import sys and print sys.modules.keys( ). More on this internal table in Chapter 21.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.116.51