Modules and packages

Now that we know how to create classes and instantiate objects, it is time to think about organizing them. For small programs, we can just put all our classes into one file and put some code at the end of the file to start them interacting. However, as our projects grow, it can become difficult to find one class that needs to be edited among the many classes we've defined. This is where modules come in. Modules are simply Python files, nothing more. The single file in our small program is a module. Two Python files are two modules. If we have two files in the same folder, we can load a class from one module for use in the other module.

For example, if we are building an e-commerce system, we will likely be storing a lot of data in a database. We can put all the classes and functions related to database access into a separate file (we'll call it something sensible: database.py). Then our other modules (for example: customer models, product information, and inventory) can import classes from that module in order to access the database.

The import statement is used for importing modules or specific classes or functions from modules. We've already seen an example of this in our Point class in the previous section. We used the import statement to get Python's built-in math module so we could use its sqrt function in our distance calculation.

Here's a concrete example. Assume we have a module called database.py that contains a class called Database, and a second module called products.py that is responsible for product-related queries. At this point, we don't need to think too much about the contents of these files. What we know is that products.py needs to instantiate the Database class from database.py so it can execute queries on the product table in the database.

There are several variations on the import statement syntax that can be used to access the class.

	import database
	db = database.Database()
	# Do queries on db

This version imports the database module into the products namespace (the list of names currently accessible in a module or function), so any class or function in the database module can be accessed using database.<something> notation. Alternatively, we can import just the one class we need using the from...import syntax:

from database import Database
db = Database()
# Do queries on db

If, for some reason, products already has a class called Database, and we don't want the two names to be confused, we can rename the class when used inside the products module:

	from database import Database as DB
	db = DB()
	# Do queries on db

We can also import multiple items in one statement. If our database module also contains a Query class, we can import both classes using:

	from database import Database, Query

Some sources say that we can even import all classes and functions from the database module using this syntax:

	from database import *

Don't do this. Every experienced Python programmer will tell you that you should never use this syntax. They'll use obscure justifications like, "it clutters up the namespace", which doesn't make much sense to beginners. One way to learn why to avoid this syntax is to use it and try to understand your code two years later. But we can save some time and two years of poorly written code with a quick explanation now!

When we explicitly import the database class at the top of our file using from database import Database, we can easily see where the Database class comes from. We might use db = Database() 400 lines later in the file, and we can quickly look at the imports to see where that Database class came from. Then if we need clarification as to how to use the Database class, we can visit the original file (or import the module in the interactive interpreter and use the help(database.Database) command). However, if we use from database import * syntax, it takes a lot longer to find where that class is located. Code maintenance becomes a nightmare.

In addition, many editors are able to provide extra functionality, such as reliable code completion or the ability to jump to the definition of a class if normal imports are used. The import * syntax usually completely destroys their ability to do this reliably.

Finally, using the import * syntax can bring unexpected objects into our local namespace. Sure, it will import all the classes and functions defined in the module being imported from, but it will also import any classes or modules that were themselves imported into that file!

In spite of all these warnings, you may think, "if I only use from X import * syntax for one module, I can assume any unknown imports come from that module". This is technically true, but it breaks down in practice. I promise that if you use this syntax, you (or someone else trying to understand your code) will have extremely frustrating moments of, "Where on earth can this class be coming from?" Every name used in a module should come from a well-specified place, whether it is defined in that module, or explicitly imported from another module. There should be no magic variables that seem to come out of thin air. We should always be able to immediately identify where the names in our current namespace originated.

Organizing the modules

As a project grows into a collection of more and more modules, we may find that we want to add another level of abstraction, some kind of nested hierarchy on our modules' levels. But we can't put modules inside modules; one file can only hold one file, after all, and modules are nothing more than Python files.

Files, however, can go in folders and so can modules. A package is a collection of modules in a folder. The name of the package is the name of the folder. All we need to do to tell Python that a folder is a package and place a (normally empty) file in the folder named __init__.py. If we forget this file, we won't be able to import modules from that folder.

Let's put our modules inside an ecommerce package in our working folder, which will also contain a main.py to start the program. Let's additionally add another package in the ecommerce package for various payment options. The folder hierarchy will look like this:

	parent_directory/
		main.py
		ecommerce/
			__init__.py	
			database.py
			products.py
			payments/
				__init__.py
				paypal.py
				authorizenet.py

When importing modules or classes between packages, we have to be cautious about the syntax. In Python 3, there are two ways of importing modules: absolute imports and relative imports.

Absolute imports

Absolute imports specify the complete path to the module, function, or path we want to import. If we need access to the Product class inside the products module, we could use any of these syntaxes to do an absolute import:

	import ecommerce.products
	product = ecommerce.products.Product()

or

	from ecommerce.products import Product
	product = Product()

or

	from ecommerce import products
	product = products.Product()

The import statements separate packages or modules using the period as a separator.

These statements will work from any module. We could instantiate a Product using this syntax in main.py, in the database module, or in either of the two payment modules. Indeed, so long as the packages are available to Python, it will be able to import them. For example, the packages can also be installed to the Python site packages folder, or the PYTHONPATH environment variable could be customized to dynamically tell Python what folders to search for packages and modules it is going to import.

So with these choices, which syntax do we choose? It depends on your personal taste and the application at hand. If there are dozens of classes and functions inside the products module that I want to use, I generally import the module name using the from ecommerce import products syntax and then access the individual classes using products.Product. If I only need one or two classes from the products module, I import them directly using the from ecommerce.proucts import Product syntax. I don't personally use the first syntax very often unless I have some kind of name conflict (for example, I need to access two completely different modules called products and I need to separate them). Do whatever you think makes your code look more elegant.

Relative imports

When working with related modules in a package, it seems kind of silly to specify the full path; we know what our parent module is named. This is where relative imports come in. Relative imports are basically a way of saying "find a class, function, or module as it is positioned relative to the current module". For example, if we are working in the products module and we want to import the Database class from the database module "next" to it, we could use a relative import:

	from .database import Database

The period in front of database says, "Use the database module inside the current package". In this case, the current package is the package containing the products.py file we are currently editing, that is, the ecommerce package.

If we were editing the paypal module inside the ecommerce.payments package, we would want to say, "Use the database package inside the parent package", instead. That is easily done with two periods:

	from ..database import Database

We can use more periods to go further up the hierarchy. Of course, we can also go down one side and back up the other. We don't have a deep enough example hierarchy to illustrate this properly, but the following would be a valid import if we had a ecommerce.contact package containing an email module and wanted to import the send_mail function into our paypal module:

	from ..contact.email import send_mail

This import uses two periods to say, "the parent of the payments package", then uses normal package.module syntax to go back "up" into the contact package.

Inside any one module, we can specify variables, classes, or functions. They can be a handy way of storing global state without namespace conflicts. For example, we have been importing the Database class into various modules and then instantiating it, but it might make more sense to have only one database object globally available from the database module. The database module might look like this:

	class Database:
		# the database implementation
		pass
		
	database = Database()

Then we can use any of the import methods we've discussed to access the database object, for example:

	from ecommerce.database import database

A problem with the above class is that the database object is created immediately when the module is first imported, which is usually when the program starts up. This isn't always ideal, since connecting to a database can take a while, slowing down startup, or the database connection information may not yet be available. We could delay creating the database until it is actually needed by calling an initialize_database function to create the module-level variable:

	class Database:
		# the database implementation
		pass
		
	database = None
	
	def initialize_database():
		global database
		database = Database()

The global keyword tells Python that the database variable inside initialize_database is the module-level one we just defined. If we had not specified the variable as global, Python would have created a new local variable that would be discarded when the method exits, leaving the module-level value unchanged.

As these two examples illustrate, all code in a module is executed immediately at the time it is imported. However, if it is inside a method or function, the function will be created, but its internal code will not be executed until the function is called. This can be a tricky thing for scripts (like the main script in our e-commerce example) that perform execution. Often, we will write a program that does something useful, and then later find that we want to import a function or class from that module in a different program. But as soon as we import it, any code at the module level is immediately executed. If we are not careful, we can end up running the first program when we really only meant to access a couple functions inside that module.

To solve this, we should always put our startup code in a function (conventionally called main) and only execute that function when we know we are executing as a script, but not when our code is being imported from a different script. But how do we know that?:

	class UsefulClass:
		'''This class might be useful to other modules.'''
		pass		
	def main():
		'''creates a useful class and does something with it for our 
	module.'''
		useful = UsefulClass()
	print(useful)
	
	if __name__ == "__main__":
		main()
		

Every module has a __name__ special variable (remember, Python uses double underscores for special variables, like a class's __init__ method) that specifies the name of the module when it was imported. But when the module is executed directly with python module.py, it is never imported, so the __name__ is set to the string "__main__". Make it a policy to wrap all your scripts in an if __name__ == "__main__": test, just in case you write a function you will find useful to be imported by other code someday.

So methods go in classes, which go in modules, which go in packages. Is that all there is to it?

Actually, no. That is the typical order of things in a Python program, but it's not the only possible layout. Classes can be defined anywhere. They are typically defined at the module level, but they can also be defined inside a function or method, like so:

	def format_string(string, formatter=None):
		'''Format a string using the formatter object, which
		is expected to have a format() method that accepts
		a string.'''
		class DefaultFormatter:
			'''Format a string in title case.'''
			def format(self, string):
				return str(string).title()
				
		if not formatter:
			formatter = DefaultFormatter()
			
		return formatter.format(string)	
		
	hello_string = "hello world, how are you today?"
	print(" input: " + hello_string)
	print("output: " + format_string(hello_string))

Output:


input: hello world, how are you today?
output: Hello World, How Are You Today?

The format_string function accepts a string and optional formatter object, and then applies the formatter to that string. If no formatter is supplied, it creates a formatter of its own as a local class and instantiates it. Since it is created inside the scope of the function, this class cannot be accessed from anywhere outside of that function. Similarly, functions can be defined inside other functions as well; in general, any Python statement can be executed at any time. These "inner" classes and functions are useful for "one-off" items that don't require or deserve their own scope at the module level, or only make sense inside a single method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.93.12