7. Classes and Object-Oriented Programming

Classes are the mechanism used to create new kinds of objects. This chapter covers the details of classes, but is not intended to be an in-depth reference on object-oriented programming and design. It’s assumed that the reader has some prior experience with data structures and object-oriented programming in other languages such as C or Java. (Chapter 3, “Types and Objects,” contains additional information about the terminology and internal implementation of objects.)

The class Statement

A class defines a set of attributes that are associated with, and shared by, a collection of objects known as instances. A class is most commonly a collection of functions (known as methods), variables (which are known as class variables), and computed attributes (which are known as properties).

A class is defined using the class statement. The body of a class contains a series of statements that execute during class definition. Here’s an example:

image

The values created during the execution of the class body are placed into a class object that serves as a namespace much like a module. For example, the members of the Account class are accessed as follows:

image

It’s important to note that a class statement by itself doesn’t create any instances of the class (for example, no accounts are actually created in the preceding example). Rather, a class merely sets up the attributes that will be common to all the instances that will be created later. In this sense, you might think of it as a blueprint.

The functions defined inside a class are known as instance methods. An instance method is a function that operates on an instance of the class, which is passed as the first argument. By convention, this argument is called self, although any legal identifier name can be used. In the preceding example, deposit(), withdraw(), and inquiry() are examples of instance methods.

Class variables such as num_accounts are values that are shared among all instances of a class (that is, they’re not individually assigned to each instance). In this case, it’s a variable that’s keeping track of how many Account instances are in existence.

Class Instances

Instances of a class are created by calling a class object as a function. This creates a new instance that is then passed to the _ _init_ _() method of the class. The arguments to _ _init_ _() consist of the newly created instance self along with the arguments supplied when calling the class object. For example:

image

Inside _ _init_ _(), attributes are saved in the instance by assigning to self. For example, self.name = name is saving a name attribute in the instance. Once the newly created instance has been returned to the user, these attributes as well as attributes of the class are accessed using the dot (.) operator as follows:

image

The dot (.) operator is responsible for attribute binding. When you access an attribute, the resulting value may come from several different places. For example, a.name in the previous example returns the name attribute of the instance a. However, a.deposit returns the deposit attribute (a method) of the Account class. When you access an attribute, the instance is checked first and if nothing is known, the search moves to the instance’s class instead. This is the underlying mechanism by which a class shares its attributes with all of its instances.

Scoping Rules

Although classes define a namespace, classes do not create a scope for names used inside the bodies of methods. Therefore, when you’re implementing a class, references to attributes and methods must be fully qualified. For example, in methods you always reference attributes of the instance through self. Thus, in the example you use self.balance, not balance. This also applies if you want to call a method from another method, as shown in the following example:

image

The lack of scoping in classes is one area where Python differs from C++ or Java. If you have used those languages, the self parameter in Python is the same as the this pointer. The explicit use of self is required because Python does not provide a means to explicitly declare variables (that is, a declaration such as int x or float y in C). Without this, there is no way to know whether an assignment to a variable in a method is supposed to be a local variable or if it’s supposed to be saved as an instance attribute. The explicit use of self fixes this—all values stored on self are part of the instance and all other assignments are just local variables.

Inheritance

Inheritance is a mechanism for creating a new class that specializes or modifies the behavior of an existing class. The original class is called a base class or a superclass. The new class is called a derived class or a subclass. When a class is created via inheritance, it “inherits” the attributes defined by its base classes. However, a derived class may redefine any of these attributes and add new attributes of its own.

Inheritance is specified with a comma-separated list of base-class names in the class statement. If there is no logical base class, a class inherits from object, as has been shown in prior examples. object is a class which is the root of all Python objects and which provides the default implementation of some common methods such as _ _str_ _(), which creates a string for use in printing.

Inheritance is often used to redefine the behavior of existing methods. As an example, here’s a specialized version of Account that redefines the inquiry() method to periodically overstate the current balance with the hope that someone not paying close attention will overdraw his account and incur a big penalty when making a payment on their subprime mortgage:

image

In this example, instances of EvilAccount are identical to instances of Account except for the redefined inquiry() method.

Inheritance is implemented with only a slight enhancement of the dot (.) operator. Specifically, if the search for an attribute doesn’t find a match in the instance or the instance’s class, the search moves on to the base class. This process continues until there are no more base classes to search. In the previous example, this explains why c.deposit() calls the implementation of deposit() defined in the Account class.

A subclass can add new attributes to the instances by defining its own version of _ _init_ _(). For example, this version of EvilAccount adds a new attribute evilfactor:

image

When a derived class defines _ _init_ _(), the _ _init_ _() methods of base classes are not automatically invoked. Therefore, it’s up to a derived class to perform the proper initialization of the base classes by calling their _ _init_ _() methods. In the previous example, this is shown in the statement that calls Account._ _init_ _(). If a base class does not define _ _init_ _(), this step can be omitted. If you don’t know whether the base class defines _ _init_ _(), it is always safe to call it without any arguments because there is always a default implementation that simply does nothing.

Occasionally, a derived class will reimplement a method but also want to call the original implementation. To do this, a method can explicitly call the original method in the base class, passing the instance self as the first parameter as shown here:

image

A subtlety in this example is that the class EvilAccount doesn’t actually implement the deposit() method. Instead, it is implemented in the Account class. Although this code works, it might be confusing to someone reading the code (e.g., was EvilAccount supposed to implement deposit()?). Therefore, an alternative solution is to use the super() function as follows:

image

super(cls, instance) returns a special object that lets you perform attribute lookups on the base classes. If you use this, Python will search for an attribute using the normal search rules that would have been used on the base classes. This frees you from hard-coding the exact location of a method and more clearly states your intentions (that is, you want to call the previous implementation without regard for which base class defines it). Unfortunately, the syntax of super() leaves much to be desired. If you are using Python 3, you can use the simplified statement super().deposit(amount) to carry out the calculation shown in the example. In Python 2, however, you have to use the more verbose version.

Python supports multiple inheritance. This is specified by having a class list multiple base classes. For example, here are a collection of classes:

image

When multiple inheritance is used, attribute resolution becomes considerably more complicated because there are many possible search paths that could be used to bind attributes. To illustrate the possible complexity, consider the following statements:

image

In this example, methods such as deposit_fee() and withdraw_fee() are uniquely named and found in their respective base classes. However, the withdraw_fee() function doesn’t seem to work right because it doesn’t actually use the value of fee that was initialized in its own class. What has happened is that the attribute fee is a class variable defined in two different base classes. One of those values is used, but which one? (Hint: it’s DepositCharge.fee.)

To find attributes with multiple inheritance, all base classes are ordered in a list from the “most specialized” class to the “least specialized” class. Then, when searching for an attribute, this list is searched in order until the first definition of the attribute is found. In the example, the class EvilAccount is more specialized than Account because it inherits from Account. Similarly, within MostEvilAccount, DepositCharge is considered to be more specialized than WithdrawCharge because it is listed first in the list of base classes. For any given class, the ordering of base classes can be viewed by printing its _ _mro_ _ attribute. Here’s an example:

image

In most cases, this list is based on rules that “make sense.” That is, a derived class is always checked before its base classes and if a class has more than one parent, the parents are always checked in the same order as listed in the class definition. However, the precise ordering of base classes is actually quite complex and not based on any sort of “simple” algorithm such as depth-first or breadth-first search. Instead, the ordering is determined according to the C3 linearization algorithm, which is described in the paper “A Monotonic Superclass Linearization for Dylan” (K. Barrett, et al, presented at OOPSLA’96). A subtle aspect of this algorithm is that certain class hierarchies will be rejected by Python with a TypeError. Here’s an example:

image

In this case, the method resolution algorithm rejects class Z because it can’t determine an ordering of the base classes that makes sense. For example, the class X appears before class Y in the inheritance list, so it must be checked first. However, class Y is more specialized because it inherits from X. Therefore, if X is checked first, it would not be possible to resolve specialized methods in Y. In practice, these issues should rarely arise—and if they do, it usually indicates a more serious design problem with a program.

As a general rule, multiple inheritance is something best avoided in most programs. However, it is sometimes used to define what are known as mixin classes. A mixin class typically defines a set of methods that are meant to be “mixed in” to other classes in order to add extra functionality (almost like a macro). Typically, the methods in a mixin will assume that other methods are present and will build upon them. The DepositCharge and WithdrawCharge classes in the earlier example illustrate this. These classes add new methods such as deposit_fee() to classes that include them as one of the base classes. However, you would never instantiate DepositCharge by itself. In fact, if you did, it wouldn’t create an instance that could be used for anything useful (that is, the one defined method wouldn’t even execute correctly).

Just as a final note, if you wanted to fix the problematic references to fee in this example, the implementation of deposit_fee() and withdraw_fee() should be changed to refer to the attribute directly using the class name instead of self (for example, DepositChange.fee).

Polymorphism Dynamic Binding and Duck Typing

Dynamic binding (also sometimes referred to as polymorphism when used in the context of inheritance) is the capability to use an instance without regard for its type. It is handled entirely through the attribute lookup process described for inheritance in the preceding section. Whenever an attribute is accessed as obj.attr, attr is located by searching within the instance itself, the instance’s class definition, and then base classes, in that order. The first match found is returned.

A critical aspect of this binding process is that it is independent of what kind of object obj is. Thus, if you make a lookup such as obj.name, it will work on any obj that happens to have a name attribute. This behavior is sometimes referred to as duck typing in reference to the adage “if it looks like, quacks like, and walks like a duck, then it’s a duck.”

Python programmers often write programs that rely on this behavior. For example, if you want to make a customized version of an existing object, you can either inherit from it or you can simply create a completely new object that looks and acts like it but is otherwise unrelated. This latter approach is often used to maintain a loose coupling of program components. For example, code may be written to work with any kind of object whatsoever as long as it has a certain set of methods. One of the most common examples is with various “file-like” objects defined in the standard library. Although these objects work like files, they don’t inherit from the built-in file object.

Static Methods and Class Methods

In a class definition, all functions are assumed to operate on an instance, which is always passed as the first parameter self. However, there are two other common kinds of methods that can be defined.

A static method is an ordinary function that just happens to live in the namespace defined by a class. It does not operate on any kind of instance. To define a static method, use the @staticmethod decorator as shown here:

image

To call a static method, you just prefix it by the class name. You do not pass it any additional information. For example:

x = Foo.add(3,4)              # x = 7

A common use of static methods is in writing classes where you might have many different ways to create new instances. Because there can only be one _ _init_ _() function, alternative creation functions are often defined as shown here:

image

Class methods are methods that operate on the class itself as an object. Defined using the @classmethod decorator, a class method is different than an instance method in that the class is passed as the first argument which is named cls by convention. For example:

image

In this example, notice how the class TwoTimes is passed to mul() as an object. Although this example is esoteric, there are practical, but subtle, uses of class methods. As an example, suppose that you defined a class that inherited from the Date class shown previously and customized it slightly:

image

Because the class inherits from Date, it has all of the same features. However, the now() and tomorrow() methods are slightly broken. For example, if someone calls EuroDate.now(), a Date object is returned instead of a EuroDate object. A class method can fix this:

image

One caution about static and class methods is that Python does not manage these methods in a separate namespace than the instance methods. As a result, they can be invoked on an instance. For example:

image

This is potentially quite confusing because a call to d.now() doesn’t really have anything to do with the instance d. This behavior is one area where the Python object system differs from that found in other OO languages such as Smalltalk and Ruby. In those languages, class methods are strictly separate from instance methods.

Properties

Normally, when you access an attribute of an instance or a class, the associated value that is stored is returned. A property is a special kind of attribute that computes its value when accessed. Here is a simple example:

image

The resulting Circle object behaves as follows:

image

In this example, Circle instances have an instance variable c.radius that is stored. c.area and c.perimeter are simply computed from that value. The @property decorator makes it possible for the method that follows to be accessed as a simple attribute, without the extra () that you would normally have to add to call the method. To the user of the object, there is no obvious indication that an attribute is being computed other than the fact that an error message is generated if an attempt is made to redefine the attribute (as shown in the AttributeError exception above).

Using properties in this way is related to something known as the Uniform Access Principle. Essentially, if you’re defining a class, it is always a good idea to make the programming interface to it as uniform as possible. Without properties, certain attributes of an object would be accessed as a simple attribute such as c.radius whereas other attributes would be accessed as methods such as c.area(). Keeping track of when to add the extra () adds unnecessary confusion. A property can fix this.

Python programmers don’t often realize that methods themselves are implicitly handled as a kind of property. Consider this class:

image

When a user creates an instance such as f = Foo("Guido") and then accesses f.spam, the original function object spam is not returned. Instead, you get something known as a bound method, which is an object that represents the method call that will execute when the () operator is invoked on it. A bound method is like a partially evaluated function where the self parameter has already been filled in, but the additional arguments still need to be supplied by you when you call it using (). The creation of this bound method object is silently handled through a property function that executes behind the scenes. When you define static and class methods using @staticmethod and @classmethod, you are actually specifying the use of a different property function that will handle the access to those methods in a different way. For example, @staticmethod simply returns the method function back “as is” without any special wrapping or processing.

Properties can also intercept operations to set and delete an attribute. This is done by attaching additional setter and deleter methods to a property. Here is an example:

image

In this example, the attribute name is first defined as a read-only property using the @property decorator and associated method. The @name.setter and @name.deleter decorators that follow are associating additional methods with the set and deletion operations on the name attribute. The names of these methods must exactly match the name of the original property. In these methods, notice that the actual value of the name is stored in an attribute _ _name. The name of the stored attribute does not have to follow any convention, but it has to be different than the property in order to distinguish it from the name of the property itself.

In older code, you will often see properties defined using the property(getf=None, setf=None, delf=None, doc=None) function with a set of uniquely named methods for carrying out each operation. For example:

image

This older approach is still supported, but the decorator version tends to lead to classes that are a little more polished. For example, if you use decorators, the get, set, and delete functions aren’t also visible as methods.

Descriptors

With properties, access to an attribute is controlled by a series of user-defined get, set, and delete functions. This sort of attribute control can be further generalized through the use of a descriptor object. A descriptor is simply an object that represents the value of an attribute. By implementing one or more of the special methods _ _get_ _(), _ _set_ _(), and _ _delete_ _(), it can hook into the attribute access mechanism and can customize those operations. Here is an example:

image

In this example, the class TypedProperty defines a descriptor where type checking is performed when the attribute is assigned and an error is produced if an attempt is made to delete the attribute. For example:

image

Descriptors can only be instantiated at the class level. It is not legal to create descriptors on a per-instance basis by creating descriptor objects inside _ _init_ _() and other methods. Also, the attribute name used by the class to hold a descriptor takes precedence over attributes stored on instances. In the previous example, this is why the descriptor object takes a name parameter and why the name is changed slightly by inserting a leading underscore. In order for the descriptor to store a value on the instance, it has to pick a name that is different than that being used by the descriptor itself.

Data Encapsulation and Private Attributes

By default, all attributes and methods of a class are “public.” This means that they are all accessible without any restrictions. It also implies that everything defined in a base class is inherited and accessible within a derived class. This behavior is often undesirable in object-oriented applications because it exposes the internal implementation of an object and can lead to namespace conflicts between objects defined in a derived class and those defined in a base class.

To fix this problem, all names in a class that start with a double underscore, such as _ _Foo, are automatically mangled to form a new name of the form _Classname_ _Foo. This effectively provides a way for a class to have private attributes and methods because private names used in a derived class won’t collide with the same private names used in a base class. Here’s an example:

image

Although this scheme provides the illusion of data hiding, there’s no strict mechanism in place to actually prevent access to the “private” attributes of a class. In particular, if the name of the class and corresponding private attribute are known, they can be accessed using the mangled name. A class can make these attributes less visible by redefining the _ _dir_ _() method, which supplies the list of names returned by the dir() function that’s used to inspect objects.

Although this name mangling might look like an extra processing step, the mangling process actually only occurs once at the time a class is defined. It does not occur during execution of the methods, nor does it add extra overhead to program execution. Also, be aware that name mangling does not occur in functions such as getattr(), hasattr(), setattr(), or delattr() where the attribute name is specified as a string. For these functions, you need to explicitly use the mangled name such as _Classname_ _name to access the attribute.

It is recommended that private attributes be used when defining mutable attributes via properties. By doing so, you will encourage users to use the property name rather than accessing the underlying instance data directly (which is probably not what you intended if you wrapped it with a property to begin with). An example of this appeared in the previous section.

Giving a method a private name is a technique that a superclass can use to prevent a derived class from redefining and changing the implementation of a method. For example, the A.bar() method in the example only calls A._ _spam(), regardless of the type of self or the presence of a different _ _spam() method in a derived class.

Finally, don’t confuse the naming of private class attributes with the naming of “private” definitions in a module. A common mistake is to define a class where a single leading underscore is used on attribute names in an effort to hide their values (e.g., _name). In modules, this naming convention prevents names from being exported by the from module import * statement. However, in classes, this naming convention does not hide the attribute nor does it prevent name clashes that arise if someone inherits from the class and defines a new attribute or method with the same name.

Object Memory Management

When a class is defined, the resulting class is a factory for creating new instances. For example:

image

The creation of an instance is carried out in two steps using the special method _ _new_ _(), which creates a new instance, and _ _init_ _(), which initializes it. For example, the operation c = Circle(4.0) performs these steps:

image

The _ _new_ _() method of a class is something that is rarely defined by user code. If it is defined, it is typically written with the prototype _ _new_ _(cls, *args, **kwargs) where args and kwargs are the same arguments that will be passed to _ _init_ _(). _ _new_ _() is always a class method that receives the class object as the first parameter. Although _ _new_ _() creates an instance, it does not automatically call _ _init_ _().

If you see _ _new_ _() defined in a class, it usually means the class is doing one of two things. First, the class might be inheriting from a base class whose instances are immutable. This is common if defining objects that inherit from an immutable built-in type such as an integer, string, or tuple because _ _new_ _() is the only method that executes prior to the instance being created and is the only place where the value could be modified (in _ _init_ _(), it would be too late). For example:

image

The other major use of _ _new_ _() is when defining metaclasses. This is described at the end of this chapter.

Once created, instances are managed by reference counting. If the reference count reaches zero, the instance is immediately destroyed. When the instance is about to be destroyed, the interpreter first looks for a _ _del_ _() method associated with the object and calls it. In practice, it’s rarely necessary for a class to define a _ _del_ _() method. The only exception is when the destruction of an object requires a cleanup action such as closing a file, shutting down a network connection, or releasing other system resources. Even in these cases, it’s dangerous to rely on _ _del_ _() for a clean shutdown because there’s no guarantee that this method will be called when the interpreter exits. A better approach may be to define a method such as close() that a program can use to explicitly perform a shutdown.

Occasionally, a program will use the del statement to delete a reference to an object. If this causes the reference count of the object to reach zero, the _ _del_ _() method is called. However, in general, the del statement doesn’t directly call _ _del_ _().

A subtle danger involving object destruction is that instances for which _ _del_ _() is defined cannot be collected by Python’s cyclic garbage collector (which is a strong reason not to define _ _del_ _ unless you need to). Programmers coming from languages without automatic garbage collection (e.g., C++) should take care not to adopt a programming style where _ _del_ _() is unnecessarily defined. Although it is rare to break the garbage collector by defining _ _del_ _(), there are certain types of programming patterns, especially those involving parent-child relationships or graphs, where this can be a problem. For example, suppose you had an object that was implementing a variant of the “Observer Pattern.”

image

In this code, the Account class allows a set of AccountObserver objects to monitor an Account instance by receiving an update whenever the balance changes. To do this, each Account keeps a set of the observers and each AccountObserver keeps a reference back to the account. Each class has defined _ _del_ _() in an attempt to provide some sort of cleanup (such as unregistering and so on). However, it just doesn’t work. Instead, the classes have created a reference cycle in which the reference count never drops to 0 and there is no cleanup. Not only that, the garbage collector (the gc module) won’t even clean it up, resulting in a permanent memory leak.

One way to fix the problem shown in this example is for one of the classes to create a weak reference to the other using the weakref module. A weak reference is a way of creating a reference to an object without increasing its reference count. To work with a weak reference, you have to add an extra bit of functionality to check whether the object being referred to still exists. Here is an example of a modified observer class:

image

In this example, a weak reference accountref is created. To access the underlying Account, you call it like a function. This either returns the Account or None if it’s no longer around. With this modification, there is no longer a reference cycle. If the Account object is destroyed, its _ _del_ _ method runs and observers receive notification. The gc module also works properly. More information about the weakref module can be found in Chapter 13, “Python Runtime Services.”

Object Representation and Attribute Binding

Internally, instances are implemented using a dictionary that’s accessible as the instance’s _ _dict_ _ attribute. This dictionary contains the data that’s unique to each instance. Here’s an example:

image

New attributes can be added to an instance at any time, like this:

a.number = 123456        # Add attribute 'number' to a._ _dict_ _

Modifications to an instance are always reflected in the local _ _dict_ _ attribute. Likewise, if you make modifications to _ _dict_ _ directly, those modifications are reflected in the attributes.

Instances are linked back to their class by a special attribute _ _class_ _. The class itself is also just a thin layer over a dictionary which can be found in its own _ _dict_ _ attribute. The class dictionary is where you find the methods. For example:

image

Finally, classes are linked to their base classes in a special attribute _ _bases_ _, which is a tuple of the base classes. This underlying structure is the basis for all of the operations that get, set, and delete the attributes of objects.

Whenever an attribute is set using obj.name = value, the special method obj._ _setattr_ _("name", value) is invoked. If an attribute is deleted using del obj.name, the special method obj._ _delattr_ _("name") is invoked. The default behavior of these methods is to modify or remove values from the local _ _dict_ _ of obj unless the requested attribute happens to correspond to a property or descriptor. In that case, the set and delete operation will be carried out by the set and delete functions associated with the property.

For attribute lookup such as obj.name, the special method obj._ _getattrribute_ _("name") is invoked. This method carries out the search process for finding the attribute, which normally includes checking for properties, looking in the local _ _dict_ _ attribute, checking the class dictionary, and searching the base classes. If this search process fails, a final attempt to find the attribute is made by trying to invoke the _ _getattr_ _() method of the class (if defined). If this fails, an AttributeError exception is raised.

User-defined classes can implement their own versions of the attribute access functions, if desired. For example:

image

A class that reimplements these methods should probably rely upon the default implementation in object to carry out the actual work. This is because the default implementation takes care of the more advanced features of classes such as descriptors and properties.

As a general rule, it is relatively uncommon for classes to redefine the attribute access operators. However, one application where they are often used is in writing general-purpose wrappers and proxies to existing objects. By redefining _ _getattr_ _(), _ _setattr_ _(), and _ _delattr_ _(), a proxy can capture attribute access and transparently forward those operations on to another object.

_ _slots_ _

A class can restrict the set of legal instance attribute names by defining a special variable called _ _slots_ _. Here’s an example:

image

When _ _slots_ _ is defined, the attribute names that can be assigned on instances are restricted to the names specified. Otherwise, an AttributeError exception is raised. This restriction prevents someone from adding new attributes to existing instances and solves the problem that arises if someone assigns a value to an attribute that they can’t spell correctly.

In reality, _ _slots_ _ was never implemented to be a safety feature. Instead, it is actually a performance optimization for both memory and execution speed. Instances of a class that uses _ _slots_ _ no longer use a dictionary for storing instance data. Instead, a much more compact data structure based on an array is used. In programs that create a large number of objects, using _ _slots_ _ can result in a substantial reduction in memory use and execution time.

Be aware that the use of _ _slots_ _ has a tricky interaction with inheritance. If a class inherits from a base class that uses _ _slots_ _, it also needs to define _ _slots_ _ for storing its own attributes (even if it doesn’t add any) to take advantage of the benefits _ _slots_ _ provides. If you forget this, the derived class will run slower and use even more memory than what would have been used if _ _slots_ _ had not been used on any of the classes!

The use of _ _slots_ _ can also break code that expects instances to have an underlying _ _dict_ _ attribute. Although this often does not apply to user code, utility libraries and other tools for supporting objects may be programmed to look at _ _dict_ _ for debugging, serializing objects, and other operations.

Finally, the presence of _ _slots_ _ has no effect on the invocation of methods such as _ _getattribute_ _(), _ _getattr_ _(), and _ _setattr_ _() should they be redefined in a class. However, the default behavior of these methods will take _ _slots_ _ into account. In addition, it should be stressed that it is not necessary to add method or property names to _ _slots_ _, as they are stored in the class, not on a per-instance basis.

Operator Overloading

User-defined objects can be made to work with all of Python’s built-in operators by adding implementations of the special methods described in Chapter 3 to a class. For example, if you wanted to add a new kind of number to Python, you could define a class in which special methods such as _ _ add_ _() were defined to make instances work with the standard mathematical operators.

The following example shows how this works by defining a class that implements the complex numbers with some of the standard mathematical operators.

Note

Because Python already provides a complex number type, this class is only provided for the purpose of illustration.

image

In the example, the _ _repr_ _() method creates a string that can be evaluated to re-create the object (that is, "Complex(real,imag)"). This convention should be followed for all user-defined objects as applicable. On the other hand, the _ _str_ _() method creates a string that’s intended for nice output formatting (this is the string that would be produced by the print statement).

The other operators, such as _ _add_ _() and _ _sub_ _(), implement mathematical operations. A delicate matter with these operators concerns the order of operands and type coercion. As implemented in the previous example, the _ _add_ _() and _ _sub_ _() operators are applied only if a complex number appears on the left side of the operator. They do not work if they appear on the right side of the operator and the left-most operand is not a Complex. For example:

image

The operation c + 4.0 works partly by accident. All of Python’s built-in numbers already have .real and .imag attributes, so they were used in the calculation. If the other object did not have these attributes, the implementation would break. If you want your implementation of Complex to work with objects missing these attributes, you have to add extra conversion code to extract the needed information (which might depend on the type of the other object).

The operation 4.0 + c does not work at all because the built-in floating point type doesn’t know anything about the Complex class. To fix this, you can add reversed-operand methods to Complex:

image

These methods serve as a fallback. If the operation 4.0 + c fails, Python tries to execute c._ _radd_ _(4.0) first before issuing a TypeError.

Older versions of Python have tried various approaches to coerce types in mixed-type operations. For example, you might encounter legacy Python classes that implement a _ _coerce_ _() method. This is no longer used by Python 2.6 or Python 3. Also, don’t be fooled by special methods such as _ _int_ _(), _ _float_ _(), or _ _complex_ _(). Although these methods are called by explicit conversions such as int(x) or float(x), they are never called implicitly to perform type conversion in mixed-type arithmetic. So, if you are writing classes where operators must work with mixed types, you have to explicitly handle the type conversion in the implementation of each operator.

Types and Class Membership Tests

When you create an instance of a class, the type of that instance is the class itself. To test for membership in a class, use the built-in function isinstance(obj,cname). This function returns True if an object, obj, belongs to the class cname or any class derived from cname. Here’s an example:

image

Similarly, the built-in function issubclass(A,B) returns True if the class A is a subclass of class B. Here’s an example:

image

A subtle problem with type-checking of objects is that programmers often bypass inheritance and simply create objects that mimic the behavior of another object. As an example, consider these two classes:

image

In this example, FooProxy is functionally identical to Foo. It implements the same methods, and it even uses Foo underneath the covers. Yet, in the type system, FooProxy is different than Foo. For example:

image

If a program has been written to explicitly check for a Foo using isinstance(), then it certainly won’t work with a FooProxy object. However, this degree of strictness is often not exactly what you want. Instead, it might make more sense to assert that an object can simply be used as Foo because it has the same interface. To do this, it is possible to define an object that redefines the behavior of isinstance() and issubclass() for the purpose of grouping objects together and type-checking. Here is an example:

image

In this example, the class IClass creates an object that merely groups a collection of other classes together in a set. The register() method adds a new class to the set. The special method _ _instancecheck_ _() is called if anyone performs the operation isinstance(x, IClass). The special method _ _subclasscheck_ _() is called if the operation issubclass(C,IClass) is called.

By using the IFoo object and registered implementers, one can now perform type checks such as the following:

image

In this example, it’s important to emphasize that no strong type-checking is occurring. The IFoo object has overloaded the instance checking operations in a way that allows a you to assert that a class belongs to a group. It doesn’t assert any information on the actual programming interface, and no other verification actually occurs. In fact, you can simply register any collection of objects you want to group together without regard to how those classes are related to each other. Typically, the grouping of classes is based on some criteria such as all classes implementing the same programming interface. However, no such meaning should be inferred when overloading _ _instancecheck_ _() or _ _subclasscheck_ _(). The actual interpretation is left up to the application.

Python provides a more formal mechanism for grouping objects, defining interfaces, and type-checking. This is done by defining an abstract base class, which is defined in the next section.

Abstract Base Classes

In the last section, it was shown that the isinstance() and issubclass() operations can be overloaded. This can be used to create objects that group similar classes together and to perform various forms of type-checking. Abstract base classes build upon this concept and provide a means for organizing objects into a hierarchy, making assertions about required methods, and so forth.

To define an abstract base class, you use the abc module. This module defines a metaclass (ABCMeta) and a set of decorators (@abstractmethod and @abstractproperty) that are used as follows:

image

The definition of an abstract class needs to set its metaclass to ABCMeta as shown (also, be aware that the syntax differs between Python 2 and 3). This is required because the implementation of abstract classes relies on a metaclass (described in the next section). Within the abstract class, the @abstractmethod and @abstractproperty decorators specify that a method or property must be implemented by subclasses of Foo.

An abstract class is not meant to be instantiated directly. If you try to create a Foo for the previous class, you will get the following error:

image

This restriction carries over to derived classes as well. For instance, if you have a class Bar that inherits from Foo but it doesn’t implement one or more of the abstract methods, attempts to create a Bar will fail with a similar error. Because of this added checking, abstract classes are useful to programmers who want to make assertions on the methods and properties that must be implemented on subclasses.

Although an abstract class enforces rules about methods and properties that must be implemented, it does not perform conformance checking on arguments or return values. Thus, an abstract class will not check a subclass to see whether a method has used the same arguments as an abstract method. Likewise, an abstract class that requires the definition of a property does not check to see whether the property in a subclass supports the same set of operations (get, set, and delete) of the property specified in a base.

Although an abstract class can not be instantiated, it can define methods and properties for use in subclasses. Moreover, an abstract method in the base can still be called from a subclass. For example, calling Foo.spam(a,b) from the subclass is allowed.

Abstract base classes allow preexisting classes to be registered as belonging to that base. This is done using the register() method as follows:

image

When a class is registered with an abstract base, type-checking operations involving the abstract base (such as isinstance() and issubclass()) will return True for instances of the registered class. When a class is registered with an abstract class, no checks are made to see whether the class actually implements any of the abstract methods or properties. This registration process only affects type-checking. It does not add extra error checking to the class that is registered.

Unlike many other object-oriented languages, Python’s built-in types are organized into a relatively flat hierarchy. For example, if you look at the built-in types such as int or float, they directly inherit from object, the root of all objects, instead of an intermediate base class representing numbers. This makes it clumsy to write programs that want to inspect and manipulate objects based on a generic category such as simply being an instance of a number.

The abstract class mechanism addresses this issue by allowing preexisting objects to be organized into user-definable type hierarchies. Moreover, some library modules aim to organize the built-in types according to different capabilities that they possess. The collections module contains abstract base classes for various kinds of operations involving sequences, sets, and dictionaries. The numbers module contains abstract base classes related to organizing a hierarchy of numbers. Further details can be found in Chapter 14, “Mathematics,” and Chapter 15, “Data Structures, Algorithms, and Utilities.”

Metaclasses

When you define a class in Python, the class definition itself becomes an object. Here’s an example:

image

If you think about this long enough, you will realize that something had to create the Foo object. This creation of the class object is controlled by a special kind of object called a metaclass. Simply stated, a metaclass is an object that knows how to create and manage classes.

In the preceding example, the metaclass that is controlling the creation of Foo is a class called type. In fact, if you display the type of Foo, you will find out that it is a type:

image

When a new class is defined with the class statement, a number of things happen. First, the body of the class is executed as a series of statements within its own private dictionary. The execution of statements is exactly the same as in normal code with the addition of the name mangling that occurs on private members (names that start with _ _). Finally, the name of the class, the list of base classes, and the dictionary are passed to the constructor of a metaclass to create the corresponding class object. Here is an example of how it works:

image

The final step of class creation where the metaclass type() is invoked can be customized. The choice of what happens in the final step of class definition is controlled in a number of ways. First, the class can explicitly specify its metaclass by either setting a _ _metaclass_ _ class variable (Python 2), or supplying the metaclass keyword argument in the tuple of base classes (Python 3).

image

If no metaclass is explicitly specified, the class statement examines the first entry in the tuple of base classes (if any). In this case, the metaclass is the same as the type of the first base class. Therefore, when you write

class Foo(object): pass

Foo will be the same type of class as object.

If no base classes are specified, the class statement checks for the existence of a global variable called _ _metaclass_ _. If this variable is found, it will be used to create classes. If you set this variable, it will control how classes are created when a simple class statement is used. Here’s an example:

image

Finally, if no _ _metaclass_ _ value can be found anywhere, Python uses the default metaclass. In Python 2, this defaults to types.ClassType, which is known as an old-style class. This kind of class, deprecated since Python 2.2, corresponds to the original implementation of classes in Python. Although these classes are still supported, they should be avoided in new code and are not covered further here. In Python 3, the default metaclass is simply type().

The primary use of metaclasses is in frameworks that want to assert more control over the definition of user-defined objects. When a custom metaclass is defined, it typically inherits from type() and reimplements methods such as _ _init_ _() or _ _new_ _(). Here is an example of a metaclass that forces all methods to have a documentation string:

image

In this metaclass, the _ _init_ _() method has been written to inspect the contents of the class dictionary. It scans the dictionary looking for methods and checking to see whether they all have documentation strings. If not, a TypeError exception is generated. Otherwise, the default implementation of type._ _init_ _() is called to initialize the class.

To use this metaclass, a class needs to explicitly select it. The most common technique for doing this is to first define a base class such as the following:

image

This base class is then used as the parent for all objects that are to be documented. For example:

image

This example illustrates one of the major uses of metaclasses, which is that of inspecting and gathering information about class definitions. The metaclass isn’t changing anything about the class that actually gets created but is merely adding some additional checks.

In more advanced metaclass applications, a metaclass can both inspect and alter the contents of a class definition prior to the creation of the class. If alterations are going to be made, you should redefine the _ _new_ _() method that runs prior to the creation of the class itself. This technique is commonly combined with techniques that wrap attributes with descriptors or properties because it is one way to capture the names being used in the class. As an example, here is a modified version of the TypedProperty descriptor that was used in the “Descriptors” section:

image

In this example, the name attribute of the descriptor is simply set to None. To fill this in, we’ll rely on a meta class. For example:

image

In this example, the metaclass scans the class dictionary and looks for instances of TypedProperty. If found, it sets the name attribute and builds a list of names in slots. After this is done, a _ _slots_ _ attribute is added to the class dictionary, and the class is constructed by calling the _ _new_ _() method of the type() metaclass. Here is an example of using this new metaclass:

image

Although metaclasses make it possible to drastically alter the behavior and semantics of user-defined classes, you should probably resist the urge to use metaclasses in a way that makes classes work wildly different from what is described in the standard Python documentation. Users will be confused if the classes they must write don’t adhere to any of the normal coding rules expected for classes.

Class Decorators

In the previous section, it was shown how the process of creating a class can be customized by defining a metaclass. However, sometimes all you want to do is perform some kind of extra processing after a class is defined, such as adding a class to a registry or database. An alternative approach for such problems is to use a class decorator. A class decorator is a function that takes a class as input and returns a class as output. For example:

image

In this example, the register function looks inside a class for a _ _clsid_ _ attribute. If found, it’s used to add the class to a dictionary mapping class identifiers to class objects. To use this function, you can use it as a decorator right before the class definition. For example:

image

Here, the use of the decorator syntax is mainly one of convenience. An alternative way to accomplish the same thing would have been this:

image

Although it’s possible to think of endless diabolical things one might do to a class in a class decorator function, it’s probably best to avoid excessive magic such as putting a wrapper around the class or rewriting the class contents.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.125