Chapter 8. Metaprogramming, protocols, and more

This chapter covers

  • Python protocols

  • Dynamic attribute access

  • Metaprogramming with metaclasses

  • Advanced .NET interoperation

In this chapter, we’re going to look under the hood of the Python programming language. We’ve covered all the basic syntax and how to use classes from the .NET framework with IronPython. To make full use of Python, you need to know how to hook Python classes into the infrastructure that the language provides. As you write classes and libraries in Python, you’ll need to define how your objects take part in normal operations and interact with other objects. Much of this interaction is done through protocols, the magic methods that we’ve already briefly discussed.

We focus here on the use of protocols, including the metaprogramming capabilities of Python in the form of metaclasses. Metaclasses have a reputation for being something of a black art, but no book on Python would be complete without them. They can be used to achieve tasks that are much harder or even impossible with other approaches. In this part of the chapter, you’ll deepen your understanding of Python as you learn about the most important protocols.

The next part of the chapter looks at how IronPython integrates with .NET. Most of the time IronPython sits easily with the way .NET does things, but sometimes they don’t make such comfortable bedfellows. In these situations, you need to know how .NET functionality is made available in IronPython. This information will prove invaluable in any non-trivial project; along the way, we take a closer look at some of the .NET types that you use through IronPython.

Let’s start by examining how Python protocols relate to interfaces, a topic that .NET programmers will already be familiar with.

Protocols instead of interfaces

Interfaces are used in C# to specify behavior of objects. For example, if a class implements the IDisposable interface, then you can provide a Dispose method to release resources used by your objects. .NET has a whole host of interfaces, and you can create new ones. If a class implements an interface, it provides a static target for the compiler to call whenever an operation provided by that interface is used in your code.

Note

C# does have one example of duck typing: enumeration with the IEnumerable interface. To support iteration over an object,[1] classes can either implement IEnumerable or provide a GetEnumerator method. GetEnumerator can either return a type that implements IEnumerator or one that declares all the methods defined in IEnumerator.

In Python, you don’t need to provide static targets for the compiler, and you can use the principle of duck typing. Many operations are provided through a kind-of-soft interface mechanism called protocols. This isn’t to say that formal interface specification is decried in Python—how could you use an API if you didn’t know what interface it exposed?—but, again, Python chooses not to enforce this in the language design.

Note

Various third-party interface packages are available for Python, the most common one being part of the Zope project.[2]

In this section, we look at the common Python protocols and how to implement them.

A myriad of magic methods

Methods that implement protocols in Python usually have names that start and end with double underscores. They’re known as the magic methods because, instead of you calling them directly, they’re usually called for you by the interpreter.

When you compare objects, the comparison operation will cause the method corresponding to the operation to be called magically (as long as the object provides the appropriate protocol method).

We’ve already encountered several of the protocols in our journey so far: the __init__ constructor and the __getitem__ and __setitem__ methods that implement the mapping and sequence protocols. As you’d expect, there are many more that we haven’t yet encountered. A full list of all the common magic methods is in one of the appendixes, but some of them are so central to Python that we can’t do justice to the language without looking at them.

As the heading implies, Python has a lot of magic methods. You implement these to customize the behavior of your objects. Some of the protocols central to Python are also simple to implement. We start with one of these, the __len__ protocol method.

The Length of Containers

In .NET land, you find the number of members in a container object through the Length property or the Count property, depending on what kind of container it is. In Python, you determine the length of a container by calling len on it. Under the hood, this calls the __len__ method (listing 8.1).

Example 8.1. Specifying length of custom containers

>>> class Container(object):
...      def __len__(self):
...         return 3
...
>>> container = Container()
>>> len(container)
3
>>>

This listing illustrates a silly implementation of __len__ that always returns 3. The body of this method is likely to contain slightly more logic in a real container class.

That was easy. For our next trick, we look at the Python equivalents of ToString.

Representing objects as strings

You didn’t read that last sentence wrong—it’s meant to be plural. You can explicitly get a string version of an object by calling str on it. str is for producing a pretty version of an object. You can also get a representation of an object with another built-in function called repr.

You can see this distinction by working at the interactive interpreter with strings containing escaped quotes.

>>> x = 'string with 'escaped' quotes'
>>> str(x)
"string with 'escaped' quotes"
>>> repr(x)
'"string with 'escaped' quotes"'
>>>

The stringified version doesn’t have the escape backslashes. The repr’d version does have them; and, because the repr’d version includes quotes, the console adds an extra set.

When you call str on an object, Python calls __str__ for you. When you use repr, Python calls __repr__. Naturally these methods must return strings, as shown in listing 8.2.

Example 8.2. A class with custom string representations

>>> class Something(object):
...     def __str__(self):
...         return 'Something stringified'
...     def __repr__(self):
...         return "Something repr'd"
...
>>> something = Something()
>>> str(something)
'Something stringified'
>>> repr(something)
"Something repr'd"
>>>

Rather than having these methods return completely unrelated strings, the normal difference is that repr should (if possible) return a string that can be eval’d to create the equivalent object. This is why the quotes and backslashes turned up in the repr of the string. It’s common for classes to have a repr of the form ClassName(arguments) and a different str form.

In this interactive session, we’ve been making the difference between str and repr explicit. This difference also crops up implicitly—which, incidentally, is why it’s nicer to have these methods as protocol methods rather than forcing you to call them explicitly.

When you print an object, Python will use __str__ to get a string representation. When you display an object at the interactive interpreter, __repr__ is used.

>>> print something
Something stringified
>>> something
"Something repr'd"

When you use string interpolation, you can choose which method you want. %s is the formatting character for stringification, and %r will use repr.

>>> '%s and %r' % (something, something)
"Something stringified and Something repr'd"

If __str__ is unavailable but __repr__ is available, then __repr__ will be used in its place. If you want both stringification and repr’ing to be the same, and want to implement only one of these methods, then you should choose __repr__.

Just for completeness, we should mention __unicode__. This protocol method is called when you ask for a Unicode representation of an object by calling unicode on it. Because strings are already Unicode in IronPython, it’s not likely that you’ll need it, but you may encounter it in libraries and existing Python code.

Note

In fact, in IronPython 2, even if you explicitly call unicode(something), the __unicode__ method won’t be called. This is really a bug, but arises because the unicode and str types are both aliases to System.String.

A common reason for providing these methods is for debugging; being able to get useful information about your objects in tracebacks can be invaluable. A method you’re likely to implement to influence the way your objects behave in code is the __nonzero__ method.

Truth testing of objects

If you recall from the tutorial, None, empty strings, zero, and empty containers evaluate to False in logical expressions (and when used in if and while statements). Arbitrary objects evaluate to True. If you’re implementing custom containers, then for consistency they should evaluate to False when they have no members; but, by default, they will always evaluate as True.

You can control whether an object evaluates to True or False by implementing the __nonzero__ method. If this returns True, then the object will evaluate to True.

Note

The full semantics of truth value testing are slightly more complicated than we’ve described so far. If an object doesn’t implement __nonzero__, then Python will also check for __len__. If __len__ is available and returns 0, then the object will evaluate to False. The __nonzero__ method is useful for directly providing truth value testing without being dependent on supporting length.

>>> class Something(object):
...     def __nonzero__(self):
...         return False
...
>>> something = Something()
>>> bool(something)
False

These examples of working with magic methods have been nice and straightforward, almost trivially easy. This is good because even the more complex ones work on the same principles; you just need to know the rules (or know where to look them up). We now look at some more advanced protocols in the guise of operator overloading.

Operator overloading

Operator overloading allows you to control how objects take part in operations such as addition, subtraction, multiplication, and so on. You can create custom datatypes that represent values and can be used in calculations. You might want to define types that represent different currencies, and use a converter when different currencies are added together.

Each of the built-in operators (+, -, /, *, and friends) has an equivalent magic method that you can implement to support that operation. The method for addition is __add__. This method takes one argument: the object being added to. Traditionally, this argument is called other. Here’s a simple class that behaves like the number three in addition:

>>> class Three(object):
...     def __add__(self, other):
...         return other + 3
...
>>> t = Three()
>>> t + 2
5

That’s nice and easy. What happens if you try a slightly different addition with this class?

>>> 2 + t
Traceback (most recent call last):
  File , line 0, in ##29
TypeError: unsupported operand type(s) for +: 'int' and 'Three'

Because the Three instance is on the right-hand side of the addition, the integer add method is called—and integers don’t know how to add themselves to the custom class. You can solve this problem with the __radd__ method, which is the right-handed companion of __add__ and the method called when a custom type is on the right-hand side of an addition operation.

>>> class Three(object):
...     def __add__(self, other):
...         return other + 3
...     def __radd__(self, other):
...         return 3 + other
...
>>> t = Three()
>>> t + 2
5
>>> 2 + t
5

The good news is that fixing the problem is this simple; the bad news is that you need to implement two methods for each operation. The next listing shows a simple way around this two-methods problem.

Table 8.1 shows the common[3] operator magic methods and the operations they provide. All the methods listed in this table also have right-hand and in-place equivalents as well.

Table 8.1. The methods used to emulate numeric types

Method

Operator

Operation

__add__

+

Addition

__sub__

-

Subtraction

__mul__

*

Multiplication

__floordiv

//

Floor division, truncating the result to the next integer down

__div__

/

Division

__truediv

/

Used for division when true division is on

__mod__

%

Modulo

__divmod__

 

Supports the built-in function divmod

__pow__

**

Raises to the power of

__lshift__

<<

Shifts to the left

__rshift__

>>

Shifts to the right

__and__

&

Bitwise AND

__xor__

^

Bitwise exclusive OR

__or__

|

Bitwise OR

A practical application of operator overloading is creating custom datatypes. The next listing is a datatype that allows you to store values with a known amount of uncertainty. When you add two uncertain values, the uncertainty is added. When you multiply them, the uncertainty increases dramatically! Listing 8.3 shows the code for an Uncertainty class that supports addition and multiplication.

Example 8.3. Specifying length of custom containers

Specifying length of custom containers

There are a few things of note in listing 8.3. Like integers and floating point numbers, Uncertainty instances are immutable, so addition and multiplication return new instances rather than modifying themselves. It would be an odd side effect if taking part in an addition modified a number that you held a reference to elsewhere! Inside the numeric protocol methods, Uncertainty has to handle operations involving another Uncertainty differently. The first thing it does is a type check. Because __radd__ needs to do exactly the same as __add__, you make them the same method with an assignment.

There are also different implementations for __repr__ and __str__. The repr of an Uncertainty looks like Uncertainty(6, 3); the str (which is more of a pretty print) looks like 6±3. __str__ returns a Unicode string because it uses a Unicode character. This would cause problems in CPython, unless your default encoding can handle this character, but is fine under IronPython because strings are always Unicode. You’ll still need a terminal (or GUI control) capable of displaying Unicode characters to see the fancy string representation.

Operator overloading isn’t limited strictly to emulating numeric types. You can also implement these methods for objects that don’t represent values, using ordinary Python syntax to perform operations on them. For example, the popular path module by Jason Orendorff[4] overloads division so that you can concatenate paths to strings by using the division operator, which is the normal path separator on Unix systems.

newPath = path('some path') / 'subdirectory' / 'filename'

This kind of overloading operators with new meanings is a dubious practice in our opinion, but some programmers are fond of this kind of trick. Something that isn’t dubious, and in fact is central to Python, is working with iterators—the subject of the next section.

Iteration

Iteration means repeatedly performing an action on members of an object. It’s one of the fundamental concepts of programming, so being able to make your own objects iterable (or enumerable) is important.

You saw in the Document class for MultiDoc that providing a sequence-like API (a zero-indexed __getitem__ method) gets you iteration for free. Life is rarely that simple, so you need to know how to support the iteration protocol.

When Python encounters iteration over an object—as a result of a for loop, a list comprehension, or an explicit call to the iter function—it will attempt to call __iter__ on the object. If the object supports iteration, then this method should return an iterator. An iterator is an object that obeys the following three rules:

  • It has a __iter__ method that returns itself.[5]

  • It has a next method that returns the next value from the iterator.

  • When the iterator is consumed (all the values have been returned), calling next should raise the StopIteration exception.

That’s all nice and easy, but you probably immediately noticed that, unlike the other protocol methods that we’ve used, next isn’t held in the double embrace of underscores. Most magic methods are rarely called directly. The double underscores are a warning sign that, if you’re calling them directly, then you should be sure you know what you’re doing. Calling next on an iterator is perfectly normal. One place where we use this is for incrementing a counter, often on an iterator, like the following, returned by the built-in function xrange:

>>> counter = xrange(100)
>>> counter.next()
0

So without further ado, listing 8.4 demonstrates iteration with a simple class that returns every number between a start and a stop value.

Example 8.4. An example Iterator class

An example Iterator class

This example is a simple one, but your classes can support iteration by returning an object like this from their __iter__ methods. Your iterator is free to do as much dynamic magic as it wants in next, such as reading from a file or a socket. A simpler, and more powerful, way of implementing an iterator is through generators.

Generators

In the iterator created in listing 8.4, you needed to store state, in the form of the count instance variable, so that successive calls to next could calculate the next return value. A simpler pattern is available using the Python yield keyword, which is similar to the C# Yield Return statement.

When a function (or method) uses yield, it becomes a generator. Generators are a lightweight form of coroutines.

Where a function uses yield, the value is returned and execution of the generator is suspended. On the next iteration, execution continues at the point it was suspended. Complex iteration can be implemented with generators, without having to explicitly store state.

Listing 8.5 is an example with a Directory object that stores all the files in a specified path from the filesystem. It recurses into subdirectories, storing them as Directory objects. The __iter__ method is implemented as a generator and also recurses through subdirectories.

Example 8.5. Directory class that supports iteration with a generator

Directory class that supports iteration with a generator

Because the __iter__ method yields, it returns a generator, which is a specific kind of iterator. As an iterator, it has the next method available.

>>> generator = iter(Directory(some_path))
>>> generator.next()
'C:\some_path\some_file.txt'

You can easily extract all members from objects that support iteration, by calling list (or tuple) on them.

>>> paths = list(Directory(some_path))

We can’t leave Python protocols without looking at one of the most common programming concepts implemented with magic methods: equality and inequality.

Equality and inequality

Comparing objects is a basic part of programming and one of the first things a new programmer will learn; you need to know how to support equality and inequality operations on your own classes. You do this with the __eq__ (equals) and __ne__ (not equals) protocol methods. For equality to work correctly, you need to implement both these methods. If you don’t provide these methods, then Python will use object identity as the test for equality. An object will only compare equal to itself and not equal to everything else.

Listing 8.6 is a class with custom equality and inequality methods defined.

Example 8.6. Object equality and inequality methods

Object equality and inequality methods

The __ne__ method returns the not of the equality method. This is the simplest possible implementation, and it would be nice if Python did this for you! The __eq__ method always returns False unless it’s compared against another Value instance. If it’s compared to a Value object, then it compares value attributes. Without the isinstance check, equality testing would fail with an attribute error for objects that don’t have a value attribute.

As well as protocols to control equality and inequality comparisons, there are protocol methods for the other comparison operators.[8] Table 8.2 shows the rich comparison operators and corresponding protocol methods.

Table 8.2. The methods used for rich comparison

Method

Operator

Operation

__eq__

==

Equality

__ne__

!=

Inequality

__lt__

<

Less than

__le__

<=

Less than or equal to

__gt__

>

Greater than

__ge__

>=

Greater than or equal to

To support the full range of rich comparison operators, you need to implement all these methods.[9]

Tip

Another, easier way of supporting all comparison operations on an object is the __cmp__ method, which is used to overload the cmp built-in function. This method should return a negative integer if the object is less than the one it’s being compared with, zero if it’s equal to the other object, and a positive integer if it’s greater than the other object. With __cmp__ you don’t know which comparison operation is being performed, so it’s less flexible than overloading individual rich comparison operators.

We’ve now looked at quite a range of Python’s protocol methods. We haven’t used them all (by any stretch of the imagination), but we’ve covered all the most common ones. The important thing is to know the ones that you’ll use most often and where to look up the rest!

The next section will look at two protocol methods that are a little different. Instead of enabling a specific operation, __getattr__ and friends allow you to customize attribute access.

Dynamic attribute access

Python attribute access uses straightforward syntax, shared with other imperative languages like Java and C#. Through properties, you can control what happens when individual attributes are fetched or set; but, with Python, you can provide access to arbitrary attributes through the __getattr__ method.

The flip side of this is being able to dynamically access attributes when you have their names stored as strings. Python supports this through a set of built-in functions.

Attribute access with built-in functions

The following are four built-in functions available for working with attributes:

  • hasattr(object, name)Checks whether an object has a specified attribute, returning True or False.

  • getattr(object, name)Fetches a named attribute from an object. If the attribute doesn’t exist, then an AttributeError will be raised. getattr also takes an optional third argument. If this is supplied, it’s returned as a default value when the attribute is missing (instead of raising an exception).

  • setattr(object, name, value)Sets the named attribute on an object with the specified value.

  • delattr(object, name)Deletes the named attribute from an object. If the attribute doesn’t exist, then an AttributeError will be raised.

Note

The attribute-access functions invoke the whole Python attribute lookup machinery. Underneath is a dictionary per object[10] that stores attributes. This dictionary is available as __dict__. The predominance of dictionaries in the implementation of Python demonstrates how important the concept of namespaces is to the language. A dictionary is a namespace, mapping names to objects. Classes, instances, and modules are all examples of namespaces and are all implemented using dictionaries (which are exposed through __dict__).

So what are these functions useful for?

hasattr in particular is useful as a duck typing mechanism. If you want to support a particular interface, one possible pattern is it’s better to ask forgiveness than ask permission.

try:
    instance.someMethod()
except AttributeError:
    # different kind of object

Sometimes you want to ask permission, though—perhaps particular objects need different treatment. You could do type checking with isinstance, but this defeats duck typing and requires users of your code to use specific types instead of passing in objects with a compatible API. Instead, you can use hasattr, as follows:

if hasattr(instance, 'someMethod'):
    instance.someMethod()
    # and so on

getattr, setattr, and delattr are particularly useful when you have a list of attributes as strings (potentially as the result of a call to dir) and need to loop over the list performing operations. In a brief while, we’ll use these functions in a proxy class that illustrates the attribute-access protocol methods. Before we can do that, you need to learn about the attribute-access protocol methods themselves.

Attribute access through magic methods

As with built-in functions like len and str, the attribute-access functions have corresponding protocol methods that you can implement. With these functions, the relationship between the protocol method and the corresponding function is a little more complex.

The attribute-access functions invoke the full attribute lookup mechanism for Python objects. Included in this mechanism are three protocol methods that you can implement to govern how some attributes are fetched, set, and deleted.

The three protocol methods (hasattr has no direct equivalent as a protocol method) are as follows:

  • __getattr__(self, name)Provides the named attribute. This method is called only for attributes that aren’t found by the normal lookup machinery.

  • __setattr__(self, name, value)Sets the named attribute to the specified value. This method is invoked for all instance attribute setting.

  • __delattr__(self, name)Deletes the named attribute. This method is invoked for all instance attribute deletion.

As you can see, these methods are asymmetrical. __setattr__ and __delattr__ are invoked for all set and delete operations on an instance, whereas __getattr__ is only invoked if the attribute doesn’t exist. According to the Python documentation,[11] this is for efficiency and to allow __setattr__ to access instance variables.

At Resolver Systems, we’ve created a spreadsheet application with IronPython.[12] This is a spreadsheet for creating complex models or business applications using a spreadsheet interface. IronPython objects that represent the spreadsheet are exposed to the user, and can be manipulated from user code sections. One of our core classes is the Worksheet, representing different sheets in the spreadsheet. The user accesses values in a worksheet by indexing it with the column and the row number. Most spreadsheet users are more used to referring to locations by the A1 style name, so we use __getattr__ to provide a convenient way of accessing values using a syntax like this: worksheet.A1 = 3. We implemented this with code that looks like the following:

def __getattr__(self, name):
    location = CoordinatesFromCellName(name)
    if location is None:
        raise AttributeError(name)
    col, row = location
    return self[col, row]

We also need to implemented __setattr__ to allow users to set values in the worksheet. __setattr__ is called for all attribute-setting operations, so we needed to delegate to object.__setattr__ for all attributes except cells.

def __setattr__(self, name, value):
    location = CoordinatesFromCellName(name)
    if location is not None:
        col, row = location
        self[col, row] = value
    else:
        object.__setattr__(self, name, value)

You can put these methods to work in a proxy class that proxies attribute access from one object to another.

Proxying attribute access

You’ve seen how Python has no true concept of private or protected members. If you come to Python from a language like C#, you’ll probably be surprised at how infrequently you need these features. You can achieve the same effect as private members through a factory function and a proxy class like the one in listing 8.7.

Example 8.7. Attribute protection with a factory function and proxy class

Attribute protection with a factory function and proxy class

You pass in the object you want proxied as the thing argument to GetProxy. You also pass in three lists of attribute names (all optional). These are attributes that you do want to allow access to—for read access, write access, and delete access.

When you call GetProxy, it returns an instance of the Proxy class. This instance has access to thing, and the attribute lists, through the closure; but, from the instance, there’s no way to get back to the original object. Accessing attributes on the Proxy instance triggers __getattr__ (or __setattr__ or __delattr__). If the attribute name is in the corresponding attribute list, then the access is allowed. If the attribute isn’t in the list, then the access is disallowed, and an AttributeError is raised.

This pattern is useful for more than protecting attribute access; it’s a general example of the delegation pattern,[13] but it does have a couple of restrictions. Although it works fine for fetching normal methods, magic methods are looked up on the class rather than the instance. Indexing the object, or any other operation that uses a protocol method, will look for the method on the Proxy class instead of the original instance. Two possible approaches to solving this are as follows:

  • You can provide the magic methods you want on the Proxy class and proxy those as well.

  • Because magic methods are looked up on the class, you can provide a metaclass that has __getattr__ and friends implemented and does the proxying.

Another restriction is that, although attribute access is proxied, the proxy instance has a different type to the object it’s proxying and so can’t necessarily be used in all the same circumstances as the original.

Using the __getattr__ and __setattr__ protocol methods to customize attribute access isn’t something you do every day; but, in the right places, they can be used to create elegant and intuitive APIs.

Something that takes us even deeper into the dynamic aspects of Python is metaprogramming with Python metaclasses. Understanding metaclasses will further deepen your understanding of Python and open up some fun possibilities.

Metaprogramming

Metaprogramming is the programming language or runtime assisting with the writing or modification of your program. The classic example is runtime code generation. In Python and IronPython, this is supported through the exec statement and built-in compile/eval functions. Python source code is text, so generating code using string manipulation and then executing it is relatively easy.

But, code generation has to be relatively deterministic (your code that generates the strings is going to be following a set of rules that you determine), meaning that it’s usually easier to provide objects rather than go through the intermediate step of generating and executing code. An exception is when you generate code from user input, perhaps by translating a domain-specific language into Python. This is the approach taken by the Resolver One spreadsheet, which translates a Python-like formula language into Python expressions.

Python has further support for metaprogramming through something called metaclasses. These allow you to customize class creation—that is, modify classes or perform actions at the point at which they’re defined. Metaclasses have a reputation for being deep black magic.

 

Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t.

 
 --Python Guru Tim Peters

They’re seen as magic because they can modify code away from the point at which it appears in your source. In fact, the basic principles are simple to grasp, and no good book on Python would be complete without an introduction to them.

Introduction to metaclasses

In Python, everything is an object. Functions and classes are all first-class objects that can be created at runtime and passed around your code. Every object has a type. For most objects, their type is their class, so what’s the type of a class? The answer is that classes are instances of their metaclass.

Just as objects are created by calling their class with the correct parameters, classes are created by calling their metaclass with certain parameters. The default metaclass (the type of classes) is type. This leads to the following wonderful expression:

type(type) is type

type is itself a class, so its type is type![14]

What does this have to do with metaprogramming? Python is an interpreted language—classes are created at runtime; and, by providing a custom metaclass, you can control what happens when a class is created, including modifying the class.

As usual, the easiest way to explain this is to show it in action. Listing 8.8 shows the simplest possible metaclass.

Example 8.8. The simplest example of a metaclass

The simplest example of a metaclass

This metaclass doesn’t do anything, but illustrates the basics of the metaclass. You set the metaclass on a class by assigning the __metaclass__ attribute inside the class definition. Subclasses automatically inherit the metaclass of their superclass.[15] When the class is created, the metaclass is called with a set of arguments. These are the class name, a tuple of base classes, and a dictionary of all the attributes (including methods) defined in the class. To customize class creation with a metaclass, you need to inherit from type and override the __new__ method.

You can experiment here by defining methods and attributes on SomeClass, and putting print statements in the body of PointlessMetaclass. You’ll see that methods appear as functions in the classDict, keyed by their name. Inside the metaclass, you can modify this dictionary (plus the name and the bases tuple if you want) to change how the class is created. But what can you use metaclasses for?

Uses of metaclasses

Despite their reputation for being deep magic, sometimes only a metaclass can achieve something that’s difficult or impossible to do via other means. They’re invoked at class-creation time, so they can be used to perform operations that would otherwise require a manual step. These operations include[16] the following:

  • Registering classes as they’re created—Often done by database Object-Relational Mapping (ORM) because the classes you create relate to the shape of the database table you’ll interact with. Registering classes as they’re created can also be useful for autoregistering plugin classes.

  • Enabling new coding techniques—Such as enabling a declarative way of declaring database schema, as is the case for the Elixir[17] ORM framework.

  • Providing interface registration—Includes autodiscovery of features and adaptation.

  • Class verification—Prevents subclassing or verifies code quality, such as checking that all methods have docstrings or that classes meet a particular standard.

  • Decorating all the methods in a class—Can be useful for logging, tracing, and profiling purposes.

  • Mixing in appropriate methods without having to use inheritance—Can be one way of avoiding multiple inheritance. You can also load in methods from non-code definitions—for example, by loading XML to create classes.

We can show a practical use of metaclasses with a profiling metaclass. This is an example of the fifth use of metaclasses listed—wrapping every method in a class with a decorator function.

A profiling metaclass

One of the use cases mentioned in the bulleted list is wrapping all methods with a decorator. You can use this for profiling by recording method calls and how long they take. This approach is useful if you’re looking to optimize the performance of your application. With IronPython you always have the option of moving parts of your code into C# to improve performance; but, before you consider this, it’s important to profile so that you know exactly which parts of your code are the bottlenecks—the results are often not what you’d expect. At Resolver, we’ve been through this process many times, and have always managed to improve performance by optimizing our Python code; we haven’t had to drop down into C# to improve speed so far.

For profiling IronPython code you can use the .NET 'DateTime'[18] class.

from System import DateTime
start = DateTime.Now

someFunction()

timeTaken = (DateTime.Now - start).TotalMilliseconds

There’s a drawback to this code. DateTime has a granularity of about 15 milliseconds. For timing individual calls to fast-running code, this can be way too coarse. An alternative is to use a high-performance timer class from System.Diagnostics: the Stopwatch[19] class. Listing 8.9 is the code to time a function call using a Stopwatch.[20]

Example 8.9. Timing a function call with the Stopwatch class

from System.Diagnostics import Stopwatch
s = Stopwatch()
s.Start()

someFunction()

s.Stop()
timeTaken = s.ElapsedMilliseconds

You can wrap this code in a decorator that tracks the number of calls to functions, and how long each call takes (listing 8.10).

Example 8.10. A function decorator that times calls

A function decorator that times calls

The profiler decorator takes a function and returns a wrapped function that times how long the call takes. The times are stored in a cache (the times dictionary), keyed by the function name.

Now you need a metaclass that can apply this decorator to all the methods in a class. You’ll be able to recognize methods by using FunctionType from the Python standard library types module. Listing 8.11 shows a metaclass that does this.

Tip

Don’t forget that, to import from the Python standard library, it needs to be on your path. This happens automatically if you installed IronPython from the msi installer. Otherwise, you need the Python standard library (the easiest way to obtain it is to install the appropriate version of Python) and to set the path to the library in the IRONPYTHONPATH environment variable.

Example 8.11. A profiling metaclass that wraps methods with profiler

A profiling metaclass that wraps methods with profiler

Of course, having created a profiling metaclass, we need to use it. Listing 8.12 shows how to use the ProfilingMetaclass.

Example 8.12. Timing method calls on objects with ProfilingMetaclass

from System.Threading import Thread

class Test(object):

    __metaclass__ = ProfilingMetaclass

    def __init__(self):
        counter = 0
        while counter < 100:
            counter += 1
            self.method()

    def method(self):
        Thread.CurrentThread.Join(20)

t = Test()

for name, calls in times.items():
    print 'Function: %s' % name
    print 'Called: %s times' % len(calls)
    print ('Total time taken: %s seconds' %
               (sum(calls) / 1000.0))
    avg = (sum(calls) / float(len(calls)))
    print 'Max: %sms, Min: %sms, Avg: %sms' % (max(calls), min(calls), avg)

When the Test class is created, all the methods are wrapped with the profiler. When it’s constructed, it calls method 100 hundred times. When the code has run, you print out the entries in the times cache to analyze the results. It should print something like this:

Function: method
Called: 100 times
Total time taken: 2.07 seconds
Max: 39ms, Min: 16ms, Avg: 20.7ms

Function: __init__
Called: 1 times
Total time taken: 2.093 seconds
Max: 2093ms, Min: 2093ms, Avg: 2093.0ms

If this were real code, you’d know how many calls were being made to each method and how long they were taking. With this simple technique, the times cache includes calls to other methods, as well as overhead for the profiling code itself. It’s possible to write more sophisticated profiling code that tracks trees of calls so that you can see how long different branches of your code take.

Metaclasses are magic, but perhaps not as black as they’re painted. This is true of all the Python magic that you’ve been learning about. Python’s magic methods are magic because they’re invoked by the interpreter on your behalf, not because they’re difficult to understand. Using these protocol methods is a normal part of programming in Python. You should now be confident in navigating which methods correspond to language features and how to implement them.

We’ve plunged into the depths of Python, so now it’s time to take a journey around the edges—in particular, the edges where IronPython meets the Common Language Runtime (CLR). Not all the key concepts in the CLR match directly to Python semantics, particularly because it’s a dynamic language. To make full use of framework classes, you need to know a few details.

IronPython and the CLR

So far, we’ve focused on navigating the .NET framework and the Python language, emphasizing the skills you need to make effective use of IronPython. But not all .NET concepts map easily to Python concepts—certain corners of .NET can’t be brought directly into Python. This section looks at some of these dusty corners and how they work in IronPython. Most of the things we explore are simple; but, even if they’re easy, you need to know the trick in order to use them.

The first line of IronPython/CLR integration is the clr module; in several of the corners we look in, you’ll find new clr module functionality. As well as covering some interesting topics, this section will be a useful reference as you encounter these situations in the wild.

The first dusty corner that we peer into is .NET arrays.

.NET arrays

Arrays are one of the standard .NET container types, roughly equivalent to the Python tuple.[23] They’re also statically typed, of course. When you create an array, you specify type and the size. The array can then be populated only with members of the specified type.

The tuple is a built-in Python type; and, being dynamically typed, it’s generally more convenient to use than arrays when you have a choice. When you’re working with .NET APIs, they will sometimes require an array, and you need to know how to create them.

In C#, you can create an array of integers using the syntax in figure 8.1.

Creating an integer array in C#

Figure 8.1. Creating an integer array in C#

In IronPython, you use the following syntax to achieve the same thing:

>>> from System import Array
>>> intArray = Array[int]((1, 2, 3, 4, 5))

This syntax looks like you’re indexing the Array type, so it’s valid Python syntax. This tells IronPython that you want an array of integers. It’s populated from a tuple of the initial members.

This technique isn’t the only way to create arrays. Another way is by using the CreateInstance class method, which can be used to create multidimensional arrays.

CreateInstance and multidimensional arrays

The Array type also has a CreateInstance class method for constructing arrays. Using this method doesn’t require you to prepopulate the array; it will be initialized with a default value for every member. For integer arrays, this will be 0; for reference types, it will be null (None). There are several different ways of calling CreateInstance.[24] The most common are the following:

>>> multiArray = Array.CreateInstance(str, 5)

This code creates an array of strings with five members. The next way creates a two-dimensional array from a type and two integers that specify the size of each dimension.

>>> multiArray = Array.CreateInstance(str, 5, 5)

This creates a two-dimensional array with 5 x 5 members. You can access individual members in this array using another extension to Python syntax.

>>> aMember = multiArray[2, 2]

In normal Python syntax, this would be indexing multiArray with the tuple (2, 2). Instead, you’re accessing the member at location 2, 2. (If you try indexing with a tuple, it will fail.)[25] Because this is valid Python syntax, no changes to the language were necessary to support indexing multidimensional arrays.

You can create arrays with any number of dimensions by passing a type and an integer array specifying the size of each dimension. The number of members of the array determines the number of dimensions. The following code creates a five-dimensional array of objects, with five members in each dimension:

>>> dimensions = Array[int]((5, 5, 5, 5, 5))
>>> multiArray = Array.CreateInstance(object, dimensions)
>>> multiArray[0, 0, 0, 0, 0] = SomeObject()
>>> print multiArray[0, 0, 0, 0, 0]
<SomeObject object at 0x....>

Another way of creating multidimensional arrays is by creating an array of arrays—the so-called jagged array. The following code creates an array with five members; each member is an array of five members:

>>> arrays = tuple(Array[int]((0, 0, 0, 0, 0)) for i in range(5))
>>> multiArray = Array[Array[int]](arrays)
>>> multiArray[0][0]
0

As you can see, this uses slightly different indexing syntax. Indexing the array returns the member at that index, which itself is an array. You then index that to retrieve a specific member.

Arrays have useful methods and properties such as Copy, to copy an array, or Rank, which tells you how many dimensions an array has. If you’re going to be using arrays a lot, it’s worthwhile to familiarize yourself with the array API. The next corner of .NET we shine a light on is overloaded methods.

Overloaded methods

C# allows overloaded methods—the same method being defined several times, but with different types in the parameter list. The compiler can tell at compile time which of the overloads you’re calling by the types of the arguments.

This largely works seamlessly in IronPython. The IronPython runtime uses reflection (the .NET equivalent of introspection using the System.Reflection namespace) to deduce which overload you’re calling. If IronPython gets it wrong, it’s a bug and you should report it on the IronPython mailing list or the issue tracker on the IronPython CodePlex site!

On occasion, you may want direct access to a specific overload. IronPython provides an Overloads collection that you can index by type (or a tuple of types). The IronPython documentation provides the following illustration, choosing the WriteLine method that takes an object and then passing None to it:

from System import Console
Console.WriteLine.Overloads[object](None)

Three more .NET concepts that don’t map easily to Python are the out, ref, and pointer parameters.

out, ref, params, and pointer parameters

Occasionally you’ll find a .NET method that requires an out or ref type parameter, an argument that takes a pointer, or one that uses the params keywords. These concepts don’t exist directly in IronPython, but you can handle them all straightforwardly.

In C#, the body of the method modifies out (output) parameters. In Python, if you passed in an immutable value for this parameter, such as a string or an integer, then in Python the original reference wouldn’t be modified—which would kind of defeat the purpose of having the parameter. This difficulty is easily solved in IronPython. Because the original value of an out parameter isn’t used, the IronPython runtime does some magic, and the updated value becomes an extra return value. Let’s observe this in action with the Dictionary.TryGetValue method.[26] This method takes two parameters: the key you’re searching for and an out parameter modified to contain the corresponding value if it’s found. Normally, it returns a Boolean that indicates whether or not the key was found. The out parameter is modified to contain the value. In IronPython, you get two return values.

>>> from System.Collections.Generic import Dictionary
>>> d = Dictionary[str, str]()
>>> d['key1'] = 'Value 1'
>>> d['key2'] = 'Value 2'
>>> d.TryGetValue('key1')
(True, 'Value 1')
>>> d.TryGetValue('key3')
(False, None)

Passing by reference is similar; in fact, you can often pass by reference instead of using an out parameter. To pass by reference, you need to create a reference using a type provided by the clr module. One example is the class method Array.Resize for resizing arrays. It’s a generic method, so you need to provide type information—making it a great example of how dynamic typing makes life easier! This method takes a reference to an array and the new size as an integer.

Note

clr.Reference isn’t to be confused with clr.References. clr.References is a list of all the assemblies (actual assembly objects) that you’ve added a reference to.

First you construct an array of integers to resize, and then create a reference to that array. clr.Reference is a generic, so you need to tell it the type of the reference you require.

>>> from System import Array
>>> import clr
>>> array = Array[int]((0, 1, 2, 3, 4))
>>> array
Array[int](0, 1, 2, 3, 4)
>>> ref = clr.Reference[Array[int]](array)

Next you call Array.Resize. You need to specify the type of array (again); and, because Resize is a generic method, you need to specify the type of array here as well (otherwise the IronPython runtime won’t be able to find the right target and will report TypeError: no callable targets). The resized array is stored as the Value attribute of the reference.

>>> Array[int].Resize[int](ref, 3)
>>> ref.Value
Array[int]((0, 1, 2))

A pointer is a good old-fashioned kind of reference. You usually come across them when dealing with unmanaged resources, and often they’ll be pointers to integers. Integer pointers are represented with the System.IntPtr structure.

>>> from System import IntPtr
>>> ptr = IntPtr(3478)
>>> ptr.ToInt32()
3478
>>> IntPtr.Zero
<System.IntPtr object at 0x... [0]>

We’ve not encountered this in the wild; but, for completeness, we need to mention another way of passing arguments: the C# params keyword. This is used to pass a variable number of arguments (like the Python *args syntax), and the method receives a collection of the arguments you pass. In IronPython, .NET APIs that use the params keyword are handled seamlessly—you pass arguments as normal.

>>> something.Method(1, 2, 3, 4)

Another area of .NET that can potentially cause confusion is the peculiarity of value types.

Value types

The .NET framework makes a distinction between value types and reference types. The technical distinction is that value types are allocated on the stack or inline within an object, whereas reference types are allocated on the heap. Value types are passed by value (rather than by reference—hence, the names) making them efficient to use.[27] Table 8.3 lists the value and reference types.

Table 8.3. .NET value and reference types

Value types

Reference types

All numeric datatypes (including Byte)

Strings

Boolean, Char, and Date

Arrays (even arrays of value types)

All structures, even if members are reference types

Class types (like Form and Object)

Enumerations (because the underlying type is numeric)

Delegates

In .NET land, you can pass a value type to methods expecting a reference type. Take, for example, this overload of Console.WriteLine:

Console.WriteLine(String, Object)

This takes a format string and an object, and it writes the string representation of the object to the standard output stream. The declaration of Object as the second argument means that this method can take any arbitrary object, but also means that it’s expecting a reference type (because Object is a reference type). If you call this method with a value type, like an integer, then the CLR will automatically box and unbox the value so that it can be used in place of a reference type.

The boxing and unboxing can lead to certain difficulties, particularly because Python doesn’t have value types and the programmer will expect everything to behave as a reference.

Take this code that uses Point, which, as a structure, is a value type:

>>> from System import Array
>>> from System.Drawing import Point
>>> point = Point(0, 0)
>>> array = Array[Point]((point,))
>>> array[0].X = 30
>>> array[0].X
0

This code creates a new Point object (point) and stores it in an array (array). When you fetch the point by indexing the array, you end up setting the X member on an unboxed value—probably not on the object you expected. This problem isn’t restricted to Python; if you re-create the code in C#, the same thing happens.

The situation is worse if you attempt to modify a value type exposed as a field (attribute, in Python speak) on an object. Referencing the value type will return a copy, and the update would be lost. In these cases, the IronPython designers have decided[28] that the best thing to do is to raise a ValueError rather than letting the unexpected behavior slip through. As a result, value types are mostly immutable,[29] although updates will still be possible by methods exposed on the value types themselves.

Interfaces

Interfaces are a way of specifying that certain types have known behavior. An interface defines functionality, and the methods and properties for that functionality.

Where a class implements an interface, that interface will be added to the method resolution order[30] when you use it from Python (effectively making the interface behave like a base class—coincidentally you can also implement an interface in a Python class by inheriting from it). Even if an explicitly implemented interface method is marked as private on a class, you’ll still be able to call it.

As well as calling the method directly, you can call the method on the interface—passing in the instance as the first argument in the same way you pass the instance (self) as the first argument when calling an unbound method on a class.[31]

The following snippet shows an example of calling BeginInit on a DataGridView, through the ISupportInitialize interface that it implements:

>>> import clr
>>> clr.AddReference('System.Windows.Forms')
>>> from System.Windows.Forms import DataGridView
>>> from System.ComponentModel import ISupportInitialize
>>> grid = DataGridView()
>>> ISupportInitialize.BeginInit(grid)

How is this useful? Well, it’s possible for a class to implement two interfaces with conflicting member names. This technique allows you to call a specific interface method.

This workaround is fine for methods, but you don’t normally pass in arguments to invoke properties. Instead, you can use GetValue and SetValue on the interface property descriptor.

ISomeInterface.PropertyName.GetValue(instance)
ISomeInterface.PropertyName.SetValue(instance, value)

Note

GetValue and SetValue aren’t available only for property descriptors; they’re also available for instance fields if you access them directly on the class in IronPython.

Working with interfaces is mostly straightforward. Unfortunately, when it comes to .NET attributes, it’s a much sorrier tale.

Attributes

Attributes in the .NET framework are a bit like decorators in Python and annotations in Java. They’re metadata applied to objects that provide information to the CLR. They can be applied to classes, methods, properties, and even method parameters or return values (that is, pretty much everywhere).

Attributes are used to provide documentation, specify runtime information, and even specify behavior. The DllImport attribute provides access to unmanaged code; and, in Silverlight, you use the Scriptable attribute to expose objects to JavaScript. The following code segment uses the DllImport attribute to expose functions user32.dll as static methods on a class in C#:

[DllImport("user32.dll")]
public static extern IntPtr GetDesktopWindow();

[DllImport("user32.dll")]
public static extern IntPtr GetTopWindow(IntPtr hWnd);

Because attributes are for the compiler, they’re used at compile time—a problem for IronPython. First, there’s no syntax in Python that maps to attributes. The IronPython team is keen to avoid introducing syntax to IronPython that isn’t backwards compatible with Python.

A worse problem is that IronPython classes aren’t true .NET classes. There are various reasons for this, among them that .NET classes aren’t garbage collected and that you can swap out the type of Python classes at runtime! The upshot is that you can’t apply attributes to Python classes.

The most common solution is to write stub C# classes and then subclass from IronPython. You can even generate and compile these classes at runtime. We use both of these techniques later in the book. This isn’t the end of the story, though—the IronPython team is still keen to find a solution to this problem that will allow you to use attributes from IronPython.

Static compilation of IronPython code

A feature of IronPython 1 that nearly didn’t make it into IronPython 2 is the ability to compile Python code into binary assemblies. Fortunately, this feature made a comeback in IronPython 2 beta 4.

IronPython works by compiling Python code to assemblies in memory. If you execute a Python script using the IronPython ipy.exe executable, then you can dump these assemblies to disk by launching it with the -X:SaveAssemblies command-line arguments. They’re not executable on their own though; they’re useful for debugging, but they make calls into dynamic call sites[32] generated at runtime when running IronPython scripts.

The clr module includes a function that can save executable assemblies: CompileModules (IronPython 2 only). Its call signature is as follows:

clr.CompileModules(assemblyName, *filenames, **kwArgs)

This function compiles Python files into assemblies that you can add references to and import from in the normal way with IronPython. This feature allows you to package several Python modules as a single assembly. The following snippet of code takes two modules and outputs a single assembly:

import clr
clr.CompileModules("modules.dll", "module.py", "module2.py")

Having created this assembly, you can add a reference to it and then import the module and module2 namespaces from it.

import clr
clr.AddReference('modules.dll')
import module, module2

CompileModules also takes a keyword argument, mainModule, that allows you to specify the Python file that acts as the entry point for your application. This still outputs to a dll rather than an exe file, but it can be combined with a stub executable to create binary distributions of Python applications. Creating a stub executable can be automated with some fiddly use of the .NET Reflection.Emit API, but it’s far simpler to use the IronPython Pyc[33] sample. Pyc is a command-line tool that creates console or Windows executables from Python files. It comes with good documentation, plus a command-line help switch. The basic usage pattern is as follows:

ipy.exe pyc.py /out:program.exe /main:main.py /target:winexe module.py module2.py

In this chapter, we’ve climbed under the hood of both the Python language and how IronPython interacts with the .NET framework. It’s time to summarize and move on to the next chapter.

Summary

Python uses protocols to provide features, whereas a language like C# uses interfaces. In addition, a lot of the more dynamic language features of Python, such as customizing attribute access, can be done by implementing protocol methods. You’ll find that you use some of these magic methods a great deal. Providing equality and comparison methods on classes is something you’ll do frequently when writing Python code. Other protocols, including metaclasses, you’ll use only rarely. Even if you don’t use them directly, understanding the mechanisms at work deepens your understanding of Python.

We’ve also worked with some of the areas of the .NET framework that don’t easily map to Python. IronPython exposes objects as Python objects; but, in certain places, this is a leaky abstraction. You should now have a clearer understanding of what’s going on under the hood with IronPython, especially in the distinction between value and reference types. This understanding is vital when the abstraction breaks down and you need to know exactly what’s happening beneath the surface. The specific information we’ve covered is vital for any non-trivial interaction with .NET classes and objects.

In the next chapter, we use some new features added to the .NET framework in .NET 3.0, including the new user interface library, the Windows Presentation Foundation (WPF). Although younger than Windows Forms, WPF is capable of creating attractive and flexible user interfaces in ways not possible with its older cousin.



[3] For a complete list of these methods, including some of the less common ones, see http://docs.python.org/ref/numeric-types.html.

[5] This allows an object to be its own iterator.

[6] A PEP is a Python Enhancement Proposal, and they’re the way new features are proposed for Python. PEP 342 (now part of Python 2.5) lives at http://www.python.org/dev/peps/pep-0342/.

[8] This page in the Python documentation lists (among other things) the protocol methods for rich comparison: http://docs.python.org/ref/customization.html.

[9] See the following article on rich comparison in CPython and IronPython, which also shows a simple Mixin class to easily provide rich comparison just by implementing __eq__ and __lt__: http://www.voidspace.org.uk/python/articles/comparison.shtml.

[10] For creating massive numbers of objects, you can avoid the overhead of a dictionary per object, by using a memory optimisation mechanism called __slots__ for storing named members only.

[14] This isn’t true in IronPython 1—which is a bug that has been fixed in IronPython 2.

[15] Meaning that you can’t have incompatible metaclasses where you have multiple inheritance.

[16] This borrows some of the use cases for metaclasses from an excellent presentation by Mike C. Fletcher. See http://www.vrplumber.com/programming/metaclasses-pycon.pdf.

[17] Elixir is a declarative layer built on top of the popular Python ORM SQLAlchemy. See http://elixir.ematia.de/trac/wiki.

[19] See http://msdn2.microsoft.com/en-us/library/system.diagnostics.stopwatch(vs.80).aspx. Note that, on multicore AMD processors, the StopWatch can sometimes return negative numbers (appearing to travel backwards in time!). The solution is to apply this fix: http://support.microsoft.com/?id=896256.

[20] The Python standard library time.clock() is implemented using StopWatch under the hood.

[21] Which is hardly surprising—the Python built-in types are written in C and have been fine-tuned over the last fifteen years or more. Their implementation in IronPython is barely a handful of years old.

[22] If you’re using .NET 3.0, you can use the Action and Func delegates instead.

[23] Although arrays allow you to change their contents (they’re mutable). Like tuples, they have a fixed size.

[24] For a list of the different ways to call CreateInstance, see http://msdn2.microsoft.com/en-us/library/system.array.createinstance.aspx.

[25] In his technical review, Dino noted: “I think this is an IronPython bug—or at least a small deficiency in the way we map array to Python.”

[26] This is the example used by Haibo Luo in a blog entry. See http://blogs.msdn.com/haibo_luo/archive/2007/10/04/5284947.aspx.

[27] Well, large value types, such as big structs, can be expensive to copy—another reason to be aware of what’s happening under the hood.

[29] In fact, the CLR API design guide says Do not create mutable value types.

[30] Accessible through the __mro__ attribute on classes. In pure-Python code, it specifies the lookup order for finding methods where there’s multiple inheritance. For a comprehensive reference, see http://www.python.org/download/releases/2.3/mro/.

[31] You can also call methods on the base classes of .NET classes, by passing in the instance as the first argument. This can also be useful when a method obscures a method on a base class making it impossible to call directly from the instance.

[32] An implementation detail of the Dynamic Language Runtime.

[33] Available for download from the CodePlex IronPython site.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.194.230