© Jacob Zimmerman 2018
Jacob ZimmermanPython Descriptorshttps://doi.org/10.1007/978-1-4842-3727-4_6

6. Which Methods Are Needed?

Jacob Zimmerman1 
(1)
New York, USA
 

When designing a descriptor, it must be decided which methods will be included. It can sometimes help to decide right away if the descriptor should be a data or non-data descriptor, but sometimes it works better to “discover” which kind of descriptor it is.

__delete__() is rarely ever needed, even if it is a data descriptor. That doesn’t mean it shouldn’t ever be included, however. If the descriptor is going to be released into an open domain, it wouldn’t hurt to add the __delete__() method on a data descriptor simply for completeness for cases when a user decides to call del on it. If you don’t, an AttributeError will be raised when someone tries to delete it.

__get__() is almost always needed for data and non-data descriptors. It is required for non-data descriptors, and the typical case where __get__() isn’t required for data descriptors is if __set__() assigns the data into the instance dictionary under the same name as the descriptor (what I call set-it-and-forget-it descriptors). Otherwise, it is almost always wanted for retrieving the data that is set in a data descriptor, so unless the data is assigned to the instance to be automatically retrieved without __get__() or the data is write-only, a __get__() method would be necessary. Keep in mind that if a descriptor doesn’t have a __get__() method and instance doesn’t have anything in __dict__ under the same name as the descriptor, the actual descriptor object itself will be returned.

Just like __delete__() , __set__() is only used for data descriptors. Unlike __delete__(), __set__() is not regarded as unnecessary. Seeing that __delete__() is unused in the most common cases, __set__() is nearly a requirement for creating data descriptors (which need either __set__() or __delete__()). If the descriptor’s status as data or non-data is being “discovered,” __set__() is usually the deciding factor. Even if the data is meant to be read-only, __set__() should be included to raise an AttributeError in order to enforce the read-only nature. Otherwise, it may just be treated like non-data descriptor.

When __get__() Is Called Without an instance Argument

It is often that a descriptor’s __get__() method is the most complicated method on it because there are two different ways it can be called: with or without an instance argument (although “without” means that None is given instead of an instance).

When the descriptor is a class-level descriptor (usually non-data), implementing __get__() without using instance is trivial, since that’s the intended use. But when a descriptor is meant for instance-level use, and the descriptor is not being called from an instance, it can be difficult to figure out what to do.

Here, I present a few options.

Raise Exception or Return self

The first thing that may come to mind is to raise an exception, since class-level access is not intended, but this should be avoided. A common programming style in Python is called EAFP , meaning that it is easier to ask for forgiveness than for permission . What this means is that, just because something isn’t used as intended, it doesn’t mean that usage should be disallowed. If the use will hurt invariants and cause problems, it’s fine to disallow it by raising an exception; otherwise, there are other, better options to consider. The conventional solution is to simply return self. If the descriptor is being accessed from the class level, it’s likely that the user realizes that it’s a descriptor and wants to work with it. Doing so can be a sign of inappropriate use, but Python allows freedom, and so should its users, to a point. The property built-in will return self (the property object) if accessed from the class, as an example. From what I’ve seen, this is the most common approach by far.

“Unbound” Attributes

Another solution, which is used by methods, is to have an “unbound” version of the attribute be returned. When accessing a function from the class level, the function’s __get__() detects that it does not have an instance, and so just returns the function itself. In Python 2, it actually returned an “unbound” method, which is where the name I use comes from. In Python 3, though, they changed it to just the function, since that’s exactly what it is anyway.

This can work for non-callable attributes as well. It’s a little strange, since it turns the attribute into a callable that must receive an instance to return the value. This makes it into a specific attribute lookup, akin to len() and iter(), where you just need to pass in the instance to receive the wanted value.

Here is a stripped-down __get__() implementation that works this way.
def __get__(self, instance, owner):
    if instance is None:
        def unboundattr(inst):
            return self.__get__(inst, owner)
        return unboundattr
    else:
        ...
When called, the inner unboundattr() function will end up using the else branch of the __get__() method (assuming they didn’t pass in None). Using inner functions can sometimes be confusing, and typing that whole thing every time is a little annoying, so here’s a reusable class implementation that can be used by any descriptor.
class UnboundAttribute:
    def __init__(self, descriptor, owner):
        self.descriptor = descriptor
        self.owner = owner
    def __call__(self, instance):
        return self.descriptor.__get__(instance, self.owner)
Using this class, a __get__() method that uses unbound attributes can be implemented like this:
def __get__(self, instance, owner):
    if instance is None:
        return UnboundAttribute(self, owner)
    else:
        ...

The original version relies on closures around self and owner, which remove its reusability, other than through copying and pasting. But the class takes those two variables in with its constructor to store on a new instance. It’s also kind of nice that if you print the unbound attribute object, it says that it’s an unbound attribute. (This also works if you implement your own version, especially if you take in some handy metadata, like the name of the attribute being accessed. More on how to do that in the next chapter.)

The really interesting (and useful) thing about this technique is that the unbound attribute can be passed into a higher-order function that receives a function, such as map(). It avoids having to write up a getter method or ugly lambda. For example, if there was a class like this:
class Class:
    attr = UnbindableDescriptor()
A map() call to a list of Class objects like this:
result = map(lambda c: c.attr, aList)
could be replaced with this:
result = map(Class.attr, aList)

Instead of passing in a lambda to do the work of accessing the attribute of the Class instances, Class.attr is passed in, which returns the “unbound” version of the attribute—a function that receives the instance in order to look up the attribute on the descriptor. In essence, the descriptor provides an implicit getter method to the reference of the attribute.

This is a very useful technique for implementing a descriptor’s __get__() method, but it has one major drawback: returning self is so prevalent that not doing so is highly unexpected. Hopefully, this idea gets some traction in the community and becomes the new standard. Also, as seen in the upcoming chapter on read-only descriptors, there may need to be a way to access the descriptor object. Luckily, all you need to do is get the descriptor attribute from the returned UnboundAttribute.

Even though it’s not the expected behavior, the built-in function descriptor already does this, so it won’t be too difficult for them to get used to it. People expect “unbound method” functions when accessing from the class level, so applying the convention to attributes shouldn’t be a huge stretch for them.

Since writing the first edition of this book, I have discovered that there is a function for creating unbound attributes in the standard library, and it’s more useful than UnboundAttribute in some important ways. In the operator module, there’s a function called attrgetter() that takes in a string name of an attribute and returns a function that takes in an instance and (I assume) calls getattr() on the instance with the name of the attribute. There’s also support for multiple attribute names being passed in; the final result is a tuple of all those attributes on the instance.

There are several significant benefits to this over descriptor-based unbound attributes (without even counting the multiple attribute support). The first is greater support for inheritance. If a subclass overrode the descriptor with a different one, but the superclass version is passed around, it will actually use the superclass descriptor, which removes the awesome dynamic nature of inheritance. For this very same reason, unless you’re absolutely sure that the class you’re using doesn’t have any subclasses, you should use attrgetter() for methods as well.

Descriptor-based unbound attributes can support the same level of inheritance support, but there’s more work involved. First, you need the name of the attribute, which isn’t always easy to get. Again, methods for doing so are in the next chapter. After that, the changes are pretty simple. You change __call__() to use getattr() instead of descriptor.__get__(). This then eliminates the need for the descriptor and owner properties, though you should keep descriptor so someone can look up the descriptor, as mentioned earlier. Sadly, I don’t see any practical way of supporting multiple attributes this way.

The second major benefit is that it works for all kinds of attributes, not just methods or descriptor-based ones.

There are a few downsides to attrgetter() though. First, and maybe most obvious, is the lack of code completion help. You’re passing in the string name of an attribute, which means whatever editor you’re using is not going to help you not screw up the spelling of the attribute’s name. Second, it loses a little bit of context. When a class name is used, you include the context that attribute name applies to, whereas attrgetter() only includes the name of the attribute.

If you do the upgrades to UnboundAttribute, I still completely support using it. But it is certainly good to know when to use attrgetter() instead.

Summary

We’ve looked into the decision-making process behind building general descriptors and figuring out which methods we’ll want and possibly using unbound attributes with __get__(). In the next chapter, we’ll dig into even more design decisions that have to be made, at least when it comes to storing values with descriptors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.12.156