Underscores in Python

There are some conventions and implementation details that make use of underscores in Python, which is an interesting topic that's worthy of analysis.

Like we mentioned previously, by default all attributes of an object are public. Consider the following example to illustrate this:

>>> class Connector:
... def __init__(self, source):
... self.source = source
... self._timeout = 60
...
>>> conn = Connector("postgresql://localhost")
>>> conn.source
'postgresql://localhost'
>>> conn._timeout
60
>>> conn.__dict__
{'source': 'postgresql://localhost', '_timeout': 60}

Here, a Connector object is created with source, and it starts with two attributes—the aforementioned source and timeout. The former is public, and the latter private. However, as we can see from the following lines when we create an object like this, we can actually access both of them.

The interpretation of this code is that _timeout should be accessed only within connector itself and never from a caller. This means that you should organize the code in a way so that you can safely refactor the timeout at all of the times it's needed, relying on the fact that it's not being called from outside the object (only internally), hence preserving the same interface as before. Complying with these rules makes the code easier to maintain and more robust because we don't have to worry about ripple effects when refactoring the code if we maintain the interface of the object. The same principle applies to methods as well.

Objects should only expose those attributes and methods that are relevant to an external caller object, namely, entailing its interface. Everything that is not strictly part of an object's interface should be kept prefixed with a single underscore.

This is the Pythonic way of clearly delimiting the interface of an object. There is, however, a common misconception that some attributes and methods can be actually made private. This is, again, a misconception. Let's imagine that now the timeout attribute is defined with a double underscore instead:

>>> class Connector:
... def __init__(self, source):
... self.source = source
... self.__timeout = 60
...
... def connect(self):
... print("connecting with {0}s".format(self.__timeout))
... # ...
...
>>> conn = Connector("postgresql://localhost")
>>> conn.connect()
connecting with 60s
>>> conn.__timeout
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Connector' object has no attribute '__timeout'

Some developers use this method to hide some attributes, thinking, like in this example, that timeout is now private and that no other object can modify it. Now, take a look at the exception that is raised when trying to access __timeout. It's AttributeError, saying that it doesn't exist. It doesn't say something like "this is private" or "this can't be accessed" and so on. It says it does not exist. This should give us a clue that, in fact, something different is happening and that this behavior is instead just a side effect, but not the real effect we want.

What's actually happening is that with the double underscores, Python creates a different name for the attribute (this is called name mangling). What it does is create the attribute with the following name instead: "_<class-name>__<attribute-name>". In this case, an attribute named '_Connector__timeout', will be created, and such an attribute can be accessed (and modified) as follows:

>>> vars(conn)
{'source': 'postgresql://localhost', '_Connector__timeout': 60}
>>> conn._Connector__timeout
60
>>> conn._Connector__timeout = 30
>>> conn.connect()
connecting with 30s

Notice the side effect that we mentioned earlier—the attribute only exists with a different name, and for that reason the AttributeError was raised on our first attempt to access it.

The idea of the double underscore in Python is completely different. It was created as a means to override different methods of a class that is going to be extended several times, without the risk of having collisions with the method names. Even that is a too far-fetched use case as to justify the use of this mechanism.

Double underscores are a non-Pythonic approach. If you need to define attributes as private, use a single underscore, and respect the Pythonic convention that it is a private attribute.

Do not use double underscores.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.107.149