Appendix D. Python 3 Migration with 2.6+

We keep the language evolving... [we] need to move forward or die.

—Yukihiro “Matz” Matsumoto
(Image), September 2008
(verbally at Lone Star Ruby conference;
the actual quote Guido was referring to)

D.1. Python 3: The Next Generation

Python is currently undergoing its most significant transformation since it was first released back in the winter of 1991. Python 3 is backward incompatible with all older versions, so porting will be a more significant issue than in the past.

Unlike other end-of-life efforts, however, Python 2.x will not disappear anytime soon. In fact, the remainder of the version 2.x series will be developed in parallel with 3.x, thereby ensuring a smooth transition from the current to next generation. Python 2.6 is the first of these final version 2.x releases.

This document reinforces material covered Appendix C, “Python 3: The Evolution of a Programming Language,” but goes into more detail when appropriate.

D.1.1. Hybrid 2.6+ as Transition Tool

Python 2.6 and remaining 2.x releases are hybrid interpreters. This means that they can run a considerable amount of version 1.x code, all version 2.x software, and can even run a limited amount of 3.x (code that is native version 3.x but made available in 2.6+). Some will argue that Python releases dating back to version 2.2 have already been mixed interpreters because they support creation of both classic classes as well as new-style classes, but that is as far as they go.

The version 2.6 release is the first version with specific version 3.x features backported to it. The most significant of these features are summarized here:

• Integers

– Single integer type

– New binary and modified octal literals

– Classic or true division

– The -Q division switch

• Built-in functions

print or print()

reduce()

– Other updates

• Object-oriented programming

– Two different class objects

• Strings

bytes literals

bytes type

• Exceptions

– Handling exceptions

– Raising exceptions

Other transition tools and tips

– Warnings: the -3 switch

2to3 tool

This appendix does not discuss other new version 2.x features that are stand-alone, meaning they do not have any consequences for porting applications to version 3.x. Thus, without further ado, let’s jump right in.

D.2. Integers

Python integers face several changes in version 3.x and beyond, relating to their types, literals, and the integer division operation. We describe each of these changes next, highlighting the role that version 2.6 and newer versions play in terms of migration.

D.2.1. Single Integer Type

Previous versions of Python featured two integer types, int and long. The original ints were limited in size to the architecture of the platform on which the code ran (i.e., 32-bit, 64-bit), whereas longs were unlimited in size except in terms of how much virtual memory the operating system provided. The process of unifying these two types into a single int type began in Python 2.2 and will be complete in version 3.0.1 The new single int type will be unlimited in size, and the previous L or l designation for longs is removed. You can read more about this change in PEP 237.

As of version 2.6, there is little trace of long integers, save for the support of the trailing L. It is included for backward-compatibility purposes, to support all code that uses longs. Nevertheless, users should be actively purging long integers from their existing code and should no longer use longs in any new code written against Python 2.6+.

D.2.2. New Binary and Modified Octal Literals

Python 3 features a minor revision to the alternative base format for integers. It has basically streamlined the syntax to make it consistent with the existing hexadecimal format, prefixed with a leading 0x (or 0X for capital letters)—for example, 0x80, 0xffff, 0xDEADBEEF.

A new binary literal lets you provide the bits to an integer number, prefixed with a leading 0b (e.g., 0b0110). The original octal representation began with a single 0, but this format proved confusing to some users, so it has been changed to 0o to bring it in line with hexadecimal and binary literals, as just described. In other words, 0177 is no longer allowed; you must use 0o177, instead. Here are some examples:

Python 2.x

>>> 0177
127

Python 3 (including 2.6+)

>>> 0o177
127
>>> 0b0110
6

Both the new binary and modified octal literal formats have been backported to version 2.6 to help with migration. In fact, version 2.6 and newer, in their role as transition tools, accept both octal formats, whereas no version 3.x release accepts the old 0177 format. You can find more information on the updates to integer literals in PEP 3127.

D.2.3. Classic or True Division

A change that has been a long time coming, yet remains controversial to many, is the change to the division operator (/). The traditional division operation works in the following way: given two integer operands, / performs integer floor division. If there is at least one float involved, true division occurs:

Python 2.x: Classic Division

>>> 1 / 2          # floor
0
>>> 1.0 / 2.0      # true
0.5
>>> 1.0 / 2        # true (2 is internally coerced to float)
0.5

In Python 3, the / operator will always return a float, regardless of operand type.

Python 3.x: True Division

>>> 1 / 2          # true
0.5
>>> 1.0 / 2        # true
0.5

The double-slash division operator (//) was added as a proxy in Python 2.2 to always perform floor division, regardless of the operand type and to begin the transition process.

Python 2.2+ and 3.x: Floor Division

>>> 1 // 2         # floor
0
>>> 1.0 // 2       # floor
0.0

Using // will be the only way to obtain floor division functionality in version 3.x. To try true division in Python 2.2+, you can add the line from __future__ import division to your code, or use the -Q command-line option (discussed next).

Python 2.2+: Division Command-Line Option

If you do not wish to import division from __future__ module in your code, but you want true division to always prevail, you can use the -Qnew switch. There are also other options for using -Q, as summarized in Table D-1.

Table D-1. Division Operation -Q Command-Line Options

Image

For example, the -Qwarnall option is used in the Tools/scripts/fixdiv.py script found in the Python source distribution.

As you might have guessed by now, all of the transition efforts have already been implemented in Python 2.2, and no specific additional functionality as far as this command-line has been added to versions 2.6 or 2.7 with respect to Python 3 migration. Table D-2 summarizes the division operators and their functionality in the various Python releases.

Table D-2. Python Default Division Operator Functionality by Release

Image

You can read more about the change to the division operator in PEP 238 as well as in an article titled “Keeping Up with Python: The 2.2 Release” that I wrote for Linux Journal in July 2002.

D.3. Built-In Functions

D.3.1. The print Statement or print() Function

It’s no secret that one of the most common causes of breakage between Python 2.x and 3.x applications is the change in the print statement, which becomes a built-in function (BIF) in version 3.x. This change allows print() to be more flexible, upgradeable, and swappable, if desired.

Python 2.6 and newer support either the print statement or the print() BIF. The default is the former usage, as it should be in a version 2.x language. To discard the print statement and go with only the function in a “Python 3 mode” application, you would simply import print_function from __future__:

>>> print 'foo', 'bar'
foo bar
>>>
>>> from __future__ import print_function
>>> print
<built-in function print>
>>> print('foo', 'bar')
foo bar
>>> print('foo', 'bar', sep='-')
foo-bar

The preceding example demonstrates the power of print() as a function. Using the print statement, we display the strings foo and bar to the user, but we cannot change the default delimiter or separator between strings, which is a space. In contrast, print() makes this functionality available in its call as the argument sep, which replaces the default—and allows print to evolve and progress.

Note that this is a one-way import, meaning that there is no way to revert print() to a statement. Even issuing a "del print_function" will not have any effect. This major change is detailed in PEP 3105.

D.3.2. reduce() Moved to functools Module

In Python 3.x, the reduce() function has been demoted (much to the chagrin of many Python functional programmers) from being a BIF to functools module function, beginning in version 2.6.

>>> from operator import add
>>> reduce(add, range(5))
10
>>>
>>> import functools
>>> functools.reduce(add, range(5))
10

D.3.3. Other Updates

One key theme in Python 3.x is the migration to greater use of iterators, especially for BIF and methods that have historically returned lists. Still other iterators are changing because of the updates to integers. The following are the most high-profile BIFs, changed in Python 3.x:

range()

zip()

map()

filter()

hex()

oct()

Starting in Python 2.6, programmers can access the new and updated functions by importing the future_builtins module. Here is an example that demonstrates both the old and new oct() and zip() functions:

>>> oct(87)
'0127'
>>>
>>> zip(range(4), 'abcd')
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
>>> dict(zip(range(4), 'abcd'))
{0: 'a', 1: 'b', 2: 'c', 3: 'd'}
>>>
>>> import future_builtins
>>> future_builtins.oct(87)
'0o127'
>>>
>>> future_builtins.zip(range(4), 'abcd')
<itertools.izip object at 0x374080>
>>> dict(future_builtins.zip(range(4), 'abcd'))
{0: 'a', 1: 'b', 2: 'c', 3: 'd'}

If you want to use only the Python 3.x versions of these functions in your current Python 2.x environment, you can override the old ones by importing all the new functions into your namespace. The following example demonstrates this process with oct():

>>> from future_builtins import *
>>> oct(87)
'0o127'

D.4. Object-Oriented Programming: Two Different Class Objects

Python’s original classes are now called classic classes. They had many flaws and were eventually replaced by new-style classes. The transition began in Python 2.2 and continues today.

Classic classes use the following syntax:

class ClassicClass:
    pass

New-style classes use this syntax:

class NewStyleClass(object):
    pass

New-style classes feature so many more advantages than classic classes that the latter have been preserved only for backward-compatibility purposes and are eliminated entirely in Python 3. With new-style classes, types and classes are finally unified (see Guido’s “Unifying Types and Classes in Python 2.2” essay as well as PEP 252 and PEP 253).

There are no other changes added in Python 2.6 or newer for migration purposes, unless you count class decorators as a version 3.x feature. Just be aware that all 2.2+ versions serve as hybrid interpreters, allowing for both class objects and instances of those classes. In Python 3, both syntaxes shown in the preceding examples result only in new-style classes being created. This behavior does not pose a serious porting issue, but you do need to be aware that classic classes don’t exist in Python 3.

D.5. Strings

One especially notable change in Python 3.x is that the default string type is changing. Python 2.x supports both ASCII and Unicode strings, with ASCII being the default. This support is swapped in Python 3: Unicode becomes the default, and ASCII strings are now called bytes. The bytes data structure contains byte values and really shouldn’t be considered a string (anymore) as much as it is an immutable byte array that contains data.

Current string literals will now require a leading b or B in Python 3.x, and current Unicode string literals will drop their leading u or U. The type and BIF names will change from str to bytes and from unicode to str. In addition, there is a new mutable “string” type called bytearray that, like bytes, is also a byte array, only mutable.

You can find out more about using Unicode strings in the HOWTO and learn about the changes coming to string types in PEP 3137. Refer to Table C-1 for a chart on the various string types in both Python 2 and Python 3.

D.5.1. bytes Literals

To smooth the way for using bytes objects in Python 3.x, you can optionally prepend a regular ASCII/binary string in Python 2.6 with a leading b or B, thereby creating bytes literals (b'' or B'') as synonyms for str literals (''). The leading indicator has no bearing on any str object itself or any of the object’s operations (it is purely decorative), but it does prepare you for situations in Python 3 for which you need to create such a literal. You can find out more about bytes literals in PEP 3112.

bytes is str

It should not require much of a stretch of the imagination to recognize that if bytes literals are supported, then bytes objects themselves need to exist in Python 2.6+. Indeed, the bytes type is synonymous with str, as demonstrated in the following:

>>> bytes is str
True

Thus, you can use bytes or bytes() in Python 2.6+ wherever you use str or str(). Further information on bytes objects can be found in PEP 358.

D.6. Exceptions

Python 2.6 and newer version 2.x releases have several features that you can use to port exception handling and raise exceptions in Python 3.x.

D.6.1. Handling Exceptions (Using as)

The syntax in Python 3 for catching and handling a single exception looks like this:

except ValueError as e:

The e variable contains the instance of the exception that provides the reason why the error was thrown. It is optional, as is the entire as e phrase. Thus, this change really applies only to those users who save this value.

The equivalent Python 2 syntax uses a comma instead of the as keyword:

except ValueError, e:

This change was made in Python 3.x because of the confusion that occurs when programmers attempt to handle more than one exception with the same handler.

To catch multiple exceptions with the same handler, beginners often write this (invalid) code:

except ValueError, TypeError, e:

In fact, if you are trying to catch more than one exception, you need to use a tuple that contains the exceptions:

except (ValueError, TypeError), e:

The as keyword in Python 3.x (and version 2.6+) is intended to ensure that the comma in the original syntax is no longer a source of confusion. However, the parentheses are still required when you are trying to catch more than one type of exception using the same handler:

except (ValueError, TypeError) as e:

For porting efforts, Python 2.6 and newer accept either the comma or as when defining exception handlers that save the instance. In contrast; only the idiom with as is permitted in Python 3. You can find more information about this change in PEP 3110.

D.6.2. Raising Exceptions

The change in raising exceptions found in Python 3.x really isn’t a change at all; in fact, it doesn’t even have anything to do with the transition efforts associated with Python 2.6. The Python 3 syntax for raising exceptions (providing the optional reason for the exception) looks like this:

raise ValueError('Invalid value')

Long-time Python users have probably been using the following idiom (although both approaches are supported in all version 2.x releases):

raise ValueError, 'Invalid value'

To emphasize that raising exceptions is equivalent to instantiating an exception class and to provide some additional flexibility, Python 3 supports only the first idiom. The good news is that you don’t have to wait until you adopt version 2.6 to start using this technique—as we mentioned in Appendix C, this syntax has actually been valid since the Python 1.x days.

D.7. Other Transition Tools and Tips

In addition to Python 2.6, developers have access to an array of tools that can make the transition to Python 3.x go more smoothly—in particular, the -3 switch (which provides obsolescence warnings) and the 2to3 tool (you can read more about it at http://docs.python.org/3.0/library/2to3.html). However, the most important tool that you can “write” is a good transition plan. In fact, there’s no substitute for planning.

Clearly, the Python 3.x changes do not represent some wild mutation of the familiar Python syntax. Instead, the variations are just enough to break the old code base. Of course, the changes will affect users, so a good transition plan is essential. Most good plans come with tools or aids to help you out in this regard. The porting recommendations in the “What’s New in Python 3.0” document specifically state that good testing of code is critical, in addition to the use of key tools. Without mincing words, here is exactly what is suggested at http://docs.python.org/3.0/whatsnew/3.0.html#porting-to-python-3-0:

1. (Prerequisite) Start with excellent test coverage.

2. Port to Python 2.6. This should involve no more work than the average port from Python 2.x to Python 2.(x+1). Ensure that all your tests pass.

3. (Still using 2.6) Turn on the -3 command-line switch. It enables warnings about features that have been removed (or changed) in Python 3.0. Run your test suite again, and fix any code that generates warnings. Ensure that all your tests still pass.

4. Run the 2to3 source-to-source translator over your source code tree. Run the result of the translation under Python 3.0. Manually fix any remaining issues, and continue fixing problems until all tests pass again.

Another alternative to consider is the 3to2 tool. As you can guess from its name, it does the opposite of 2to3: it takes Python 3 code and attempts to deliver a working Python 2 equivalent. This library is maintained by an external developer and isn’t part of the standard library; however, it’s an interesting alternative because it encourages people to code in Python 3 as their main development tool, and that can’t be a bad thing. You can learn more about 3to2 at http://pypi.python.org/pypi/3to2.

The third alternative is to not port at all; instead, write code that runs on both 2.x and 3.x (with no changes to the source) to begin with. Is this possible?

D.8. Writing Code That is Compatible in Both Versions 2.x and 3.x

While we’re in the crossroads transitioning from Python 2 to 3, you might wonder whether it is possible to write code that runs without modification in both Python 2 and 3. It seems like a reasonable request, but how would you get started? What breaks the most Python 2 code when executed by a version 3.x interpreter?

D.8.1. print vs. print()

If you think like me, you’d say that the answer to the preceding question is the print statement. That’s as good a place to start as any, so let’s give it a shot. The tricky part is that in version 2.x, it’s a statement, thus a keyword or reserved word, whereas in version 3.x, it’s just a BIF. In other words, because language syntax is involved, you cannot use if statements, and no, Python still doesn’t have #ifdef macros!

Let’s try just putting parentheses around the arguments to print:

>>> print('Hello World!')
Hello World!

Cool! That works in both Python 2 and Python 3! Are we done? Sorry, not quite.

>>> print(10, 20) # Python 2
(10, 20)

You’re not going to be as lucky this time because the former is a tuple, whereas in Python 3, you’re passing in multiple arguments to print():

>>> print(10, 20) # Python 3
10 20

If you think a bit more, perhaps we can check if print is a keyword. You might recall that there is a keyword module that contains a list of keywords. Because print won’t be a keyword in version 3.x, you might think that it can be as simple as this:

>>> import keyword
>>> 'print' in keyword.kwlist
False

As a smart programmer, you’d probably try it in version 2.x, expecting a True for a response. Although you would be correct, you’d still fail for a different reason:

>>> import keyword
>>> if 'print' in keyword.kwlist:
...   from __future__ import print_function
...
  File "<stdin>", line 2
SyntaxError: from __future__ imports must occur at the beginning of
the file

One workable solution requires that you use a function that has similar capabilities as print. One of them is sys.stdout.write(); another solution is distutils.log.warn(). For whatever reason, we decided to use the latter in many of this book’s chapters. I suppose sys.stderr.write() will also work, if unbuffered output is your thing.

The “Hello World!” example would then look like this:

# Python 2.x
print 'Hello World!'
# Python 3.x
print('Hello World!')

The following line would work in both versions:

# Python 2.x & 3.x compatible
from distutils.log import warn as printf
printf('Hello World!')

That reminds me of why we didn’t use sys.stdout.write(); we would need to add a NEWLINE character at the end of the string to match the behavior:

# Python 2.x & 3.x compatible
import sys
sys.stdout.write('Hello World! ')

The one real problem isn’t this little minor annoyance, but that these functions are no true proxy for print or print() for that matter; they only work when you’ve come up with a single string representing your output. Anything more complex requires more effort on your part.

D.8.2. Import Your Way to a Solution

In other situations, life is a bit easier, and you can just import the correct solution. In the code that follows, we want to import the urlopen() function. In Python 2, it resides in urllib and urllib2 (we’ll use the latter), and in Python 3, it’s been integrated into urllib.request. Your solution, which works for both versions 2.x and 3.x, is neat and simple in this case:

try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen

For memory conservation, perhaps you’re interested in the iterator (Python 3) version of a well-known built-in such as zip(). In Python 2, the iterator version is itertools.izip(). This function is renamed to and replaces zip() in Python 3. In other words, itertools.izip() replaces zip() and takes on its name. If you insist on this iterator version, your import statement is also fairly straightforward:

try:
    from itertools import izip as zip
except ImportError:
    pass

One example, which isn’t as elegant looking, is the StringIO class. In Python 2, the pure Python version is in the StringIO module, meaning you access it via StringIO.StringIO. There is also a C version for speed, and that’s located at cStringIO.StringIO. Depending on your Python installation, you might prefer cStringIO first and fallback to StringIO if cStringIO is not available.

In Python 3, Unicode is the default string type, but if you’re doing any kind of networking, it’s likely that you’ll have to manipulate ASCII/bytes strings instead, so instead of StringIO, you’d want io.BytesIO. To get what you want, the import is slightly uglier:

try:
    from io import BytesIO as StringIO
except ImportError:
    try:
        from cStringIO import StringIO
    except ImportError:
        from StringIO import StringIO

D.8.3. Putting It All Together

If you’re lucky, these are all the changes you need to make, and the rest of your code is simpler than the setup at the beginning. If you install the imports of distutils.log.warn() [as printf()], url*.urlopen(), *.StringIO, and a normal import of xml.etree.ElementTree (2.5 and newer), you can write a very short parser to display the top headline stories from the Google News service with just these roughly eight lines of code:

g = urlopen('http://news.google.com/news?topic=h&output=rss')
f = StringIO(g.read())
g.close()
tree = xml.etree.ElementTree.parse(f)
f.close()
for elmt in tree.getiterator():
    if elmt.tag == 'title' and not
            elmt.text.startswith('Top Stories'):
        printf('- %s' % elmt.text)

This script runs exactly the same under version 2.x and 3.x with no changes to the code whatsoever. Of course, if you’re using version 2.4 and older, you’ll need to download ElementTree separately.

The code snippets in this subsection come from Chapter 14, “Text Processing,” so take a look at the goognewsrss.py file to see the full version in action.

Some will feel that these changes really start to mess up the elegance of your Python source. After all, readability counts! If you prefer to keep your code cleaner yet still write code that runs in both versions 2.x and 3.x without changes, take a look at the six package.

six is a compatibility library who’s primary role is to provide an interface to keep your application code the same while hiding the complexities described in this appendix subsection from the developer. To find out more about six, go to http://packages.python.org/six.

Whether you use a library like six or choose to roll your own, we hoped to show in this short narrative that it is possible to write code that runs in both versions 2.x and 3.x. The bottom line is that you might need to sacrifice some of the elegance and simplicity of Python, trading it off for true 2-to-3 portability. I’m sure we’ll be revisiting this issue for the next few years until the whole world has completed the transition to the next generation.

D.9. Conclusion

We know big changes are happening in the next generation of Python, simply because version 3.x code is backward incompatible with older releases. The changes, although significant, won’t require entirely new ways of thinking for programmers—though there is obvious code breakage. To ease the transition period, current and future releases of the remainder of the version 2.x interpreters will contain version 3.x-backported features.

Python 2.6 is the first of the “dual-mode” interpreters with which you can start programming against the version 3.x code base. Python 2.6 and newer run all version 2.x software as well as understand some version 3.x code. (The current goal is for version 2.7 to be the final 2.x release. To find more information on the fictional Python 2.8 release, go to PEP 404 at http://www.python.org/dev/peps/pep-0404.) In this way, these final version 2.x releases help simplify the porting and migration process and will ease you gently into the next generation of Python programming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.234.225