1
STARTING YOUR PROJECT

image

In this first chapter, we’ll look at a few aspects of starting a project and what you should think about before you begin, such as which Python version to use, how to structure your modules, how to effectively number software versions, and how to ensure best coding practices with automatic error checking.

Versions of Python

Before beginning a project, you’ll need to decide what version(s) of Python it will support. This is not as simple a decision as it may seem.

It’s no secret that Python supports several versions at the same time. Each minor version of the interpreter gets bug-fix support for 18 months and security support for 5 years. For example, Python 3.7, released on June 27, 2018, will be supported until Python 3.8 is released, which should be around October 2019. Around December 2019, a last bug-fix release of Python 3.7 will occur, and everyone will be expected to switch to Python 3.8. Each new version of Python introduces new features and deprecates old ones. Figure 1-1 illustrates this timeline.

image

Figure 1-1: Python release timeline

On top of that, we should take into consideration the Python 2 versus Python 3 problem. People working with (very) old platforms may still require Python 2 support because Python 3 has not been made available on those platforms, but the rule of thumb is to forget Python 2 if you can.

Here is a quick way to figure out which version you need:

  • Versions 2.6 and older are now obsolete, so I do not recommend you worry about supporting them at all. If you do intend to support these older versions for whatever reason, be warned that you’ll have a hard time ensuring that your program supports Python 3.x as well. Having said that, you might still run into Python 2.6 on some older systems—if that’s the case, sorry!

  • Version 2.7 is and will remain the last version of Python 2.x. Every system is basically running or able to run Python 3 one way or the other nowadays, so unless you’re doing archeology, you shouldn’t need to worry about supporting Python 2.7 in new programs. Python 2.7 will cease to be supported after the year 2020, so the last thing you want to do is build a new software based on it.

  • Version 3.7 is the most recent version of the Python 3 branch as of this writing, and that’s the one that you should target. However, if your operating system ships version 3.6 (most operating systems, except Windows, ship with 3.6 or later), make sure your application will also work with 3.6.

Techniques for writing programs that support both Python 2.7 and 3.x will be discussed in Chapter 13.

Finally, note that this book has been written with Python 3 in mind.

Laying Out Your Project

Starting a new project is always a bit of a puzzle. You can’t be sure how your project will be structured, so you might not know how to organize your files. However, once you have a proper understanding of best practices, you’ll understand which basic structure to start with. Here I’ll give some tips on dos and don’ts for laying out your project.

What to Do

First, consider your project structure, which should be fairly simple. Use packages and hierarchy wisely: a deep hierarchy can be a nightmare to navigate, while a flat hierarchy tends to become bloated.

Then, avoid making the common mistake of storing unit tests outside the package directory. These tests should definitely be included in a subpackage of your software so that they aren’t automatically installed as a tests top-level module by setuptools (or some other packaging library) by accident. By placing them in a subpackage, you ensure they can be installed and eventually used by other packages so users can build their own unit tests.

Figure 1-2 illustrates what a standard file hierarchy should look like.

image

Figure 1-2: Standard package directory

The standard name for a Python installation script is setup.py. It comes with its companion setup.cfg, which should contain the installation script configuration details. When run, setup.py will install your package using the Python distribution utilities.

You can also provide important information to users in README.rst (or README.txt, or whatever filename suits your fancy). Finally, the docs directory should contain the package’s documentation in reStructuredText format, which will be consumed by Sphinx (see Chapter 3).

Packages will often have to provide extra data for the software to use, such as images, shell scripts, and so forth. Unfortunately, there’s no universally accepted standard for where these files should be stored, so you should just put them wherever makes the most sense for your project depending on their functions. For example, web application templates could go in a templates directory in your package root directory.

The following top-level directories also frequently appear:

  • etc for sample configuration files

  • tools for shell scripts or related tools

  • bin for binary scripts you’ve written that will be installed by setup.py

What Not to Do

There is a particular design issue that I often encounter in project structures that have not been fully thought out: some developers will create files or modules based on the type of code they will store. For example, they might create functions.py or exceptions.py files. This is a terrible approach and doesn’t help any developer when navigating the code. When reading a codebase, the developer expects a functional area of a program to be confined in a particular file. The code organization doesn’t benefit from this approach, which forces readers to jump between files for no good reason.

Organize your code based on features, not on types.

It is also a bad idea to create a module directory that contains only an __init__.py file, because it’s unnecessary nesting. For example, you shouldn’t create a directory named hooks with a single file named hooks/__init__.py in it, where hooks.py would have been enough. If you create a directory, it should contain several other Python files that belong to the category the directory represents. Building a deep hierarchy unnecessarily is confusing.

You should also be very careful about the code that you put in the __init__.py file. This file will be called and executed the first time that a module contained in the directory is loaded. Placing the wrong things in your __init__.py can have unwanted side effects. In fact, __init__.py files should be empty most of the time, unless you know what you’re doing. Don’t try to remove __init__.py files altogether though, or you won’t be able to import your Python module at all: Python requires an __init__.py file to be present for the directory to be considered a submodule.

Version Numbering

Software versions need to be stamped so users know which is the more recent version. For every project, users must be able to organize the timeline of the evolving code.

There is an infinite number of ways to organize your version numbers. However, PEP 440 introduces a version format that every Python package, and ideally every application, should follow so that other programs and packages can easily and reliably identify which versions of your package they require.

PEP 440 defines the following regular expression format for version numbering:

N[.N]+[{a|b|c|rc}N][.postN][.devN]

This allows for standard numbering such as 1.2 or 1.2.3. There are a few further details to note:

  • Version 1.2 is equivalent to 1.2.0, 1.3.4 is equivalent to 1.3.4.0, and so forth.

  • Versions matching N[.N]+ are considered final releases.

  • Date-based versions such as 2013.06.22 are considered invalid. Automated tools designed to detect PEP 440–format version numbers will (or should) raise an error if they detect a version number greater than or equal to 1980.

  • Final components can also use the following format:

    • N[.N]+aN (for example, 1.2a1) denotes an alpha release; this is a version that might be unstable and missing features.

    • N[.N]+bN (for example, 2.3.1b2) denotes a beta release, a version that might be feature complete but still buggy.

    • N[.N]+cN or N[.N]+rcN (for example, 0.4rc1) denotes a (release) candidate. This is a version that might be released as the final product unless significant bugs emerge. The rc and c suffixes have the same meaning, but if both are used, rc releases are considered newer than c releases.

  • The following suffixes can also be used:

    • The suffix .postN (for example, 1.4.post2) indicates a post release. Post releases are typically used to address minor errors in the publication process, such as mistakes in release notes. You shouldn’t use the .postN suffix when releasing a bug-fix version; instead, increment the minor version number.

    • The suffix.devN (for example, 2.3.4.dev3) indicates a developmental release. It indicates a prerelease of the version that it qualifies: for example, 2.3.4.dev3 indicates the third developmental version of the 2.3.4 release, prior to any alpha, beta, candidate, or final release. This suffix is discouraged because it is harder for humans to parse.

This scheme should be sufficient for most common use cases.

NOTE

You might have heard of Semantic Versioning, which provides its own guidelines for version numbering. This specification partially overlaps with PEP 440, but unfortunately, they’re not entirely compatible. For example, Semantic Versioning’s recommendation for prerelease versioning uses a scheme such as 1.0.0-alpha+001 that is not compliant with PEP 440.

Many distributed version control system (DVCS) platforms, such as Git and Mercurial, are able to generate version numbers using an identifying hash (for Git, refer to git describe). Unfortunately, this system isn’t compatible with the scheme defined by PEP 440: for one thing, identifying hashes aren’t orderable.

Coding Style and Automated Checks

Coding style is a touchy subject, but one we should talk about before we dive further into Python. Unlike many programming languages, Python uses indentation to define blocks. While this offers a simple solution to the age-old question “Where should I put my braces?” it introduces a new question: “How should I indent?”

That was one of the first questions raised in the community, so the Python folks, in their vast wisdom, came up with the PEP 8: Style Guide for Python Code (https://www.python.org/dev/peps/pep-0008/).

This document defines the standard style for writing Python code. The list of guidelines boils down to:

  • Use four spaces per indentation level.

  • Limit all lines to a maximum of 79 characters.

  • Separate top-level function and class definitions with two blank lines.

  • Encode files using ASCII or UTF-8.

  • Use one module import per import statement and per line. Place import statements at the top of the file, after comments and docstrings, grouped first by standard, then by third party, and finally by local library imports.

  • Do not use extraneous whitespaces between parentheses, square brackets, or braces or before commas.

  • Write class names in camel case (e.g., CamelCase), suffix exceptions with Error (if applicable), and name functions in lowercase with words and underscores (e.g., separated_by_underscores). Use a leading underscore for _private attributes or methods.

These guidelines really aren’t hard to follow, and they make a lot of sense. Most Python programmers have no trouble sticking to them as they write code.

However, errare humanum est, and it’s still a pain to look through your code to make sure it fits the PEP 8 guidelines. Luckily, there’s a pep8 tool (found at https://pypi.org/project/pep8/) that can automatically check any Python file you send its way. Install pep8 with pip, and then you can use it on a file like so:

$ pep8 hello.py
hello.py:4:1: E302 expected 2 blank lines, found 1
$ echo $?
1

Here I use pep8 on my file hello.py, and the output indicates which lines and columns do not conform to PEP 8 and reports each issue with a code—here it’s line 4 and column 1. Violations of MUST statements in the specification are reported as errors, and their error codes start with an E. Minor issues are reported as warnings, and their error codes start with a W. The three-digit code following that first letter indicates the exact kind of error or warning.

The hundreds digit tells you the general category of an error code: for example, errors starting with E2 indicate issues with whitespace, errors starting with E3 indicate issues with blank lines, and warnings starting with W6 indicate deprecated features being used. These codes are all listed in the pep8 readthedocs documentation (https://pep8.readthedocs.io/).

Tools to Catch Style Errors

The community still debates whether validating against PEP 8 code, which is not part of the Standard Library, is good practice. My advice is to consider running a PEP 8 validation tool against your source code on a regular basis. You can do this easily by integrating it into your continuous integration system. While this approach may seem a bit extreme, it’s a good way to ensure that you continue to respect the PEP 8 guidelines in the long term. We’ll discuss in “Using virtualenv with tox” on page 92 how you can integrate pep8 with tox to automate these checks.

Most open source projects enforce PEP 8 conformance through automatic checks. Using these automatic checks from the very beginning of the project might frustrate newcomers, but it also ensures that the codebase always looks the same in every part of the project. This is very important for a project of any size where there are multiple developers with differing opinions on, for example, whitespace ordering. You know what I mean.

It’s also possible to set your code to ignore certain kinds of errors and warnings by using the --ignore option, like so:

$ pep8 --ignore=E3 hello.py
$ echo $?
0

This will ignore any code E3 errors inside my hello.py file. The --ignore option allows you to effectively ignore parts of the PEP 8 specification that you don’t want to follow. If you’re running pep8 on an existing codebase, it also allows you to ignore certain kinds of problems so you can focus on fixing issues one category at a time.

NOTE

If you write C code for Python (e.g., modules), the PEP 7 standard describes the coding style that you should follow.

Tools to Catch Coding Errors

Python also has tools that check for actual coding errors rather than style errors. Here are some notable examples:

These tools all make use of static analysis—that is, they parse the code and analyze it rather than running it outright.

If you choose to use Pyflakes, note that it doesn’t check PEP 8 conformance on its own, so you’d need the second pep8 tool to cover both.

To simplify things, Python has a project named flake8 (https://pypi.org/project/flake8/) that combines pyflakes and pep8 into a single command. It also adds some new fancy features: for example, it can skip checks on lines containing # noqa and is extensible via plugins.

There are a large number of plugins available for flake8 that you can use out of the box. For example, installing flake8-import-order (with pip install flake8-import-order) will extend flake8 so that it also checks whether your import statements are sorted alphabetically in your source code. Yes, some projects want that.

In most open source projects, flake8 is heavily used for code style verification. Some large open source projects have even written their own plugins for flake8, adding checks for errors such as odd usage of except, Python 2/3 portability issues, import style, dangerous string formatting, possible localization issues, and more.

If you’re starting a new project, I strongly recommend that you use one of these tools for automatic checking of your code quality and style. If you already have a codebase that didn’t implement automatic code checking, a good approach is to run your tool of choice with most of the warnings disabled and fix issues one category at a time.

Though none of these tools may be a perfect fit for your project or your preferences, flake8 is a good way to improve the quality of your code and make it more durable.

NOTE

Many text editors, including the famous GNU Emacs and vim, have plugins available (such as Flycheck) that can run tools such as pep8 or flake8 directly in your code buffer, interactively highlighting any part of your code that isn’t PEP 8 compliant. This is a handy way to fix most style errors as you write your code.

We’ll talk about extending this toolset in Chapter 9 with our own plugin to verify correct method declaration.

Joshua Harlow on Python

Joshua Harlow is a Python developer. He was one of the technical leads on the OpenStack team at Yahoo! between 2012 and 2016 and now works at GoDaddy. Josh is the author of several Python libraries such as Taskflow, automaton, and Zake.

What got you into using Python?

I started programming in Python 2.3 or 2.4 back in about 2004 during an internship at IBM near Poughkeepsie, New York (most of my relatives and family are from upstate NY, shout out to them!). I forget exactly what I was doing there, but it involved wxPython and some Python code that they were working on to automate some system.

After that internship I returned to school, went on to graduate school at the Rochester Institute of Technology, and ended up working at Yahoo!.

I eventually ended up in the CTO team, where I and a few others were tasked with figuring out which open source cloud platform to use. We landed on OpenStack, which is written almost entirely in Python.

What do you love and hate about the Python language?

Some of the things I love (not a comprehensive listing):

  • Its simplicity—Python is really easy for beginners to engage with and for experienced developers to stay engaged with.

  • Style checking—reading code you wrote later on is a big part of developing software and having consistency that can be enforced by tools such as flake8, pep8, and Pylint really helps.

  • The ability to pick and choose programming styles and mix them up as you see fit.

Some of the things I dislike (not a comprehensive listing):

  • The somewhat painful Python 2 to 3 transition (version 3.6 has paved over most of the issues here).

  • Lambdas are too simplistic and should be made more powerful.

  • The lack of a decent package installer—I feel pip needs some work, like developing a real dependency resolver.

  • The global interpreter lock (GIL) and the need for it. It makes me sad . . . [more on the GIL in Chapter 11].

  • The lack of native support for multithreading—currently you need the addition of an explicit asyncio model.

  • The fracturing of the Python community; this is mainly around the split between CPython and PyPy (and other variants).

You work on debtcollector, a Python module for managing deprecation warnings. How is the process of starting a new library?

The simplicity mentioned above makes it really easy to get a new library going and to publish it so others can use it. Since that code came out of one of the other libraries that I work on (taskflow1) it was relatively easy to transplant and extend that code without having to worry about the API being badly designed. I am very glad others (inside the OpenStack community or outside of it) have found a need/use for it, and I hope that library grows to accommodate more styles of deprecation patterns that other libraries (and applications?) find useful.

What is Python missing, in your opinion?

Python could perform better under just-in-time (JIT) compilation. Most newer languages being created (such as Rust, Node.js using the Chrome V8 JavaScript engine, and others) have many of Python’s capabilities but are also JIT compiled. It would be really be great if the default CPython could also be JIT compiled so that Python could compete with these newer languages on performance.

Python also really needs a strong set of concurrency patterns; not just the low level asyncio and threading styles of patterns, but higher-level concepts that help make applications that work performantly at larger scale. The Python library goless does port over some of the concepts from Go, which does provide a built-in concurrency model. I believe these higher-level patterns need to be available as first-class patterns that are built in to the Standard Library and maintained so that developers can use them where they see fit. Without these, I don’t see how Python can compete with other languages that do provide them.

Until next time, keep coding and be happy!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.59