4
Choosing Understandable Names

“The two hardest problems in computer science are naming things, cache invalidation, and off-by-one errors.” This classic joke, attributed to Leon Bambrick and based on a quote by Phil Karlton, contains a kernel of truth: it’s hard to come up with good names, formally called identifiers, for variables, functions, classes, and anything else in programming. Concise, descriptive names are important for your program’s readability. But creating names is easier said than done. If you were moving to a new house, labeling all your moving boxes as “Stuff” would be concise but not descriptive. A descriptive name for a programming book might be Invent Your Own Computer Games with Python, but it’s not concise.

Unless you’re writing “throwaway” code that you don’t intend to maintain after you run the program once, you should put some thought into selecting good names in your program. If you simply use a, b, and c for variable names, your future self will expend unnecessary effort to remember what these variables were initially used for.

Names are a subjective choice that you must make. An automated formatting tool, such as Black, described in Chapter 3, can’t decide what you should name your variables. This chapter provides you with some guidelines to help you choose suitable names and avoid poor names. As always, these guidelines aren’t written in stone: use your judgment to decide when to apply them to your code.

Casing Styles

Because Python identifiers are case sensitive and cannot contain whitespace, programmers use several styles for identifiers that include multiple words:

  1. snake_case separates words with an underscore, which looks like a flat snake in between each word. This case often implies that all letters are lowercase, although constants are often written in UPPER_SNAKE_CASE.
  2. camelCase separates words by capitalizing the start of each word after the first. This case often implies the first word begins with a lowercase letter. The uppercase letters look like a camel’s humps.
  3. PascalCase, named for its use in the Pascal programming language, is similar to camelCase but capitalizes the first word as well.

Casing is a code formatting issue and we cover it in Chapter 3. The most common styles are snake_case and camelCase. Either is fine to use as long as your project consistently uses one or the other, not both.

PEP 8’s Naming Conventions

The PEP 8 document introduced in Chapter 3 has some recommendations for Python naming conventions:

  • All letters should be ASCII letters—that is, uppercase and lowercase English letters that don’t have accent marks.
  • Modules should have short, all lowercase names.
  • Class names should be written in PascalCase.
  • Constant variables should be written in uppercase SNAKE_CASE.
  • Function, method, and variable names should be written in lowercase snake_case.
  • The first argument for methods should always be named self in lowercase.
  • The first argument for class methods should always be named cls in lowercase.
  • Private attributes in classes should always begin with an underscore ( _ ).
  • Public attributes in classes should never begin with an underscore ( _ ).

You can bend or break these rules as required. For example, although English is the dominant language in programming, you can use letter characters in any language as identifiers: コンピューター = 'laptop' is syntactically valid Python code. As you can see in this book, my preference for variable names goes against PEP 8, because I use camelCase rather than snake_case. PEP 8 contains a reminder that a programmer doesn’t need to strictly follow PEP 8. The important readability factor isn’t which style you choose but consistency in using that style.

You can read PEP 8’s “Naming Conventions” section online at https://www.python.org/dev/peps/pep-0008/#naming-conventions.

Appropriate Name Length

Obviously, names shouldn’t be too long or too short. Long variable names are tedious to type, whereas short variable names can be confusing or mysterious. Because code is read more often than it’s written, it’s safer to err on the side of too long variable names. Let’s look at some examples of names that are too short and too long.

Too Short Names

The most common naming mistake is choosing names that are too short. Short names often make sense to you when you first write them, but their precise meaning can be lost a few days or weeks later. Let’s consider a few types of short names.

  • A one- or two-letter name like g probably refers to some other word that begins with g, but there are many such words. Acronyms and names that are only one or two letters long are easy for you to write but difficult for someone else to read. This also applies to . . .
  • . . . an abbreviated name like mon, which could stand for monitor, month, monster, or any number of words.
  • A single-word name like start can be vague: the start of what? Such names could be missing context that isn’t readily apparent when read by someone else.

One- or two-letter, abbreviated, or single-word names might be understandable to you, but you always need to keep in mind that other programmers (or even you a few weeks into the future) will have difficulty understanding their meaning.

There are some exceptions where short variable names are fine. For example, it’s common to use i (for index) as a variable name in for loops that loop over a range of numbers or indexes of a list, and j and k (because they come after i in the alphabet) if you have nested loops:

>>> for i in range(10): 
...     for j in range(3): 
...         print(i, j)
...
0 0
0 1
0 2
1 0
--snip--

Another exception is using x and y for Cartesian coordinates. In most other cases, I caution against using single-letter variable names. Although it might be tempting to use, say, w and h as shorthand for width and height, or n as shorthand for number, these meanings might not be apparent to others.

Too Long Names

In general, the larger the name’s scope, the more descriptive it should be. A short name like payment is fine for a local variable inside a single, short function. But payment might not be descriptive enough if you use it for a global variable across a 10,000-line program, because such a large program might process multiple kinds of payment data. A more descriptive name, such as salesClientMonthlyPayment or annual_electric_bill_payment, could be more suitable. The additional words in the name provide more context and resolve ambiguity.

It’s better to be overly descriptive than not descriptive enough. But there are guidelines for determining when longer names are unnecessary.

Prefixes in Names

The use of common prefixes in names could indicate unnecessary detail in the name. If a variable is an attribute of a class, the prefix might provide information that doesn’t need to be in the variable name. For example, if you have a Cat class with a weight attribute, it’s obvious that weight refers to the cat’s weight. So the name catWeight would be overly descriptive and unnecessarily long.

Similarly, an old and now obsolete practice is the use of Hungarian notation, the practice of including an abbreviation of the data type in names. For example, the name strName indicates that the variable contains a string value, and iVacationDays indicates that the variable contains an integer. Modern languages and IDEs can relay this data type information to the programmer without the need for these prefixes, making Hungarian notation an unnecessary practice today. If you find you’re including the name of a data type in your names, consider removing it.

On the other hand, the is and has prefixes for variables that contain Boolean values, or functions and methods that return Boolean values, make those names more readable. Consider the following use of a variable named is_vehicle and a method named has_key():

if item_under_repair.has_key('tires'): 
  is_vehicle = True 

The has_key() method and is_vehicle variable support a plain English reading of the code: “if the item under repair has a key named ‘tires,’ then it’s true that the item is a vehicle.”

Similarly, adding units to your names can provide useful information. A weight variable that stores a floating-point value is ambiguous: is the weight in pounds, kilograms, or tons? This unit information isn’t a data type, so including a prefix or suffix of kg or lbs or tons isn’t the same as Hungarian notation. If you aren’t using a weight-specific data type that contains unit information, naming the variable something like weight_kg could be prudent. Indeed, in 1999 the Mars Climate Orbiter robotic space probe was lost when software supplied by Lockheed Martin produced calculations in imperial standard units, whereas NASA’s systems used metric, resulting in an incorrect trajectory. The spacecraft reportedly cost $125 million.

Sequential Numeric Suffixes in Names

Sequential numeric suffixes in your names indicate that you might need to change the variable’s data type or add different details to the name. Numbers alone often don’t provide enough information to distinguish these names.

Variable names like payment1, payment2, and payment3 don’t tell the person reading the code what the difference is between these payment values. The programmer should probably refactor these three variables into a single list or tuple variable named payments that contains three values.

Functions with calls like makePayment1(amount), makePayment2(amount), and so on should probably be refactored into a single function that accepts an integer argument: makePayment(1, amount), makePayment(2, amount), and so on. If these functions have different behaviors that justify separate functions, the meaning behind the numbers should be stated in the name: makeLowPriorityPayment(amount) and makeHighPriorityPayment(amount), or make1stQuarterPayment(amount) and make2ndQuarterPayment(amount), for example.

If you have a valid reason for choosing names with sequential numeric suffixes, it’s fine to use them. But if you’re using these names because it’s an easy choice to make, consider revising them.

Make Names Searchable

For all but the smallest programs, you’ll probably need to use your editor or IDE’s Ctrl-F “find” feature to locate where your variables and functions are referenced. If you choose a short, generic variable name, such as num or a, you’ll end up with several false matches. To make the name easy to find immediately, form unique names by using longer variable names that contain specific details.

Some IDEs will have refactoring features that can identify names based on how your program uses them. For example, a common feature is a “rename” tool that can differentiate between variables named num and number, as well as between local num and global num variables. But you should still choose names as though these tools weren’t available.

Keeping this rule in mind will naturally help you pick descriptive names instead of generic ones. The name email is vague, so consider a more descriptive name like emailAddress, downloadEmailAttachment, emailMessage, or replyToAddress. Not only would such a name be more precise, it would be easier to find in your source code files as well.

Avoid Jokes, Puns, and Cultural References

At one of my previous software jobs, our codebase contained a function named gooseDownload(). I had no idea what this meant, because the product we were creating had nothing to do with birds or the downloading of birds. When I found the more-senior co-worker who had originally written the function, he explained that goose was meant as a verb, as in “goose the engine.” I had no idea what this phrase meant, either. He had to further explain that “goose the engine” was automotive jargon that meant press down on the gas pedal to make the engine go faster. Thus, gooseDownload() was a function to make downloads go faster. I nodded my head and went back to my desk. Years later, after this co-worker left the company, I renamed his function to increaseDownloadSpeed().

When choosing names in your program, you might be tempted to use jokes, puns, or cultural references to add some levity to your code. Don’t do this. Jokes can be hard to convey in text, and the joke probably won’t be as funny in the future. Puns can also be easy to miss, and handling repeat bug reports from co-workers who confused a pun for a typo can be quite punishing.

Culture-specific references can get in the way of communicating your code’s intent clearly. The internet makes it easier than ever to share source code with strangers around the world who won’t necessarily be fluent in English or understand English jokes. As noted earlier in the chapter, the names spam, eggs, and bacon used in Python documentation reference a Monty Python comedy sketch, but we use these as metasyntactic variables only; it’s inadvisable to use them in real-world code.

The best policy is to write your code in a way that non-native English speakers can readily understand: polite, direct, and humorless. My former co-worker might have thought gooseDownload() was a funny joke, but nothing kills a joke faster than having to explain it.

Don’t Overwrite Built-in Names

You should also never use Python’s built-in names for your own variables. For example, if you name a variable list or set, you’ll overwrite Python’s list() and set() functions, possibly causing bugs later in your code. The list() function creates list objects. But overwriting it can lead to this error:

>>> list(range(5)) 
[0, 1, 2, 3, 4]
1 >>> list = ['cat', 'dog', 'moose'] 
2 >>> list(range(5)) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable

If we assign a list value to the name list1, we’ll lose the original list() function. Attempting to call list()2 would result in a TypeError. To find out whether Python is already using a name, type it into the interactive shell or try to import it. You’ll get a NameError or ModuleNotFoundError if the name isn’t being used. For example, Python uses the names open and test but doesn’t use spam and eggs:

>>> open 
<built-in function open >
>>> import test
>>> spam 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'spam' is not defined
>>> import eggs 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'eggs'

Some commonly overwritten Python names are all, any, date, email, file, format, hash, id, input, list, min, max, object, open, random, set, str, sum, test, and type. Don’t use these names for your identifiers.

Another common problem is naming your .py files the same names as third-party modules. For example, if you installed the third-party Pyperclip module but also created a pyperclip.py file, an import pyperclip statement imports pyperclip.py instead of the Pyperclip module. When you try to call Pyperclip’s copy() or paste() functions, you’ll get an error saying they don’t exist:

>>> # Run this code with a file named pyperclip.py in the current folder.
>>> import pyperclip # This imports your pyperclip.py, not the real one.
>>> pyperclip.copy('hello') 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pyperclip' has no attribute 'copy'

Be aware of overwriting existing names in your Python code, especially if you’re unexpectedly getting these has no attribute error messages.

The Worst Possible Variable Names Ever

The name data is a terrible, generic variable name, because literally all variables contain data. The same goes for naming variables var, which is a bit like naming your pet dog “Dog.” The name temp is common for variables that temporarily hold data but is still a poor choice: after all, from a Zen perspective, all variables are temporary. Unfortunately, these names occur frequently despite their vagueness; avoid using them in your code.

If you need a variable to hold the statistical variance of your temperature data, please use the name temperatureVariance. It should go without saying that the name tempVarData would be a poor choice.

Summary

Choosing names has nothing to do with algorithms or computer science, and yet it’s a vital part of writing readable code. Ultimately, the names you use in your code are up to you, but be aware of the many guidelines that exist. The PEP 8 document recommends several naming conventions, such as lowercase names for modules and PascalCase names for classes. Names shouldn’t be too short or too long. But it’s often better to err on the side of too descriptive instead of not detailed enough.

A name should be concise but descriptive. A name that is easy to find using a Ctrl-F search feature is the sign of a distinct and descriptive variable. Think about how searchable your name is to determine whether you’re using a too generic name. Also, consider whether a programmer who doesn’t speak fluent English would understand the name: avoid using jokes, puns, and cultural references in your names; instead, choose names that are polite, direct, and humorless.

Although many of the suggestions in this chapter are simply guidelines, you should always avoid names Python’s standard library already uses, such as all, any, date, email, file, format, hash, id, input, list, min, max, object, open, random, set, str, sum, test, and type. Using these names could cause subtle bugs in your code.

The computer doesn’t care whether your names are descriptive or vague. Names make code easier to read by humans, not easier to run by computers. If your code is readable, it’s easy to understand. If it’s easy to understand, it’s easy to change. And if it’s easy to change, it’s easier to fix bugs or add new features. Using understandable names is a foundational step to producing quality software.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.53.5