Chapter 7. Formatting

No other topic generates more heat and less light than code formatting. Everybody has their own style and attempts to impose another style are met with ferocious resistance. So, why am I willingly sticking my head into this buzz saw?

The first reason is “because it is there.” I want to push patterns to their limit, to see how well they apply to a detail-oriented, exception-ridden, and emotion-filled topic. I wrote these patterns over the course of a couple of months. As new special cases came up, I either had to modify the patterns, add a new pattern, or format my code according to the existing patterns. Before long, I no longer was finding cases where I had to change the patterns. I am quite pleased that all of formatting in Smalltalk fits into ten patterns.

The second reason is “because it is important.” Not necessarily formatting code according to these patterns but formatting them according to some set of consistent rules gives a team a smoothness to their interaction. If everybody formats the same way, then reviews and code transfer are never delayed while someone “cleans up” the code. Done right, formatting can convey a lot of information about the structure of code at a glance.

The third reason is to advance the discussion of formatting. Stating the rules as patterns makes my goals and tradeoffs explicit. If you have a different style, you can use these patterns as an example to cast your own rules as patterns. Then you can compare, explicitly, what problems each set of patterns solves and what problems each ignores.

The priorities of these patterns are:

1. To make the gross structure of the method apparent at a glance. Complex messages and blocks, in particular, should jump out at the reader.

2. To preserve vertical space. There is a huge difference between reading a method that fits into the text pane of a browser and reading one that forces you to scroll. Keeping methods compact vertically lets you have smaller browsers and still be able to read methods without scrolling. This reduces window management overhead and leaves more screen space for other programming tools.

3. To be easy to remember. I have seen style guides that have 50–100 rules for formatting. Formatting is important but it shouldn’t take that much brain power.

There are many styles of Smalltalk coding for which these formatting patterns would be a disaster. Long methods and methods that use complex expressions look terrible formatted this way. However, thorough use of Composed Method and Explaining Temporary Variable, along with the attitude that you are writing for a reader and not the computer, will go a long way towards helping you produce code that is simple to format and simple to read.

Inline Message Pattern

Image

You are about to write a method for an Intention Revealing Selector (p. 49).

• How do you format the message pattern?

One alternative is to write the keyword/argument pairs one per line. This makes it easy to see what the selector of the method is by reading straight down from the top left corner. However, this style of formatting will often take up three or four lines of vertical space. Composed Methods are generally only a few lines long. It seems a waste of space to have more introduction than content.

Another reason for lining up the keywords vertically is that early text editors did not have line wrapping, so if you wanted to see all the parameters, you had to scroll horizontally. All current Smalltalks have line wrapping available in the source code editor, so all arguments are available regardless of window size or message pattern width.

The problem of reading the selector as a whole is solved by the browser. You never look at a method as a raw piece of text. Methods always appear in the context of a browser. The selector is always presented near the method. If you forget what method you are working on, a quick glance above the method will answer your question.

By saving the vertical space otherwise taken up by the message pattern, you can quickly scan many methods in a smaller browser than is otherwise possible. This allows you to have more information on the screen at the same time, if it is useful. There is a big difference between browsing a program without every having to scroll the text of a method and browsing where you are constantly scrolling.

Write the message pattern without explicit line breaks.

Here is a message pattern formatted with this pattern:

I’ve seen this formatted like this:

or worse:

Both of these spend vertical space, increasing the chance that you won’t be able to see the body of the method (the part that matters and the part that is likely to be surprising) without scrolling.

Image

Use Type Suggesting Parameter Names (p. 174) for parameters.

Type Suggesting Parameter Name

Image

You are writing an Inline Message Pattern (p. 172). You might be completing a Double Dispatch (p. 55).

• What do you call a method parameter?

There are two important pieces of information associated with every variable—what messages it receives (its type) and what role it plays in the computation. Understanding the type and role of variables is important for understanding a piece of code.

Keywords communicate their associated parameter’s role. Since the keywords and parameters are together at the head of every method, the reader can easily understand a parameter’s role without any help from the name.

Smalltalk doesn’t have a strong notion of types. The set of messages sent to a variable appears nowhere in the language or programming environment. Because of this lack, there is no direct way to communicate types.

Classes sometimes play the role of types. You would expect a Number to be able to respond to messages like +, -, *, and /; or a Collection to do: and includes:.

Name parameters according to their most general expected class, preceded by “a” or “an.” If there is more than one parameter with the same expected class, precede the class with a descriptive word.

An Array that requires Integer keys names the parameters to at:put: as

A Dictionary, where the key can be any object, names the parameters:

Image

After you have named the parameters, you are ready to write the method. You may have to declare Role Suggesting Temporary Variable Names (p. 110). You may need to format an Indented Control Flow (p. 175). You may have to use a Guard Clause (p. 178) to protect the execution of the body of the method.

Indented Control Flow

• How do you indent messages?

The conflicting needs of formatting to produce both few lines and short lines is thrown in high relief with this pattern. The only saving grace is that Composed Method creates methods with little enough functionality that you never need to deal with hundreds or thousands of words in a method.

One extreme would be to place all the keywords and arguments on the same line, no matter how long the method. This minimizes the length of the method but makes it difficult to read.

If there are multiple keywords to a message, the fact that they all appear is important to communicate quickly to a scanning reader. By placing each keyword/argument pair on its own line, you can make it easy for the reader to recognize the presence of complex messages.

Arguments do not need to be aligned, unlike keywords, because readers seldom scan all the arguments. Arguments are only interesting in the context of their keyword.

Put zero or one argument messages on the same lines as their receiver. For messages with two or more keywords put each keyword/argument pair on its own line, indented one tab.

Here are some zero and one argument messages formatted with Indented Control Flow:

Here are some two argument messages formatted with Indented Control Flow:

Many people have complex exceptions for formatting control statements, like ifTrue: and whileTrue:. One of the things I really like about this pattern is that it gives reasonable results while treating conditional statements as just another message send (which they are, after all).

Formatting code like this makes reading the whole selector easy. You can easily read that the message in this example is #copyFrom:to:with:startingAt:

Image

Rectangular Block (p. 177) formats blocks. Guard Clause (p. 178) prevents indenting from marching across the page.

Rectangular Block

Image

You are writing an expression with Indented Control Flow (p. 175).

• How do you format blocks?

Smalltalk distinguishes between code that is executed immediately upon the activation of a method and code whose execution is deferred. To read code accurately, you must be able to quickly distinguish which code in a method falls into which category.

Code should occupy as few lines as possible, consistent with readability. Short methods are easier to assimilate quickly and they fit more easily into a browser. On the other hand, making it easy for the eye to pick out blocks is a reasonable use of extra lines.

One more resource we can bring to bear on this problem is the tendency of the eye to distinguish and interpolate vertical and horizontal lines. The square brackets used to signify blocks lead the eye to create the illusion of a whole rectangle even though one isn’t there. Therefore:

Make blocks rectangular. Use the square brackets as the upper left and bottom right corners of the rectangle. If the statement in the block is simple, the block can fit on one line. If the statement is compound, bring the block onto its own line and indent.

Here are a couple of one line blocks:

Here is a block that takes two lines because it contains two statements:

Here is a block that takes two lines because it contains a two parameter message:

Guard Clause

Image

You are writing an expression with an Indented Control Flow (p. 175).

How do you format code that shouldn’t execute if a condition holds?

In the bad old days of Fortran programming, when it was possible to have multiple entries and exits to a single routine, tracing the flow of control was a nightmare. Which statements in a routine got executed, and when, was impossible to determine statically. This lead to the commandment “Every routine shall have one entry and one exit.”

Smalltalk labors under few of the same constraints of long ago Fortran, but the prohibition against multiple exits persists. When routines are only a few lines long, understanding flow of control within a routine is simple. It is the flow between routines that becomes the legitimate focus of attention.

Multiple returns can simplify the formatting of code, particularly conditionals. What’s more, the multiple return version of a method is often a more direct expression of the programmer’s intent. Therefore:

Format the one-branch conditional with an explicit return.

Let’s say you have a method that connects a communication device only if the device isn’t already connected. The single exit version of the method might be:

You can read this as “If I am not already connected, connect my connection.” The guard clause version of the same method is:

You can read this as “Don’t do anything if I am connected. Connect my connection.” The guard clause is more a statement of fact, or an invariant, than a path of control to be followed.

Conditional Expression

• How do you format conditional expressions where both branches assign or return a value?

Most programming languages make a distinction between statements that work solely by side effect and expressions that return values. For example, control structures in C and Pascal work only by controlling how other statements execute.

In Smalltalk, there are no pure statements. All control structures are implemented in terms of messages, and all messages return values. This leads to the possibility of using the value of control structures.

Programmers new to Smalltalk are likely to be surprised the first time they encounter loops or conditionals used as an expression. New Smalltalkers are likely to write:

These expressions can be translated into the following without changing the meaning:

Is the simpler form worth the possibility of confusion for beginners? It more directly communicates the intent of the expression. You don’t mean “There are two paths of expression, one of which sets the value of cost to the result of sending myself calculateCost and the other of which sets the value of cost to 0.” You mean, “Set cost to one of two values, either the result of sending myself calculateCost or 0.”

Format conditionals so their value is used where it clearly expresses the intent of the method.

Assignment and return are often found in both branches of a conditional. Look for opportunities to factor both to the outside of the conditional.

Here is an example of a return on both branches of a conditional:

If I write code like this, I don’t mean, “Here are two alternative paths of execution.” I mean, “Here are two alternative values to be returned.” Thus, a Conditional Expression expresses my intent more clearly:

I commonly see code in which both sides of a conditional expression evaluate to a Boolean. Start with this:

Using Conditional Expression we first factor out the assignment:

We can go a step further and eliminate the conditional entirely. The following code is equivalent to the preceding:

Image

You may be able to express one or both branches of the conditional more explicitly by using a Composed Method (p. 21).

Simple Enumeration Parameter

• What do you call the parameter to an enumeration block?

It is tempting to try to pack as much meaning as possible into every name. Certainly, classes, instance variables, and messages deserve careful attention. Each of these elements can communicate volumes about your intent as you program.

Some variables just don’t deserve such attention. Variables that are always used the same way, where their meaning can be easily understood from context, call for consistency over creativity. The effort to carefully name such variables is wasted because no non-obvious information is communicated to the program. They may even be counter productive, if the reader tries to impute meaning to the variable that isn’t there.

Call the parameter “each.” If you have nested enumeration blocks, append a descriptive word to all parameter names.

For example, the meaning of “each” in

is clear. If the block is more complicated, each may not be descriptive enough. In that case, you should invoke Composed Method to turn the block into a single message. The Type Suggesting Parameter in the new method will clarify the meaning of the object.

The typical example of nested blocks is iterating over the two dimensions of a bitmap:

Nested blocks that iterate over unlike collections should probably be factored with Composed Method.

Image

You may need Composed Method to simplify the enumeration block.

Cascade

• How do you format multiple messages to the same receiver?

The simplest solution is to just repeat the expression that created the receiver. Such code looks like this:

For complex expressions, the first simplification is to use an Explaining Temporary Variable to hold the value of the expression:

One of Smalltalk’s few syntactic quirks is a solution to this problem. Rather than having to repeat an expression or create a temporary variable to hold the expression, Smalltalk lets you say at the end of one message “Here’s another message to the same receiver.”

Use a Cascade to send several messages to the same receiver. Separate the messages with a semicolon. Put each message on its own line and indent one tab. Only use Cascades for messages with zero or one argument.

The code above becomes:

Whether or not you use a Cascade is really a matter of intent. If you want to communicate “Here are a bunch of messages all going to the same object,” that’s a good time to use a Cascade. If you just happen to be sending messages to the same object, but it’s not really part of the essence of the code that the two messages are going to the same object, don’t use a Cascade.

One confusion that sometimes arises about Cascade is if the initial expression is complex, where do the cascaded messages get sent? For example, in:

the indentation cues you that both adds get sent to the new OrderedCollection, not the class itself. Here’s the rule: All subsequent messages in a Cascade go to the same receiver as the first message in the cascade (in this case, #add:). Any preceding parts of the expression that got you the receiver are irrelevant.

The restriction that Cascades only be used with zero or one argument messages comes from the difficulty in visually parsing Cascades with varying numbers of arguments. In the example above, what if you could send height:width: as a single message? Using Cascade, the code would look like:

At a glance, you can’t tell whether height and width are set separately or together. The readability gains of a Cascade are quickly lost if you have to spend any time figuring out the messages that are sent. Fortunately, most times messages go to the same receiver (especially more than two), the messages are simple.

Image

You may have to use Yourself (p. 186) if you are using the value of a Cascade.

Yourself

Image

You need to use the value of a Cascade (p. 183).

• How can you use the value of a Cascade if the last message doesn’t return the receiver of the message?

This has got to be the number one confusing method in all of Smalltalk. There it is in Object, where every new Smalltalker stumbles across it:

Or, if the programmer was really clever (and didn’t know about Interesting Return Value):

What’s going on?

Let’s say you want to add a bunch of elements to an OrderedCollection. Collection>>add: anObject is defined to return anObject, not the receiver of the message. If you want to assign the Collection to a variable:

will result in the value of all being 7. There are two solutions to this problem. The first is to put the variable assignment in parentheses:

When you need the value of a Cascade and the last message does not return the receiver, append the message “yourself” to the Cascade.

Our example becomes:

Sending “yourself” returns the receiver, the new instance of OrderedCollection. That’s the object that gets assigned to the variable.

I’ve seen folks become defensive about “yourself,” tacking it onto every Cascade they write. You shouldn’t do this. “yourself” is there to communicate to your reader that you really want the value of the receiver used, not the result of sending a message. If you aren’t using the value of a Cascade, don’t use “yourself.” For example, I wouldn’t use it in Point>>printOn:, because I don’t assign the value of the Cascade to a variable or return it as the value of the method.

Having written this, I’m not sure why I prefer Cascades to the parenthesized format. Perhaps it’s because there is a big psychological difference in parsing a method with parentheses and one without. If I can avoid parentheses and still have a method that reads clearly, I will.

Another use of “yourself” is with #inject:into:. Suppose you want to put all the children of a collection of parents together in a Set. You might be tempted to write:

But this wouldn’t work because the result of sending #addAll: is the argument (in this case the children), not the receiver. To get this to work as expected, you have to write:

Interesting Return Value

• When do you explicitly return a value at the end of a method?

All messages return a value. If a method does not explicitly return a value, the receiver of the message is returned by default. This causes some confusion for new programmers, who may be used to Pascal’s distinction between procedures and functions, or C’s lack of a definition of the return value of a procedure with no explicit return. To compensate, some programmers always explicitly return a value from every method.

The distinction between methods that do their work by side effect and those that are valuable for the result they return is important. An unfamiliar reader wanting to quickly understand the expected use of a method should be able to glance at the last line and instantly understand whether a useful return value is generated or not. Therefore:

Return a value only when you intend for the sender to use the value.

For example, consider the implementation of topComponent in VisualWorks. Visual components form a tree, with a ScheduledWindow at the root. Any component in the tree can fetch the root by sending itself the message “topComponent.” VisualPart (the superclass of interior nodes and leaves) implements this message by asking the container for its topComponent:

ScheduledWindow implements the base case of the recursion by returning itself. The simplest implementation would be to have a method with no statements. It would return the receiver. However, using Interesting Return Value, because the result is intended to be used by the sender, it explicitly returns “self.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.48.62