Chapter 1. Software qualities and a problem to solve

This chapter covers

  • Evaluating software from different points of view and for different objectives
  • Distinguishing internal from external software qualities
  • Distinguishing functional from nonfunctional software qualities
  • Assessing interactions and trade-offs between software qualities

The core idea of this book is to convey the mindset of an experienced developer by comparing and contrasting different code qualities (aka nonfunctional requirements). Most of these qualities—like performance or readability—are universal, in the sense that they’re relevant to any piece of software. To emphasize this fact, you’ll revisit the same recurring example in each chapter: a simple class representing a system of water containers.

In this chapter, I’ll introduce the software qualities that this book addresses, and I’ll present the specifications for the water container example, followed by a preliminary implementation.

1.1. Software qualities

In this book, you should interpret the word quality as a characteristic that a piece of software may or may not have, not as its overall value. That’s why I talk about multiple qualities. You can’t consider all characteristics qualities; for example, the programming language in which a piece of software is written is certainly a characteristic of that software but not a quality. Qualities are characteristics that you can grade on a scale, at least in principle.

As with all products, the software qualities that people are mostly interested in are those that measure the extent to which the system fulfills its requirements. Unfortunately, just describing—let alone fulfilling—the requirements of a piece of software is no easy task. Indeed, the entire field of Requirements Analysis is devoted to it. How is that possible? Isn’t it enough for the system to reliably and consistently offer the services its users need?

First of all, often the users themselves don’t exactly know what services they need— they need time and assistance to figure that out. Second, fulfilling those needs isn’t the end of the story at all. Those services may be offered more or less quickly, with more or less accuracy, after a long user training or after just a quick glance at a well-designed UI, and so on. In addition, over time you need to modify, fix, or improve any system, which leads to more quality variables: How easy is it to understand the system’s inner workings? How easy is it to modify and extend it without breaking other parts? The list goes on and on.

To put some order in this multitude of criteria, experts suggest organizing them according to two characteristics: internal versus external and functional versus nonfunctional.

1.1.1. Internal vs. external qualities

The end user can perceive external qualities while interacting with the system, whereas you can appraise internal ones only by looking at the source code. The boundary between these two categories isn’t clear-cut. The end user can indirectly perceive some internal qualities. Vice versa, all external qualities ultimately depend on the source code.

Software quality standards

The ISO and IEC standardization bodies have defined software quality requirements since 1991 in standard 9126, which was superseded by standard 25010 in 2011.

For example, maintainability (how easy it is to modify, fix, or extend the software) is an internal attribute, but end users will become aware of it if a defect is found and programmers take a long time to fix it. Conversely, robustness to incorrect inputs is generally considered an external attribute, but it becomes internal when the piece of software under consideration—perhaps a library—isn’t exposed to the end user and only interacts with other system modules.

1.1.2. Functional vs. nonfunctional qualities

The second distinction is between qualities that apply to what the software does (functional qualities) and those that refer to how the software is (nonfunctional qualities) (figure 1.1). The internal-external dichotomy applies to this distinction as well: if the software does something, its effect is visible to the end user, one way or another. Therefore, all functional qualities are external. On the other hand, nonfunctional qualities can be either internal or external, depending on whether they’re more related to the code itself or to its emerging traits. The following sections contain examples of both kinds. In the meantime, take a look at figure 1.2, which puts all the qualities addressed in this chapter in a 2D spectrum, representing the internal-external distinction on the horizontal axis and the functional versus nonfunctional distinction on the vertical one. The next section presents the main software qualities that the end user can directly appraise.

Figure 1.1. Functional and nonfunctional requirements pull software in different directions. It’s your job to find a balance.

1.2. Mostly external software qualities

External software qualities pertain to the observable behavior of the program and as such are naturally the primary concern of the development process. Besides attributing these qualities to software, I’ll discuss them in relation to a plain old toaster to try and frame them in the most general and intuitive sense. The following subsections provide a description of the most important external qualities.

1.2.1. Correctness

Adherence to stated objectives, aka requirements or specifications

For a toaster to be correct, it must cook sliced bread until it’s brown and crispy. Software, on the other hand, must offer the functionalities that were agreed on with the customer. This is the functional quality, by definition.

There’s no secret recipe for correctness, but people employ a variety of best practices and development processes to improve the likelihood of writing correct software in the first place, and catching defects after the fact. In this book, I’ll focus on the small-scale techniques that a single programmer can employ on the job, regardless of the specific development process their company has adopted.

First of all, there can be no correctness if the developer doesn’t have a clear idea of the specifications they’re aiming at. Thinking of specifications in terms of contracts and implementing safeguards to enforce those contracts are useful ideas I explore in chapter 5. The primary way to catch the inevitable defects is to put the software through simulated interactions, that is, testing. Chapter 6 discusses systematic ways to design test cases and measure their effectiveness. Finally, adopting the best practices for readable code benefits correctness by helping both the original author and their peers spot problems, before and after they’re exposed by failed tests. Chapter 7 presents a selection of such best practices.

1.2.2. Robustness

Resilience to incorrect inputs or adverse/unanticipated external conditions (such as the lack of some resource)

Correctness and robustness are sometimes lumped together as reliability. A robust toaster doesn’t catch fire if a bagel, a fork, or nothing at all is pushed in instead of bread. It has safeguards in place against overheating, and so on.[1]

1

Toaster robustness is no joke: an estimated 700 people worldwide are killed every year in toaster-related accidents.

Robust software, among other things, checks that its inputs are valid values. If they’re not, it signals the problem and reacts accordingly. If the error condition is fatal, a robust program aborts after salvaging as much as possible of the user data or the computation that has been performed so far. Chapter 5 addresses robustness by promoting rigorous specification and runtime monitoring of method contracts and class invariants.

1.2.3. Usability

A measure of the effort needed to learn how to use software and to achieve its goals; ease of use

Modern pop-up toasters are very easy to use, doing away with a lever to push the bread in and start toasting, and a knob to adjust the amount of toasting desired. Software usability is tied to the design of its user interface (UI) and is addressed by such disciplines as human-computer interaction and user experience (UX) design. This book doesn’t address usability because it’s focused on software systems with no direct exposure to the end user.

1.2.4. Efficiency

Adequate consumption of resources

Toaster efficiency may refer to how much time and electricity is needed to complete its toasting task. For software, time and space (memory) are the two resources that all programs consume. Chapters 3 and 4 deal with time and space efficiency, respectively. Many programs also require network bandwidth, database connections, and many other resources. Trade-offs commonly arise between different resources. A more powerful toaster may be faster but require more (peak) electricity. Analogously, some programs may be made faster by employing more memory (more on this later).

Although I’m listing efficiency among the external qualities, its true nature is ambiguous. For example, execution speed is definitely noticeable on the part of the end user, especially when it’s limited. Consumption of other resources, like network bandwidth, is instead hidden from the user, and you can appraise it only with specialized tools or by analyzing the source code. That’s why I put efficiency somewhat in the middle in figure 1.2.

Figure 1.2. Software qualities classified according to two dichotomies: internal versus external (horizontal axis) and functional versus nonfunctional (vertical axis). The qualities that I specifically address in the book have a thick border.

Efficiency is a mostly nonfunctional quality, because in general the user doesn’t care if some service is offered in one or two milliseconds, or whether one or two kilobytes of data is sent over the network. It becomes a functional issue in two contexts:

  • In performance-critical applications— In these cases, performance guarantees are part of the specifications. Think of an embedded device that interacts with physical sensors and actuators. The response time of its software must obey precise timeouts. Failure to do so may result in functional inconsistencies in the best case, all the way to life-threatening incidents in industrial, medical, or automotive applications.
  • Whenever the efficiency is so bad that it affects normal operations— Even for a consumer-oriented, noncritical program, there’s a limit to the sluggishness and memory hunger that the user is willing to put up with. Beyond that, the lack of efficiency rises to the level of a functional defect.

1.3. Mostly internal software qualities

You can appraise internal qualities better by looking at the source code of a program than by running it. The following subsections provide a list of the most important internal qualities.

1.3.1. Readability

Clarity, understandability by fellow programmers

It may seem odd to speak of toaster readability, until we realize that, as for all internal qualities, we are talking about its structure and design. In fact, the relevant international standard for software qualities dubs this characteristic analyzability. So, a readable toaster, once opened for inspection, is easy to analyze, revealing a clear internal layout, with the heating elements well separated from the electronics, an easily identifiable power circuit and timer, and so on.

A readable program is just what it sounds like: easy to understand by another programmer, or by the author after the program’s mental model has faded from their mind. Readability is an extremely important, and often undervalued, code quality. It’s the topic of chapter 7 of this book.

1.3.2. Reusability

Ease of reusing the code to solve similar problems, and amount of changes needed to do so—aka adaptability

You may consider a toaster reusable in our sense if the company that makes it can adapt its design and its parts to build other appliances. For example, its power cord is likely to be standard and, as such, compatible with similar small appliances. Perhaps its timer could be used in a microwave, and so on.

Code reuse was one of the historical selling points of the object oriented (OO) paradigm. Experience has proven that the vision of building complex systems out of widely reusable software components was exaggerated. The modern trend, instead, favors libraries and frameworks that are intentionally designed for reusability, on top of which lies a not-so-thin layer of application-specific code that doesn’t aim at reusability. I address reusability in chapter 9 of this book.

1.3.3. Testability

The ability to write tests, and how easy it is to do so, that can trigger all relevant program behaviors and observe their effects

Before discussing testable toasters, let’s try to figure out what a toaster test might look like.[2] A reasonable test procedure would involve inserting suitable thermometers into the slots and starting a toasting run. You’d measure success by the temperature change in time being sufficiently close to a predetermined nominal one. A testable toaster makes this procedure easy to perform repeatedly and automatically, with as little human intervention as possible. For example, a toaster that you can start by pushing a button is more testable than a toaster requiring a lever to be pulled down, because it’s easier for a machine to push or bypass a button than to pull or bypass a lever.

2

According to some reports, “how to test a toaster” is a recurring question in software engineering job interviews.

Testable code exposes an API that allows the caller to verify all expected behaviors. For example, a void method (aka a procedure) is less testable than a method returning a value. This book addresses testing techniques and testability in chapter 6.

1.3.4. Maintainability

Ease of finding and fixing bugs, as well as evolving the software

A maintainable toaster is easy to pull apart and service. Its schematics are widely available, and its components are replaceable. Similarly, maintainable software is readable and modular, with different parts having clearly defined responsibilities and interacting in clearly defined ways. Testability and readability, addressed in chapters 6 and 7, are among the main contributors to maintainability.

The FURPS model

Large companies with strong technical traditions develop their own quality model for their software development processes. For example, Hewlett-Packard developed the well-known FURPS model, which classifies software characteristics in five groups: Functionality, Usability, Reliability, Performance, and Supportability.

Table 1.1. Typical interactions between code qualities: ↓ stands for “hurts” and - for “no interaction.” Inspired by Figure 20-1 in Code Complete (see the Further reading section at the end of this chapter).
Readability        
Robustness -      
Space efficiency -    
Time efficiency  
  Readability Robustness Space efficiency Time efficiency

1.4. Interactions between software qualities

Some software qualities represent contrasting objectives, while others go hand-in-hand. The result is a balancing act common to all engineering specialties. Mathematicians have a name for this type of problem: multi-criteria optimization; that is, finding optimal solutions with respect to multiple competing quality measures. Contrary to an abstract mathematical problem, software qualities may be impossible to quantify (think readability). Luckily, you don’t need to find a truly optimal solution, just one that’s good enough for your purposes.

Table 1.1 summarizes the relationships between four of the qualities that we examine in this book. Both time and space efficiency may hinder readability. Seeking maximum performance leads to sacrificing abstraction and writing lower level code. In Java, this may entail using primitive types instead of objects, plain arrays instead of collections, or, in extreme cases, writing performance-critical parts in a lower level language like C and connecting them with the main program using the Java Native Interface.

Minimizing memory requirements also favors the use of primitive types, as well as special encodings, where a single value is used as a compact way to represent different things. (You’ll see an example of this in section 4.4.) All these techniques tend to hurt readability, and hence maintainability. Conversely, readable code uses more temporary variables and support methods and shies away from those low-level performance hacks.

Time and space efficiency also conflict with each other. For example, a common strategy for improving performance involves storing extra information in memory, instead of computing it every time it’s needed. A prominent example is the difference between singly and doubly linked lists. Even though the “previous” link of every node could in principle be computed by scanning the list, storing and maintaining those links allows for constant-time deletion of arbitrary nodes. The class in section 4.4 trades improved space efficiency for increased running time.

Maximizing robustness requires adding code that checks for abnormal circumstances and reacts in the proper way. Such checks incur a performance overhead, albeit usually quite limited. Space efficiency need not be impacted in any way. Similarly, in principle, there’s no reason why robust code should be less readable.

Software metrics

Software qualities are related to software metrics, which are quantifiable properties of a piece of software. Hundreds of metrics have been proposed in the literature, two of the most common being the mere number of lines of code (aka LOC) and the cyclomatic complexity (a measure of the amount of nesting and branching). Metrics provide objective means of evaluating and monitoring a project that are intended to support decisions related to project development. For example, a method having high cyclomatic complexity may require more testing effort.

Modern IDEs automatically compute common software metrics either natively or via plugins. The relative merits of these metrics, their relationships with the general software qualities described in this chapter, and their effective use are highly debated topics in the software engineering community. In this book, we’ll make use of code coverage metrics in chapter 6.

Opposite to these software qualities sits another force that contrasts them all: development time. Business reasons push for writing software quickly, but maximizing any quality attribute requires deliberate effort and time. Even when management is sensitive to the prospective benefits of carefully designed software, it may be tricky to estimate how much time is enough time for a high-quality result. Development processes, of which there are a rich variety, propose different solutions to this problem, some advocating the use of the software metrics mentioned in the sidebar.

This book doesn’t enter into the process debate (sometimes it feels like “war” is a more appropriate term), instead focusing on those software qualities that remain meaningful when applied to a small software unit consisting of a single class with a fixed API. Time and space efficiency make the cut, together with reliability, readability, and generality. I exclude other qualities, such as usability or security, from this analysis.

1.5. Special qualities

In addition to the quality attributes I’ve described in the previous sections, I’ll consider two properties of a class that are not formally software qualities: thread safety and succinctness.

1.5.1. Thread safety

The ability of a class to work seamlessly in a multithreaded environment

This isn’t a general software quality because it applies only to the specific context of multithreaded programs. Still, such context has become so ubiquitous and thread synchronization issues are so tricky that knowing your way around basic concurrency primitives is a valuable skill to have in any programmer’s toolbox.

It’s tempting to put thread safety among the internal qualities, but that would be a mistake. What’s truly hidden from the user is whether a program is sequential or multi-threaded. In the realm of multithreaded programs, thread safety is a basic prerequisite to correctness, and as such a very visible quality. Incidentally, thread safety issues lead to some of the hardest bugs to detect because of their apparent randomness and poor reproducibility. That’s why in figure 1.2 I put thread safety in the same area as correctness and robustness. Chapter 8 is devoted to ensuring thread safety while avoiding common concurrency pitfalls.

1.5.2. Succinctness

Writing the shortest possible program for a given task

Generally speaking, this isn’t a code quality at all. On the contrary, it leads to horrible, obscure code. I’ve included it in this book (in appendix A) as a fun exercise that pushes the language to its limits and challenges your knowledge of Java or any programming language of your choice.

Still, you can find practical scenarios where succinctness is a desired objective. Lowend embedded systems like smart cards, found in phones and credit cards, may be equipped with so little memory that the program must not only occupy little memory while running, but also exhibit a small footprint when stored on persistent memory. Indeed, most smart cards these days feature 4 KB of RAM and 512 KB of persistent storage. In such cases, the sheer number of bytecode instructions becomes a relevant issue, and shorter source code may lead to fewer issues in that area.

1.6. The recurring example: A system of water containers

In this section, I’ll describe the programming problem that you’ll solve repeatedly in the rest of the book, each time aiming at a different software quality objective. You’ll learn the desired API, followed by a simple use case and a preliminary implementation.

Suppose you need to implement the core infrastructure for a new social network. People can register and, of course, connect with each other. Connections are symmetric (if I’m connected to you, you’re automatically connected to me, as with Facebook), and one special feature of this network is that users can send a message to all the users to whom they’re connected, directly or indirectly. In this book, I’ll take the essential features of this scenario and put them in a simpler setting, where we don’t have to worry about the content of the messages or the attributes of the people.

Instead of people, you’ll deal with a set of water containers, all identical and equipped with a virtually unlimited capacity. At any given time, a container holds a certain amount of liquid, and any two containers can be permanently connected by a pipe. Instead of sending messages, you can pour water in or remove it from a container. Whenever two or more containers are connected, they become communicating vessels, and from that time on they split equally the liquid contained in them.

1.6.1. The API

This section describes the desired API for the water containers. At the very least, you’ll build a Container class, endowed with a public constructor that takes no arguments and creates an empty container, and the following three methods:

  • public double getAmount()—Return the amount of water currently held in this container.
  • public void connectTo(Container other)—Permanently connect this container with other.
  • public void addWater(double amount)—Pour amount units of water into this container. This method automatically and equally distributes water among all containers that are connected, directly or indirectly, to this one. You can also use this method with a negative amount to remove water from this container. In that case, the group of connected containers should be holding enough water to satisfy the request—you wouldn’t want to leave a negative amount of water in a container.

Most of the implementations I present in the following chapters conform exactly to this API, save for a couple of clearly marked exceptions, where tweaking the API helps optimizing a certain software quality.

A connection between two containers is symmetric: water can flow in both directions. A set of containers connected by symmetric links form what is known in computer science as an undirected graph. See the sidebar to learn the basic notions about such graphs.

Undirected graphs

In computer science, networks of pairwise connected items are called graphs. In this context, items are also known as nodes and their connections as edges. If connections are symmetric, the graph is called undirected because the connections don’t have a specific direction. A set of items that are connected, directly or indirectly, is called a connected component. In this book, a maximal connected component is simply called a group.

The elements of graphs according to computer science

A proper implementation of addWater in the container scenario requires that you know what components are connected because you have to spread (or remove) water evenly among all connected containers. In fact, the main algorithmic problem underlying the proposed scenario consists of maintaining knowledge of the connected components under node creation (new Container) and edge insertion (connectTo method), a type of dynamic graph connectivity problem.

Such problems are central to many applications involving networks of items: in a social network, connected components represent groups of people linked by friendship; in image processing, connected (in the sense of adjacent) regions of same-color pixels help identify objects in a scene; in computer networks, discovering and maintaining connected components is a basic step in routing. Chapter 9 explores the reach and the limits of our specific version of the problem.

1.6.2. The use case

This section presents a simple use case that exemplifies the API outlined in the previous section. You’ll create four containers, put some water in two of them, and then progressively connect them until they form a single group (figure 1.3). For this preliminary example, you’ll insert the water first and then connect the containers. In general, you can freely interleave these two operations. What’s more, you can create new containers at any time.

Figure 1.3. The four steps of the use case: from four empty isolated containers to a single group of connected containers

I’ve divided the use case (class UseCase in the online repository (https://bitbucket.org/mfaella/exercisesinstyle)) into four parts so that in the other chapters you can easily refer to specific points and examine how different implementations fulfill the same requests. The four steps are illustrated in figure 1.3. In the first part, which coincides with the following code snippet, you simply create four containers. Initially, they’re empty and isolated (not connected).

Container a = new Container();
Container b = new Container();
Container c = new Container();
Container d = new Container();

Next, you add water to the first and last containers and connect the first two with a pipe. At the end, you print the water amount in each container to screen to check that everything worked according to the specifications.

a.addWater(12);
d.addWater(8);
a.connectTo(b);
System.out.println(a.getAmount()+" "+b.getAmount()+" "+
                   c.getAmount()+" "+d.getAmount());

At the end of the previous snippet, containers a and b are connected, so they share the water that you put into a, whereas containers c and d are isolated. The following is the desired output from the println:

6.0 6.0 0.0 8.0

Let’s move on and connect c to b to check whether adding a new connection automatically redistributes the water among all connected containers.

b.connectTo(c);
System.out.println(a.getAmount()+" "+b.getAmount()+" "+
                   c.getAmount()+" "+d.getAmount());

At this point, c is connected to b and, indirectly, to a. Now a, b, and c are communicating vessels, and the total amount of water contained in all of them distributes equally among them. Container d is unaffected, leading to this output:

4.0 4.0 4.0 8.0

Pay special attention to the current point in the use case, as I will use it in the following chapters as a standard scenario to show how different implementations represent the same situation in memory.

Finally, connect d to b so that all containers form a single connected group:

b.connectTo(d);
System.out.println(a.getAmount()+" "+b.getAmount()+" "+
                   c.getAmount()+" "+d.getAmount());

As a consequence, in the final output, the water level is equal in all containers:

5.0 5.0 5.0 5.0

1.7. Data model and representations

Now that you know the requirements for your water container class, you can turn to designing an actual implementation. The specifications fix the public API, so the next step is to figure out which fields each Container object needs, and possibly the class itself (aka static fields) needs. The examples in later chapters show that you can come up with a surprisingly large number of different field choices, depending on which quality objective you’re aiming for. This section presents some general observations that apply regardless of the specific quality objective.

First of all, the objects must include enough information to offer the services that the specifications require. Once this basic criterion is met, you still have two types of decisions to make:

  1. Do you store any extra information, even if not strictly necessary?
  2. How do you encode all the information you want to store? Which data types or structures are the most appropriate? And which object(s) will be responsible for it?

Regarding question 1, you may want to store unnecessary information for two possible reasons. First, you may do so for performance; this is the case of information that you could derive from other fields, but you prefer to have it ready because deriving it is more expensive than maintaining it. Think of a linked list storing its length in a field, even if that information could be computed on-the-fly by scanning the list and counting the number of nodes. Second, you sometimes store extra information to make room for future extensions. You’ll encounter an example of this in section 1.7.2.

Once you establish what information is to be stored, it’s time to answer question 2 by equipping classes and objects with fields of appropriate types. Even in a relatively simple scenario like our water containers, this step can be far from trivial. As the whole book tries to prove, several competing solutions may exist, all valid in different contexts and with different quality objectives in mind.

Focusing on our scenario, the information describing the current state of a container is composed of two aspects: the amount of water held in it and its connections with other containers. The next two sections deal with each aspect separately.

1.7.1. Storing water amounts

First of all, the presence of the getAmount method requires containers to “know” the amount of water in them. By “knowing,” I don’t mean that you should necessarily store this information in the container. It’s too early to make that call. What I mean is simply that the container has some way to appraise that value and return it. Additionally, the API dictates that such an amount be represented by a double. The natural implementation choice is indeed to include an amount field of type double in each container. Under closer inspection, you might notice that each container in a group of connected containers holds the same amount of water. So, it might be preferable to store such amount information only once, in a separate object representing a group of containers. In this way, you’ll only need to update a single object when addWater is called, even if the current container is connected to many others.

Finally, instead of a separate object, you also could store the group amount in a special container, chosen as the representative for its group. Summarizing, at least three approaches seem to make sense at this point:

  1. Each container holds an up-to-date “amount” field.
  2. A separate “group” object holds the “amount” field.
  3. Only one container in each group—the representative—holds the up-to-date amount value, which applies to all containers in the group.

In the following chapters, various implementations side with each of these three alternative approaches (as well as a couple of extra approaches), and I’ll discuss the pros and cons of each approach in detail.

1.7.2. Storing connections

When adding water to a container, the liquid must be distributed equally over all containers that are connected (directly or indirectly) to it. Each container therefore must be able to identify all the containers that are connected to it. An important decision is whether to distinguish direct from indirect connections. A direct connection between a and b can be established only via the call a.connectTo(b) or b.connectTo(a), whereas indirect connections arise as a consequence of direct ones.[3]

3

In mathematical terms, indirect connections correspond to the transitive closure of direct ones.

Picking the Information To Be Stored

The operations that our specifications require don’t distinguish direct from indirect connections, so you could just store the more general type: indirect connections. However, suppose that at some point in the future you want to add a “disconnectFrom” operation whose intent is to undo a previous “connectTo” operation. If you mix up direct and indirect connections, you can’t hope to correctly implement “disconnectFrom.”

Indeed, consider the two scenarios represented in figure 1.4, where direct connections are drawn as lines between containers. If you store only indirect connections in memory, the two scenarios are indistinguishable: in both cases, all containers are mutually connected. Hence, if the same sequence of operations is applied to both scenarios, they’re bound to react in the exact same way. On the other hand, consider what should happen if the client issues the following operations:

a.disconnectFrom(b);
a.addWater(1);
Figure 1.4. Two three-container scenarios. Lines between containers represent direct connections.

If these two lines are executed on the first scenario (figure 1.4, left), the three containers are still mutually connected, so the extra water must be split equally among all of them. Conversely, in the second scenario (figure 1.4, right) disconnecting a from b makes container a isolated, so the extra water must be added to a only. This shows that only storing indirect connections is incompatible with a future “disconnectFrom” operation.

Summarizing, if you think that the future addition of a “disconnectFrom” operation is likely, you may have reason to store direct connections explicitly and separately from indirect ones. However, if you don’t have specific information about the future evolution of your software, you should be wary of such temptations. Programmers are known to be prone to overgeneralization and tend to weigh the hypothetical benefits more than the certain costs that come with it. Consider that the costs associated with an extra feature aren’t limited to development time, as each unnecessary class member needs to be tested, documented, and maintained just like the necessary ones.

Also, there’s no limit to the amount of extra information you may want to include. What if you later want to remove all connections older than one hour? You should store the time when each connection was made! What if you want to know how many threads have created connections? You should store the set of all threads that have ever created a connection, and so on. In the following chapters, I’ll generally stick to storing only the information that’s necessary for present purposes,[4] with a few clearly marked exceptions.

4

This principle has been formalized as the “You aren’t gonna need it” (YAGNI) slogan by the Extreme Programming movement.

Picking A Representation

Finally, assuming you’re satisfied with storing indirect connections, the next step is to pick an actual representation for them. In this respect, the preliminary choice is between explicitly forging a new class, say Pipe, to represent the connection between two containers, or storing the corresponding information directly inside the container objects (an implicit representation).

The first choice is more inline with the OO orthodoxy. In the real world, containers are connected by pipes, and pipes are real objects, clearly distinguished from containers. Hence, the story goes, they deserve to be modeled separately. On the other hand, the specifications laid out in this chapter don’t mention any Pipe objects, so they would remain hidden within containers, unknown to the clients. Moreover, and more importantly, those pipe objects would contain very little behavior. Each pipe would hold two references to the containers being connected, with no other attributes or nontrivial methods.

Balancing these reasons, it seems there would be a pretty meager benefit from having this extra class around, so you might as well follow the practical, implicit route and avoid it altogether. Containers will be able to reach their group companions without resorting to a dedicated “pipe” object. But how exactly will you arrange the references linking the connected containers? The core language and its API offer a variety of solutions: plain arrays, lists, sets, and more. We won’t analyze them here because many of them occur naturally in the following chapters (especially chapters 4 and 5) when optimizing for different code qualities.

1.8. Hello containers!    [Novice]

To break the ice, in this section we’ll consider a Container implementation that could be authored by an inexperienced programmer who’s just picked up Java after some exposure to a structured language like C. This class is the first in the long sequence of versions that you’ll encounter throughout the book. I’ve assigned each version a nickname to help you navigate and compare them. The nickname for this version is Novice, and its fully qualified name in the repository is eis.chapter1.novice.Container.

1.8.1. Fields and constructor

Even seasoned professionals have been beginners at some point, navigating the syntax of a new language, unaware of the vast API hiding just around the corner. At first, arrays are the data structure of choice, and resolving syntax errors is too demanding to also worry about coding style issues. After some trial and error, the beginning programmer puts together a class that compiles and seems to fulfill the requirements. Perhaps it starts somewhat like listing 1.1.

Listing 1.1. Novice: Fields and constructor
public class Container {

   Container[] g;  1 The group of connected containers
   int n;          2 The actual size of the group
   double x;       3 The water amount in this container

   public Container() {
      g = new Container[1000];  4 Look: a magic number!
      g[0] = this;              5 Puts this container in its group
      n = 1;
      x = 0;
    }

These few lines contain a wealth of small and not-so-small defects. Let’s focus on the ones that are superficial and easy to fix, as the others will become apparent when we move to better versions in subsequent chapters.

The intent for the three instance fields is the following:

  • g is the array of all containers connected to this one, including this one (as is clear from the constructor)
  • n is the number of containers in g
  • x is the amount of liquid in this container

The single quirk that immediately marks the code as amateurish is the choice of variable names: very short and completely uninformative. A pro wouldn’t call the group g if a mobster gave them 60 seconds to hack into a super-secure system of water containers. Jokes aside, meaningful naming is the first rule of readable code, as you’ll see in chapter 7.

Then we have the visibility issue. Fields should be private instead of default. Recall that default visibility is more open than private; it allows access from other classes residing in the same package. Information hiding (aka encapsulation) is a fundamental OO principle, enabling classes to ignore the internals of other classes and interact with them via a well-defined public interface (a form of separation of concerns). In turn, this allows classes to modify their internal representation without affecting existing clients.

The principle of separation of concerns also provides the very footing for this book. The many implementations I present in the following chapters comply with the same public API, and therefore, in principle, clients can use them interchangeably. The way each implementation realizes the API is appropriately hidden from the outside, thanks to the visibility specifiers. At a deeper level, the very notion of individually optimizing different software qualities is an extreme instance of separation of concerns. It’s so extreme, in fact, to be merely a didactic tool and not an approach to pursue in practice.

Moving along, the array size, as shown in the sixth line of code in listing 1.1, is defined by a so-called magic number: a constant that’s not given any name. Best practices dictate that you assign all constants to some final variable, so that (a) the variable name can document the meaning of the constant, and (b) you set the value of that constant at a single point, which is especially useful if you use the constant multiple times.

The choice of using a plain array is not very appropriate, as it puts an a-priori bound to the maximum number of connected containers: too small a bound, and the program is bound to fail; too large is just wasted space. Moreover, using an array forces us to manually keep track of the number of containers actually in the group (field n here). Better options exist in the Java API, and I discuss them in chapter 2. Nevertheless, plain arrays will come in handy in chapter 5, where the primary objective will be to save space.

1.8.2. Methods getAmount and addWater

Let’s proceed and examine the source code for the first two methods, as shown in the following listing.

Listing 1.2. Novice: Methods getAmount and addWater
   public double getAmount() {return x; }

   public void addWater(double x) {
      double y = x / n;
      for (int i=0; i<n; i++)
         g[i].x = g[i].x + y;
   }

getAmount is a trivial getter, and addWater shows the usual naming problems with variables x and y, whereas i is acceptable as the traditional name for an array index. If the last line of the listing used the += operator, it wouldn’t repeat g[i].x twice, and you wouldn’t have to look back and forth to make sure the statement is actually incrementing the same variable.

Notice that addWater doesn’t check whether its argument is negative and, in that case, whether the group holds enough water to satisfy the request. I’ll deal with robustness issues like this one specifically in chapter 6.

1.8.3. Method connectTo

Finally, our novice programmer implements the connectTo method, whose task is to merge two groups of containers with a new connection. After this operation, all containers in the two groups must hold the same amount of water because they all become communicating vessels. First, the method will compute the total amount of water in both groups and the total size of the two groups. The water amount per container, after the merge, is simply the former divided by the latter.

You’ll also need to update the arrays of all containers in the two groups. The naive way to do so involves appending all containers in the second group to all the arrays belonging to the first group, and vice versa. That’s what the following listing does, using two nested loops. Finally, the method updates the size field n and the amount field x of all affected containers.

Listing 1.3. Novice: Method connectTo
   public void connectTo(Container c) {
      double z = (x*n + c.x*c.n) / (n + c.n);  1 Amount per container
                                                 after merge

      for (int i=0; i<n; i++)         2 For each container g[i] in 1st group
         for (int j=0; j<c.n; j++) {  3 For each container c.g[j] in 2nd group
            g[i].g[n+j] = c.g[j];     4 Appends c.g[j] to group of g[i]
            c.g[j].g[c.n+i] = g[i];   5 Appends g[i] to group of c.g[j]
         }

      n += c.n;

      for (int i=0; i<n; i++) {  6 Updates sizes and amounts
         g[i].n = n;
         g[i].x = z;
      }
   }

As you can see, the connectTo method is where the naming issues hurt the most. All those single letter names make it really hard to understand what’s going on. For a dramatic comparison, you may want to jump ahead and take a quick look at the readability-optimized version in chapter 7.

Readability would also be improved by replacing the three for-loops with enhanced-for (aka foreach statement in C#), but the representation based on fixed-size arrays makes that a little cumbersome. Indeed, imagine you replaced the last loop from listing 1.3 with the following:

      for (Container c: g) {
         c.n = n;
         c.x = z;
      }

This new loop is certainly more readable, but it’s going to crash with a NullPointer Exception as soon as the c variable goes beyond the cells that actually contain a reference to a container. The remedy is quite simple—exiting the loop as soon as you detect a null reference:

      for (Container c: g) {
         if (c==null) break;
         c.n = n;
         c.x = z;
      }

Despite being utterly unreadable, the connectTo method in listing 1.3 is logically correct, with some restrictions. Indeed, consider what happens if this and c are already connected before you call the method. Let’s make it concrete and assume the following use case, involving two brand new containers:

a.connectTo(b);
a.connectTo(b);

Can you see what’s going to happen? Is the method tolerant to this slight misstep by the caller? Really think about it before reading ahead. I’ll wait . . .

The answer is that connecting two already connected containers messes up their state. Container a ends up with two references to itself and two references to b in its group array, and a size field n equal to 4 instead of 2. Something similar happens to b. What’s worse, the defect manifests itself even if this and c were only indirectly connected, which can’t be considered ill usage on the part of the caller. I’m talking about a scenario like the following (once again, a, b, and c are three brand new containers):

a.connectTo(b);
b.connectTo(c);
c.connectTo(a);

Before the last line, containers a and c are already connected, albeit indirectly (as in figure 1.4, right). The last line adds a direct connection between them, which is legitimate according to the specifications and leads to the situation depicted in figure 1.4, left. But the connectTo implementation in listing 1.3, instead, adds a second copy of all three containers to all group arrays, while erroneously setting all group sizes to 6 instead of 3.

Another obvious limitation of this implementation is that if the merged group contains more than 1,000 members (the magic number), one of these two lines in listing 1.3:

g[i].g[n+j] = c.g[j];
c.g[j].g[c.n+i] = g[i];

will crash the program with an ArrayIndexOutOfBoundsException.

In the next chapter, I’ll present a reference implementation that solves most of the superficial issues I’ve noted here, while striking a balance between different code qualities.

Summary

  • You can distinguish between internal and external software qualities, as well as functional and nonfunctional software qualities.
  • Some software qualities contrast with each other, and some go hand-in-hand.
  • This book addresses software qualities using a system of water containers as a unifying example.

Further reading

This book tries to squeeze into 300 pages a varied range of topics that are seldom treated together. To pull this off, I can only scratch the surface of each topic. That’s why I end each chapter with a short list of resources you can refer to for in-depth information on the chapter’s content.

  • Steve McConnell. Code Complete. Microsoft Press, 2nd edition, 2004. A valuable book on coding style and all-around good software. Among many other things, it discusses code qualities and their interactions.
  • Diomidis Spinellis. Code Quality: The Open Source Perspective. Addison Wesley, 2006. The author takes you on a journey through quality attributes not unlike the one offered by this book, but with an almost opposite guiding principle: instead of a single running example, he employs a wealth of code fragments taken from various popular open source projects.
  • Stephen H. Kan. Metrics and Models in Software Quality Engineering. Addison Wesley, 2003. Kan provides a systematic, in-depth treatment of software metrics, including statistically sound ways to measure them and use them to monitor and manage software development processes.
  • Christopher W.H. Davis. Agile Metrics in Action. Manning Publications, 2015. Chapter 8 of this book discusses software qualities and the metrics you can use to estimate them.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.104.120