Tip 2Insist on Correctness
White Belt[​​White Belt] These considerations are essential to your coding from day one.

In toy programs it’s easy to tell the difference between correct and incorrect. Does factorial(n) return the correct number? That’s easy to check: one number goes in, and another number comes out. But in big programs, there are potentially many inputs—not just function parameters, but also state within the system—and many outputs or other side effects. That’s not so easy to check.

Isolation and Side Effects

Textbooks love to use math problems for programming examples, partly because computers are good at math, but mostly because it’s easy to reason about numbers in isolation. You can call factorial(5) all day long, and it’ll return the same thing. Network connections, files on disk, or (especially) users have a nasty habit of not being so predictable.

When a function changes something outside its local variables—for example, it writes data to a file or a network socket—it’s said to have side effects. The opposite, a pure function, always returns the same thing when given the same arguments and does not change any outside state. Obviously, pure functions are a lot easier to test than functions with side effects.

Most programs have a mix of pure and impure code; however, not many programmers think about which parts are which. You might see something like this:

ReadStudentGrades.rb
 
def​ self.import_csv(filename)
 
File.open(filename) ​do​ |file|
 
file.each_line ​do​ |line|
 
name, grade = line.split(​','​)
 
 
# Convert numeric grade to letter grade
 
grade = ​case​ grade.to_i
 
when​ 90..100 ​then​ ​'A'
 
when​ 80..89 ​then​ ​'B'
 
when​ 70..79 ​then​ ​'C'
 
when​ 60..69 ​then​ ​'D'
 
else​ ​'F'
 
end
 
 
Student.add_to_database(name, grade)
 
end
 
end
 
end

This function is doing three things: reading lines from a file (impure), doing some analysis (pure), and updating a global data structure (impure). As this is written, you can’t easily test any one piece.

Said this way, it’s obvious that each task should be isolated so it can be tested separately. We’ll discuss the file part shortly in Interactions. Let’s pull the analysis bit into its own method:

ReadStudentGrades2.rb
 
def​ self.numeric_to_letter_grade(numeric)
 
case​ numeric
 
when​ 90..100 ​then​ ​'A'
 
when​ 80..89 ​then​ ​'B'
 
when​ 70..79 ​then​ ​'C'
 
when​ 60..69 ​then​ ​'D'
 
when​ 0..59 ​then​ ​'F'
 
else​ raise ArgumentError.new(
 
"​#{numeric}​ is not a valid grade"​)
 
end
 
end

Now numeric_to_letter_grade is a pure function that’s easy to test in isolation:

ReadStudentGrades2.rb
 
def​ test_convert_numeric_to_letter_grade
 
assert_equal ​'A'​,
 
Student.numeric_to_letter_grade(100)
 
assert_equal ​'B'​,
 
Student.numeric_to_letter_grade(85)
 
assert_equal ​'F'​,
 
Student.numeric_to_letter_grade(50)
 
assert_equal ​'F'​,
 
Student.numeric_to_letter_grade(0)
 
end
 
 
def​ test_raise_on_invalid_input
 
assert_raise(ArgumentError) ​do
 
Student.numeric_to_letter_grade(-1)
 
end
 
 
assert_raise(ArgumentError) ​do
 
Student.numeric_to_letter_grade(​"foo"​)
 
end
 
 
assert_raise(ArgumentError) ​do
 
Student.numeric_to_letter_grade(nil)
 
end
 
end

This example may be trivial, but what happens when the business logic is complex and it’s buried in a function that has five different side effects? (Answer: it doesn’t get tested very well.) Teasing apart the knots of pure and impure code can help you test correctness both for new code and when maintaining legacy code.

Interactions

Now what about those side effects? It’s a huge pain to augment your code with constructs like “If in test mode, don’t actually connect to the database….” Instead, most languages have a mechanism for creating test doubles that take the place of the resource your function wants to use.

Let’s say we rewrote the previous example so that import_csv() handles only the file processing and passes the rest of the work off to Student.new():

ReadStudentGrades3.rb
 
def​ self.import_csv(filename)
 
file = File.open(filename) ​do​ |file|
 
file.each_line ​do​ |line|
 
name, grade = line.split(​','​)
 
 
Student.new(name, grade.to_i)
 
end
 
end
 
end

What we need is a test double for the file, something that will intercept the call to File.open() and yield some canned data. We need the same for Student.new(), ideally intercepting the call in a way that verifies the data passed into it.

Ruby’s Mocha framework allows us to do exactly this:

ReadStudentGrades3.rb
 
def​ test_import_from_csv
 
File.expects(:open).yields(​'Alice,99'​)
 
Student.expects(:new).with(​'Alice'​, 99)
 
 
Student.import_csv(nil)
 
end

This illustrates two points about testing interactions between methods:

  • Unit tests must not pollute the state of the system by leaving stale file handles around, objects in a database, or other cruft. A framework for test doubles should let you intercept these.

  • This kind of test double is known as a mock object, which verifies expectations you program into it. In this example, if Student.new() was not called or was called with different parameters than we specified in the test, Mocha would fail the test.

Of course, Ruby and Mocha make the problem too easy. What about those of us who suffer with million-line C programs? Even C can be instrumented with test doubles, but it takes more effort.

You can generalize the problem to this: how do you replace one set of functions at runtime with another set of functions? (If you’re nerdy enough to think “That sounds like a dynamic dispatch table,” you’re right.) Sticking with the example of opening and reading a file, here’s one approach:

TestDoubles.c
 
struct​ fileops {
 
FILE* (*fopen)
 
(​const​ ​char​ *path,
 
const​ ​char​ *mode);
 
size_t (*fread)
 
(​void​ *ptr,
 
size_t size,
 
size_t nitems,
 
FILE *stream);
 
// ...
 
};
 
 
FILE*
 
stub_fopen(​const​ ​char​ *path, ​const​ ​char​ *mode)
 
{
 
// Just return fake file pointer
 
return​ (FILE*) 0x12345678;
 
}
 
 
// ...
 
 
struct​ fileops real_fileops = {
 
.fopen = fopen
 
};
 
 
struct​ fileops stub_fileops = {
 
.fopen = stub_fopen
 
};

The fileops structure has pointers to functions that match the standard C library API. In the case of the real_fileops structure, we fill in these pointers with the real functions. In the case of stub_fileops, they point to our own stubbed-out versions. Using the structure isn’t much different from just calling a function:

TestDoubles.c
 
// Assume that ops is a function parameter or global
 
struct​ fileops *ops;
 
ops = &stub_fileops;
 
 
FILE* file = (*ops->fopen)(​"foo"​, ​"r"​);
 
// ...

Now the program can flip between “real mode” and “test mode” by just reassigning a pointer.

Type Systems

When you refer to something like 42 in code, is that a number, a string, or what? If you have a function like factorial(n), what kind of thing is supposed to go into it, and what’s supposed to come out? The type of elements, functions, and expressions is very important. How a language deals with types is called its type system.

The type system can be an important tool for writing correct programs. For example, in Java you could write a method like this:

 
public​ ​long​ factorial(​long​ n) {
 
// ...
 
}

In this case, both the reader (you) and the compiler can easily deduce that factorial() should take a number and return a number. Java is statically typed because it checks types when code is compiled. Trying to pass in a string simply won’t compile.

Compare this with Ruby:

 
def​ factorial(n)
 
# ...
 
end

What is acceptable input to this method? You can’t tell just by looking at the signature. Ruby is dynamically typed because it waits until runtime to verify types. This gives you tremendous flexibility but also means that some failures that would be caught at compile time won’t be caught until runtime.

Both approaches to types have their pros and cons, but for the purposes of correctness, keep in mind the following:

  • Static types help to communicate the proper use of functions and provide some safety from abuse. If your factorial function takes a long and returns a long, the compiler won’t let you pass it a string instead. However, it’s not a magic bullet: if you call factorial(-1), the type system won’t complain, so the failure will happen at runtime.

  • To make good use of a static type system, you have to play by its rules. A common example is the use of const in C++: when you start using const to declare that some things cannot be changed, then the compiler gets really finicky about every function properly declaring the const-ness of its parameters. It’s valuable if you completely play by the rules; it’s just a huge hassle if your commitment is anything less than 100 percent.

  • Dynamically typed languages may let you play fast and loose with types, but it still doesn’t make sense to call factorial() on a string. You need to use contract-oriented unit tests, discussed in Tip 3, Design with Tests, to ensure that your functions adequately check the sanity of their parameters.

Regardless of the language’s type system, get in the habit of documenting your expectations of each parameter—they usually aren’t as self-explanatory as the factorial(n) example. See Tip 6, Be Stylish for further discussion of documentation and code comments.

The Misnomer of 100 Percent Coverage

A common (but flawed) metric for answering “Have I tested enough?” is code coverage. That is, what percentage of your application code is exercised by running the unit tests? Ideally, every line of code in your application gets run at least once while running the unit tests—coverage is 100 percent.

Less than 100 percent coverage means you have some cases that are not tested. Junior programmers will assume that the converse is true: when they hit 100 percent coverage, they have enough tests. However, that’s not true: 100 percent coverage absolutely does not mean that all cases are covered.

Consider the following C code:

BadStringReverse.c
 
#include <assert.h>
 
#include <stdio.h>
 
#include <stdlib.h>
 
#include <string.h>
 
 
void​ reverse(​char​ *str) ​// BAD BAD BAD
 
{
 
int​ len = strlen(str);
 
char​ *copy = malloc(len);
 
 
for​ (​int​ i = 0; i < len; i++) {
 
copy[i] = str[len - i - 1];
 
}
 
copy[len] = 0;
 
 
strcpy(str, copy);
 
}
 
 
int​ main()
 
{
 
char​ str[] = ​"fubar"​;
 
reverse(str);
 
assert(strcmp(str, ​"rabuf"​) == 0);
 
printf(​"Ta-da, it works! "​); ​// Not quite
 
}

The test covers 100 percent of the reverse function. Does that mean the function is correct? No: the memory allocated by malloc() is never freed, and the allocated buffer is one byte too small.

Don’t be lulled into complacency by 100 percent coverage: it means nothing about the quality of your code or your tests. Writing good tests, just like writing good application code, requires thought, diligence, and good judgment.

Less Than 100 Percent Coverage

Some cases can be extremely hard to unit test. Here’s an example:

  • Kernel drivers that interface with hardware rely on hardware state changes outside your code’s control, and creating a high-fidelity test double is near impossible.

  • Multithreaded code can have timing problems that require sheer luck to fall into.

  • Third-party code provided as binaries often can’t be provoked to return failures at will.

So, how do you get 100 percent coverage from your tests? With enough wizardry, it’s surely possible, but is it worth it? That’s a value judgment that may come down to no. In those situations, discuss the issue with your team’s tech lead. They may be able to think of a test method that’s not too painful. If nothing else, you will need them to review your code.

Don’t be dissuaded if you can’t hit 100 percent, and don’t use that as an excuse to punt on testing entirely. Prove what’s reasonable with tests; subject everything else to review by a senior programmer.

Further Reading

Kent Beck’s Test-Driven Development: By Example [Bec02] remains a foundational work on unit testing. Although it uses Java in its examples, the principles apply to any language. (While reading it, try to solve the example problem in your own way; you may come up with a more elegant solution.) We’ll discuss the test-driven aspect in Tip 3, Design with Tests.

For complete coverage of the Ruby Way to unit testing, Ruby programmers should pick up The RSpec Book [CADH09].

C programmers should look to Test Driven Development for Embedded C [Gre10] for techniques on TDD and building test harnesses.

There’s a nomenclature around test doubles; terms like mocks and stubs have specific definitions. Martin Fowler has a good article online[4] that explains the details.

There’s a whole theory around type systems and using them to build correct code; see Pierce’s Types and Programming Languages [Pie02] for the gory details. Also, Kim Bruce’s Foundations of Object-Oriented Languages: Types and Semantics [Bru02] has specific emphasis on OOP.

Actions

  • Look up the unit testing frameworks available for each programming language you use. Most languages will have both the usual bases covered (assertions, test setup, and teardown) and some facility for fake objects (mocks, stubs). Install any tools you need to get these running.

  • This tip has bits and pieces of a program that reads lines of comma-separated data from a file, splits them apart, and uses them to create objects. Create a program that does this in the language of your choice, complete with unit tests that assure the correctness of every line of application code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.61