Chapter 8. Impossible Programs

The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.

H. P. Lovecraft

In this book, we’ve explored different models of computers and programming languages, including several kinds of abstract machine. Some of those machines are more powerful than others, and two varieties in particular come with pretty obvious limitations: finite automata can’t solve problems that involve unrestricted counting, like deciding whether a string of brackets is correctly balanced, and pushdown automata can’t handle any problem where information needs to be reused in more than one place, like deciding whether a string contains the same number of a, b, and c characters.

But the most advanced device we’ve seen, the Turing machine, seems to have everything that we need: it’s got unlimited storage that can be accessed in any order, arbitrary loops, conditionals, and subroutines. The extremely minimal programming language from Chapter 6, the lambda calculus, turned out to be surprisingly powerful too: with a little ingenuity it allows us to represent simple values and complex data structures as pure code, as well as implement operations that manipulate those representations. And in Chapter 7, we saw many other simple systems that, like the lambda calculus, have the same universal power as Turing machines.

How much further can we push this progression of increasingly powerful systems? Perhaps not indefinitely: our attempts to make Turing machines more powerful by adding features didn’t get us anywhere, which suggests there may be a hard limit on computational power. So what are computers and programming languages fundamentally capable of, and is there anything that they can’t do? Are there any impossible programs?

The Facts of Life

These are pretty deep questions, so before we try to tackle them, let’s take a tour of some fundamental facts from the world of computation. Some of these facts are obvious, others less so, but they’re all important prerequisites for thinking about the capabilities and limitations of computing machines.

Universal Systems Can Perform Algorithms

What, generally speaking, can we do with universal systems like Turing machines, the lambda calculus, and partial recursive functions? If we can properly understand the capabilities of these systems, then we’ll be able to investigate their limitations.

The practical purpose of a computing machine is to perform algorithms. An algorithm is a list of instructions describing some process for turning an input value into an output value, as long as those instructions fulfill certain criteria:

Finiteness

There are a finite number of instructions.

Simplicity

Each instruction is simple enough that it can be performed by a person with a pencil and paper without using any ingenuity.

Termination

A person following the instructions will finish within a finite number of steps for any input.

Correctness

A person following the instructions will produce the right answer for any input.

For example, one of the oldest known algorithms is Euclid’s algorithm, which dates from around 300 BC. It takes two positive integers and returns the largest integer that will divide them both exactly—their greatest common divisor. Here are its instructions:

  1. Give the two numbers the names x and y.

  2. Decide which of x or y is the larger number.

  3. Subtract the smaller number from the larger. (If x is larger, subtract y from it and make this the new value of x; vice versa if y is larger.)

  4. Repeat steps 2 and 3 until x and y are equal.

  5. When x and y become equal, their value is the greatest common divisor of the original two numbers.

We’re happy to recognize this as an algorithm, because it appears to meet the basic criteria. It only contains a few instructions, and they all seem simple enough to be performed with pencil and paper by someone who doesn’t have any special insight into the overall problem. With a bit of thought, we can also see that it must finish within a finite number of steps for any input: every repetition of step 3 causes one of the two numbers to get smaller, so they must eventually reach the same value[64] and cause the algorithm to finish. It’s not quite so obvious that it’ll always give the correct answer, but a few lines of elementary algebra are enough to show that the result must always be the greatest common divisor of the original numbers.

So Euclid’s algorithm is worthy of its name, but like any algorithm, it’s just a collection of ideas expressed as human-readable words and symbols. If we want to do something useful with it—maybe we’d like to explore its mathematical properties, or design a machine that performs it automatically—we need to translate the algorithm into a stricter, less ambiguous form that’s suitable for mathematical analysis or mechanical execution.

We already have several models of computation that we could use for this: we could try to write down Euclid’s algorithm as a Turing machine rulebook, or a lambda calculus expression, or a partial recursive function definition, but all of those would involve a lot of housekeeping and other uninteresting detail. For the moment, let’s just translate it into unrestricted Ruby:[65]

def euclid(x, y)
  until x == y
    if x > y
      x = x - y
    else
      y = y - x
    end
  end

  x
end

This #euclid method contains essentially the same instructions as the natural language description of Euclid’s algorithm, but this time, they’re written in a way that has a strictly defined meaning (according to the operational semantics of Ruby) and therefore can be interpreted by a machine:

>> euclid(18, 12)
=> 6
>> euclid(867, 5309)
=> 1

In this specific case, it’s been easy to take an informal, human-readable description of an algorithm and turn it into unambiguous instructions for a machine to follow. Having Euclid’s algorithm in a machine-readable form is very convenient; now we can perform it quickly, repeatedly, and reliably without having to employ manual labor.

Note

Hopefully it’s clear that we could just as well have implemented this algorithm with the lambda calculus by using similar techniques to the ones we saw in Numeric Operations, or as a partial recursive function built from the operations in Partial Recursive Functions, or as a collection of Turing machine rules like the ones used for simple arithmetic in Rules.

This raises an important question: can any algorithm be turned into instructions suitable for execution by a machine? Superficially that seems like a trivial thing to ask—it was pretty obvious how to turn Euclid’s algorithm into a program, and as programmers, we have a natural tendency to think of the two things as interchangeable—but there’s a real difference between the abstract, intuitive idea of an algorithm and the concrete, logical implementation of that algorithm within a computational system. Could there ever be an algorithm so large, complex, and unusual that its essence can’t be captured by an unthinking mechanical process?

Ultimately there can be no rigorous answer, because the question is philosophical rather than scientific. The instructions of an algorithm must be “simple” and “without ingenuity” so that it “can be performed by a person,” but those are imprecise ideas about human intuition and capability, not mathematical assertions of the kind that can be used to prove or disprove a hypothesis.

We can still collect evidence one way or the other by coming up with lots of algorithms and seeing whether our computing system of choice—Turing machines, or lambda calculus, or partial recursive functions, or Ruby—can implement them. Mathematicians and computer scientists have been doing exactly that since the 1930s, and so far, nobody has managed to devise a reasonable algorithm that can’t be performed by these systems, so we can be pretty confident about our empirical hunch: it certainly looks as though a machine can perform any algorithm.

Another strong piece of evidence is the fact that most of these systems were developed independently as attempts to capture and analyze the informal idea of an algorithm, and were only later found to be exactly equivalent to each other. Every historical attempt to model the idea of an algorithm has produced a system whose capabilities are identical to those of a Turing machine, and that’s a pretty good hint that a Turing machine adequately represents what an algorithm can do.

The idea that any algorithm can be performed by a machine—specifically a deterministic Turing machine—is called the Church–Turing thesis, and although it’s just a conjecture rather than a proven fact, it has enough evidence in its favor to be generally accepted as true.

Note

“Turing machines can perform any algorithm” is a philosophical claim about the relationship between the intuitive idea of algorithms and the formal systems that we use to implement them. What it actually means is a matter of interpretation: we could see it as a statement about what can and cannot be computed, or just as a firmer definition of the word “algorithm.”

Either way, it’s called the “Church–Turing thesis,” not the “Church–Turing theorem,” because it’s an informal claim rather than a provable mathematical assertion—it can’t be expressed in purely mathematical language, so there’s no way to construct a mathematical proof. It’s widely believed to be true because it matches our intuition about the nature of computation and the evidence of what algorithms are capable of, but we still call it a “thesis” to remind ourselves that it has a different status from provable ideas like Pythagoras’ theorem.

The Church–Turing thesis implies that Turing machines, despite their simplicity, have all the power required to perform any computation that can in principle be carried out by a person following simple instructions. Many people go further than this and claim that, since all attempts to codify algorithms have led to universal systems that are equivalent in power to Turing machines, it’s just not possible to do any better: any real-world computer or programming language can only ever do as much as a Turing machine can do, and no more. Whether it’s ultimately possible to build a machine that’s more powerful than a Turing machine—that can use exotic laws of physics to perform tasks beyond what we think of as “algorithms”—is not definitively known, but it’s definitely true that we don’t currently know how to do it.

Programs Can Stand In for Turing Machines

As we saw in Chapter 5, the Turing machine’s simplicity makes it cumbersome to design a rulebook for a particular task. To avoid our investigation of computability being overshadowed by the fiddly details of Turing machine programming, we’ll use Ruby programs as a substitute, just as we did for Euclid’s algorithm.

This sleight of hand is justified by universality: in principle, we can translate any Ruby program into an equivalent Turing machine and vice versa, so a Ruby program is no more or less powerful than a Turing machine, and anything we can discover about the limitations of Ruby’s capabilities should apply equally to Turing machines.

A sensible objection is that Ruby has lots of practical functionality that Turing machines don’t. A Ruby program can access the filesystem, send and receive messages across the network, accept user input, draw graphics on a bitmapped display, and so on, whereas even the most sophisticated set of Turing machine rules can only ever read and write characters on a tape. But that isn’t a fundamental problem, because all of this extra functionality can be simulated with a Turing machine: if necessary, we can designate certain parts of the tape as representing “the filesystem” or “the network” or “the display” or whatever, and treat reading and writing to those tape regions as though it was genuine interaction with the outside world. None of these enhancements changes the underlying computational power of a Turing machine; they just provide higher-level interpretations of its activity on the tape.

In practice, we can sidestep the objection completely by restricting ourselves to simple Ruby programs that avoid any controversial language features. For the rest of this chapter, we’ll stick to writing programs that read a string from standard input, do some computation, and then write a string to standard output when they’re finished; the input string is analogous to the initial contents of a Turing machine’s tape, and the output string is like the final tape contents.

Code Is Data

Programs live a double life. As well as being instructions to control a particular system, we can also think of a program as pure data: a tree of expressions, a raw string of characters, or even a single large number. This duality is usually taken for granted by us as programmers, but for general-purpose computers it’s vitally important that programs can be represented as data so that they can be used as input to other programs; it’s the unification of code and data that makes software possible in the first place.

We’ve already seen programs-as-data in the case of the universal Turing machine, which expects another Turing machine’s rulebook to be written on its tape as a sequence of characters. In fancy homoiconic programming languages like Lisp[66] and XSLT, programs are explicitly written as data structures that the language itself can manipulate: every Lisp program is a nested list called an s-expression, and every XSLT stylesheet is an XML document.

In Ruby, only the interpreter (which, at least in the case of MRI, is not itself written in Ruby) usually gets to see a structured representation of the program, but the code-as-data principle still applies. Consider this simple Ruby program:

puts 'hello world'

To an observer who understands the syntax and semantics of Ruby, this is a program that sends a puts message to the main object with the string 'hello world', which results in the Kernel#puts method printing hello world to standard output. But on a lower level, it’s just a sequence of characters, and because characters are represented as bytes, ultimately that sequence can be viewed as a large number:

>> program = "puts 'hello world'"
=> "puts 'hello world'"
>> bytes_in_binary = program.bytes.map { |byte| byte.to_s(2).rjust(8, '0') }
=> ["01110000", "01110101", "01110100", "01110011", "00100000", "00100111",
    "01101000", "01100101", "01101100", "01101100", "01101111", "00100000",
    "01110111", "01101111", "01110010", "01101100", "01100100", "00100111"]
>> number = bytes_in_binary.join.to_i(2)
=> 9796543849500706521102980495717740021834791

In a sense, puts 'hello world' is Ruby program number 9796543849500706521102980495717740021834791.[67] Conversely, if someone tells us the number of a Ruby program, we can easily turn it back into the program itself and run it:

>> number = 9796543849500706521102980495717740021834791
=> 9796543849500706521102980495717740021834791
>> bytes_in_binary = number.to_s(2).scan(/.+?(?=.{8}*z)/)
=> ["1110000", "01110101", "01110100", "01110011", "00100000", "00100111",
    "01101000", "01100101", "01101100", "01101100", "01101111", "00100000",
    "01110111", "01101111", "01110010", "01101100", "01100100", "00100111"]
>> program = bytes_in_binary.map { |string| string.to_i(2).chr }.join
=> "puts 'hello world'"
>> eval program
hello world
=> nil

Of course, this scheme of encoding a program as a large number is what makes it possible to to store it on disk, send it over the Internet, and feed it to a Ruby interpreter (which is itself just a large number on disk!) to make a particular computation happen.

Note

Since every Ruby program has a unique number, we can automatically generate all possible programs: start by generating program number 1, then generate program number 2, and so on.[68] If we did this for long enough, we’d eventually generate the next hot asynchronous web development framework and retire to a life of leisure.

Universal Systems Can Loop Forever

We’ve seen that general-purpose computers are universal: we can design a Turing machine that is capable of simulating any other Turing machine, or write a program that can evaluate any other program. Universality is a powerful idea that allows us to use a single adaptable machine for a variety of tasks rather than many specialized devices, but it also has an inconvenient consequence: any system that’s powerful enough to be universal will inevitably allow us to construct computations that loop forever without halting.

So why must every universal system bring nontermination along for the ride? Isn’t there some ingenious way to restrict Turing machines so that they’ll always halt, without compromising their usefulness? How do we know we won’t someday design a programming language that’s just as powerful as Ruby but doesn’t have infinite loops in it? There are all sorts of specific examples of why this can’t be done, but there’s also a more general argument, so let’s go through it.

Ruby is a universal programming language, so it must be possible to write Ruby code that evaluates Ruby code. In principle, we can define a method called #evaluate, which takes the source code of a Ruby program and a string to provide to that program on standard input, and returns the result (i.e., the string sent to standard output) of evaluating that program with that input.

The implementation of #evaluate is far too complicated to be contained within this chapter, but here’s the broadest possible outline of how it would work:

def evaluate(program, input)
  # parse program
  # evaluate program on input while capturing output
  # return output
end

#evaluate is essentially a Ruby interpreter written in Ruby. Although we haven’t given its implementation, it’s certainly possible to write it: first turn program into a sequence of tokens and parse them to build a parse tree (see Parsing with Pushdown Automata), then evaluate that parse tree according to the operational semantics of Ruby (see Operational Semantics). It’s a large and complex job, but it can definitely be done; otherwise, Ruby wouldn’t qualify as universal.

For simplicity, we’ll assume that our imaginary implementation of #evaluate is bug-free and won’t crash while it’s evaluating program—if we’re going to imagine some code, we may as well imagine that it’s perfect. Of course it might return some result that indicates that program raised an exception while it was being evaluated, but that’s not the same as #evaluate itself actually crashing.

Note

Ruby happens to have a built-in Kernel#eval method that can evaluate a string of Ruby code, but taking advantage of it here would be a bit of a cheat, not least because (in MRI) it’s implemented in C, not Ruby. It’s also just unnecessary for the current discussion; we’re using Ruby as a representative example of any universal programming language, but many universal languages don’t have a built-in eval.

But hey, since it’s there, it’d be a shame not to use it to make #evaluate less imaginary. Here’s a rough attempt, with apologies for cheating:

require 'stringio'

def evaluate(program, input)
  old_stdin, old_stdout = $stdin, $stdout
  $stdin, $stdout = StringIO.new(input), (output = StringIO.new)

  begin
    eval program
  rescue Exception => e
    output.puts(e)
  ensure
    $stdin, $stdout = old_stdin, old_stdout
  end

  output.string
end

This implementation has many practical and philosophical problems that could all be avoided by writing a pure-Ruby #evaluate. On the other hand, it’s short enough to include here and works well enough for demonstration purposes:

>> evaluate('print $stdin.read.reverse', 'hello world')
=> "dlrow olleh"

The existence of #evaluate allows us to define another method, #evaluate_on_itself, which returns the result of evaluating program with its own source as input:

def evaluate_on_itself(program)
  evaluate(program, program)
end

This might sound like a weird thing to do, but it’s totally legitimate; program is just a string, so we’re perfectly entitled to treat it both as a Ruby program and as input to that program. Code is data, right?

>> evaluate_on_itself('print $stdin.read.reverse')
=> "esrever.daer.nidts$ tnirp"

Since we know we can implement #evaluate and #evaluate_on_itself in Ruby, we must therefore be able to write the complete Ruby program does_it_say_no.rb:

def evaluate(program, input)
  # parse program
  # evaluate program on input while capturing output
  # return output
end

def evaluate_on_itself(program)
  evaluate(program, program)
end

program = $stdin.read

if evaluate_on_itself(program) == 'no'
  print 'yes'
else
  print 'no'
end

This program is a straightforward application of existing code: it defines #evaluate and #evaluate_on_itself, then reads another Ruby program from standard input and passes it to #evaluate_on_itself to see what that program does when run with itself as input. If the resulting output is the string 'no', does_it_say_no.rb outputs 'yes', otherwise, it outputs 'no'. For example:[69]

$ echo 'print $stdin.read.reverse' | ruby does_it_say_no.rb
no

That’s the result we expected; as we saw above, when we run print $stdin.read.reverse with itself as input, we get the output esrever.daer.nidts$ tnirp, which is not equal to no. What about a program that does output no?

$ echo 'if $stdin.read.include?("no") then print "no" end' | ruby does_it_say_no.rb
yes

Again, just as expected.

So here’s the big question: what happens when we run ruby does_it_say_no.rb < does_it_say_no.rb?[70] Bear in mind that does_it_say_no.rb is a real program—one that we could write out in full if we had enough time and enthusiasm—so it must do something, but it’s not immediately obvious what that is. Let’s try to work it out by considering all the possibilities and eliminating the ones that don’t make sense.

First, running this particular program with its own source as input can’t possibly produce the output yes. By the program’s own logic, the output yes can only be produced when running does_it_say_no.rb on its own source produces the output no, which contradicts the original premise. So that’s not the answer.

Okay, so maybe it outputs no instead. But again, the structure of the program means that it can only output no if exactly the same computation doesn’t output no—another contradiction.

Is it conceivable that it could output some other string, like maybe, or even the empty string? That would be contradictory too: if evaluate_on_itself(program, program) doesn’t return no then the program prints no, not something different.

So it can’t output yes or no, it can’t output something else, and it can’t crash unless #evaluate contains bugs, which we’re assuming it doesn’t. The only remaining possibility is that it doesn’t produce any output, and that can only happen if the program never finishes: #evaluate must loop forever without returning a result.

Note

In practice it’s almost certain that ruby does_it_say_no.rb < does_it_say_no.rb will exhaust the finite memory of the host machine, causing ruby to crash, rather than actually looping forever. This is a resource constraint imposed from outside the program, though, not a property of the program itself; in principle, we could keep adding more memory to the computer as necessary and let the computation run indefinitely.

This might seem like an unnecessarily complicated way of demonstrating that Ruby lets us write nonhalting programs. After all, while true do end is much a simpler example that does the same thing.

But by thinking about the behavior of does_it_say_no.rb, we’ve shown that nonhalting programs are an inevitable consequence of universality, regardless of the specific features of the system. Our argument doesn’t rely on any particular abilities of Ruby other than its universality, so the same ideas can be applied to Turing machines, or the lambda calculus, or any other universal system. Whenever we’re working with a language that is powerful enough to evaluate itself, we know that it must be possible to use its equivalent of #evaluate to construct a program that never halts, without having to know anything else about the language’s capabilities.

In particular, it’s impossible to remove features (e.g., while loops) from a programming language in a way that prevents us from writing nonhalting programs while keeping the language powerful enough to be universal. If removing a particular feature makes it impossible to write a program that loops forever, it must also have made it impossible to implement #evaluate.

Languages that have been carefully designed to ensure that their programs must always halt are called total programming languages, as opposed to the more conventional partial programming languages whose programs sometimes halt with an answer and sometimes don’t. Total programming languages are still very powerful and capable of expressing many useful computations, but one thing they can’t do is interpret themselves.

Note

That’s surprising, since the equivalent of #evaluate for a total programming language must by definition always halt, yet it still can’t be implemented in that language—if it could, we’d be able to use the does_it_say_no.rb technique to make it loop forever.

This gives us our first tantalizing glimpse of an impossible program: we can’t write an interpreter for a total programming language in itself, even though there’s a perfectly respectable, guaranteed-to-halt algorithm for interpreting it. In fact it’s so respectable that we could write it in another, more sophisticated total programming language, but that new total language wouldn’t be able to implement its own interpreter either.

An interesting curiosity, but total languages are designed to have artificial limits; we were looking for things that no computer or programming language could do. We’d better keep going.

Programs Can Refer to Themselves

The self-referential trick used by does_it_say_no.rb hinges on our ability to construct a program that can read its own source code, but perhaps it seems a bit like cheating to assume that this will always be possible. In our example, the program received its own source as an explicit input, thanks to functionality provided by the surrounding environment (i.e., the shell); if that hadn’t been an option, it could also have read the data directly from disk with File.read(__FILE__), taking advantage of Ruby’s filesystem API and the special __FILE__ constant that always contains the name of the current file.

But we were supposed to be making a general argument that depended only on the universality of Ruby, not on the capabilities of the operating system or the File class. What about compiled languages like Java and C that may not have access to their source at runtime? What about JavaScript programs that get loaded into memory over a network connection, and may not be stored locally on a filesystem at all? What about self-contained universal systems like Turing machines and the lambda calculus, where the notions of “filesystem” and “standard input” don’t exist?

Fortunately, the does_it_say_no.rb argument can withstand this objection, because having a program read its own source from standard input is just a convenient shorthand for something that all universal systems can do, again regardless of their environment or other features. This is a consequence of a fundamental mathematical result called Kleene’s second recursion theorem, which guarantees that any program can be converted into an equivalent one that is able to calculate its own source code. The recursion theorem provides reassurance that our shorthand was justified: we could have replaced the line program = $stdin.read with some code to generate the source of does_it_say_no.rb and assign it to program without having to do any I/O at all.

Let’s see how to do the conversion on a simple Ruby program. Take this one, for example:

x = 1
y = 2
puts x + y

We want to transform it into a program that looks something like this…

program = ''
x = 1
y = 2
puts x + y

…where program is assigned a string containing the source of the complete program. But what should the value of program be?

A naïve approach is to try to concoct a simple string literal that can be assigned to program, but that quickly gets us into trouble, because the literal would be part of the program’s source code and would therefore have to appear somewhere inside itself. That would require program to begin with the string 'program =' followed by the value of program, which would be the string 'program =' again followed by the value of program, and so on forever:

program = %q{program = %q{program = %q{program = %q{program = %q{program = %q{}}}}}}
x = 1
y = 2
puts x + y

Note

Ruby’s %q syntax allows us to quote noninterpolated strings with a pair of delimiter characters, in this case curly brackets, instead of single quotes. The advantage is that the string literal can contain unescaped instances of the delimiters as long as they’re correctly balanced:

>> puts %q{Curly brackets look like { and }.}
Curly brackets look like { and }.
=> nil
>> puts %q{An unbalanced curly bracket like } is a problem.}
SyntaxError: syntax error, unexpected tIDENTIFIER, expecting end-of-input

Using %q instead of single quotes helps to avoid character-escaping headaches in strings that contain their own delimiters:

program = 'program = 'program = 'program = \\'\\''''

The way out of this infinitely deep hole is to take advantage of the fact that a value used by a program doesn’t have to appear literally in its source code; it can also be computed dynamically from other data. This means we can construct the converted program in three parts:

  1. Assign a string literal to a variable (call it data).

  2. Use that string to compute the current program’s source code and assign it to program.

  3. Do whatever other work the program is supposed to do (i.e., the code we originally started with).

So the structure of the program will be more like this:

data = ''
program = 
x = 1
y = 2
puts x + y

That sounds plausible as a general strategy, but it’s a bit light on specific details. How do we know what string to actually assign to data in part A, and how do we use it in part B to compute program? Here’s one solution:

  • In part A, create a string literal that contains the source code of parts B and C, and assign that string to data. This string won’t need to “contain itself,” because it’s not the source of the full program, only the section of the program that comes after part A.

  • In part B, first compute a string that contains the source code of part A. We can do that because part A mostly consists of a big string literal whose value is available as data, so we just need to prefix data’s value with 'data =' to recreate part A’s source. Then simply concatenate the result with data to get the source of the entire program (since data contains the source of parts B and C) and assign it to program.

This plan still sounds circular—part A produces the source of part B, and part B produces the source of part A—but it narrowly avoids an infinite regress by ensuring that part B just computes the source of part A without having to literally contain it.

We can start making progress by filling in the bits we already know about. We have most of the source of parts B and C already, so we can partially complete the value of data:

data = %q{
program = 
x = 1
y = 2
puts x + y
}
program = 
x = 1
y = 2
puts x + y

Note

data needs to contain newline characters. By representing these as actual newlines inside an uninterpolated string literal, rather than as interpolated escape sequences, we are able to include the source of parts B and C verbatim without any special encoding or escaping.[71] This straightforward copy-and-paste makes the source of part A easier to compute.

We also know that the source of part A is just the string 'data = %q{}' with the value of data filling the gap between the curly braces, so we can partially complete the value of program too:

data = %q{
program = 
x = 1
y = 2
puts x + y
}
program = "data = %q{#{data}}" + 
x = 1
y = 2
puts x + y

Now all that’s missing from program is the source code of parts B and C, which is exactly what data contains, so we can append the value of data to program to finish it off:

data = %q{
program = 
x = 1
y = 2
puts x + y
}
program = "data = %q{#{data}}" + data
x = 1
y = 2
puts x + y

Finally, we can go back and fix up the value of data to reflect what part B actually looks like:

data = %q{
program = "data = %q{#{data}}" + data
x = 1
y = 2
puts x + y
}
program = "data = %q{#{data}}" + data
x = 1
y = 2
puts x + y

And that’s it! This program does the same thing as the original, but now it has an extra local variable containing its own source code—an admittedly hollow victory, since it doesn’t actually do anything with that variable. So what if we convert a program that expects a local variable called program and does something with it? Take the classic example:

puts program

This is a program that is trying to print its own source code,[72] but it’s obviously going to fail in that form, because program is an undefined variable. If we run it through the self-referencing transformation we get this result:

data = %q{
program = "data = %q{#{data}}" + data
puts program
}
program = "data = %q{#{data}}" + data
puts program

That’s a bit more interesting. Let’s see what this code does on the console:

>> data = %q{
   program = "data = %q{#{data}}" + data
   puts program
   }
=> "
program = "data = %q{#{data}}" + data
puts program
"
>> program = "data = %q{#{data}}" + data
=> "data = %q{
program = "data = %q{#{data}}" + data
puts program
}
program = "data = %q{#{data}}" + data
puts program
"
>> puts program
data = %q{
program = "data = %q{#{data}}" + data
puts program
}
program = "data = %q{#{data}}" + data
puts program
=> nil

Sure enough, the line puts program really does print out the source code of the whole program.

Hopefully it’s clear that this transformation doesn’t depend on any special properties of the program itself, so it will work for any Ruby program, and the use of $stdin.read or File.read(__FILE__) to read a program’s own source can therefore always be eliminated.[73] It also doesn’t depend on any special properties of Ruby itself—just the ability to compute new values from old ones, like any other universal system—which implies that any Turing machine can be adapted to refer to its own encoding, any lambda calculus expression can be extended to contain a lambda-calculus representation of its own syntax, and so on.

Decidability

So far we’ve seen that Turing machines have a lot of power and flexibility: they can execute arbitrary programs encoded as data, perform any algorithm we can think of, run for an unlimited amount of time, and calculate their own descriptions. And in spite of their simplicity, these little imaginary machines have turned out to be representative of universal systems in general.

If they’re so powerful and flexible, is there anything that Turing machines—and therefore real-world computers and programming languages—can’t do?

Before we can answer that question we need to make it a bit more precise. What kind of thing can we ask a Turing machine to do, and how can we tell whether it’s done it? Do we need to investigate every possible kind of problem, or is it enough to consider just some of them? Are we looking for problems whose solutions are merely beyond our current understanding, or problems that we already know we’ll never be able to solve?

We can narrow the scope of the question by focusing on decision problems. A decision problem is any question with a yes or no answer, like “is 2 less than 3?” or “does the regular expression (a(|b))* match the string 'abaab'?” Decision problems are easier to handle than function problems whose answer is a number or some other non-Boolean value, like “what is the greatest common divisor of 18 and 12?,” but they’re still interesting enough to be worth investigating.

A decision problem is decidable (or computable) if there’s an algorithm that’s guaranteed to solve it in a finite amount of time for any possible input. The Church–Turing thesis claims that every algorithm can be performed by a Turing machine, so for a problem to be decidable, we have to be able to design a Turing machine that always produces the correct answer and always halts if we let it run for long enough. It’s easy enough to interpret a Turing machine’s final configuration as a “yes” or “no” answer: we could look to see whether it’s written a Y or N character at the current tape location, for example, or we could ignore the tape contents altogether and just check whether its final state is an accept (“yes”) or a nonaccept (“no”) state.

All the decision problems we’ve seen in earlier chapters are decidable. Some of them, like “does this finite automaton accept this string?” and “does this regular expression match this string?,” are self-evidently decidable because we’ve written Ruby programs to solve them by simulating finite automata directly. Those programs could be laboriously translated into Turing machines given enough time and energy, and because their execution involves a finite number of steps—each step of a DFA simulation consumes a character of input, and the input has a finite number of characters—they are guaranteed to always halt with a yes-or-no answer, so the original problems qualify as decidable.

Other problems are a bit more subtle. “Does this pushdown automaton accept this string?” might appear to be undecidable, because we’ve seen that our direct simulation of a pushdown automaton in Ruby has the potential to loop forever and never produce an answer. However, there happens to be a way to calculate exactly how many simulation steps a particular pushdown automaton will need in order to accept or reject an input string of a given length,[74] so the problem is decidable after all: we just calculate the number of steps needed, run the simulation for that many steps, and then check whether or not the input has been accepted.

So, can we do this every time? Is there always a clever way to sneak around a problem and find a way to implement a machine, or a program, that is guaranteed to solve it in a finite amount of time?

Well, no, unfortunately not. There are many decision problems—infinitely many—and it turns out that a lot of them are undecidable: there is no guaranteed-to-halt algorithm for solving them. Each of these problems is undecidable not because we just haven’t found the right algorithm for it yet, but because the problem itself is fundamentally impossible to solve for some inputs, and we can even prove that no suitable algorithm will ever be found.

The Halting Problem

A lot of undecidable problems are concerned with the behavior of machines and programs during their execution. The most famous of these, the halting problem, is the task of deciding whether the execution of a particular Turing machine with a particular initial tape will ever halt. Thanks to universality, we can restate the same problem in more practical terms: given a string containing the source code of a Ruby program, and another string of data for that program to read from standard input, will running that program ultimately result in an answer or just an infinite loop?

Building a Halting Checker

It’s not obvious why the halting problem should be considered undecidable. After all, it’s easy to come up with individual programs for which the question is answerable. Here’s a program that will definitely halt, regardless of its input string:

input = $stdin.read
puts input.upcase

Note

We’ll assume that $stdin.read always immediately returns a value—namely, that every program’s standard input is finite and nonblocking—because we’re interested in the internal behavior of the program, not its interactions with the operating system.

Conversely, a small addition to the source code produces a program that will clearly never halt:

input = $stdin.read

while true
  # do nothing
end

puts input.upcase

We can certainly write a halting checker to distinguish between only these two cases. Just testing whether the program’s source code contains the string while true is enough:

def halts?(program, input)
  if program.include?('while true')
    false
  else
    true
  end
end

This implementation of #halts? gives the right answers for the two example programs:

>> always = "input = $stdin.read
puts input.upcase"
=> "input = $stdin.read
puts input.upcase"
>> halts?(always, 'hello world')
=> true
>> never = "input = $stdin.read
while true
# do nothing
end
puts input.upcase"
=> "input = $stdin.read
while true
# do nothing
end
puts input.upcase"
>> halts?(never, 'hello world')
=> false

But #halts? is likely to be wrong for other programs. For example, there are programs whose halting behavior depends on the value of their input:

input = $stdin.read

if input.include?('goodbye')
  while true
    # do nothing
  end
else
  puts input.upcase
end

We can always extend our halting checker to cope with specific cases like this, since we know what to look for:

def halts?(program, input)
  if program.include?('while true')
    if program.include?('input.include?('goodbye')')
      if input.include?('goodbye')
        false
      else
        true
      end
    else
      false
    end
  else
    true
  end
end

Now we have a checker that gives the correct answers for all three programs and any possible input strings:

>> halts?(always, 'hello world')
=> true
>> halts?(never, 'hello world')
=> false
>> sometimes = "input = $stdin.read
if input.include?('goodbye')
while true
# do nothing
end
else
puts input.upcase
end"
=> "input = $stdin.read
if input.include?('goodbye')
while true
# do nothing
end
else
puts input.upcase
end"
>> halts?(sometimes, 'hello world')
=> true
>> halts?(sometimes, 'goodbye world')
=> false

We could go on like this indefinitely, adding more checks and more special cases to support an expanding repertoire of example programs, but we’d never get a solution to the full problem of deciding whether any program will halt. A brute-force implementation can be made more and more accurate, but it will always have blind spots; the simple-minded approach of looking for specific patterns of syntax can’t possibly scale to all programs.

To make #halts? work in general, for any possible program and input, seems like it would be difficult. If a program contains any loops at all—whether explicit, like while loops, or implicit, like recursive method calls—then it has the potential to run forever, and sophisticated analysis of the program’s meaning is required if we want to predict anything about how it behaves for a given input. As humans, we can see immediately that this program always halts:

input = $stdin.read
output = ''

n = input.length
until n.zero?
  output = output + '*'
  n = n - 1
end

puts output

But why does it always halt? Certainly not for any straightforward syntactic reason. The explanation is that IO#read always returns a String, and String#length always returns a nonnegative Integer, and repeatedly calling -(1) on a nonnegative Integer always eventually produces an object whose #zero? method returns true. This chain of reasoning is subtle and highly sensitive to small modifications; if the n = n - 1 statement inside the loop is changed to n = n - 2, the program will only halt for even-length inputs. A halting checker that knew all these facts about Ruby and numbers, as well as how to connect facts together to make accurate decisions about this kind of program, would need to be large and complex.

Note

The fundamental difficulty is that it’s hard to predict what a program will do without actually running it. It’s tempting to run the program with #evaluate to see whether it halts, but that’s no good: if the program doesn’t halt, #evaluate will run forever and we won’t get an answer from #halts? no matter how long we wait. Any reliable halting-detection algorithm must find a way to produce a definitive answer in a finite amount of time just by inspecting and analyzing the text of the program, not by simply running it and waiting.

It’ll Never Work

Okay, so our intuition tells us that #halts? would be hard to implement correctly, but that doesn’t necessarily mean that the halting problem is undecidable. There are plenty of difficult problems (e.g., writing #evaluate) that turn out to be solvable given enough effort and ingenuity; if the halting problem is undecidable, that means #halts? is impossible to write, not just extremely difficult.

How do we know that a proper implementation of #halts? can’t possibly exist? If it’s just an engineering problem, why can’t we throw lots of programmers at it and eventually come up with a solution?

Too good to be true

Let’s pretend temporarily that the halting problem is decidable. In this imaginary world, it’s possible to write a full implementation of #halts?, so a call to halts?(program, input) always comes back with a true or false answer for any program and input, and that answer always correctly predicts whether program would halt if it was run with input on standard input. The rough structure of the #halts? method might be something like this:

def halts?(program, input)
  # parse program
  # analyze program
  # return true if program halts on input, false if not
end

If we can write #halts?, then we can construct does_it_halt.rb, a program that reads another program as input and prints yes or no depending on whether that program halts when it reads the empty string:[75]

def halts?(program, input)
  # parse program
  # analyze program
  # return true if program halts on input, false if not
end

def halts_on_empty?(program)
  halts?(program, '')
end

program = $stdin.read

if halts_on_empty?(program)
  print 'yes'
else
  print 'no'
end

Once we have does_it_halt.rb, we can use it to solve very difficult problems. Consider this famous claim made by Christian Goldbach in 1742:

Every even integer greater than 2 can be written as the sum of two primes.

This is the Goldbach conjecture, and it’s famous because nobody has been able to prove whether it’s true or false. The evidence suggests that it’s true, because an even number picked at random can invariably be broken apart into two primes—12 = 5 + 7, 34 = 3 + 31, 567890 = 7 + 567883 and so forth—and computers have been used to check that this can be done for all even numbers between four and four quintillion (4,000,000,000,000,000,000). But there are infinitely many even numbers, so no computer can check them all, and there’s no known proof that every even number must break apart in this way. There is a possibility, however small, that some very large even number is not the sum of two primes.

Proving the Goldbach conjecture is one of the holy grails of number theory; in the year 2000, the publisher Faber and Faber offered a million-dollar prize to anyone who could produce a proof. But wait: we already have the tools to discover whether the conjecture is true! We just need to write a program that searches for a counterexample:

require 'prime'

def primes_less_than(n)
  Prime.each(n - 1).entries
end

def sum_of_two_primes?(n)
  primes = primes_less_than(n)
  primes.any? { |a| primes.any? { |b| a + b == n } }
end

n = 4

while sum_of_two_primes?(n)
  n = n + 2
end

print n

This establishes a connection between the truth of the Goldbach conjecture and the halting behavior of a program. If the conjecture is true, this program will never find a counterexample no matter how high it counts, so it will loop forever; if the conjecture’s false, n will eventually be assigned the value of an even number that isn’t the sum of two primes, and the program will halt. So we just have to save it as goldbach.rb and run ruby does_it_halt.rb < goldbach.rb to find out whether it’s a halting program, and that will tell us whether the Goldbach conjecture is true. A million dollars is ours![76]

Well, obviously this is too good to be true. To write a program that can accurately predict the behavior of goldbach.rb would require a proficiency in number theory beyond our current understanding. Mathematicians have been working for hundreds of years to try to prove or disprove the Goldbach conjecture; it’s unlikely that a bunch of avaricious software engineers could construct a Ruby program that can miraculously solve not only this problem but any unsolved mathematical conjecture that can be expressed as a looping program.

Fundamentally impossible

So far we’ve seen strong evidence that the halting problem is undecidable, but not conclusive proof. Our intuition may say it’s unlikely that we can prove or disprove the Goldbach conjecture just by turning it into a program, but computation can be pretty counterintuitive at times, so we shouldn’t allow ourselves to be convinced solely by arguments about how improbable something is. If the halting problem really is undecidable, as opposed to simply very difficult, we should be able to prove it.

Here’s why #halts? can never work. If it did work, we’d be able to construct a new method #halts_on_itself? that calls #halts? to determine what a program does when run with its own source code as input:[77]

def halts_on_itself?(program)
  halts?(program, program)
end

Like #halts?, the #halts_on_itself? method will always finish and return a Boolean value: true if program halts when run with itself as input, false if it loops forever.

Given working implementations of #halts? and #halts_on_itself?, we can write a program called do_the_opposite.rb:

def halts?(program, input)
  # parse program
  # analyze program
  # return true if program halts on input, false if not
end

def halts_on_itself?(program)
  halts?(program, program)
end

program = $stdin.read

if halts_on_itself?(program)
  while true
    # do nothing
  end
end

This code reads program from standard input, finds out whether it would halt if run on itself, and promptly does the exact opposite: if program would halt, do_the_opposite.rb loops forever; if program would loop forever, do_the_opposite.rb halts.

Now, what does ruby do_the_opposite.rb < do_the_opposite.rb do?[78] Just as we saw earlier with does_it_say_no.rb, this question creates an inescapable contradiction.

#halts_on_itself? must return either true or false when given the source of do_the_opposite.rb as an argument. If it returns true to indicate a halting program, then ruby do_the_opposite.rb < do_the_opposite.rb will loop forever, which means #halts_on_itself? was wrong about what would happen. On the other hand, if #halts_on_itself? returns false, that’ll make do_the_opposite.rb immediately halt, again contradicting #halts_on_itself?’s prediction.

It’s wrong to pick on #halts_on_itself? here—it’s just an innocent one-liner that delegates to #halts? and relays its answer. What we’ve really shown is that it’s not possible for #halts? to return a satisfactory answer when called with do_the_opposite.rb as both program and input arguments; no matter how hard it works, any result it produces will automatically be wrong. That means there are only two possible fates for any real implementation of #halts?:

  • It sometimes gives the wrong answer, e.g., predicting that do_the_opposite.rb will loop forever even though it halts (or vice versa).

  • It sometimes loops forever and never returns any answer, just like #evaluate does in ruby does_it_say_no.rb < does_it_say_no.rb.

So a fully correct implementation of #halts? can never exist: there must be inputs for which it makes either an incorrect prediction or no prediction at all.

Remember the definition of decidability:

A decision problem is decidable if there’s an algorithm that’s guaranteed to solve it in a finite amount of time for any possible input.

We’ve proved it’s impossible to write a Ruby program that completely solves the halting problem, and since Ruby programs are equivalent in power to Turing machines, it must be impossible with a Turing machine too. The Church–Turing thesis says that every algorithm can be performed by a Turing machine, so if there’s no Turing machine for solving the halting problem, there’s no algorithm either; in other words, the halting problem is undecidable.

Other Undecidable Problems

It’s discouraging that there’s an easily defined problem (“does this program halt?”) that computers can’t reliably solve. However, that specific problem is relatively abstract, and the do_the_opposite.rb program we used to illustrate it is impractical and contrived; it doesn’t seem likely that we’ll ever want to actually implement #halts?, or write a program like do_the_opposite.rb, as part of a real-world application. Perhaps we can disregard undecidability as an academic curiosity and get on with our lives.

Unfortunately it’s not that simple, because the halting problem is not the only undecidable problem. There are plenty of others that we might realistically want to solve in our day-to-day work of building software, and their undecidability has real consequences for the practical limitations of automated tools and processes.

Let’s look at a toy example. Suppose we’ve been given the job of developing a Ruby program to print the string 'hello world'. That sounds simple enough, but in our capacity as inveterate procrastinators[79] we’re also going to develop an automated tool that can reliably decide whether or not a particular program prints hello world when supplied with a particular input.[80] Armed with this tool, we can analyze our final program and check that it does what it’s supposed to.

Now, imagine we succeed in developing a method #prints_hello_world? that can correctly make that decision about any program. Omitting implementation details, the method has this general form:

def prints_hello_world?(program, input)
  # parse program
  # analyze program
  # return true if program prints "hello world", false if not
end

Once we’ve finished writing our original program, we can use #prints_hello_world? to verify that it does the right thing; if it does, we’ll check it into source control, email the boss, and everyone’s happy. But the situation is even better than that, because we can also use #prints_hello_world? to implement another interesting method:

def halts?(program, input)
  hello_world_program = %Q{
    program = #{program.inspect}
    input = $stdin.read
    evaluate(program, input) # evaluate program, ignoring its output
    print 'hello world'
  }

  prints_hello_world?(hello_world_program, input)
end

Note

The %Q syntax quotes a string in the same way as %q and then performs interpolation, so #{program.inspect} gets replaced with a Ruby string literal containing the value of program.

Our new version of #halts? works by constructing a special program, hello_world_program, which does two main things:

  1. Evaluates program with input available on its standard input

  2. Prints hello world

hello_world_program is constructed so that its execution has only two possible outcomes: either evaluate(program, input) will finish successfully, in which case hello world will be printed, or evaluate(program, input) will loop forever and there’ll be no output at all.

This special program is fed into #prints_hello_world? to find out which of those two outcomes will happen. If #prints_hello_world? returns true, that means evaluate(program, input) will eventually finish and allow hello world to be printed, so #halts? returns true to indicate that program halts on input. If #prints_hello_world? instead returns false, that must be because hello_world_program will never reach its final line, so #halts? returns false to say that evaluate(program, input) loops forever.

Our new implementation of #halts? shows that the halting problem is reducible to the problem of checking whether a program prints hello world. In other words, any algorithm that computes #prints_hello_world? can be adapted to make an algorithm that computes #halts?.

We already know that a working #halts? can’t exist, so the obvious conclusion is that a complete implementation of #prints_hello_world? can’t exist either. And if it’s impossible to implement, the Church–Turing thesis says there’s no algorithm for it, so “does this program print hello world?” is another undecidable problem.

In reality, nobody cares about automatically detecting whether a program prints a particular string, but the structure of this undecidability proof points to something larger and more general. We only had to construct a program that exhibits the “prints hello world” property whenever some other program halts, and that was enough to show undecidability. What stops us from reusing this argument for any property of program behavior, including properties that we actually do care about?

Well, nothing does. This is Rice’s theorem: any nontrivial property of program behavior is undecidable, because the halting problem can always be reduced to the problem of deciding whether that property is true; if we could invent an algorithm for deciding that property, we’d be able to use it to build another algorithm that decides the halting problem, and that’s impossible.

Note

Roughly speaking, a “nontrivial property” is a claim about what a program does, not how it does it. For example, Rice’s theorem doesn’t apply to a purely syntactic property like “does this program’s source code contain the string 'reverse'?,” because that’s an incidental implementation detail that can be refactored away without changing the program’s externally visible behavior. On the other hand, a semantic property like “does this program output the reverse of its input?” is within the scope of Rice’s theorem and therefore undecidable.

Rice’s theorem tells us there are a huge number of undecidable problems that are concerned with what a program will do when it’s executed.

Depressing Implications

Undecidability is an inconvenient fact of life. The halting problem is disappointing, because it shows we can’t have everything: we want the unrestricted power of a universal programming language, but we also want to write programs that produce a result without getting stuck in an infinite loop, or at least programs whose subroutines halt as part of some larger long-running task (see Very Long-Running Computations).

This disappointment is bleakly summarized in a classic paper from 2004:

There is a dichotomy in language design, because of the halting problem. For our programming discipline we are forced to choose between

  1. Security—a language in which all programs are known to terminate.

  2. Universality—a language in which we can write

    1. all terminating programs

    2. silly programs which fail to terminate

    and, given an arbitrary program we cannot in general say if it is (i) or (ii).

Five decades ago, at the beginning of electronic computing, we chose (B).

Yes, we’d all like to avoid writing silly programs, but that’s just tough luck. There’s no way to tell whether an arbitrary program is silly, so we can’t completely avoid writing them without sacrificing universality.[81]

The implications of Rice’s theorem are depressing too: not only is the question “does this program halt?” undecidable, but so is “does this program do what I want it to do?” We live in a universe where there’s no way to build a machine that can accurately predict whether a program will print hello world, calculate a particular mathematical function, or make a particular operating system call, and that’s just the way it is.

That’s frustrating, because it would be really useful to be able to check program properties mechanically; the reliability of modern software would be improved by a tool that decides whether a program conforms to its specification or contains any bugs. Those properties might be mechanically checkable for individual programs, but unless they’re checkable in general, then we’ll never be able to trust machines to do the job for us.

For example, say we invent a new software platform and decide to make money by selling compatible programs from an online shop—an “application superstore,” if you like—on behalf of our platform’s third-party developers. We want our customers to be able to shop with confidence, so we decide to only sell programs that meet certain criteria: they must not crash, they must not call private APIs, and they must not execute arbitrary code downloaded from the Internet.

When thousands of developers start submitting programs to us, how do we review each one to see whether it falls within our guidelines? It would save a lot of time and money if we could use an automated system to check every submission for compliance, but thanks to undecidability, it’s not possible to build a system that does the job accurately. We have no choice but to hire a small army of human beings to manually test those programs by running them, disassembling them, and instrumenting the operating system to profile their dynamic behavior.

Manual review is slow, expensive, and error-prone, with the added drawback that each program can only be run for a short amount of time, providing a limited snapshot of its dynamic behavior. So even if nobody makes a mistake, occasionally something undesirable will slip through the net and we’ll have a load of angry customers on our hands. Thanks a lot, undecidability.

Beneath all this inconvenience are two fundamental problems. The first is that we don’t have the power to look into the future and see what will happen when a program is executed; the only general way to find out what a program does is to run it for real. While some programs are simple enough to have behavior that’s straightforwardly predictable, a universal language will always permit programs whose behavior can’t be predicted just by analyzing their source code.[82]

The second problem is that, when we do decide to run a program, there’s no reliable way to know how long it will take to finish. The only general solution is to run it and wait, but since we know that programs in a universal language can loop forever without halting, there will always be some programs for which no finite amount of waiting is long enough.

Why Does This Happen?

In this chapter, we’ve seen that all universal systems are powerful enough to refer to themselves. Programs operate on numbers, numbers can represent strings, and strings are how the instructions of a program are written down, so programs are perfectly capable of operating on their own source code.

This capacity for self-reference makes it impossible to write a program that can reliably predict program behavior. Once a particular behavior-checking program has been written, we are always able to construct a larger program that defeats it: the new program incorporates the checker as a subroutine, checks its own source code, and then immediately does the exact opposite of whatever the checker said it would do. These self-contradictory programs are curiosities rather than something we’d ever write in practice, but they’re just a symptom, not the cause, of the underlying problem: in general, program behavior is too powerful to be accurately predicted.

Note

Human language has similar power and similar problems. “This sentence is a lie” (the liar paradox) is an English sentence that can’t be true or false. But the liar paradox depends on the special self-referential word “this”; as we saw in Programs Can Refer to Themselves, any computer program can be made self-referential by construction, without requiring any special language features.

When it comes down to it, there are two basic reasons why the behavior of programs is so hard to predict:

  1. Any system with enough power to be self-referential can’t correctly answer every question about itself.[83] We will always be able to construct a program like do_the_opposite.rb whose behavior can’t be predicted by the system. To avoid this problem, we need to step outside the self-referential system and use a different, more powerful system to answer questions about it.

  2. But in the case of universal programming languages, there is no more powerful system for us to upgrade to. The Church–Turing thesis tells us that any usable algorithm we invent for making predictions about the behavior of programs can itself be performed by a program, so we’re stuck with the capabilities of universal systems.

Coping with Uncomputability

The whole point of writing a program is to get a computer to do something useful. As programmers, how are we supposed to cope with a world where checking that a program works properly is an unsolvable problem?

Denial is a tempting response: Ignore the whole issue. It would be nice if we could automatically verify program behavior, but we can’t, so let’s just hope for the best and never try to check that a program does its job correctly.

But that would be an overreaction, because the situation isn’t as bad as it sounds. Rice’s theorem doesn’t mean that program analysis is impossible, just that we can’t write a nontrivial analyzer that will always halt and produce the right answer. As we saw in Building a Halting Checker, there’s nothing to stop us from writing a tool that gives the right answer for some programs, as long as we can tolerate the fact that there will always be other programs for which it either gives the wrong answer or loops forever without returning anything.

Here are some practical ways of analyzing and predicting program behavior, in spite of undecidability:

  • Ask undecidable questions, but give up if an answer can’t be found. For example, to check whether a program prints a particular string, we can run it and wait; if it doesn’t print that string within a particular period of time, say 10 seconds, we just terminate the program and assume it’s no good. We might accidentally throw out a program that produces the expected output after 11 seconds, but in many cases that’s an acceptable risk, especially since slow programs can be undesirable in their own right.

  • Ask several small questions whose answers, when taken together, provide empirical evidence for the answer to a larger question. When performing automated acceptance testing, we’re usually not able to check that a program does the right thing for every possible input, but we can try running it for a limited number of example inputs to see what happens. Each test run gives us information about how the program behaves in that specific case, and we can use that information to become more confident of how it’s likely to behave in general. This leaves open the possibility that there are other untested inputs that cause wildly different behavior, but we can live with that as long as our test cases do a good job of representing most kinds of realistic input.

    Another example of this approach is the use of unit tests to verify the behavior of small pieces of a program individually rather than trying to verify the program as a whole. A well-isolated unit test concentrates on checking the properties of a simple unit of code, and makes assumptions about other parts of the program by representing them with test doubles (i.e., stubs and mocks). Individual unit tests that exercise small pieces of well-understood code can be simple and fast, minimizing the danger that any one test will run forever or give a misleading answer.

    By unit testing all the program’s pieces in this way, we can set up a chain of assumptions and implications that resembles a mathematical proof: “if piece A works, then piece B works, and if piece B works, then piece C works.” Deciding whether all of these assumptions are justified is the responsibility of human reasoning rather than automated verification, although integration and acceptance testing can improve our confidence that the entire system does what it’s supposed to do.

  • Ask decidable questions by being conservative where necessary. The above suggestions involve actually running parts of a program to see what happens, which always introduces the risk of hitting an infinite loop, but there are useful questions that can be answered just by inspecting a program’s source code statically. The most obvious example is “does this program contain any syntax errors?,” but we can answer more interesting questions if we’re prepared to accept safe approximations in cases where the real answer is undecidable.

    A common analysis is to look through a program’s source to see if it contains dead code that computes values that are never used, or unreachable code that never gets evaluated. We can’t always tell whether code is truly dead or unreachable, in which case we have to be conservative and assume it isn’t, but there are cases where it’s obvious: in some languages, we know that an assignment to a local variable that’s never mentioned again is definitely dead, and that a statement that immediately follows a return is definitely unreachable.[84] An optimizing compiler like GCC uses these techniques to identify and eliminate unnecessary code, making programs smaller and faster without affecting their behavior.

  • Approximate a program by converting it into something simpler, then ask decidable questions about the approximation. This important idea is the subject of the next chapter.



[64] The smallest value x and y can reach is 1, so they’ll meet there if all else fails.

[65] Ruby already has a built-in version of Euclid’s algorithm, Integer#gcd, but that’s beside the point.

[66] Lisp is really a family of programming languages—including Common Lisp, Scheme, and Clojure—that share very similar syntax.

[67] It would be more useful to only assign numbers to the syntactically valid Ruby programs, but doing that is more complicated.

[68] Most of those numbers won’t represent syntactically valid Ruby programs, but we can feed each potential program to the Ruby parser and reject it if it has any syntax errors.

[69] We’re using Unix shell syntax here. On Windows, it’s necessary to omit the single quotes around the argument to echo, or to put the text in a file and feed it to ruby with the < input redirection operator.

[70] This is the shell command to run does_it_say_no.rb with its own source code as input.

[71] We can only get away with this because parts B and C happen not to contain any difficult characters like backslashes or unbalanced curly brackets. If they did, we’d have to escape them somehow and then undo that escaping as part of assembling the value of program.

[72] Douglas Hofstadter coined the name quine for a program that prints itself.

[73] Can you resist the urge to write a Ruby program that can perform this transformation on any Ruby program? If you use %q{} to quote data’s value, how will you handle backslashes and unbalanced curly brackets in the original source?

[74] Briefly, for the curious: every pushdown automaton has an equivalent context-free grammar and vice versa; any CFG can be rewritten in Chomsky normal form; and any CFG in that form must take exactly 2n − 1 steps to generate a string of length n. So we can turn the original PDA into a CFG, rewrite the CFG into Chomsky normal form, and then turn that CFG back into a PDA. The resulting pushdown automaton recognizes the same language as the original, but now we know exactly how many steps it’ll take to do it.

[75] The choice of the empty string is unimportant; it’s just an arbitrary fixed input. The plan is to run does_it_halt.rb on self-contained programs that don’t read anything from standard input, so it doesn’t matter what input is.

[76] Faber’s million-dollar prize expired in 2002, but anyone who produced a proof today would still be in for some serious fortune and glory on the rock star mathematician circuit.

[77] This is reminiscent of #evaluate_on_itself from Universal Systems Can Loop Forever, with #halts? in place of #evaluate.

[78] Or equivalently: what does #halts_on_itself? return if we call it with do_the_opposite.rb’s source code as its argument?

[79] Surely “responsible software engineering professionals”?

[80] Again, that input might be irrelevant if the program doesn’t actually read anything from $stdin, but we’re including it for the sake of completeness and consistency.

[81] Total programming languages are a potential solution to this problem, but so far they haven’t taken off, perhaps because they can be more difficult to understand than conventional languages.

[82] Stephen Wolfram coined the name computational irreducibility for the idea that a program’s behavior can’t be predicted without running it.

[83] This is roughly the content of Gödel’s first incompleteness theorem.

[84] The Java language specification requires the compiler to reject any program that contains unreachable code. See http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.21 for a lengthy explanation of how a Java compiler is meant to decide which parts of a program are potentially reachable without running any of it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.107.85