The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.
In this book, we’ve explored different models of computers and programming languages, including several
kinds of abstract machine. Some of those machines are more powerful than others, and two
varieties in particular come with pretty obvious limitations: finite automata can’t solve
problems that involve unrestricted counting, like deciding whether a string of brackets is
correctly balanced, and pushdown automata can’t handle any problem where information needs to be
reused in more than one place, like deciding whether a string contains the same number of
a
, b
, and c
characters.
But the most advanced device we’ve seen, the Turing machine, seems to have everything that we need: it’s got unlimited storage that can be accessed in any order, arbitrary loops, conditionals, and subroutines. The extremely minimal programming language from Chapter 6, the lambda calculus, turned out to be surprisingly powerful too: with a little ingenuity it allows us to represent simple values and complex data structures as pure code, as well as implement operations that manipulate those representations. And in Chapter 7, we saw many other simple systems that, like the lambda calculus, have the same universal power as Turing machines.
How much further can we push this progression of increasingly powerful systems? Perhaps not indefinitely: our attempts to make Turing machines more powerful by adding features didn’t get us anywhere, which suggests there may be a hard limit on computational power. So what are computers and programming languages fundamentally capable of, and is there anything that they can’t do? Are there any impossible programs?
These are pretty deep questions, so before we try to tackle them, let’s take a tour of some fundamental facts from the world of computation. Some of these facts are obvious, others less so, but they’re all important prerequisites for thinking about the capabilities and limitations of computing machines.
What, generally speaking, can we do with universal systems like Turing machines, the lambda calculus, and partial recursive functions? If we can properly understand the capabilities of these systems, then we’ll be able to investigate their limitations.
The practical purpose of a computing machine is to perform algorithms. An algorithm is a list of instructions describing some process for turning an input value into an output value, as long as those instructions fulfill certain criteria:
There are a finite number of instructions.
Each instruction is simple enough that it can be performed by a person with a pencil and paper without using any ingenuity.
A person following the instructions will finish within a finite number of steps for any input.
A person following the instructions will produce the right answer for any input.
For example, one of the oldest known algorithms is Euclid’s algorithm, which dates from around 300 BC. It takes two positive integers and returns the largest integer that will divide them both exactly—their greatest common divisor. Here are its instructions:
Give the two numbers the names x
and y
.
Decide which of x
or
y
is the larger number.
Subtract the smaller number from the larger. (If x
is larger, subtract y
from it and make this the new value of
x
; vice versa if y
is larger.)
Repeat steps 2 and 3 until x
and y
are equal.
When x
and y
become equal, their value is the
greatest common divisor of the original two numbers.
We’re happy to recognize this as an algorithm, because it appears to meet the basic criteria. It only contains a few instructions, and they all seem simple enough to be performed with pencil and paper by someone who doesn’t have any special insight into the overall problem. With a bit of thought, we can also see that it must finish within a finite number of steps for any input: every repetition of step 3 causes one of the two numbers to get smaller, so they must eventually reach the same value[64] and cause the algorithm to finish. It’s not quite so obvious that it’ll always give the correct answer, but a few lines of elementary algebra are enough to show that the result must always be the greatest common divisor of the original numbers.
So Euclid’s algorithm is worthy of its name, but like any algorithm, it’s just a collection of ideas expressed as human-readable words and symbols. If we want to do something useful with it—maybe we’d like to explore its mathematical properties, or design a machine that performs it automatically—we need to translate the algorithm into a stricter, less ambiguous form that’s suitable for mathematical analysis or mechanical execution.
We already have several models of computation that we could use for this: we could try to write down Euclid’s algorithm as a Turing machine rulebook, or a lambda calculus expression, or a partial recursive function definition, but all of those would involve a lot of housekeeping and other uninteresting detail. For the moment, let’s just translate it into unrestricted Ruby:[65]
def
euclid
(
x
,
y
)
until
x
==
y
if
x
>
y
x
=
x
-
y
else
y
=
y
-
x
end
end
x
end
This #euclid
method contains essentially the same
instructions as the natural language description of Euclid’s algorithm, but this time,
they’re written in a way that has a strictly defined meaning (according to the operational
semantics of Ruby) and therefore can be interpreted by a machine:
>>
euclid
(
18
,
12
)
=> 6
>>
euclid
(
867
,
5309
)
=> 1
In this specific case, it’s been easy to take an informal, human-readable description of an algorithm and turn it into unambiguous instructions for a machine to follow. Having Euclid’s algorithm in a machine-readable form is very convenient; now we can perform it quickly, repeatedly, and reliably without having to employ manual labor.
Hopefully it’s clear that we could just as well have implemented this algorithm with the lambda calculus by using similar techniques to the ones we saw in Numeric Operations, or as a partial recursive function built from the operations in Partial Recursive Functions, or as a collection of Turing machine rules like the ones used for simple arithmetic in Rules.
This raises an important question: can any algorithm be turned into instructions suitable for execution by a machine? Superficially that seems like a trivial thing to ask—it was pretty obvious how to turn Euclid’s algorithm into a program, and as programmers, we have a natural tendency to think of the two things as interchangeable—but there’s a real difference between the abstract, intuitive idea of an algorithm and the concrete, logical implementation of that algorithm within a computational system. Could there ever be an algorithm so large, complex, and unusual that its essence can’t be captured by an unthinking mechanical process?
Ultimately there can be no rigorous answer, because the question is philosophical rather than scientific. The instructions of an algorithm must be “simple” and “without ingenuity” so that it “can be performed by a person,” but those are imprecise ideas about human intuition and capability, not mathematical assertions of the kind that can be used to prove or disprove a hypothesis.
We can still collect evidence one way or the other by coming up with lots of algorithms and seeing whether our computing system of choice—Turing machines, or lambda calculus, or partial recursive functions, or Ruby—can implement them. Mathematicians and computer scientists have been doing exactly that since the 1930s, and so far, nobody has managed to devise a reasonable algorithm that can’t be performed by these systems, so we can be pretty confident about our empirical hunch: it certainly looks as though a machine can perform any algorithm.
Another strong piece of evidence is the fact that most of these systems were developed independently as attempts to capture and analyze the informal idea of an algorithm, and were only later found to be exactly equivalent to each other. Every historical attempt to model the idea of an algorithm has produced a system whose capabilities are identical to those of a Turing machine, and that’s a pretty good hint that a Turing machine adequately represents what an algorithm can do.
The idea that any algorithm can be performed by a machine—specifically a deterministic Turing machine—is called the Church–Turing thesis, and although it’s just a conjecture rather than a proven fact, it has enough evidence in its favor to be generally accepted as true.
“Turing machines can perform any algorithm” is a philosophical claim about the relationship between the intuitive idea of algorithms and the formal systems that we use to implement them. What it actually means is a matter of interpretation: we could see it as a statement about what can and cannot be computed, or just as a firmer definition of the word “algorithm.”
Either way, it’s called the “Church–Turing thesis,” not the “Church–Turing theorem,” because it’s an informal claim rather than a provable mathematical assertion—it can’t be expressed in purely mathematical language, so there’s no way to construct a mathematical proof. It’s widely believed to be true because it matches our intuition about the nature of computation and the evidence of what algorithms are capable of, but we still call it a “thesis” to remind ourselves that it has a different status from provable ideas like Pythagoras’ theorem.
The Church–Turing thesis implies that Turing machines, despite their simplicity, have all the power required to perform any computation that can in principle be carried out by a person following simple instructions. Many people go further than this and claim that, since all attempts to codify algorithms have led to universal systems that are equivalent in power to Turing machines, it’s just not possible to do any better: any real-world computer or programming language can only ever do as much as a Turing machine can do, and no more. Whether it’s ultimately possible to build a machine that’s more powerful than a Turing machine—that can use exotic laws of physics to perform tasks beyond what we think of as “algorithms”—is not definitively known, but it’s definitely true that we don’t currently know how to do it.
As we saw in Chapter 5, the Turing machine’s simplicity makes it cumbersome to design a rulebook for a particular task. To avoid our investigation of computability being overshadowed by the fiddly details of Turing machine programming, we’ll use Ruby programs as a substitute, just as we did for Euclid’s algorithm.
This sleight of hand is justified by universality: in principle, we can translate any Ruby program into an equivalent Turing machine and vice versa, so a Ruby program is no more or less powerful than a Turing machine, and anything we can discover about the limitations of Ruby’s capabilities should apply equally to Turing machines.
A sensible objection is that Ruby has lots of practical functionality that Turing machines don’t. A Ruby program can access the filesystem, send and receive messages across the network, accept user input, draw graphics on a bitmapped display, and so on, whereas even the most sophisticated set of Turing machine rules can only ever read and write characters on a tape. But that isn’t a fundamental problem, because all of this extra functionality can be simulated with a Turing machine: if necessary, we can designate certain parts of the tape as representing “the filesystem” or “the network” or “the display” or whatever, and treat reading and writing to those tape regions as though it was genuine interaction with the outside world. None of these enhancements changes the underlying computational power of a Turing machine; they just provide higher-level interpretations of its activity on the tape.
In practice, we can sidestep the objection completely by restricting ourselves to simple Ruby programs that avoid any controversial language features. For the rest of this chapter, we’ll stick to writing programs that read a string from standard input, do some computation, and then write a string to standard output when they’re finished; the input string is analogous to the initial contents of a Turing machine’s tape, and the output string is like the final tape contents.
Programs live a double life. As well as being instructions to control a particular system, we can also think of a program as pure data: a tree of expressions, a raw string of characters, or even a single large number. This duality is usually taken for granted by us as programmers, but for general-purpose computers it’s vitally important that programs can be represented as data so that they can be used as input to other programs; it’s the unification of code and data that makes software possible in the first place.
We’ve already seen programs-as-data in the case of the universal Turing machine, which expects another Turing machine’s rulebook to be written on its tape as a sequence of characters. In fancy homoiconic programming languages like Lisp[66] and XSLT, programs are explicitly written as data structures that the language itself can manipulate: every Lisp program is a nested list called an s-expression, and every XSLT stylesheet is an XML document.
In Ruby, only the interpreter (which, at least in the case of MRI, is not itself written in Ruby) usually gets to see a structured representation of the program, but the code-as-data principle still applies. Consider this simple Ruby program:
puts
'hello world'
To an observer who understands the syntax and semantics of Ruby, this is a program that
sends a puts
message to the main
object with the string 'hello world'
,
which results in the Kernel#puts
method printing hello world
to standard output. But on a lower level, it’s just
a sequence of characters, and because characters are represented as bytes, ultimately that
sequence can be viewed as a large number:
>>
program
=
"puts 'hello world'"
=> "puts 'hello world'"
>>
bytes_in_binary
=
program
.
bytes
.
map
{
|
byte
|
byte
.
to_s
(
2
)
.
rjust
(
8
,
'0'
)
}
=> ["01110000", "01110101", "01110100", "01110011", "00100000", "00100111",
"01101000", "01100101", "01101100", "01101100", "01101111", "00100000",
"01110111", "01101111", "01110010", "01101100", "01100100", "00100111"]
>>
number
=
bytes_in_binary
.
join
.
to_i
(
2
)
=> 9796543849500706521102980495717740021834791
In a sense, puts 'hello world'
is Ruby program number
9796543849500706521102980495717740021834791.[67] Conversely, if someone tells us the number of a Ruby
program, we can easily turn it back into the program itself and run
it:
>>
number
=
9796543849500706521102980495717740021834791
=> 9796543849500706521102980495717740021834791
>>
bytes_in_binary
=
number
.
to_s
(
2
)
.
scan
(
/.+?(?=.{8}*z)/
)
=> ["1110000", "01110101", "01110100", "01110011", "00100000", "00100111",
"01101000", "01100101", "01101100", "01101100", "01101111", "00100000",
"01110111", "01101111", "01110010", "01101100", "01100100", "00100111"]
>>
program
=
bytes_in_binary
.
map
{
|
string
|
string
.
to_i
(
2
)
.
chr
}
.
join
=> "puts 'hello world'"
>>
eval
program
hello world
=> nil
Of course, this scheme of encoding a program as a large number is what makes it possible to to store it on disk, send it over the Internet, and feed it to a Ruby interpreter (which is itself just a large number on disk!) to make a particular computation happen.
Since every Ruby program has a unique number, we can automatically generate all possible programs: start by generating program number 1, then generate program number 2, and so on.[68] If we did this for long enough, we’d eventually generate the next hot asynchronous web development framework and retire to a life of leisure.
We’ve seen that general-purpose computers are universal: we can design a Turing machine that is capable of simulating any other Turing machine, or write a program that can evaluate any other program. Universality is a powerful idea that allows us to use a single adaptable machine for a variety of tasks rather than many specialized devices, but it also has an inconvenient consequence: any system that’s powerful enough to be universal will inevitably allow us to construct computations that loop forever without halting.
So why must every universal system bring nontermination along for the ride? Isn’t there some ingenious way to restrict Turing machines so that they’ll always halt, without compromising their usefulness? How do we know we won’t someday design a programming language that’s just as powerful as Ruby but doesn’t have infinite loops in it? There are all sorts of specific examples of why this can’t be done, but there’s also a more general argument, so let’s go through it.
Ruby is a universal programming language, so it must be possible to write Ruby code that
evaluates Ruby code. In principle, we can define a method called #evaluate
, which takes the source code of a Ruby program and a string to
provide to that program on standard input, and returns the result (i.e., the string sent to
standard output) of evaluating that program with that input.
The implementation of #evaluate
is far too complicated to be contained within this chapter, but here’s
the broadest possible outline of how it would work:
def
evaluate
(
program
,
input
)
# parse program
# evaluate program on input while capturing output
# return output
end
#evaluate
is essentially a Ruby
interpreter written in Ruby. Although we haven’t given its
implementation, it’s certainly possible to write it: first turn program
into a sequence of tokens and parse
them to build a parse tree (see Parsing with Pushdown Automata),
then evaluate that parse tree according to the operational semantics of
Ruby (see Operational Semantics). It’s a large and
complex job, but it can definitely be done; otherwise, Ruby wouldn’t
qualify as universal.
For simplicity, we’ll assume that our imaginary implementation of #evaluate
is bug-free and won’t crash while it’s evaluating
program
—if we’re going to imagine some code, we may as
well imagine that it’s perfect. Of course it might return some result that indicates that
program
raised an exception while it was being
evaluated, but that’s not the same as #evaluate
itself
actually crashing.
Ruby happens to have a built-in Kernel#eval
method that can evaluate a
string of Ruby code, but taking advantage of it here would be a bit of
a cheat, not least because (in MRI) it’s implemented in C, not Ruby.
It’s also just unnecessary for the current discussion; we’re using
Ruby as a representative example of any universal programming
language, but many universal languages don’t have a built-in eval
.
But hey, since it’s there, it’d be a shame not to use it to make
#evaluate
less imaginary. Here’s a
rough attempt, with apologies for cheating:
require
'stringio'
def
evaluate
(
program
,
input
)
old_stdin
,
old_stdout
=
$stdin
,
$stdout
$stdin
,
$stdout
=
StringIO
.
new
(
input
),
(
output
=
StringIO
.
new
)
begin
eval
program
rescue
Exception
=>
e
output
.
puts
(
e
)
ensure
$stdin
,
$stdout
=
old_stdin
,
old_stdout
end
output
.
string
end
This implementation has many practical and philosophical
problems that could all be avoided by writing a pure-Ruby #evaluate
. On the other hand, it’s short
enough to include here and works well enough for demonstration
purposes:
>>
evaluate
(
'print $stdin.read.reverse'
,
'hello world'
)
=> "dlrow olleh"
The existence of #evaluate
allows us to define another method, #evaluate_on_itself
, which returns the result
of evaluating program
with
its own source as input:
def
evaluate_on_itself
(
program
)
evaluate
(
program
,
program
)
end
This might sound like a weird thing to do, but it’s totally
legitimate; program
is just a string,
so we’re perfectly entitled to treat it both as a Ruby program and as
input to that program. Code is data,
right?
>>
evaluate_on_itself
(
'print $stdin.read.reverse'
)
=> "esrever.daer.nidts$ tnirp"
Since we know we can implement #evaluate
and #evaluate_on_itself
in Ruby, we must therefore
be able to write the complete Ruby program does_it_say_no.rb:
def
evaluate
(
program
,
input
)
# parse program
# evaluate program on input while capturing output
# return output
end
def
evaluate_on_itself
(
program
)
evaluate
(
program
,
program
)
end
program
=
$stdin
.
read
if
evaluate_on_itself
(
program
)
==
'no'
'yes'
else
'no'
end
This program is a straightforward application of existing code: it defines #evaluate
and #evaluate_on_itself
, then reads another Ruby program from standard input and
passes it to #evaluate_on_itself
to see what that program
does when run with itself as input. If the resulting output is the string 'no'
, does_it_say_no.rb
outputs 'yes'
, otherwise, it outputs 'no'
. For example:[69]
$
no
echo
'print $stdin.read.reverse'
| ruby does_it_say_no.rb
That’s the result we expected; as we saw above, when we run print $stdin.read.reverse
with itself as input, we get the output esrever.daer.nidts$ tnirp
, which is not equal to no
. What about a program that does output
no
?
$
yes
echo
'if $stdin.read.include?("no") then print "no" end'
| ruby does_it_say_no.rb
Again, just as expected.
So here’s the big question: what happens when we run ruby does_it_say_no.rb < does_it_say_no.rb?[70] Bear in mind that does_it_say_no.rb is a real program—one that we could write out in full if we had enough time and enthusiasm—so it must do something, but it’s not immediately obvious what that is. Let’s try to work it out by considering all the possibilities and eliminating the ones that don’t make sense.
First, running this particular program with its own source as input can’t possibly
produce the output yes
. By the program’s own logic, the
output yes
can only be produced when running does_it_say_no.rb on its own source produces the output
no
, which contradicts the original premise. So that’s
not the answer.
Okay, so maybe it outputs no
instead. But again, the
structure of the program means that it can only output no
if exactly the same computation doesn’t output no
—another contradiction.
Is it conceivable that it could output some other string, like maybe
, or even the empty string? That would be contradictory too: if evaluate_on_itself(program, program)
doesn’t return no
then the program prints no
, not something different.
So it can’t output yes
or no
, it can’t output something else, and it can’t crash unless #evaluate
contains bugs, which we’re assuming it doesn’t. The
only remaining possibility is that it doesn’t produce any output, and that can only happen
if the program never finishes: #evaluate
must loop
forever without returning a result.
In practice it’s almost certain that ruby does_it_say_no.rb < does_it_say_no.rb will exhaust the finite memory of the host machine, causing ruby to crash, rather than actually looping forever. This is a resource constraint imposed from outside the program, though, not a property of the program itself; in principle, we could keep adding more memory to the computer as necessary and let the computation run indefinitely.
This might seem like an unnecessarily complicated way of demonstrating that Ruby lets us
write nonhalting programs. After all, while true do end
is much a simpler example that does the same thing.
But by thinking about the behavior of does_it_say_no.rb, we’ve shown that nonhalting programs are an inevitable
consequence of universality, regardless of the specific features of the system. Our argument
doesn’t rely on any particular abilities of Ruby other than its universality, so the same
ideas can be applied to Turing machines, or the lambda calculus, or any other universal
system. Whenever we’re working with a language that is powerful enough to evaluate itself,
we know that it must be possible to use its equivalent of #evaluate
to construct a program that never halts, without having to know
anything else about the language’s capabilities.
In particular, it’s impossible to remove features (e.g., while
loops) from a programming language in a way that prevents us from writing
nonhalting programs while keeping the language powerful enough to be universal. If removing
a particular feature makes it impossible to write a program that loops forever, it must also
have made it impossible to implement #evaluate
.
Languages that have been carefully designed to ensure that their programs must always halt are called total programming languages, as opposed to the more conventional partial programming languages whose programs sometimes halt with an answer and sometimes don’t. Total programming languages are still very powerful and capable of expressing many useful computations, but one thing they can’t do is interpret themselves.
That’s surprising, since the equivalent of #evaluate
for a total programming language
must by definition always halt, yet it still can’t be implemented in
that language—if it could, we’d be able to use the does_it_say_no.rb technique to make it loop
forever.
This gives us our first tantalizing glimpse of an impossible program: we can’t write an interpreter for a total programming language in itself, even though there’s a perfectly respectable, guaranteed-to-halt algorithm for interpreting it. In fact it’s so respectable that we could write it in another, more sophisticated total programming language, but that new total language wouldn’t be able to implement its own interpreter either.
An interesting curiosity, but total languages are designed to have artificial limits; we were looking for things that no computer or programming language could do. We’d better keep going.
The self-referential trick used by does_it_say_no.rb hinges on our ability to construct a program that can read its own source code, but
perhaps it seems a bit like cheating to assume that this will always be possible. In our
example, the program received its own source as an explicit input, thanks to functionality
provided by the surrounding environment (i.e., the shell); if that hadn’t been an option, it
could also have read the data directly from disk with File.read(__FILE__)
, taking advantage of Ruby’s filesystem API and the special
__FILE__
constant that always contains the name of the
current file.
But we were supposed to be making a general argument that depended only on the
universality of Ruby, not on the capabilities of the operating system or the File
class. What about compiled languages like Java and C that
may not have access to their source at runtime? What about JavaScript programs that get
loaded into memory over a network connection, and may not be stored locally on a filesystem
at all? What about self-contained universal systems like Turing machines and the lambda
calculus, where the notions of “filesystem” and “standard input” don’t exist?
Fortunately, the does_it_say_no.rb argument can
withstand this objection, because having a program read its own source from standard input
is just a convenient shorthand for something that all universal systems can do, again
regardless of their environment or other features. This is a consequence of a fundamental
mathematical result called Kleene’s second recursion theorem, which
guarantees that any program can be converted into an equivalent one that is able to
calculate its own source code. The recursion theorem provides reassurance that our shorthand
was justified: we could have replaced the line program =
$stdin.read
with some code to generate the source of does_it_say_no.rb and assign it to program
without having to do any I/O at all.
Let’s see how to do the conversion on a simple Ruby program. Take this one, for example:
x
=
1
y
=
2
puts
x
+
y
We want to transform it into a program that looks something like this…
program
=
'
…
'
x
=
1
y
=
2
puts
x
+
y
…where program
is assigned a
string containing the source of the complete program. But what should
the value of program
be?
A naïve approach is to try to concoct a simple string literal that can be assigned to
program
, but that quickly gets us into trouble, because
the literal would be part of the program’s source code and would therefore have to appear
somewhere inside itself. That would require program
to
begin with the string 'program ='
followed by the value
of program
, which would be the string 'program ='
again followed by the value of program
, and so on forever:
program
=
%q{program = %q{program = %q{program = %q{program = %q{program = %q{
…
}}}}}}
x
=
1
y
=
2
puts
x
+
y
Ruby’s %q
syntax allows us to quote noninterpolated
strings with a pair of delimiter characters, in this case curly brackets, instead of
single quotes. The advantage is that the string literal can contain unescaped instances of
the delimiters as long as they’re correctly balanced:
>>
puts
%q{Curly brackets look like { and }.}
Curly brackets look like { and }.
=> nil
>>
puts
%q{An unbalanced curly bracket like }
is
a
problem
.
}
SyntaxError: syntax error, unexpected tIDENTIFIER, expecting end-of-input
Using %q
instead of single
quotes helps to avoid character-escaping headaches in strings that
contain their own delimiters:
program
=
'program = 'program = 'program = \\'
…
\\''''
The way out of this infinitely deep hole is to take advantage of the fact that a value used by a program doesn’t have to appear literally in its source code; it can also be computed dynamically from other data. This means we can construct the converted program in three parts:
Assign a string literal to a variable (call it data
).
Use that string to compute the current program’s source code and assign it to
program
.
Do whatever other work the program is supposed to do (i.e., the code we originally started with).
So the structure of the program will be more like this:
data
=
'
…
'
program
=
…
x
=
1
y
=
2
puts
x
+
y
That sounds plausible as a general strategy, but it’s a bit light
on specific details. How do we know what string to actually assign to
data
in part A, and how do we use it
in part B to compute program
? Here’s
one solution:
In part A, create a string literal that contains the source code of parts B and C,
and assign that string to data
. This string won’t
need to “contain itself,” because it’s not the source of the full program, only the
section of the program that comes after part A.
In part B, first compute a string that contains the source code of part A. We can do
that because part A mostly consists of a big string literal whose value is available as
data
, so we just need to prefix data
’s value with 'data
='
to recreate part A’s source. Then simply concatenate the result with
data
to get the source of the entire program (since
data
contains the source of parts B and C) and
assign it to program
.
This plan still sounds circular—part A produces the source of part B, and part B produces the source of part A—but it narrowly avoids an infinite regress by ensuring that part B just computes the source of part A without having to literally contain it.
We can start making progress by filling in the bits we already
know about. We have most of the source of parts B and C already, so we
can partially complete the value of data
:
data
=
%q{
program =
…
x = 1
y = 2
puts x + y
}
program
=
…
x
=
1
y
=
2
puts
x
+
y
data
needs to contain newline
characters. By representing these as actual newlines inside an
uninterpolated string literal, rather than as interpolated
escape sequences, we are able to include
the source of parts B and C verbatim without any special encoding or
escaping.[71] This straightforward copy-and-paste makes the source of
part A easier to compute.
We also know that the source of part A is just the string 'data
= %q{
with the value of …
}'data
filling the gap between the curly braces, so we can partially complete the
value of program
too:
data
=
%q{
program =
…
x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+
…
x
=
1
y
=
2
puts
x
+
y
Now all that’s missing from program
is the source code of parts B and C,
which is exactly what data
contains,
so we can append the value of data
to
program
to finish it off:
data
=
%q{
program =
…
x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+
data
x
=
1
y
=
2
puts
x
+
y
Finally, we can go back and fix up the value of data
to reflect what part B actually looks
like:
data
=
%q{
program =
"data = %q{#{data}}" + data
x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+
data
x
=
1
y
=
2
puts
x
+
y
And that’s it! This program does the same thing as the original, but now it has an extra
local variable containing its own source code—an admittedly hollow victory, since it doesn’t
actually do anything with that variable. So what if we convert a
program that expects a local variable called program
and
does something with it? Take the classic example:
puts
program
This is a program that is trying to print its own source code,[72] but it’s obviously going to fail in that form, because program
is an undefined variable. If we run it through the self-referencing
transformation we get this result:
data
=
%q{
program = "data = %q{#{data}}" + data
puts program
}
program
=
"data = %q{
#{
data
}
}"
+
data
puts
program
That’s a bit more interesting. Let’s see what this code does on the console:
>>
data
=
%q{
program = "data = %q{#{data}}" + data
puts program
}
=> " program = "data = %q{#{data}}" + data puts program "
>>
program
=
"data = %q{
#{
data
}
}"
+
data
=> "data = %q{ program = "data = %q{#{data}}" + data puts program }
program = "data = %q{#{data}}" + data puts program "
>>
puts
program
data = %q{
program = "data = %q{#{data}}" + data
puts program
}
program = "data = %q{#{data}}" + data
puts program
=> nil
Sure enough, the line puts
program
really does print out the source code of the whole
program.
Hopefully it’s clear that this transformation doesn’t depend on
any special properties of the program itself, so it will work for any
Ruby program, and the use of $stdin.read
or File.read(__FILE__)
to read a program’s own
source can therefore always be eliminated.[73] It also doesn’t depend on any special properties of Ruby
itself—just the ability to compute new values from old ones, like any
other universal system—which implies that any Turing machine can be
adapted to refer to its own encoding, any lambda calculus expression can
be extended to contain a lambda-calculus representation of its own
syntax, and so on.
So far we’ve seen that Turing machines have a lot of power and flexibility: they can execute arbitrary programs encoded as data, perform any algorithm we can think of, run for an unlimited amount of time, and calculate their own descriptions. And in spite of their simplicity, these little imaginary machines have turned out to be representative of universal systems in general.
If they’re so powerful and flexible, is there anything that Turing machines—and therefore real-world computers and programming languages—can’t do?
Before we can answer that question we need to make it a bit more precise. What kind of thing can we ask a Turing machine to do, and how can we tell whether it’s done it? Do we need to investigate every possible kind of problem, or is it enough to consider just some of them? Are we looking for problems whose solutions are merely beyond our current understanding, or problems that we already know we’ll never be able to solve?
We can narrow the scope of the question by focusing on decision
problems. A decision problem is any question with a yes or no answer, like “is 2
less than 3?” or “does the regular expression (a(|b))*
match the string 'abaab'
?” Decision problems are easier to
handle than function problems whose answer is a number or some other
non-Boolean value, like “what is the greatest common divisor of 18 and 12?,” but they’re still
interesting enough to be worth investigating.
A decision problem is decidable (or
computable) if there’s an algorithm that’s guaranteed to solve it in
a finite amount of time for any possible input. The Church–Turing thesis claims that every algorithm can be performed by a Turing machine, so for a
problem to be decidable, we have to be able to design a Turing machine that always produces
the correct answer and always halts if we let it run for long enough. It’s easy enough to
interpret a Turing machine’s final configuration as a “yes” or “no” answer: we could look to
see whether it’s written a Y
or N
character at the current tape location, for example, or we could ignore the
tape contents altogether and just check whether its final state is an accept (“yes”) or a
nonaccept (“no”) state.
All the decision problems we’ve seen in earlier chapters are decidable. Some of them, like “does this finite automaton accept this string?” and “does this regular expression match this string?,” are self-evidently decidable because we’ve written Ruby programs to solve them by simulating finite automata directly. Those programs could be laboriously translated into Turing machines given enough time and energy, and because their execution involves a finite number of steps—each step of a DFA simulation consumes a character of input, and the input has a finite number of characters—they are guaranteed to always halt with a yes-or-no answer, so the original problems qualify as decidable.
Other problems are a bit more subtle. “Does this pushdown automaton accept this string?” might appear to be undecidable, because we’ve seen that our direct simulation of a pushdown automaton in Ruby has the potential to loop forever and never produce an answer. However, there happens to be a way to calculate exactly how many simulation steps a particular pushdown automaton will need in order to accept or reject an input string of a given length,[74] so the problem is decidable after all: we just calculate the number of steps needed, run the simulation for that many steps, and then check whether or not the input has been accepted.
So, can we do this every time? Is there always a clever way to sneak around a problem and find a way to implement a machine, or a program, that is guaranteed to solve it in a finite amount of time?
Well, no, unfortunately not. There are many decision problems—infinitely many—and it turns out that a lot of them are undecidable: there is no guaranteed-to-halt algorithm for solving them. Each of these problems is undecidable not because we just haven’t found the right algorithm for it yet, but because the problem itself is fundamentally impossible to solve for some inputs, and we can even prove that no suitable algorithm will ever be found.
A lot of undecidable problems are concerned with the behavior of machines and programs during their execution. The most famous of these, the halting problem, is the task of deciding whether the execution of a particular Turing machine with a particular initial tape will ever halt. Thanks to universality, we can restate the same problem in more practical terms: given a string containing the source code of a Ruby program, and another string of data for that program to read from standard input, will running that program ultimately result in an answer or just an infinite loop?
It’s not obvious why the halting problem should be considered undecidable. After all, it’s easy to come up with individual programs for which the question is answerable. Here’s a program that will definitely halt, regardless of its input string:
input
=
$stdin
.
read
puts
input
.
upcase
We’ll assume that $stdin.read
always immediately
returns a value—namely, that every program’s standard input is finite and
nonblocking—because we’re interested in the internal behavior of the program, not its
interactions with the operating system.
Conversely, a small addition to the source code produces a program that will clearly never halt:
input
=
$stdin
.
read
while
true
# do nothing
end
puts
input
.
upcase
We can certainly write a halting checker to distinguish between only these two cases.
Just testing whether the program’s source code contains the string while true
is enough:
def
halts?
(
program
,
input
)
if
program
.
include?
(
'while true'
)
false
else
true
end
end
This implementation of #halts?
gives the right answers for the two example programs:
>>
always
=
"input = $stdin.read
puts input.upcase"
=> "input = $stdin.read puts input.upcase"
>>
halts?
(
always
,
'hello world'
)
=> true
>>
never
=
"input = $stdin.read
while true
# do nothing
end
puts input.upcase"
=> "input = $stdin.read while true # do nothing end puts input.upcase"
>>
halts?
(
never
,
'hello world'
)
=> false
But #halts?
is likely to be
wrong for other programs. For example, there are programs whose halting
behavior depends on the value of their input:
input
=
$stdin
.
read
if
input
.
include?
(
'goodbye'
)
while
true
# do nothing
end
else
puts
input
.
upcase
end
We can always extend our halting checker to cope with specific cases like this, since we know what to look for:
def
halts?
(
program
,
input
)
if
program
.
include?
(
'while true'
)
if
program
.
include?
(
'input.include?('goodbye')'
)
if
input
.
include?
(
'goodbye'
)
false
else
true
end
else
false
end
else
true
end
end
Now we have a checker that gives the correct answers for all three programs and any possible input strings:
>>
halts?
(
always
,
'hello world'
)
=> true
>>
halts?
(
never
,
'hello world'
)
=> false
>>
sometimes
=
"input = $stdin.read
if input.include?('goodbye')
while true
# do nothing
end
else
puts input.upcase
end"
=> "input = $stdin.read if input.include?('goodbye') while true # do nothing
end else puts input.upcase end"
>>
halts?
(
sometimes
,
'hello world'
)
=> true
>>
halts?
(
sometimes
,
'goodbye world'
)
=> false
We could go on like this indefinitely, adding more checks and more special cases to support an expanding repertoire of example programs, but we’d never get a solution to the full problem of deciding whether any program will halt. A brute-force implementation can be made more and more accurate, but it will always have blind spots; the simple-minded approach of looking for specific patterns of syntax can’t possibly scale to all programs.
To make #halts?
work in general, for any possible
program and input, seems like it would be difficult. If a program contains any loops at
all—whether explicit, like while
loops, or implicit, like
recursive method calls—then it has the potential to run forever, and sophisticated analysis
of the program’s meaning is required if we want to predict anything
about how it behaves for a given input. As humans, we can see immediately that this program
always halts:
input
=
$stdin
.
read
output
=
''
n
=
input
.
length
until
n
.
zero?
output
=
output
+
'*'
n
=
n
-
1
end
puts
output
But why does it always halt? Certainly not for any straightforward
syntactic reason. The explanation is that IO#read
always
returns a String
, and String#length
always returns a nonnegative Integer
, and repeatedly calling -(1)
on a
nonnegative Integer
always eventually produces an object
whose #zero?
method returns true
. This chain of reasoning is subtle and highly sensitive to small
modifications; if the n = n - 1
statement inside the loop
is changed to n = n - 2
, the program will only halt for
even-length inputs. A halting checker that knew all these facts about Ruby and numbers, as
well as how to connect facts together to make accurate decisions about this kind of program,
would need to be large and complex.
The fundamental difficulty is that it’s hard to predict what a program will do without
actually running it. It’s tempting to run the program with #evaluate
to see whether it halts, but that’s no good: if the program doesn’t halt, #evaluate
will run forever and we won’t get an answer from #halts?
no matter how long we wait. Any reliable
halting-detection algorithm must find a way to produce a definitive answer in a finite
amount of time just by inspecting and analyzing the text of the program, not by simply
running it and waiting.
Okay, so our intuition tells us that #halts?
would be
hard to implement correctly, but that doesn’t necessarily mean that the halting problem is
undecidable. There are plenty of difficult problems (e.g., writing #evaluate
) that turn out to be solvable given enough effort and ingenuity; if
the halting problem is undecidable, that means #halts?
is
impossible to write, not just extremely difficult.
How do we know that a proper implementation of #halts?
can’t possibly exist? If it’s just an
engineering problem, why can’t we throw lots of programmers at it and
eventually come up with a solution?
Let’s pretend temporarily that the halting problem is decidable. In this imaginary
world, it’s possible to write a full implementation of #halts?
, so a call to halts?(program,
input)
always comes back with a true
or
false
answer for any program
and input
, and that answer always
correctly predicts whether program
would halt if it was
run with input
on standard input. The rough structure
of the #halts?
method might be something like
this:
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
If we can write #halts?
, then we can construct
does_it_halt.rb, a program that reads another
program as input and prints yes
or no
depending on whether that program halts when it reads the
empty string:[75]
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
def
halts_on_empty?
(
program
)
halts?
(
program
,
''
)
end
program
=
$stdin
.
read
if
halts_on_empty?
(
program
)
'yes'
else
'no'
end
Once we have does_it_halt.rb, we can use it to solve very difficult problems. Consider this famous claim made by Christian Goldbach in 1742:
Every even integer greater than 2 can be written as the sum of two primes.
This is the Goldbach conjecture, and it’s famous because nobody has been able to prove whether it’s true or false. The evidence suggests that it’s true, because an even number picked at random can invariably be broken apart into two primes—12 = 5 + 7, 34 = 3 + 31, 567890 = 7 + 567883 and so forth—and computers have been used to check that this can be done for all even numbers between four and four quintillion (4,000,000,000,000,000,000). But there are infinitely many even numbers, so no computer can check them all, and there’s no known proof that every even number must break apart in this way. There is a possibility, however small, that some very large even number is not the sum of two primes.
Proving the Goldbach conjecture is one of the holy grails of number theory; in the year 2000, the publisher Faber and Faber offered a million-dollar prize to anyone who could produce a proof. But wait: we already have the tools to discover whether the conjecture is true! We just need to write a program that searches for a counterexample:
require
'prime'
def
primes_less_than
(
n
)
Prime
.
each
(
n
-
1
)
.
entries
end
def
sum_of_two_primes?
(
n
)
primes
=
primes_less_than
(
n
)
primes
.
any?
{
|
a
|
primes
.
any?
{
|
b
|
a
+
b
==
n
}
}
end
n
=
4
while
sum_of_two_primes?
(
n
)
n
=
n
+
2
end
n
This establishes a connection between the truth of the Goldbach
conjecture and the halting behavior of a program. If the conjecture is
true, this program will never find a counterexample no matter how high
it counts, so it will loop forever; if the conjecture’s false,
n
will eventually be assigned the
value of an even number that isn’t the sum of two primes, and the
program will halt. So we just have to save it as goldbach.rb and run ruby does_it_halt.rb < goldbach.rb to
find out whether it’s a halting program, and that will tell us whether
the Goldbach conjecture is true. A million dollars is ours![76]
Well, obviously this is too good to be true. To write a program that can accurately predict the behavior of goldbach.rb would require a proficiency in number theory beyond our current understanding. Mathematicians have been working for hundreds of years to try to prove or disprove the Goldbach conjecture; it’s unlikely that a bunch of avaricious software engineers could construct a Ruby program that can miraculously solve not only this problem but any unsolved mathematical conjecture that can be expressed as a looping program.
So far we’ve seen strong evidence that the halting problem is undecidable, but not conclusive proof. Our intuition may say it’s unlikely that we can prove or disprove the Goldbach conjecture just by turning it into a program, but computation can be pretty counterintuitive at times, so we shouldn’t allow ourselves to be convinced solely by arguments about how improbable something is. If the halting problem really is undecidable, as opposed to simply very difficult, we should be able to prove it.
Here’s why #halts?
can never
work. If it did work, we’d be able to construct a
new method #halts_on_itself?
that
calls #halts?
to determine what a
program does when run with its own source code as input:[77]
def
halts_on_itself?
(
program
)
halts?
(
program
,
program
)
end
Like #halts?
, the #halts_on_itself?
method will always finish and return a Boolean value:
true
if program
halts when run with itself as input, false
if it loops
forever.
Given working implementations of #halts?
and #halts_on_itself?
, we can write a program
called do_the_opposite.rb:
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
def
halts_on_itself?
(
program
)
halts?
(
program
,
program
)
end
program
=
$stdin
.
read
if
halts_on_itself?
(
program
)
while
true
# do nothing
end
end
This code reads program
from
standard input, finds out whether it would halt if run on itself, and
promptly does the exact opposite: if program
would halt, do_the_opposite.rb loops forever; if
program
would loop forever,
do_the_opposite.rb halts.
Now, what does ruby do_the_opposite.rb < do_the_opposite.rb do?[78] Just as we saw earlier with does_it_say_no.rb, this question creates an inescapable contradiction.
#halts_on_itself?
must return
either true
or false
when given the source of do_the_opposite.rb as an argument. If it
returns true
to indicate a halting
program, then ruby do_the_opposite.rb <
do_the_opposite.rb will loop forever, which means #halts_on_itself?
was wrong about what would
happen. On the other hand, if #halts_on_itself?
returns false
, that’ll make do_the_opposite.rb immediately halt, again
contradicting #halts_on_itself?
’s
prediction.
It’s wrong to pick on #halts_on_itself?
here—it’s just an innocent
one-liner that delegates to #halts?
and relays its answer. What we’ve really shown is that it’s not
possible for #halts?
to return a
satisfactory answer when called with do_the_opposite.rb as both program
and input
arguments; no matter how hard it
works, any result it produces will automatically be wrong. That means
there are only two possible fates for any real implementation of
#halts?
:
It sometimes gives the wrong answer, e.g., predicting that do_the_opposite.rb will loop forever even though it halts (or vice versa).
It sometimes loops forever and never returns any answer, just like #evaluate
does in ruby
does_it_say_no.rb < does_it_say_no.rb.
So a fully correct implementation of #halts?
can never exist: there must be
inputs for which it makes either an incorrect prediction or no
prediction at all.
Remember the definition of decidability:
A decision problem is decidable if there’s an algorithm that’s guaranteed to solve it in a finite amount of time for any possible input.
We’ve proved it’s impossible to write a Ruby program that completely solves the halting problem, and since Ruby programs are equivalent in power to Turing machines, it must be impossible with a Turing machine too. The Church–Turing thesis says that every algorithm can be performed by a Turing machine, so if there’s no Turing machine for solving the halting problem, there’s no algorithm either; in other words, the halting problem is undecidable.
It’s discouraging that there’s an easily defined problem (“does this program
halt?”) that computers can’t reliably solve. However, that specific
problem is relatively abstract, and the do_the_opposite.rb program we used to
illustrate it is impractical and contrived; it doesn’t seem likely that
we’ll ever want to actually implement #halts?
, or write a program like do_the_opposite.rb, as part of a real-world
application. Perhaps we can disregard undecidability as an academic
curiosity and get on with our lives.
Unfortunately it’s not that simple, because the halting problem is not the only undecidable problem. There are plenty of others that we might realistically want to solve in our day-to-day work of building software, and their undecidability has real consequences for the practical limitations of automated tools and processes.
Let’s look at a toy example. Suppose we’ve been given the job of developing a Ruby program
to print the string 'hello world'
. That sounds simple
enough, but in our capacity as inveterate procrastinators[79] we’re also going to develop an automated tool that can reliably decide whether or
not a particular program prints hello world
when supplied
with a particular input.[80] Armed with this tool, we can analyze our final program and check that it does what
it’s supposed to.
Now, imagine we succeed in developing a method #prints_hello_world?
that can correctly make
that decision about any program. Omitting implementation details, the
method has this general form:
def
prints_hello_world?
(
program
,
input
)
# parse program
# analyze program
# return true if program prints "hello world", false if not
end
Once we’ve finished writing our original program, we can use
#prints_hello_world?
to verify that it
does the right thing; if it does, we’ll check it into source control,
email the boss, and everyone’s happy. But the situation is even better
than that, because we can also use #prints_hello_world?
to implement another
interesting method:
def
halts?
(
program
,
input
)
hello_world_program
=
%Q{
program =
#{
program
.
inspect
}
input = $stdin.read
evaluate(program, input) # evaluate program, ignoring its output
print 'hello world'
}
prints_hello_world?
(
hello_world_program
,
input
)
end
The %Q
syntax quotes a string
in the same way as %q
and then
performs interpolation, so #{program.inspect}
gets replaced with a Ruby
string literal containing the value of program
.
Our new version of #halts?
works
by constructing a special program, hello_world_program
, which does two main
things:
Evaluates program
with input
available on its standard input
Prints hello world
hello_world_program
is constructed so that its
execution has only two possible outcomes: either evaluate(program,
input)
will finish successfully, in which case hello
world
will be printed, or evaluate(program,
input)
will loop forever and there’ll be no output at all.
This special program is fed into #prints_hello_world?
to find out which of those two outcomes will happen. If #prints_hello_world?
returns true
, that means
evaluate(program, input)
will eventually finish and allow
hello world
to be printed, so #halts?
returns true
to indicate that program
halts on input
. If
#prints_hello_world?
instead returns false
, that must be because hello_world_program
will never reach its final line, so #halts?
returns false
to say that evaluate(program, input)
loops forever.
Our new implementation of #halts?
shows that the
halting problem is reducible to the problem of checking whether a
program prints hello world
. In other words, any algorithm
that computes #prints_hello_world?
can be adapted to make
an algorithm that computes #halts?
.
We already know that a working #halts?
can’t exist, so
the obvious conclusion is that a complete implementation of #prints_hello_world?
can’t exist either. And if it’s impossible to implement, the
Church–Turing thesis says there’s no algorithm for it, so “does this program print hello world
?” is another undecidable problem.
In reality, nobody cares about automatically detecting whether a program prints a
particular string, but the structure of this undecidability proof points to something larger
and more general. We only had to construct a program that exhibits the “prints hello world
” property whenever some other program halts, and that
was enough to show undecidability. What stops us from reusing this argument for
any property of program behavior, including properties that we actually
do care about?
Well, nothing does. This is Rice’s theorem: any nontrivial property of program behavior is undecidable, because the halting problem can always be reduced to the problem of deciding whether that property is true; if we could invent an algorithm for deciding that property, we’d be able to use it to build another algorithm that decides the halting problem, and that’s impossible.
Roughly speaking, a “nontrivial property” is a claim about what a
program does, not how it does it. For example, Rice’s theorem doesn’t
apply to a purely syntactic property like “does this program’s source code contain the
string 'reverse'
?,” because that’s an incidental
implementation detail that can be refactored away without changing the program’s externally
visible behavior. On the other hand, a semantic property like “does this program output the
reverse of its input?” is within the scope of Rice’s theorem and therefore
undecidable.
Rice’s theorem tells us there are a huge number of undecidable problems that are concerned with what a program will do when it’s executed.
Undecidability is an inconvenient fact of life. The halting problem is disappointing, because it shows we can’t have everything: we want the unrestricted power of a universal programming language, but we also want to write programs that produce a result without getting stuck in an infinite loop, or at least programs whose subroutines halt as part of some larger long-running task (see Very Long-Running Computations).
This disappointment is bleakly summarized in a classic paper from 2004:
There is a dichotomy in language design, because of the halting problem. For our programming discipline we are forced to choose between
Security—a language in which all programs are known to terminate.
Universality—a language in which we can write
all terminating programs
silly programs which fail to terminate
and, given an arbitrary program we cannot in general say if it is (i) or (ii).
Five decades ago, at the beginning of electronic computing, we chose (B).
—David Turner, Total Functional Programming
Yes, we’d all like to avoid writing silly programs, but that’s just tough luck. There’s no way to tell whether an arbitrary program is silly, so we can’t completely avoid writing them without sacrificing universality.[81]
The implications of Rice’s theorem are depressing too: not only is the question “does this program halt?”
undecidable, but so is “does this program do what I want it to do?” We live in a universe
where there’s no way to build a machine that can accurately predict whether a program will
print hello world
, calculate a particular mathematical
function, or make a particular operating system call, and that’s just the way it is.
That’s frustrating, because it would be really useful to be able to check program properties mechanically; the reliability of modern software would be improved by a tool that decides whether a program conforms to its specification or contains any bugs. Those properties might be mechanically checkable for individual programs, but unless they’re checkable in general, then we’ll never be able to trust machines to do the job for us.
For example, say we invent a new software platform and decide to make money by selling compatible programs from an online shop—an “application superstore,” if you like—on behalf of our platform’s third-party developers. We want our customers to be able to shop with confidence, so we decide to only sell programs that meet certain criteria: they must not crash, they must not call private APIs, and they must not execute arbitrary code downloaded from the Internet.
When thousands of developers start submitting programs to us, how do we review each one to see whether it falls within our guidelines? It would save a lot of time and money if we could use an automated system to check every submission for compliance, but thanks to undecidability, it’s not possible to build a system that does the job accurately. We have no choice but to hire a small army of human beings to manually test those programs by running them, disassembling them, and instrumenting the operating system to profile their dynamic behavior.
Manual review is slow, expensive, and error-prone, with the added drawback that each program can only be run for a short amount of time, providing a limited snapshot of its dynamic behavior. So even if nobody makes a mistake, occasionally something undesirable will slip through the net and we’ll have a load of angry customers on our hands. Thanks a lot, undecidability.
Beneath all this inconvenience are two fundamental problems. The first is that we don’t have the power to look into the future and see what will happen when a program is executed; the only general way to find out what a program does is to run it for real. While some programs are simple enough to have behavior that’s straightforwardly predictable, a universal language will always permit programs whose behavior can’t be predicted just by analyzing their source code.[82]
The second problem is that, when we do decide to run a program, there’s no reliable way to know how long it will take to finish. The only general solution is to run it and wait, but since we know that programs in a universal language can loop forever without halting, there will always be some programs for which no finite amount of waiting is long enough.
In this chapter, we’ve seen that all universal systems are powerful enough to refer to themselves. Programs operate on numbers, numbers can represent strings, and strings are how the instructions of a program are written down, so programs are perfectly capable of operating on their own source code.
This capacity for self-reference makes it impossible to write a program that can reliably predict program behavior. Once a particular behavior-checking program has been written, we are always able to construct a larger program that defeats it: the new program incorporates the checker as a subroutine, checks its own source code, and then immediately does the exact opposite of whatever the checker said it would do. These self-contradictory programs are curiosities rather than something we’d ever write in practice, but they’re just a symptom, not the cause, of the underlying problem: in general, program behavior is too powerful to be accurately predicted.
Human language has similar power and similar problems. “This sentence is a lie” (the liar paradox) is an English sentence that can’t be true or false. But the liar paradox depends on the special self-referential word “this”; as we saw in Programs Can Refer to Themselves, any computer program can be made self-referential by construction, without requiring any special language features.
When it comes down to it, there are two basic reasons why the behavior of programs is so hard to predict:
Any system with enough power to be self-referential can’t correctly answer every question about itself.[83] We will always be able to construct a program like do_the_opposite.rb whose behavior can’t be predicted by the system. To avoid this problem, we need to step outside the self-referential system and use a different, more powerful system to answer questions about it.
But in the case of universal programming languages, there is no more powerful system for us to upgrade to. The Church–Turing thesis tells us that any usable algorithm we invent for making predictions about the behavior of programs can itself be performed by a program, so we’re stuck with the capabilities of universal systems.
The whole point of writing a program is to get a computer to do something useful. As programmers, how are we supposed to cope with a world where checking that a program works properly is an unsolvable problem?
Denial is a tempting response: Ignore the whole issue. It would be nice if we could automatically verify program behavior, but we can’t, so let’s just hope for the best and never try to check that a program does its job correctly.
But that would be an overreaction, because the situation isn’t as bad as it sounds. Rice’s theorem doesn’t mean that program analysis is impossible, just that we can’t write a nontrivial analyzer that will always halt and produce the right answer. As we saw in Building a Halting Checker, there’s nothing to stop us from writing a tool that gives the right answer for some programs, as long as we can tolerate the fact that there will always be other programs for which it either gives the wrong answer or loops forever without returning anything.
Here are some practical ways of analyzing and predicting program behavior, in spite of undecidability:
Ask undecidable questions, but give up if an answer can’t be found. For example, to check whether a program prints a particular string, we can run it and wait; if it doesn’t print that string within a particular period of time, say 10 seconds, we just terminate the program and assume it’s no good. We might accidentally throw out a program that produces the expected output after 11 seconds, but in many cases that’s an acceptable risk, especially since slow programs can be undesirable in their own right.
Ask several small questions whose answers, when taken together, provide empirical evidence for the answer to a larger question. When performing automated acceptance testing, we’re usually not able to check that a program does the right thing for every possible input, but we can try running it for a limited number of example inputs to see what happens. Each test run gives us information about how the program behaves in that specific case, and we can use that information to become more confident of how it’s likely to behave in general. This leaves open the possibility that there are other untested inputs that cause wildly different behavior, but we can live with that as long as our test cases do a good job of representing most kinds of realistic input.
Another example of this approach is the use of unit tests to verify the behavior of small pieces of a program individually rather than trying to verify the program as a whole. A well-isolated unit test concentrates on checking the properties of a simple unit of code, and makes assumptions about other parts of the program by representing them with test doubles (i.e., stubs and mocks). Individual unit tests that exercise small pieces of well-understood code can be simple and fast, minimizing the danger that any one test will run forever or give a misleading answer.
By unit testing all the program’s pieces in this way, we can set up a chain of assumptions and implications that resembles a mathematical proof: “if piece A works, then piece B works, and if piece B works, then piece C works.” Deciding whether all of these assumptions are justified is the responsibility of human reasoning rather than automated verification, although integration and acceptance testing can improve our confidence that the entire system does what it’s supposed to do.
Ask decidable questions by being conservative where necessary. The above suggestions involve actually running parts of a program to see what happens, which always introduces the risk of hitting an infinite loop, but there are useful questions that can be answered just by inspecting a program’s source code statically. The most obvious example is “does this program contain any syntax errors?,” but we can answer more interesting questions if we’re prepared to accept safe approximations in cases where the real answer is undecidable.
A common analysis is to look through a program’s source to see if it contains
dead code that computes values that are never used, or
unreachable code that never gets evaluated. We can’t always tell whether code is truly dead or
unreachable, in which case we have to be conservative and assume it isn’t, but there are
cases where it’s obvious: in some languages, we know that an assignment to a local
variable that’s never mentioned again is definitely dead, and that a statement that
immediately follows a return
is definitely unreachable.[84] An optimizing compiler like GCC uses these techniques to identify and eliminate unnecessary code, making
programs smaller and faster without affecting their behavior.
Approximate a program by converting it into something simpler, then ask decidable questions about the approximation. This important idea is the subject of the next chapter.
[64] The smallest value x
and y
can reach is 1
, so they’ll meet there
if all else fails.
[65] Ruby already has a built-in version of Euclid’s algorithm,
Integer#gcd
, but that’s beside
the point.
[66] Lisp is really a family of programming languages—including Common Lisp, Scheme, and Clojure—that share very similar syntax.
[67] It would be more useful to only assign numbers to the syntactically valid Ruby programs, but doing that is more complicated.
[68] Most of those numbers won’t represent syntactically valid Ruby programs, but we can feed each potential program to the Ruby parser and reject it if it has any syntax errors.
[69] We’re using Unix shell syntax here. On Windows, it’s necessary to omit the single
quotes around the argument to echo
, or to put the
text in a file and feed it to ruby
with the <
input redirection operator.
[70] This is the shell command to run does_it_say_no.rb with its own source code as input.
[71] We can only get away with this because parts B and C happen
not to contain any difficult characters like backslashes or
unbalanced curly brackets. If they did, we’d have to escape them
somehow and then undo that escaping as part of assembling the
value of program
.
[72] Douglas Hofstadter coined the name quine for a program that prints itself.
[73] Can you resist the urge to write a Ruby program that can
perform this transformation on any Ruby program? If you use %q{}
to quote data
’s value, how will you handle
backslashes and unbalanced curly brackets in the original
source?
[74] Briefly, for the curious: every pushdown automaton has an equivalent context-free
grammar and vice versa; any CFG can be rewritten in Chomsky normal
form; and any CFG in that form must take exactly 2n
− 1
steps to generate a string of length n
. So we can turn the original PDA
into a CFG, rewrite the CFG into Chomsky normal form, and then turn that CFG back into a
PDA. The resulting pushdown automaton recognizes the same language as the original, but
now we know exactly how many steps it’ll take to do it.
[75] The choice of the empty string is unimportant; it’s just an arbitrary fixed input.
The plan is to run does_it_halt.rb on
self-contained programs that don’t read anything from standard input, so it doesn’t
matter what input
is.
[76] Faber’s million-dollar prize expired in 2002, but anyone who produced a proof today would still be in for some serious fortune and glory on the rock star mathematician circuit.
[77] This is reminiscent of #evaluate_on_itself
from Universal Systems Can Loop Forever, with #halts?
in place of #evaluate
.
[78] Or equivalently: what does #halts_on_itself?
return if we call it
with do_the_opposite.rb’s
source code as its argument?
[79] Surely “responsible software engineering professionals”?
[80] Again, that input might be irrelevant if the program doesn’t actually read anything
from $stdin
, but we’re including it for the sake of
completeness and consistency.
[81] Total programming languages are a potential solution to this problem, but so far they haven’t taken off, perhaps because they can be more difficult to understand than conventional languages.
[82] Stephen Wolfram coined the name computational irreducibility for the idea that a program’s behavior can’t be predicted without running it.
[83] This is roughly the content of Gödel’s first incompleteness theorem.
[84] The Java language specification requires the compiler to reject any program that contains unreachable code. See http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.21 for a lengthy explanation of how a Java compiler is meant to decide which parts of a program are potentially reachable without running any of it.
18.223.107.85