© Hillel Wayne 2018
Hillel WaynePractical TLA+https://doi.org/10.1007/978-1-4842-3829-5_7

7. Algorithms

Hillel Wayne1 
(1)
Chicago, Illinois, USA
 

One of the benefits of TLA+ being a specification language is that operators can be far more expressive and powerful than program functions can be. This is also a drawback: if your spec uses a “too powerful” operator, you cannot directly translate it to code. Usually this is fine: if you’re specifying a large system, you probably aren’t worrying that your sort function is correct.

If you’re directly writing a new sorting algorithm, though, you want to specify it. This chapter is about how we can write and verify algorithms with TLA+. While we will be implementing them, our focus is on the verification, not the implementation.

By “algorithm,” we’re assuming that algorithms are code intended to terminate and produce an output, rather than run forever or continuously interact with its environment.

Single-Process Algorithms

Most single-process algorithm specifications follow a template:
---- MODULE name ----
EXTENDS * whatever
Expected(input) == * ...
Helpers == * ...
(*--algorithm name
variables
  input in * ...
  output; * ...
  * helper variables
begin
  * algorithm implementation
  assert output = Expected(input);
end algorithm; *)
====

Expected is what we’re actually trying to implement: it takes some input and returns the value our algorithm should, if correct, output. Helpers are anything that the algorithm will use that is outside of our verification scope. For example, if we were specifying some code for Python, we might make a Sort operator, as Python would give us sorted() by default.

For the PlusCal algorithm , we want to specify it works for a given range of inputs, and we will store the return value in output. Here we aren’t defining an initial value for output, since that’s something the algorithm would have to assign. TLC will create the constant DefaultInitValue <- [model constant] and initialize output to that. We also place any helper variables here, as we can’t define new variables in the middle of an algorithm (barring use of with). In the body, we write our implementation of the algorithm. Finally, we make sure that whatever output value we get matches our expectation.

Of course, this is just a starting guideline, not a strict rule. If our algorithm is complex, we might add procedures and macros to break it into parts. We might add assert statements as guards or sanity checks. Or we might want to add a global invariant to hold at every step of the spec, like we do with our larger systems.

Here’s what this might look like, all filled out:
EXTENDS Integers, TLC
add(a, b) == a + b
(*--algorithm add
variables
  in_a in -5..5,
  in_b in -5..5,
  output;
begin
  output := in_a + in_b;
  assert output = add(in_a, in_b);
end algorithm; *)

Let’s do some examples .

Max

Given a sequence of numbers, return the largest element of that sequence. For example, max(<<1, 1, 2, -1>>) = 2.

First of all, we need our expected operator. We know that for a set, we can get the maximum with CHOOSE x in set: A y in set: y <= x. The maximum of a sequence would just be the maximum of its range. Putting those together:
EXTENDS Sequences
Max(seq) ==
  LET set == {seq[i]: i in 1..Len(seq)}
  IN CHOOSE x in set: A y in set: y <= x
We could also find the index that gives us the largest number, and then return the number at that index. It’s some duplicated effort, but some people might find it clearer.
Max(seq) ==
  LET index ==
    CHOOSE x in 1..Len(seq):
      A y in 1..Len(seq): seq[y] <= seq[x]
  IN seq[index]
Either way, here’s a full version of the algorithm:
EXTENDS Integers, Sequences, TLC
CONSTANTS IntSet, MaxSeqLen
ASSUME IntSet subseteq Int
ASSUME MaxSeqLen > 0
PT == INSTANCE PT
Max(seq) ==
  LET set == {seq[i]: i in 1..Len(seq)}
  IN CHOOSE x in set: A y in set: y <= x
AllInputs == PT!SeqOf(IntSet, MaxSeqLen)
(*--algorithm max
variables seq in AllInputs, i = 1, max;
begin
  max := seq[1];
  while i <= Len(seq) do
    if max < seq[i] then
      max := seq[i];
    end if;
    i := i + 1;
  end while;
  assert max = Max(seq);
end algorithm; *)
While AllInputs is “too powerful” to use in our algorithm, we only use it to generate inputs and not implement the algorithm itself. Set
defaultInitValue <- [ model value ]
IntSet <- -5..5
MaxSeqLen <- 5

This fails. Looking at the error, it tried to calculate <<>>[1], which is undefined. This is, incidentally, a reason why we don’t assign output in the variable. Try replacing the definition of max with max = seq[1] and comparing the two error outputs.

We can make this precondition explicit by adding assert Len(seq) > 0 to the beginning of the algorithm. That tells the reader that this implementation is only valid if you pass in a nonempty list. After that, it is fine for us to remove the empty sequence from our possible initial states, as we made it explicit that <<>> is a bad value. This means we will also remove {<<>>} from AllInputs.
AllInputs == PT!SeqOf(IntSet, MaxSeqLen) {<<>>}
(*--algorithm max
variables seq in AllInputs, i = 1, max;
begin
  assert Len(seq) > 0;
  max := seq[1];

DEFINITION OVERRIDES

Some people don’t like the AllInputs. You could argue that we’re saying the algorithm is only specified for sequences of at most five elements! It’d be better to say the input can be any sequence of integers: in Seq(Int).

But Int is an infinite set of numbers and Seq(Int) is an infinite set of sequences. TLC can’t enumerate that. If we want to write the spec this way, we need to tell TLC to use a different value when checking the model. We might, say, have it replace Int with 1..5. This is called a definition override . See Chapter 4 for more information on using definition overrides.

This should pass (1,576,685 states), meaning our implementation of “find max value” is correct, at least for the parameters we tested.

Leftpad

Given a character, a length, and a string, return a string padded on the left with that character to length n. If n is less than the length of the string, output the original string. For example, Leftpad(" ", 5, "foo") = " foo" , while Leftpad(" ", 1, "foo") = "foo".

Leftpad is a popular milestone in learning a theorem prover. It’s a simple algorithm with a surprisingly complex complete specification. For the TLA+ version, we will use a sequence of characters instead of a string, since that works somewhat better with the TLA+ operators.

Call the operator Leftpad(c, n, str). The complete specification is the following:
  1. 1.

    Len(Leftpad(c, n, str)) = Max(n, Len(str)).

     
  2. 2.

    The suffix of the output matches str.

     
  3. 3.

    All of the characters before str are c.

     
In other words, some number of characters c prepended to str, such that the final length is n.
Leftpad(c, n, str) ==
  LET
    outlength == PT!Max(Len(str), n)
    padlength ==
      CHOOSE padlength in 0..n:
        padlength + Len(str) = outlength
  IN
    [x in 1..padlength |-> c] o str
>> Leftpad(" ", 1, <<"f", "o", "o">>)
<<"f", "o", "o">>
>> Leftpad(" ", 5, <<"f", "o", "o">>)
<<" ", " ", "f", "o", "o">>
Since we can pad with any character, the state space explodes very quickly. For optimization reasons we should not test this with all possible alphanumeric characters. Rather, we should choose some restricted subset for both c and str.
Characters == {"a", "b", "c"}
(*--algorithm leftpad
variables
  in_c in Characters union {" "},
  in_n in 0..6,
  in_str in PT!SeqOf(Characters, 6),
  output;
begin
  output := in_str;
  while Len(output) < in_n do
    output := <<in_c>> o output;
  end while;
  assert output = Leftpad(in_c, in_n, in_str);
end algorithm; *)

This passes with 125,632 states. Try adding errors to see that TLC catches them. What happens when we replace Len(output) < in_n with Len(output) <= in_n?

One odd case is if we replace in_n in -1..6. The error is that there is no padding that satisfies leftpad. This is because 0..-1 is the empty set, so padlength is undefined in Leftpad. This means either our definition is wrong, because it doesn’t define what it means to pad with a negative number; or the spec is wrong, because we’re not supposed to be able to pad with a negative number. In other words, does Leftpad take any integer, or only nonnegative integers?

The integer case is simple enough. We just have to expand the definition of Leftpad to be str for n < 0.
Leftpad(c, n, str) ==
  IF n < 0 THEN str ELSE
  LET
    outlength == PT!Max(Len(str), n)
    padlength ==
      CHOOSE padlength in 0..n:
        padlength + Len(str) = outlength
  IN
    [x in 1..padlength |-> c] o str
If Leftpad is supposed to take nonnegative integers, then it’s correct and our spec is wrong. As with max, we need to add a precondition.
(*--algorithm leftpad
variables
  in_c in Characters union {" "},
  in_n in 0..6,
  in_str in PT!SeqOf(Characters, 6),
  output;
begin
  assert in_n >= 0;
  output := in_str;
  while Len(output) < in_n do
    output := <<in_c>> o output;
  end while;
  assert output = Leftpad(in_c, in_n, in_str);
end algorithm; *)

Properties of Algorithms

Verifying correctness is easy enough: just run the spec and confirm you have the right result at the end. Verifying other properties like performance characteristics or bounds are more difficult. We can sometimes handle this by adding auxiliary variables and asserting their values at the end.

Let’s take binary search. A correct implementation of binary search will take approximately log2(n) comparisons. Can we verify an algorithm does that?

First, let’s write a “binary search.” The only additional operator we need for a binary search is the set of all ordered sequences. We can get these by taking PT!SeqOf and filtering out all of the unordered ones.
OrderedSeqOf(set, n) ==
  { seq in PT!SeqOf(set, n):
    A x in 2..Len(seq):
      seq[x] >= seq[x-1] }
Putting it all together :
MaxInt == 4
Range(f) == {f[x]: x in DOMAIN f}
(*--algorithm definitely_binary_search
variables i = 1,
          seq in OrderedSeqOf(1..MaxInt, MaxInt),
          target in 1..MaxInt,
          found_index = 0;
begin
  Search:
    while i <= Len(seq) do
      if seq[i] = target then
        found_index := i;
        goto Result;
      else
        i := i + 1;
      end if;
    end while;
  Result:
    if target in Range(seq) then
      assert seq[found_index] = target;
    else
      * 0 is out of DOMAIN seq, so can represent "not found"
      assert found_index = 0;
    end if;
end algorithm; *)

Definitely a binary search! It works (1,666 states), it always gets the correct result, so it’s binary search, no questions asked.

Okay, maybe one question: binary search has a worst-case of O(log(n)), while this looks like a worst-case of O(n). While we can’t compute the exact runtime, we can count the number of times we iterate in the while loop and use that as a rough measure of runtime complexity. Instead of defining Log, let’s go the other way: if we take the number of loop iterations and exponent it, it should be under the length of the sequence. We can define Pow2 in a similar way to how we defined factorial back in Chapter 3, by defining a recursive function over the set 0..n.
Pow2(n) ==
  LET f[x in 0..n] ==
    IF x = 0
    THEN 1
    ELSE 2*f[x-1]
  IN f[n]
>> {Pow2(x): x in 0..5}
{1, 2, 4, 8, 16, 32}

Note

As mentioned back in Chapter 3, we could also make a generalized exponent function as a binary operator, defined as a ** n and writing a*f[x-1]. For simplicity, we’re not doing it here.

Our complexity assertion then becomes that for some iteration counter counter, Pow2(counter) <= Len(seq). In practice, though, we need to subtract one from counter before exponentiating it. To see why, a list of one element should require at most one iteration (if the single element matches target, we’re done), or 20. For two and three elements, we need two checks (21), while for four elements, we need at most three. However, 23 = 8, so Pow2(3) = 8 > 4 = Len(seq). If we subtract one, the invariant holds (23 − 1 = 4). Similarly, for 10 elements, we should need four iterations, and 24 − 1 < 10 < 24.This doesn’t change the complexity, though, as $$ {2}^{n-1}=frac{1}{2}{2}^n $$, and we can ignore constants when determining algorithmic complexity.
variables i = 1,
          seq in OrderedSeqOf(1..MaxInt, MaxInt),
          target in 1..MaxInt,
          found_index = 0,
          counter = 0;
Search:
  while i <= Len(seq) do
    counter := counter + 1;
    if seq[i] = target then
      found_index := m;
      goto Result;
    end if;
    i := i + 1
  end while;
Result:
  if Len(seq) > 0 then
    assert Pow2(counter-1) <= Len(seq);
  end if;
  if target in PT!Range(seq) then
    assert seq[found_index] = target;
  else
    assert found_index = 0;
  end if;
Now this fails, as our “binary search” is too inefficient. By contrast, this is a real binary search :
(*--algorithm binary_search
variables low = 1,
          seq in OrderedSeqOf(1..MaxInt, MaxInt),
          high = Len(seq),
          target in 1..MaxInt,
          found_index = 0,
          counter = 0;
begin
Search:
  while low <= high do
    counter := counter + 1;
    with
      m = (high + low) div 2
    do
        if seq[m] = target then
            found_index := m;
            goto Result;
        elsif seq[m] < target then
            low := m + 1;
        else
            high := m - 1;
        end if;
    end with;
end while;
  Result:
    if Len(seq) > 0 then
      assert Pow2(counter-1) <= Len(seq);
    end if;
    if target in Range(seq) then
      assert seq[found_index] = target;
    else
      assert found_index = 0;
    end if;
end algorithm; *)

This passes (1483 states). Try again with MaxInt == 7, which also passes (141,675 states). Testing on higher values of MaxInt require us to modify Advanced Options > Cardinality of Largest Enumerable Set in our model, so let’s avoid that. We’ve demoed how to test asymptotic complexity for a worst-case scenario. Testing average and best-case complexity is outside the scope of what we can easily do with TLA+, unfortunately, and you should start reaching for another tool.

Sharp readers might have noticed a subtle bug in our impementation of binary search. While it works as an abstract algorithm, low + high might overflow the integer value on a machine. To see this, let’s save that computation and assert it’s under MaxInt:
while low <= high do
    counter := counter + 1;
    with
      lh = low + high,
      m = lh div 2
    do
      assert lh <= MaxInt;
      if seq[m] = target then
         found_index := m;
         goto Result;
This fails, as if the sequence has MaxInt elements low + high = MaxInt + 1. This bug was first discovered in 2006,1 years after we “proved” Binary Search correct.2 The proposed fix is to instead write
    with
      lh = high - low,
      m = high - (lh div 2)
    do
which no longer overflows and still has the same performance and correctness characteristics. If you’re writing specs in a context where this might be the case, then you’d ideally want to make an invariant that all operations don’t make a variable overflow. We could do that here by making m and lh global variables and then adding an invariant on all variables:
NoOverflows ==
  A x in {m, lh, low, high}:
    x <= MaxInt

Multiprocess Algorithm

Multiprocess algorithms are similar to single-process algorithms, except that we want to check our assertion when all of the processes terminate. Instead of hard-coding an assertion in, we should encode it as a liveness requirement. This means using the “eventually always” (<>[]) operator, which checks that the algorithm ends with a certain thing being true.

Remember to use fair processes if you don’t want to simulate your algorithm crashing midway through.
EXTENDS Integers, Sequences, TLC
(*--algorithm counter_incrementer
variables
  counter = 0,
  goal = 3;
define
  Success == <>[](counter = goal)
end define;
fair process incrementer in 1..3
variable local = 0
begin
  Get:
    local := counter;
  Increment:
    counter := local + 1;
end process;
end algorithm; *)

This, unsurprisingly, fails, as our processes can increment based off stale memory. If we merge the two labels into one label, this succeeds with 22 states.

Summary

We verified single-process algorithms were correct and some additional nonfunctional properties about them, such as their worst-case performance and that they didn’t overflow our computer’s maximum value. We also briefly summarized how to extend these ideas to multiprocess algorithms, using temporal properties instead of bare assertions.

Many algorithms are defined for specific data structures. And many specs for systems are designed assuming you have your data organized in a specific way. In the next chapter, we will show how to write reusable data structures for algorithms and specifications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.79