© Hillel Wayne 2018
Hillel WaynePractical TLA+https://doi.org/10.1007/978-1-4842-3829-5_8

8. Data Structures

Hillel Wayne1 
(1)
Chicago, Illinois, USA
 

When we want to write a specification involving some data structure, we need some sort of definition of the data structure . Further, we need one that’s independent of the algorithm. That means we should write data structures as separate modules that are extended or instantiated in our spec. We’ll use the example of linked lists (LL), in a file we’ll call LinkedLists.tla.

Warning

If you’re making a new specification for this, do not make LinkedLists.tla the root file. Instead, make the root file something else, such as main.tla, and add LinkedLists.tla as a secondary module. This will make it easier to test later. You can do this under File > Open Module > Add TLA+ Module.

A linked list is a low-level data structure where each element (node) of the LL is a data structure containing the data and a pointer to the next node. The last node in the list points to a null element, which is how we know it’s the last one. Critically, though, the LL might not have a last element that points to null. Instead, what would be the “last” element could instead point to an earlier memory address. This is called having a cycle.

In most cases, LLs with cycles are unwanted and indicate there is a bug in the system. This gives us several uses for speccing them: we may want to ensure some algorithm never produces LLs with cycles, or we may want to write an algorithm that detects cycles, or we may want to ensure a system still works properly even if fed a cyclic LL. To support all of these use cases, we want LinkedLists.tla to generate all possible LLs and let us select the subset that has the properties we need for our spec.

In TLA+, we generally represent data structures as functions or structures (which are also functions). By convention the module should have a LinkedLists(Nodes) operator that generates all matching functions where Nodes is the set of memory addresses.

While LL’s have data in them, that data is not central to the core topology of a linked list. All that matters for the base case is that, for a given node, we know what the next node will be. Then our linked list will be some element of the function set [Nodes -> Nodes]. We’ll start by defining all possible mappings between nodes.
PointerMaps(Nodes) == [Nodes -> Nodes]
LinkedLists(Nodes) == * ...

Next, we need a concept of a final node. It’s simply a node that points to a null value, which means we need a null value. We can add a NULL constant and then assert that none of the nodes are in NULL. This means using TLC to get Assert. We will use LOCAL INSTANCE instead of EXTENDS, so that any spec extending LinkedLists.tla does not also import the TLC operators.

Here’s what we have so far:
CONSTANT NULL
LOCAL INSTANCE TLC * For Assert
PointerMaps(Nodes) == [Nodes -> Nodes union {NULL}]
LinkedLists(Nodes) ==
  IF NULL in Nodes THEN Assert(FALSE, "NULL cannot be in Nodes") ELSE
  * ...
Almost there. PointerMaps is the set of possible memory mappings. But not all possible mappings are LLs. Consider the mapping [n in Nodes |-> NULL] (Figure 8-1). That’s not a single LL, that’s multiple disjoint LLs , each one element long. We need some way of restricting our function space to “actual” LLs. That’s one where, if you start from the appropriate initial element and keep going to the next node, you eventually reach all of the other nodes and eventually either cycle or hit NULL.
../images/462201_1_En_8_Chapter/462201_1_En_8_Fig1_HTML.jpg
Figure 8-1

[n in {"a", "b", "c"} |-> NULL]

How do we collect the subset of nodes reachable from a given starting point? One way is to use a recursive operator, such as PT!ReduceSet. We’d begin to add some first node, then whatever add first connects to, then whatever that connects to, and so on. But recursive operators are messy and hard to get right, plus we would need some way of finding first before we even start.

A better approach would be to notice that if we “follow” the nodes in the LL, we have a sequence. So if a given PointerMap is a single LL, we can find a sequence of nodes where every subsequent element is the next node of the element before it. For example, if our LL was [a |-> b, b |-> NULL, c |-> a], that would be the sequence <<c, a, b>>.

Furthermore, all of the nodes must appear in the sequence. Otherwise, for the mapping [a |-> NULL, b |-> NULL], we could select the sequence <<a>> and claim the mapping is a valid LL.
* PointerMap is an element of PointerMaps
isLinkedList(PointerMap) ==
  LET
    nodes == DOMAIN PointerMap
    all_seqs == [1..Cardinality(nodes) -> nodes]
  IN E ordering in all_seqs:
      * each node points to the next node in the ordering
      / A i in 1..Len(ordering)-1:
        PointerMap[ordering[i]] = ordering[i+1]
      * all nodes in the mapping appear in the ordering
      / A n in nodes:
        E i in 1..Len(ordering):
          ordering[i] = n
The last clause reduces to nodes subseteq Range(ordering), which we can use instead for simplicity. Now we can use isLinkedList to select the pointer maps that correspond to linked lists.
* While Range is defined in PT, we don't want
* a generic module reliant on PT!
Range(f) == {f[x]: x in DOMAIN f}
isLinkedList(PointerMap)      ==
  LET
    nodes == DOMAIN PointerMap
    all_seqs == [1..Cardinality(nodes) -> nodes]
  IN E ordering in all_seqs:
      / A i in 1..Len(ordering)-1:
        PointerMap[ordering[i]] = ordering[i+1]
      / nodes subseteq Range(ordering)
LinkedLists(Nodes)  ==
  IF NULL in Nodes THEN Assert(FALSE, "NULL cannot be in Nodes") ELSE
  {pm in PointerMaps(Nodes) : isLinkedList(pm)}
If we call LinkedLists(Nodes), though, we’ll only pass in the pointermaps that have all of the nodes in their domain, so we will only get linked lists of length Cardinality(Nodes). To get smaller LLs, all we need to do is extend LinkedLists to generate all possible subsets of nodes, define all of the pointermaps for each subset, and call isLinkedList on all of the pointermaps we generated.
LinkedLists(Nodes)  ==
  IF NULL in Nodes THEN Assert(FALSE, "NULL cannot be in Nodes") ELSE
  LET
    node_subsets == (SUBSET Nodes {{}})
    pointer_maps_sets == {PointerMaps(subn): subn in node_subsets}
    * pointer_maps_sets is a set of set of functions,
    * so we need to union them all together
    all_pointer_maps == UNION pointer_maps_sets
  IN {pm in all_pointer_maps : isLinkedList(pm)}
Every linked list should have a starting point. Can we define it as a node with no other element in the LL pointing to it? Not exactly. For any linked list, there is at most one node that isn’t pointed to by any other nodes. If there is one such orphan node , it has to be the first node. But there are some cases, called “rings,” where there are no orphan nodes: the last element of the LL points back to the first one (Figure 8-2).
>> CHOOSE ll in LinkedLists({"a", "b"}): {"a", "b"} subseteq Range(ll) [a |-> "b", b  |-> "a"]
../images/462201_1_En_8_Chapter/462201_1_En_8_Fig2_HTML.jpg
Figure 8-2

[a |-> "b", b |-> "a"]

In that specific case, it doesn’t matter which node we start from, so we might as well pick one arbitrarily. For the rest, we should pick the orphan node as our starting point.
Ring(LL)  ==  (DOMAIN  LL  =  Range(LL))
First(LL) ==
  IF Ring(LL)
  THEN CHOOSE node in DOMAIN LL:
        TRUE
  ELSE CHOOSE node in DOMAIN LL:
        node otin Range(LL)

Tip

We could also write First as

First(LL) ==  CHOOSE node in DOMAIN LL:    ~Ring(LL) => node otin Range(LL)

If the linked list is a ring, then we have FALSE => node otin Range(LL), which is always TRUE.

We defined Ring(LL) as a Boolean operator so we could use it in conjunction with other operators, as you see with First. If, for your spec, you want data structures that match specific criteria, it’s common practice to get them by first defining a operator that tests if a given instance matches those criteria and then using that operator in conjunction with set filters and CHOOSE. That way you can easily compose criteria in your spec. For example, here is how we can choose a cyclic LL that is not a ring:
Cyclic(LL) == NULL otin Range(LL)
>> CHOOSE ll in LinkedLists({"a", "b"}): Cyclic(ll) / ~Ring(ll)
[a |-> "b", b |-> "b"]
../images/462201_1_En_8_Chapter/462201_1_En_8_Fig3_HTML.jpg
Figure 8-3

[a |-> "b", b |-> "b"]

While we’ve written a lot of operators, most of them are internal to the module and we should make them LOCAL . Putting everything together:
---- MODULE LinkedLists ----
CONSTANT NULL
LOCAL INSTANCE FiniteSets * For Cardinality
LOCAL INSTANCE Sequences * For len
LOCAL INSTANCE TLC * For Assert
LOCAL INSTANCE Integers * For a..b
LOCAL PointerMaps(Nodes) == [Nodes -> Nodes union {NULL}]
LOCAL Range(f) == {f[x]: x in DOMAIN f}
LOCAL isLinkedList(PointerMap) ==
  LET
    nodes == DOMAIN PointerMap
    all_seqs == [1..Cardinality(nodes) -> nodes]
  IN E ordering in all_seqs:
    / A i in 1..Len(ordering)-1:
      PointerMap[ordering[i]] = ordering[i+1]
    / nodes subseteq Range(ordering)
LinkedLists(Nodes) ==
  IF NULL in Nodes THEN Assert(FALSE, "NULL cannot be in Nodes") ELSE
  LET
    node_subsets == (SUBSET Nodes {{}})
    pointer_maps_sets == {PointerMaps(subn): subn in node_subsets}
    all_pointer_maps == UNION pointer_maps_sets
  IN {pm in all_pointer_maps : isLinkedList(pm)}
Cyclic(LL) == NULL otin Range(LL)
Ring(LL) == (DOMAIN LL = Range(LL))
First(LL) ==
  IF Ring(LL)
  THEN CHOOSE node in DOMAIN LL:
        TRUE
  ELSE CHOOSE node in DOMAIN LL:
        node otin Range(LL)
====

Validation

Our definition of a linked list is complex enough that we should probably sanity check it. The way we can do this is to make a spec that imports LinkedLists and has some evaluation operator, like Valid. This is why I had you make a seperate root file for this project: so we could do all of the sanity checking in that root file. Here are some things we might want to check:
  • There should be some LL with a cycle.

  • There should be some LL without a cycle.

  • For every set of nodes, there is some ring that covers all of those nodes.

  • All LLs have at most one node without a parent, at most one node with two parents (in the case of a cycle), and no nodes with more than two parents.

You should try writing all of these as practice . Let’s go through one of them: we defined Cyclic as “there is no node that points to NULL.” We could also define it as “there exists a node in the LL with two parents.” Let’s show that the two are equivalent.
---- MODULE main ----
EXTENDS TLC, Integers, FiniteSets, Sequences
CONSTANTS Nodes, NULL
INSTANCE LinkedLists WITH NULL <- NULL
AllLinkedLists == LinkedLists(Nodes)
CycleImpliesTwoParents(ll) ==
  Cyclic(ll) <=>
    E n in DOMAIN ll:
      Cardinality({p in DOMAIN ll: ll[p] = n}) = 2
Valid ==
  / A ll in AllLinkedLists:
      / Assert(CycleImpliesTwoParents(ll), <<"Counterexample:", ll>>)
====
We want No Behavior Spec, since we’re not verifying any algorithms, only that the data structures are correct. We can then run Valid in Evaluate Constant Expression.
NULL <- [ model value ]
Nodes <- [ model value ] {a, b, c}
>> Valid
<<"Counterexample:", (a :> a)>>
We forgot that rings are cycles where every node has one parent. Our definition of LinkedLists may be correct, but CycleImpliesTwoParents is incorrect. Let’s adjust it to account for rings.
CycleImpliesRingOrTwoParents(ll) ==
  Cyclic(ll) <=>
    / Ring(ll)
    / E n in DOMAIN ll:
        Cardinality({p in DOMAIN ll: ll[p] = n}) = 2
Valid ==
  / A ll in AllLinkedLists:
      / Assert(CycleImpliesRingOrTwoParents(ll), <<"Counterexample:", ll>>)
>> Valid
TRUE

Try adding a few more tests as conjunctions to Valid. Do you find anything surprising? Are there any useful operators you’d want to add to LinkedLists?

Example

We went through all that trouble to make a linked list module, so we might as well use it in an algorithm. The Tortoise and the Hare algorithm is a famous way of detecting cycles in linked lists. You start two iterators, a slow “tortoise” and a fast “hare” at the beginning of the LL. At every step, you move the tortoise one node and the hare two nodes. If the two pointers ever land on the same node, the LL has a cycle. This is a single-process algorithm, so we should be able to use the standard template we learned about in the last chapter.
EXTENDS TLC
CONSTANTS Nodes, NULL
INSTANCE LinkedLists
(*--fair algorithm tortoise_and_hare
variables
  ll in LinkedLists(Nodes),
  tortoise = First(ll),
  hare = tortoise;
macro advance(pointer) begin
  pointer := ll[pointer];
  if pointer = NULL then
    assert ~Cyclic(ll);
    goto Done;
  end if;
end macro;
begin
  while TRUE do
    advance(tortoise);
    advance(hare);
    advance(hare);
    if tortoise = hare then
      assert Cyclic(ll);
      goto Done;
    end if;
  end while;
end algorithm; *)

Try checking Termination with Nodes <- [ model value ] {a, b, c, d}. It should pass with 2,248 states. Try removing one of the advances to see how the broken spec fails. Why do we need to write fair algorithm instead of just algorithm? Try removing it to see what happens.

Summary

We showed how to write a data structure that we can reuse in other specs. Our example was making a linked list module, which we wrote and validated. We also used this module as part of another algorithm.

In the next chapter, we will learn how to use state machines to help design and add detail to abstract specs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.58.194