Chapter 5. Composite Types

Rosetta’s composite types define homogeneous collections of data. The composite type subsystem provides mechanisms for defining and manipulating three primary data containers, sets, multisets, and sequences. All composite types are homogeneous and are declared using type formers. Values from composite types are formed using associated value formers, whose syntax parallels type formers.

The set type defines unordered, homogeneous collections of unique items. A primary use for the set type is defining new types and subtypes. Additionally, sets provide excellent mechanisms for defining collections of items in abstract specifications. The multiset type defines an unordered, homogeneous collection of items, where multiple instances of an element are allowed. A primary use for multiset types is manipulating collections of values, where the number of values in a collection is significant. The sequence type defines indexed, homogeneous collections of data that resemble arrays and lists. A primary use for sequence types is ordering or structuring data collections. The sequence type has associated subtypes that represent strings and bit vectors as sequences of characters and bits, respectively.

Type Formers

All Rosetta composite types are defined using type formers. A type former is a function that takes one or more content types and generates a composite type with appropriate properties from them. All type formers have the following format:

     container(T)

where container is the composite type and T is the type associated with items in the container. For example, in the definition:

     bitset::set(bit);

set is the type former while bit is the content type. This declaration creates a new variable called bitset whose value is constrained to a set of values from the bit type. A constant is defined similarly:

  mask::sequence(bit) is b";1100";

In this case, sequence is the type former while bit is again the content type. This declaration creates a new constant called mask whose type is a sequence of bit and whose constant value is the bitvector "1100".

The type former itself also defines a type. Excluding the content type from the previous declaration results in an item whose value is a set, but could be a set of elements from any arbitrary type:

   anything::set

The use of set without the content type is the same as using top as the content type. Specifically, the preceding declaration is equivalent semantically to:

   anything::set(top);

Type formers are simply Rosetta functions. Users can define their own type formers using techniques that will be documented in Chapter 8.

Set Types

A set is a unordered collection of unique items. The type former for set types is the keyword set and the form of a set declaration is:

     s::set(T)

where s is the new item, and T is the type of elements contained in the set. Values associated with s may be any collection of items from T, including collections with zero elements. The distinction between this declaration and the declaration:

     v::T;

is that in the second definition v must be a single element from T. For example, the definition:

     x::set(integer)

defines a new item whose value can be any subset of integer. In contrast, the declaration:

   v::integer

defines an item whose value is a single integer.

Of note is the type formed by using top as the type argument to set. The declaration set(top) defines a type that includes all possible sets of Rosetta values. If an item has type set(top) we know that it is a set and nothing more. The set type former with no arguments denotes this type. Specifically, the following two declarations are semantically equivalent:

   s::set(top) // Any set

   s::set // Any set

Table 5.1 provides a list of operators defined over all types formed using the set type former. These operations include standard set theory operations as well as special-purpose formers for commonly used set values.

Table 5.1. Operators defined over sets

Operation

Syntax

Return Type

Set Former

{a,b,c,d}

set

Union, intersection, and difference

A+B, A*B, A-B

set

Set element

a in B

boolean

Subset and Superset

A=<B, B>=A

boolean

Proper Subset and Proper Superset

A<B, A>B

boolean

Cardinality

#A

natural

Contents

~A

set

Empty Set

{}

set

Integer Set

{i,..j}

set(integer)

Character Set

{c,..d}

set(character)

Powerset

set(A)

set(set(A))

Not to be confused with the set type former, the set former is used to package expressions into sets. The set former uses the classical form for defining sets by extension. The notation {a,b,c} forms a set from the result of evaluating a, b, and c. For example:

      {1,2,3}          // The set containing 1,2,3

      {inc(1),2,3}     // The set containing 2,3

      {'a'}            // The singleton set containing 'a'

The set former is not a mechanism for defining set literals. It is syntactic sugar for a function that forms sets from values specified by arguments. The set former is strict, implying that if any of its arguments is bottom, then its value is bottom. This implies that no set can be formed with bottom as an element.

The type of a set former application is the least common supertype of the elements specified as its arguments. For example:

   {1,2,3}::set(posint)

   {-1,0,1}::set(integer)

   {inc(1.1)}::set(posreal)

   {1,'1'}::set(top)

   {{1,2},{2,3}}::set(set(posint))

The application of the set former with no arguments, {}, is defined as the set containing no elements, or simply the empty set. The empty set is a subset of all sets, no set is a subset of {}, and the size of {} is zero. Furthermore, {} unioned with any set is the original set and {} intersected with any set is {}. The type of the empty set presents interesting problems because it is a subset of all sets and thus an element of every formed set type. By convention, the empty set can be inferred to have any set type and this may be used as an operand to operators over sets involving any set type.

The element operator indicates when an item is an element of a set. The in operator is used to represent element and the notation a in S is read “a is an element of S.” The relation is true when the item a is contained in the set S. For example:

    3 in {1,2,3} == true

    4 in {1,2,3} == false

The subset relation is true when all elements of its first set argument are contained in its second set argument. The =< and >= operators are used to represent subset operations. A=<B is read “A is a subset of B” while B>=A is read “B is a superset of A.” Proper subset relationships hold when one set is a subset of another but is not equal to it. The the > and < operations provide proper containment. Thus, A<B is true if A=<B is true and A/=B. Likewise, A>B is true if A>=B and A/=B. For example:

    {2,3} =< {1,2,3} == true

    {2,3} < {1,2,3} == true

    {1,2,3} < {1,2,3} == false

    {1,2,3} >= {2,3} == true

    {1,2,3} > {2,3} == true

    {1,2,3} > {1,2,3} == false

Special operations are provided to form sets that contain sequences of integer and character values. The notation {i,..j} forms the set containing all integers between i and j inclusively if i and j are integer values. If i and j are character values, then a set of characters is formed. For example:

    {2,..5} == {2,3,4,5}

    {'a',..'f'} == {'a','b','c','d','e','f'}

This notation is useful for numerous mapping operations, where sets of integer and character values must be generated.

Set union, intersection, and difference provide constructors for new sets from existing sets. The operator, +, generates the set union. Thus, A+B contains all elements contained in either A or B. For example:

   {1,2,3} + {1,2,3} == {1,2,3}

   {1,2,3} + {2,3} == {1,2,3}

   {1,2,3} + {3,4,5,6} == {1,2,3,4,5,6}

   {1,2,3} + {} == {1,2,3}

Intersection generates a new set containing all elements shared in its argument set. The operator, *, generates the set intersection. Thus, A*B generates a new set containing elements contained simultaneously in both set A and set B. For example:

    {1,2,3} * {1,2,3} == {1,2,3}
    {1,2,3} * {2,3} == {2,3}
    {1,2,3} * {3,4,5,6} == {3}
    {1,2,3} * {} == {}

Set difference generates a new set containing those elements in its first argument set that are not in its second argument set. The operator, –, generates the set difference. Thus, AB generates a new set containing elements from A that are not in B. For example:

   {1,2,3} – {1,2,3} == {}

   {1,2,3} – {2,3} == {1}

   {1,2,3} – {3,4,5,6} == {1,2}

   {1,2,3} – {} == {1,2,3}

The cardinality operator, #, returns the size of a set. Thus, #A is the number of elements contained in A. For example:

   #{1,2,3} == 3

   #{} == 0

The contents operator, ~A, returns the contents of A as a set. For sets, the contents operator is an identity operator, as the elements of a set are the set itself. The contents operator is defined over all composite types and is included over set types for completeness.

The power set operator, set, is used to generate collections of all possible subsets from a set. Thus, the notation set (A) generates the set of all possible subsets of A. For example:

    set({1,2}) == {{},{1},{2},{1,2 }}

It should be noted that the powerset operator is most often used for defining new types, because any Rosetta set can be used as a type. It is not a coincidence that the former for set types is the same as the powerset operation. For example, the declaration:

    s ::set(integer);

asserts that s is an element of the powerset of integers. Because the elements of the powerset include all subsets of integers, the value of s is constrained to be one such subset. Thus, the set operator is used both to generate powersets and to define new set types. Because sets can be used as types in Rosetta, it is possible to construct user-defined types by providing functions that evaluate to sets. Examples of such usage can be found in Chapter 8.

The collection of functions defined over set types is defined in Table 5.2. The choose operator selects an arbitrary element from a set. Choose is not deterministic in that any element of its argument can be returned. For example:

  choose({1,2,3}) == 1

  choose({1,2,3}) == 2

Table 5.2. Functions defined over sets

Function

Syntax

Return Type

Random choice

choose(s::A)

A

Function image

image(f,s)

ran(f)

Filter a set

filter(p,s::set(A))

set (A)

The image function takes a unary function and a set and returns a new set resulting from the application of the unary operator to each element of the original set. For example, assuming that inc is defined as an increment function over integer:

  image(inc,{1,2,3}) == {2,3,4}

The filter function takes a boolean predicate and a set and returns a new set that includes exactly those items from the original set that satisfy the predicate.

For example, assuming that gtz checks its argument to determine if it is greater than zero:

     filter(gtz,{-1,0,1}) == {1}

Filter can be used to define new subtypes from types by filtering out values. The formal definition of the positive integer numbers is defined as the following set of values:

     filter(gtz,natural) == posint

Multiset Types

A multiset is a collection of items that allows duplication of elements. Sometimes called bags, multisets differ from sets in that repeated elements are allowed and the number of element occurrences can be measured. The type former for multisets is the function multiset. The form of a multiset declaration is:

   m::multiset(T)

where m is the new item and T is the type of elements contained in the multiset. Values associated with m may be any collection of items from T, including collections with zero elements. For example, the definition:

   x::multiset(integer)

defines a new item whose value can be any multiset of integer values.

Of note is the type formed by using top as the type argument to multiset. The declaration multiset(top) defines a type that includes all possible multisets of Rosetta values. If an item has type multiset(top) we know that it is a multiset and nothing more. The multiset type former with no arguments denotes this type. Specifically, the following two declarations are semantically equivalent:

   s::multiset(top) // Any multiset

   s::multiset // Any multiset

Table 5.3 defines the collection of operations define over multisets.

Table 5.3. Operations defined over multisets

Operator

Syntax

Return Type

Multiset former

{*a,b,a,d*}

multiset

Union, intersection and difference

A+B, A*B, A-B

multiset

Multiset element

a in M

boolean

Multisubset and multisuperset

A=<B, B>=A

boolean

Proper multisubset and proper multisuperset

A<B, B>A

boolean

Cardinality

#B

natural

Number in

x # M

natural

Contents

~A:: multiset(A)

{set}(A)

Empty multiset

{**}

multiset

Integer multiset

{*i,..j*}

multiset(integer)

Character multiset

{*‘c’,..‘d’*}

multiset(character)

Power Multiset

multiset (A)

set(multiset(A))

The multiset former is used to package expressions into multisets and defines multisets by extension. The notation {*a,b,c*} forms a multiset from the result of evaluating a, b, and c. For example:

   {* 1,2,1,3 *}      // The multiset containing 1,1,2,3

   {* inc(1),2,1,3 *} // The multiset containing 1,2,2,3

   {* 'a' *}          // The singleton multiset containing 'a'

The multiset former is not a mechanism for defining multiset literals. It is syntactic sugar for functions that form multisets from values specified by arguments to the former. The multiset former is strict, implying that if any of its arguments is bottom, then its value is bottom. This implies that no multiset can be formed with bottom as an element.

The type of a multiset former application is the least common supertype of the elements specified as its arguments. For example:

   {* 1,2,1,3 *}::multiset(posint)

   {* -1,-1,0,1,1 *}::multiset(integer)

   {* inc(1.1) *}::multiset(posreal)

   {* 1,'1' *}::multiset(top)

   {*{1,2,{1,2*}::multiset(set(posint))

   {*{*1,2*},{*1,2*}*}::multiset(multiset(posint))

The application of the multiset former with no arguments, {**}, is defined as the multiset containing no elements, or simply the empty multiset. The empty multiset is a sub-multiset of all multisets, no multiset is a sub-multiset of {**}, and the size of {**} is zero. Furthermore, {**} unioned with any multiset is the original multiset and {**} intersected with any multiset is {**}. The type of the empty multiset presents interesting problems because it is a sub-multiset of all multisets and thus an element of every formed multiset type. By convention, the empty multiset can be inferred to have any multiset type and may be used as an operand to operators over multisets involving any multiset type.

The element operator indicates when an item is an element of a multiset. The in operator is used to represent element and the notation a in M is read “a is an element of M.” The relation is true when the item a is contained in the multiset M. For example:

    3 in {*1,2,3,3*} == true

    4 in {*1,1,2,3*} ==  false

The cardinality operator for multisets returns the number of elements in the multiset. The unary application of# to a multiset returns the cardinality of the multiset. Specifically:

    #{*1,1,2,1*} == 4

    #{**} == 0

The cardinality operator may also be used as a binary operation that returns the number of occurrences of an element in a multiset. Specifically:

    1#{*1,1,2,1*} == 4

    2#{*1,1,2,1*} == 1

    4#{*1,1,2,1*} == 0

Using cardinality in this fashion is not defined for sets. If it were, the number of occurrences of any element in the set would be 1.

The contents operator defined on multisets returns the elements of the multiset as a set. The operation ~M takes an arbitrary multiset and returns its elements as a set, removing duplicate entries. For example:

    ~{*1,1,2,1*} == {1,2}

    ~{*1,2,3,2,1*} == {1,2,3}

    ~{**} == {}

The sub-multiset relation is true when all elements of its first multiset argument are contained in its second multiset argument in at least as many occurrences as the first. The =< and >= operators are used to represent sub-multiset operations. A=<B is read “A is a sub-multiset of B,” while B>=A is read “B is a super-multiset of A.” Proper sub-multiset relationships hold when one multiset is a subset of another but is not equal to it. Two multisets are equal if they contain the same elements in the same quantities. The the > and < operations provide proper sub-multiset. Thus, A<B is true if A=<B is true and A/=B. Likewise, A>B is true if A>=B and A/=B. For example:

    {*2,3*} =< {*1,2,3,3*} == true

    {*2,3,3*} < {*1,2,3*} == false

    {*2,3*} < {*y1,2,3*} == true

    {*1,2,3*} < {*1,2,3*} == false

    {*1,2,3,3*} >= {*2,3*} == true

    {*1,2,3*} > {*2,3,3*} == false

    {*1,2,3*} > {*2,3*} == true

    {*1,2,3*} > {*1,2,3*} == false

Special operations are provided to form multisets that contain sequences of integer and character values. The notation {*i,..j*} forms the multiset containing one instance of all integers between i and j inclusively, if i and j are integer values. If i and j are character values, then a multiset of characters is formed. For example:

    {*2,..5*} == {*2,3,4,5*}

    {*'a',..'f'*} == {*'a','b','c','d','e','f'*}

Multiset union, intersection, and difference provide constructors for new multisets from existing multisets. The operator, +, generates the multiset union. Thus, A+B contains all elements contained in either A or B. Further, the number of any element in A+B is the sum of the numbers in A and B individually. If ‘a’#A=1 and ‘a’#B=2, then ‘a’#(A+B)=3. For example:

   {*1,2,3*} + {*1,2,3*} == {*1,1,2,2,3,3*}

   {*1,2,3*} + {*2,3*} == {*1,2,2,3,3*}

   {*1,2,3*} + {*3,4,5,6*} == {*1,2,3,3,4,5,6*}

   {*1,2,3*} + {**} == {*1,2,3*}

Intersection generates a new multiset containing all elements shared in its argument multisets. The operator, *, generates the multiset intersection. Thus, A*B generates a new multiset containing elements contained simultaneously in both multiset A and multiset B. Further, the number of any element in the intersection is the minimum number in A and B individually. If ‘a’#A==1 and ‘a’#B==2, then ‘a’#(A*B)=1. For example:

   {*1,2,3*}*{*1,2,3*} == {*1,2,3*}

   {*1,2,2,3*}*{*2,3*} == {*2,3*}

   {*1,2,3*}*{*3,3,4,5,6*} ==  {*3*}

   {*1,2,3*}*{**} == {**}

Multiset difference generates a new multiset containing those elements in its first argument multiset that are not in its second argument multiset. The operator, –, generates the multiset difference. Thus, AB generates a new multiset containing elements from A that are not in B. Further, the number of any element in the difference is the difference between the number in the first multiset and the second. If ‘a’#A=3 and ‘a’#B=1, then ‘a’#(AB)=2. For example:

   {*1,2,3*} – {*1,2,3*} == {**}

   {*1,2,3*} – {*2,3*} == {*1*}

   {*1,2,3,3*} – {*3,4*} ==  {*1,2,3*}

   {*1,2,3,3*} – {*2,3,3*} = {*1*}

   {*1,2,3*} – {**} == {*1,2,3*}

The power multiset operation, multiset(A), forms the set of all possible multisets formed from elements of its single set argument. It serves as the type former for multiset types. Unlike set(A), which is finite for all finite A, the value of multiset(A) is always infinite unless A is empty. This is due to multisets allowing arbitrary numbers of the same element. For example:

  multiset({1}) == {{**},{*1*},{*1,1*},{*1,1,1*},...}

The definition of multiset(A) can also be defined as the set of multisets whose elements are taken from the set A. Formally, if a in multiset (A), then ~a =< A. The multiset operator is used almost exclusively for defining new multiset types, as it cannot be evaluated.

Functions defined over multisets are shown in Table 5.4. The choose operator selects an arbitrary element from a multiset. Choose is not deterministic in that any element of its argument can be returned. For example:

   choose({*1,2,2,3*})  == 2

   choose({*1,2,2,3*}) == 3

Table 5.4. Functions defined over multisets

Operator

Syntax

Return Type

Random choice

choose(m::A)

A

Image over multiset

image(f,m)

multiset(ran(f))

Filter over multiset

filter(p,m:: multiset (A))

multiset(A)

Convert set to multiset

set2multiset(s::set(A))

multiset (A)

The image function takes a unary function and a multiset and returns a new multiset resulting from the application of the unary operator to each element of the original multiset. For example, assuming that inc is defined as an increment function over integer:

    image(inc,{*1,2,2,3,3*}) == {*2,3,3,4,4*}

The filter function takes a Boolean predicate and a multiset and returns a new multiset that includes exactly those items from the original multiset that satisfy the predicate. For example, assuming that gtz checks its argument to determine if it is greater than zero:

    filter(gtz,{*-1,-1,0,1,1,2,1*}) == {*1,1,2,1*}

Keep in mind that, like sets, the ordering of multiset elements has no bearing on its value. Two multisets are equivalent if they have the same contents. For example:

    {*1,1,1,2,3,4*} == {*4,1,3,1,2,1*}

Sequence Types

A sequence is an indexed collection of items. Its fundamental use in the Rosetta system is defining ordered collections or lists of items. Sequences support random access and ordering of elements using their associated index, and in this regard behave much like arrays. Sequences also support concatenation allowing arbitrary creation of new sequences. The type former for sequences is the keyword sequence. The form of a sequence declaration is:

  s::sequence(T);

where s is the new item, and T is the type of elements contained in the sequence. Legal elements of s are sequences of elements from T, including sequences with zero elements. For example, the definition:

    x::sequence(integer)

defines a new item whose value can be any sequence of values from integer.

The declaration sequence(top) defines a type that includes all possible sequences of Rosetta values. If an item has type sequence (top) we know that it is a sequence and nothing more. The sequence type former with no arguments denotes this type. Specifically, the following two declarations are semantically equivalent:

   s::sequence(top) // Any sequence

   s::sequence // Any sequence

Table 5.5 provides a list of operators defined over all types formed using the sequence type former. Operators common to sequences as well as those defined for lists in other languages are included. In addition, operators for creating useful multiset values are defined.

Table 5.5. Operators defined over sequence types

Operator

Syntax

Return Type

Sequence Former

[1,2,3,5]

sequence

Concatenation

s&t

sequence(A)

Random access

s::sequence(A)(n)

A

Subscription

s::sequence(A) sub i

sequence (A)

Integer Sequence

[i,..j]

sequence (integer)

Character Sequence

[‘c’,..‘d’]

sequence (character)

Subsequence relations

s<t, s=<t, t>=s,t>s

boolean

Minimum and maximum

s mint, s max t

sequence

Cardinality

#s

natural

Sequence Contents

~s::sequence(A)

multiset (A)

Empty sequence

[]

sequence

The sequence former uses the classical form for defining sequences by extension. The notation [a,b,c] forms a sequence from the result of evaluating a, b, and c. The order of elements in the sequence former defines the order in the resulting sequence. The first element is associated with index 0 and other elements follow. For example:

  [1,2,3]      // The sequence containing 1,2,3 in order

  [inc(1),2,3] // The sequence containing 2,2,3

  ['a']        // The sequence containing 'a'

The sequence former is not a mechanism for defining sequence literals. Like set and multiset formers, it is syntactic sugar for functions that form sequences from values specified by arguments to the former. The sequence former is strict, implying that if any of its arguments is bottom, then its value is bottom. This implies that no sequence can be formed with bottom as an element.

The type of a sequence former application is a sequence of the least common supertype of the elements specified as arguments. For example:

   [1,2,3]::sequence(natural)

   [-1,0,1]::sequence(integer)

   [inc(1.1)]::sequence(posreal)

   [1,'1']::sequence(top)

   [{1,2},{2,3}]::sequence(set(posint))

The application of the sequence former with no arguments, [], is defined as the sequence containing no elements, or simply the empty sequence. The empty sequence shares membership properties with the empty set and the empty multiset. The type of the empty sequence can be inferred to have any sequence type and may be used as an operand to operators over sets involving any set type.

Individual elements of a sequence type are accessed using their associated positional index. The sequence is used as a function name and the single argument is used to access an element. The notation s(2) accesses the element associated with index 2 in the sequence s; s(0) references the first element of the sequence. For example, assuming that s is equal to the sequence [3,3,2,1]:

   s(0) == 3

   s(2) == 2

   s(4) == bottom

As the example demonstrates, accessing a sequence element outside the known size of the sequence results in bottom.

It is not necessary to name the sequence to access its elements. The label, s, can be replaced by its value in the preceding example with equivalent results:

   [3,3,2,1](0) == 3

   [3,3,2,1](2) == 2

   [3,3,2,1](4) == bottom

Sequences of sequences are allowed and referencing is achieved by multiple levels of accessor functions. For example, s(0)(1) accesses the second element of the first sequence in s; s(0) references the first element of s. If s is a sequence of sequences, then s(0) is itself a sequence, thus s(0)(1) accesses an element in the sequence. Sequences can be arbitrarily nested in this way to provide functionality similar to a multidimensional array. For example, assume that we would like to define a two-dimensional array of integer values. The declaration for an example of this structure is:

  intarray ::sequence(sequence(integer));

Assume that intarray =[[1,2,3],[4,5,6],[7,8]]:

  intarray(0) == [1,2,3]

  intarray(1) == [4,5,6]

  intarray(1)(0) == [1,2,3](0) == 1

  intarray(2)(2) == [7,8](2) == bottom

The contents function, ~s, returns the contents of a sequence as a multiset. Thus the number of each element in the returned multiset is equal to the number of element occurrences in the sequence.

  ~[1,2,3,2,1] == {*1,1,2,2,3*}

  ~[1,1,1] == {*1,1,1*}

  ~[] == {**}

Because all individual sequences are finite, the contents multiset can always be formed.

The contents operator can be applied twice to obtain the set of values appearing in a sequence. The first application results in a multiset and the second results in a set. For example:

   ~(~[1,2,3,2,1]) == ~{*1,1,2,2,3*} == {1,2,3}

   ~(~[1,1,1]) == ~{*1,1,1*} == {1}

   ~(~[]) == ~{**} == {}

As with set and multiset, the size of a sequence type can be found using the size operator, #s; #s returns the natural number associated with the size of s. For example:

   #[1,2,3,2,1] == 5

   #[1,1,1] == 3

   #[] == 0

The concatenation operator s&t is used to concatenate two sequences of the same type. Sequences are homogeneous data structures, thus all elements contained within them must be of a common type. The notation s&t produces a new sequence, with the elements of s preceding the elements of t. For example:

   [1,2,3]&[4,5] == [1,2,3,4,5]

   [1,2,3]&[] == [1,2,3]

   []&[1,2,3] == [1,2,3]

Assume the following declarations:

Example 5.1. Sequence Forming and Concatenation

  S::sequence(integer) is [1,2,3];

  T::sequence(integer) is [1,2];

The following relationships hold:

  S&T == [1,2,3,1,2];
  T&S == [1,2,1,2,3];
  S&S == [1,2,3,1,2,3];

  S(0) == 1;
  S(#S-1) == 3;
  (S&T)(0) == S(0) == 1;
  (S&T)(#S-1) == S(#S-1) == 3;
  (S&T)(#S) == 1;
  (S&T)(#(S&T)-1) == T(#T-1) == 2;

  ~S == {1,2,3};
  ~T == {1,2};

Assume the following declaration of a multidimensional sequence:

  S::sequence(sequence(integer)) is [[1,2,3],[4,5,6]];

The following relationships hold:

   S(0) == [1,2,3];
   S(0)(1) == 2

   S(1) == [4,5,6];
   S(1)(0) == 4;

   ~S =={*[1,2,3],[4,5,6]*};

Note that concatenating a sequence of integers with the sequence S is not legal. To add a new sequence to the end of S requires concatenating with a sequence of integer sequences of the form [[5,6]]. Note that length is not an issue. Bounded sequences will be defined in a later section.

It is possible to extract a collection of elements from a sequence using an operation called subscription. If s is a sequence and i is a sequence of natural numbers, then sub i extracts the elements of s associated with values from i, generating a new sequence from those elements in the order specified by i. For example:

  [1,2,3,4] sub [0,1] == [1,2]

  [1,2,3,4] sub [1,0] == [2,1]

  [1,2,3,4] sub [1,1,0,0,3,3,2,2] == [2,2,1,1,4,4,3,3]

Special operations are provided to form sequences of ordered integer and character values. The notation [i,..j] forms the sequence containing all integers in order between i and j inclusively, if i and j are integer values. If i and j are character values, then a sequence of characters is formed. For example:

  [2,..5] == [2,3,4,5]

  ['a',..'f'] == ['a','b','c','d','e','f']

The =< and >= operators are used to represent subsequence relations between sequences. A=<B holds when A is a subsequence of B and A>=B when B is a subsequence of A. Similarly, the > and < operations provide strict lexicographical ordering relationships. A<B is true if A=<B is true and A/=B. Likewise, A>B is true if A>=B and A/=B. For example:

  [1,2] =< [1,2,3,3] == true

  [1,2,3] < [1,2,3] == false

  [2,3] < [1,2,3] == true

  [1,2,3] =< [1,2,3] == true

Sequence max and min operations are based on the subsequence relationships just defined. A max B returns A if A>B and B otherwise. Similarly, A min B returns A if A<B and B otherwise. Both operators can be defined using the if expression:

   a min b == if a < b then a else b

   a max b == if a < b then b else a

where a and b are both sequences. Table 5.6 lists functions available for sequence types.

Table 5.6. Functions defined over sequence types

Function

Syntax

Return Type

Power Sequence

sequence (T)

set(sequence(T))

List accessors

head(s), tail(s), last(s)

element, sequence, element

Add to front

cons(e,s)

sequence

Reverse a sequence

reverse(s)

sequence

Image of function

image(F,s)

sequence

Comprehension

filter(P,s)

sequence

Reduce and Reduce tail

reduce(F,i,s), reduce_tail(F,i,s)

element of the range of F

Element replacement

replace(s,n,v)

sequence

Zip

zip(F,s1,s2)

sequence

Assume the following declarations of integer sequences:

Example 5.2. Subsequences

   s1::sequence(integer) is [1,2,3];
   s2::sequence(integer) is [1,2,3,4];

Subsequence definitions state that the following relationships must hold:

   s1 < s2;
   s1 =< s2;
   s1 =< s1;

   s1 = s2 sub [0,..2];
   s1 max s2 == s2 == [1,2,3,4];
   s1 min s2 == s1 == [1,2,3];

Three operations are defined that allow sequences to be treated as lists. The accessors head and tail return the first element of a sequence and the remaining elements, respectively. The constructor cons adds a single element to the front of a sequence. Together, cons, head, and tail provide a list capability similar to that defined in traditional functional languages. The last function is also available to access the last element of a sequence. The head, tail and last functions are defined only on nonempty lists. Examples of their application include:

   head([1,2,3]) == 1

   tail([1,2,3]) == [2,3]
   cons(1,[1,2,3]) == [1,1,2,3]

   tail(cons(1,[1,2,3])) == [1,2,3]

   head(cons(1,[1,2,3])) == 1

   last([1,2,3]) == 3

   cons(1,[]) == [1]

   head([]) == bottom

   tail([]) == bottom

   last([]) == bottom

Example 5.3. Head, Tail, and Cons Using Subscription and Concatenation

The functions head, tail, and cons can be defined using subscription, integer sequence, and the sequence former. The head of a sequence can be obtained by using the indexing operation to extract the first element. Formally, the head of a sequence can be defined as:

  head(s) == s(0)

The tail of a sequence can be obtained by extracting all but the first element. This is a bit trickier, but the integer sequence operation can be used with subscription to extract all but the first element. The integer sequence operation generates the sequence 1 through length of S minus 1. Subscription is then used to extract all but the first element of the sequence:

   tail(S) == s sub [1,..(#s-1)];

The cons constructor can be defined using the sequence former and concatenation to add an element to the beginning of a sequence:

cons(e,s) == [e]&s;

The sequence former creates a new sequence from e that is one element long. It is then added to the front of the sequence using the concatenation operator.

The reverse function is defined over all sequences and evaluates to a sequence that contains the same elements as s, but in the reverse order. For example:

  reverse([1,2,3]) == [3,2,1]

  reverse([1]) == [1]

  reverse([]) == []

The replace function generates a new sequence by replacing the element at position i with v. If replace attempts to access a sequence position beyond the length of s, the result is bottom. Unlike array operations in traditional languages, the original sequence is not modified in any way. A new sequence is generated whose elements match those in s except for position i contains the new value. For example:

  replace(['a','b','c'],0,'b') == ['b','b','c']

  replace(['a','b','c'],4,'b') == bottom

Example 5.4. Updating a Sequence

Sequences, like all other items, can be used as variables rather than as constant values. Updating sequences is quite easy using the replacement operation, as in the following statement:

   seq' = replace(seq,2,5)

The result of evaluating this expression is constraining the value of seq in the next state, seq’, to be the updated sequence from the current state — specifically, seq with the element indexed by 2 replaced with 5. Using the tick notation in this fashion will be fully explained in Chapter 12. For now, it is enough to simply realize that updating sequences like arrays can be achieved using replacement.

The image and filter functions map functions onto sequence elements. The image function returns the sequence formed by applying function to each element of a sequence. If any application results in bottom, then image results in bottom. Specifically, if inc is a function that increments its argument by 1:

   image(inc,[1,1,2,3]) == [inc(1),inc(1),inc(2),inc(3)] == [2,2,3,4]

   image(inc,[]) == []

The filter function behaves much like the image function except that only the elements that a predicate holds true for are kept in the resultant list. Should any predicate application result in bottom, then the application of filter results in bottom. Specifically, if gtz is a function over integers that is true, if its argument is greater than zero:

 filter(gtz,[-2,-1,0,1,2]) == [1,2]

 filter(gtz,[]) == []

The reduce operator is a fold over sequences. It has two distinct forms that allow both left (reduce) and right (reduce_tail) associativity. Both forms take a sequence, a binary function whose domain is the same type as the sequence elements, and an initial value of the same type as the function range. The function is first applied to the initial value and the first sequence element. The function is then applied to each sequence element using the result of the previous application as the other argument.

A simple application of reduce defines the standard summation operator:

    summation(s::sequence (integer))::integer is
      reduce(__+__,0,s);

The newly defined function will start with 0 and subsequently sum sequence values from left to right.

The reduce_tail function is similar except that the function is first applied to the initial value and the last element of the sequence, working from right to left. For operations like summation, where the applied function is commutative, the operators generate the same result. For example:

  reduce(__+__,0,[1,2,3]) == (((0+1)+2)+3)

  reduce_tail(__+__,0,[1,2,3]) == (1+(2+(3+0)))

In this case, the results are the same. However, for non-commutative operators like subtraction, the distinction is clear:

  reduce(__-__,0,[1,2,3]) == (((0-1)-2)-3) == -6

  reduce_tail(__-__,0,[1,2,3]) == (1-(2-(3-0))) == 2

Both forms are useful, but care must be taken when using them in definitions.

The zip is much like the image function except that it accepts two sequences an a binary function. The zip returns a list of elements resulting from the pair-wise application of the argument function to elements in the same position in the two lists.

If any function application results in bottom, then the application of zip also results in bottom. For example:

   zip(__+__,[1,2,3],[0,1,2]) == [1+0,2+1,3+2] == [1,3,5]

   zip(__/__,[1,2,3],[0,1,2]) == [1/0,2/1,3/2] == bottom

The power sequence operation, sequence (A), forms the set of all possible sequences formed from elements of its single set argument. It serves as the type former for sequence types. Like multiset(A), sequence (A) evaluates to an infinite set unless A is empty:

 sequence(1) == {[],[1],[1,1],[1,1,1],...}

The definition of sequences (A) can also be defined as the set of sequences whose elements are taken from the set A. Formally, if a in sequence(A) then ~(~a) =< A. Like the multiset operator, the sequence operator is used almost exclusively for defining sequence types.

The String Type

A special case of sequence is the string type. Formally, string is defined as:

   string == sequence(character);

The following notation defines a variable of type string:

   str ::string;

A shorthand for forming constant strings is the classical notation embedding a sequence of characters in quotations. Specifically, "ABcdEF" is equivalent to the sequence [‘A’,‘B’,‘c’,‘d’,‘E’,‘F’]. Thus, a string constant such as "$" can be defined as:

   dollarString ::string is "$";

All operations that apply to sequences also apply to strings because strings are simply character sequences. In particular, the notation "abc"&"def" is useful for concatenation of strings. Ordering operations for sequences provide lexicographical ordering for strings based on the ordering of Unicode characters. Relationships like "ab"<"abc" are true as a result of subsequence operations.

Example 5.5. String Usage

Rosetta’s string type provides operations that are typical of other similar languages. By treating strings as sequences of characters, operations are provided with little additional semantics. The following examples represent some useful operations over strings:

  allcaps(s::string)::string is map uc s;

  firstn(s::string, n::natural)::string is
     s sub [0..,n-1];

  pal(s::string)::boolean is s == reverse(s);

  contains(s::string, c::set(character))::boolean is

    (filter <*(x::character)::boolean is not(x in c)*> s) == [];

The Bitvector and Word Types

A second special case of sequence is the bitvector type used heavily in digital systems design. Formally, bitvector is defined:

  bitvector == sequence(bit);

Thus, a bitvector is an array that contains only elements 0 and 1.

Bitvector literals may be specified either as binary or hexadecimal strings preceded by a the bitvector literal indicator b for binary or x for hexadecimal. The following bitvector literals define the same value:

  b"10001100"

  x"8C"

Operations over bit are generalized to bitvectors by performing each operation on similarly indexed bits from the two bitvectors using the zip operation. One such example of this is the definition of bitvector and. Assuming that b0 and b1 are equally sized bitvectors:

  b0 and b1 = zip(__and__,b0,b1)

When applying bit operations to bitvectors, it is mandatory that the arguments be of the same length. The padding operators can easily be used to assure this requirement is satisfied. Assuming that b0 and b1 are bitvectors of unknown length less than 16 bits, they can be treated as 16-bit values as follows:

  padl(b0,16,0) and padl(b1,16,0)

Table 5.7. Operations defined on bitvectors in addition to traditional sequence operations

Operation

Syntax

Return Type

Binary operations

A and B, A or B, A nand B, A nor B, A xor B, A xnor B, not A

bitvector

Conversion to natural

bv2nat(b)

natural

Conversion from natural

nat2bv(n)

bitvector

Complement

twos(A)

 

Shift operations

lshr(A), lshl(A),

bitvector

 

ashr(A), ashl(A)

bitvector

Rotate operations

rotr(A), rotl(A)

bitvector

Pad operations

padr(A,l,n), padl(A,l,n), sext(A,l)

bitvector

This use of padding assures that both operands have 16 bits. As we shall see later, Rosetta provides general padding and sign extension operators.

The operations bv2nat and nat2bv provide standard mechanisms for converting between binary and natural numbers. The binary value is treated as an unsigned value and converted in the canonical fashion. For example:

  bv2nat(b"10") == 2

  bv2nat(b"101") == 5

  nat2bv(7) == "111"

  nat2bv(0) == b"0"

It is always true that bv2n(n2bv(x)) == x.

The operation twos(bv) takes the two’s complement of a binary value by negating and adding 1 to the result. This operator provides a negation function for fixed-length binary values. For example:

  twos(b"011") == b"101"

  twos(b"101") == b"011"

It is always true that twos(twos(x)) == x.

Two’s complement operators operate on bitvectors with known lengths. To accommodate this, the twos function assumes that the resulting bitvector and the argument bitvector have the same length. The pad function can be used to appropriately size bitvectors prior to taking their complement, while the sign extend operator can be used to size the bitvectors after taking their complement.

Shifting operations are provided for manipulation of bitvectors. The lshr and lshl operations provide logical shift right and left, while ashr and ashl provide logical and arithmetic shifts right and left, the distinction being that logical shift operations shift in 0s while arithmetic shift operations shift in 1. The rotr and rotl operations provide rotation or circular shift. Examples include:

   lshr(b"1100") == b"0110"

   ashl(b"0011") == b"0111"

   rotr(b"0011") == b"1001"

   rotl(b"1011") == b"0111"

The padr(A,l,n) and padl(A,l,n) operations pad or concatenate a bitvector. Both functions take three arguments, a bitvector, a length value of type natural, and a pad value of type bit. If the length value is less than the length of the input bitvector, padr removes bits to the right and padl removes bits to the left, resulting in a vector of the specified length. In this case, the pad value is ignored. If the length value is greater than the length of the input bitvector, padr adds enough copies of the pad value to the right of the vector so that the return value is of the specified length. The padl operates similarly except that bits are added to the left of the bitvector argument. Some examples include:

  padr(b"1100",2,0) == b"11"

  padl(b"1100",2,0) == b"00"

  padr(b"1100",6,1) == b"110011"

  padl(b"1100",6,0) == b"001100"

The sext operation is a special case of padl and performs a sign extension on its argument. Sign extension is useful when padding a two’s complement value without losing its sign. The operand is extended to the left using the value of its most significant bit rather than an explicit parameter. Some examples include:

   sext(b"1100",2) == b"00"

   sext(b"1100",6) == b"111100"

   sext(b"0011",6) == b"000011"

Digital systems rarely deal with arbitrary bitvectors, but instead process fixed-length bitvectors. The type word is provided to define such fixed-length bitvector types. To define a word, the function word(n) is used, where n is the word length. For example:

   w::word(8);

defines a variable w that can take the value of any bitvector of length 8. Neither shorter nor longer bitvectors belong to this type. Because word is a subtype of bitvector, all bitvector operations are defined on words.

It should be noted that bitvector and word types are not the same as binary numbers. Thus, the expression:

    w = 210001001;

is not legal because the argument types do not match. The type of w is word(8) while the type of 210001001 is posint. Thus, the equivalence cannot be defined. The expressions:

     w = nat2bv(210001001);

     w = nat2bv(1689);

are legal because in each case the number literal is converted to a bitvector prior to equating it with w.

Example 5.6. Slicing Bitvectors

An excellent example of subscription use is decoding an instruction in a CPU. Assume that I is a sequence of bits representing a typical 16-bit instruction. It is possible to decode I by extracting sub-sequence associated with instruction fields. Assuming that the instruction format is an instruction ID followed by three 4-bit register IDs, the following functions decode the instruction into its constituent parts:

   op = I sub [15,..12];

   rs = I sub [11,..8];

   rt = I sub [7,..4];

   rd = I sub [3,..0];

The value of rd may be used as a two’s complement offset value using sign extend and the pad operation:

   offset = sext(padr(rd,5,0),16)

In this case, a word address is generated by padding with 0 to the right. The sign extend function is then used to extend the 5-bit offset value to 16 bits without losing sign information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.142.2