2
Introduction to Semantics of Programming Languages

This chapter introduces intuitively the notions of name, environment, memory, etc., along with a first formal description of these notions. It allows readers to familiarize themselves with the semantic approach of programming that we share with a number of other authors [ACC 92, DOW 09, DOW 11, FRI 01, WIN 93].

Any high-level programming language uses names to denote the entities handled by programs. These names are generally known as identifiers, drawing attention to the fact that they are constructed in accordance with the syntactic rules of the chosen language. They may be used to denote program-specific values or values computed during execution. They may also denote locations (i.e. addresses in the memory), they are then called mutable variables. And identifiers can also denote operators, functions, procedures, modules, objects, etc., according to the constructs present in the language. For example, pi is often used to denote an approximate value of π; + is also an identifier, denoting an addition operator and often placed between the two operands, i.e. in infix position, as in 2 + 3. The expression 2 * x + 1 uses the identifier x and to compute its value, we need to know the value denoted by x. Retrieving the value associated with a given identifier is a mechanism at the center of any high-level language. The semantics of a language provides a model of this mechanism, presented – in a simplified form – in section 2.1.

All the formal definitions of languages, instructions, algorithms, etc., given in the following are coded in the programming languages OCaml and Python, trying to paraphrase these definitions and produce very similar versions of code in these two languages, even if developers in these languages may find the programming style used here rather unusual. For readers not introduced to these languages, some very brief explanations are given in the codes’ presentation. But almost all features of OCaml and Python will be considered either in this first volume or in the second, where object-oriented programming is considered. We hope that these two encodings of formal notions can help readers who are not truly familiar with mathematical formalism.

2.1. Environment, memory and state

2.1.1. Evaluation environment

Let X be a set of identifiers and V a set of values. The association of an identifier x X with a value v V is called a binding (of the identifier to its value), and a set Env of bindings is called an execution environment or evaluation environment. Env(x) denotes the value associated with the identifier x in Env. The set of environments is denoted as E.

In practice, the set of identifiers X that are actually used is finite: usually, we only consider those identifiers that appear in a program. An environment may thus be represented by a list of bindings, also called Env:

image

where {x1 ,x2, . . ., xn} denotes a finite subset of X, known as the domain of the environment and denoted as dom(Env). By convention, Env(x) denotes the value v, which appears in the first (x, v) binding encountered when reading the list Env from the head (here, from left to right).

In this model, a binding can be added to an environment using the operator ⊕. By convention, bindings are added at (the left of) the head of the list representing the environment:

image

Suppose that a certain operation introduces a new binding of an identifier, which is already present in the environment, for example (x2, vnew):

image

The so-obtained environment (x2,vnew) ⊕ Env contains two bindings for x2. Searching for a binding starts at the head of the environment, and, with our convention, new bindings are added at the head. So the most recent addition, (x2,vnew), will be the first found. The binding (x2, v2) is not deleted, but it is said to be masked by the new binding (x2, vnew). Several bindings for a single identifier x may therefore exist within the same environment, and the last binding added for x will be used to determine the associated value of x in the environment. Formally, the environment (x, v) ⊕ Env verifies the following property:

image

By convention, the notation (x2,v2) ⊕ (x1 ,v1) ⊕ Env is used to denote the environment (x2,v2) ⊕ ((x1,v1) ⊕ Env). For example, ((x,v2) ⊕ (x,v1) ⊕Env)(x) = v2

2.1.2. Memory

The formal model of the memory presented below makes no distinction between the different varieties of physical memory described in Chapter 1. The state of the memory is described by associating a value with the location of the cell in which it is stored. The locations themselves are considered as values, called references. As we have seen, high-level languages allow us to name a location c, containing a value v, by an identifier x bound to the reference r of c.

Let R be a set of references and V a set of values. The association of a reference r ∈ R with a value v V is represented by a pair (r, v), and a set Mem of such pairs is called here a memory. Mem(r) denotes the value stored at the reference r in Mem. Let M be the set of memories. In practice, the set of references, which is actually used, is finite: once again, only those locations used by a program are generally considered. This means that the memory can be represented by a list, also called Mem:

image

The existence of a pair (r, v) in a memory records that an initialization or a writing operation has been carried out at this location. Every referenced memory cell may be consulted through reading and can be assigned a new value by writing. In this case, the value previously held in the cell is deleted, it has “crashed”.

Writing a value v at an address r transforms a memory Mem into a memory denoted as Mem[r := v]; if a value was stored at this location r in Mem, then this “old” value is replaced by v; otherwise, a new pair is added to Mem to take account of the writing operation. There is no masking, contrary to the case of the environments. Writing a new value v at a location r that already contains a value deletes the old value Mem(r):

image

The domain of a memory dom(Mem) depends on the current environment and represents the space of references, which are accessible (directly or indirectly) from bound and non-masked identifiers in the current execution environment. The addition of a binding (x, r) to an environment Env has a twofold effect, creating (x, r) ⊕ Env and extending Mem to Mem[r := v].

NOTE.– Depending on the language or program in question, the value v may be supplied just when x is introduced, or later, or never. If no value is provided prior to its first use, the result of the program is unpredictable, leading to errors called initialization errors. Indeed, a location always contains a value, which does not need to be suited to the current computation if it has not been explicitly determined by the program.

Note that the addition of a binding (x, r) in an environment Env, of which the domain contains x, may mask a previous binding of x in Env, but will not add a new pair (r, v) to Mem if r was already present in the domain of Mem. Thus, any list of pairs representing a memory cannot contain two different pairs for the same reference. The memory Mem[r := v] verifies the following property:

image

The memory (Mem[r1 := v1 ])[r2 := v2] is denoted as Mem[r1 := v1 ][r1 := v2].

For example, (Mem[r1:= v1 ][r2:= v2])(r) = v2.

2.1.3. State

A state is defined as a pair (Env, Mem) ∈ E × M such that any reference in the domain of Mem is accessible from a binding in Env. A reference is said to be accessible if its value can be read or written from an identifier contained in Env by a series of operations of reading, writing, or reference manipulation.

Given an environment Env, the set of identifiers X is partitioned into two subsets: image which contains the identifiers bound to a reference, and image, which contains the others:

image

The value associated with an identifier x in image is a reference Env(x) = r where a value Mem(r) is stored, which can be modified by writing. Identifiers of image are generally called mutable variables.

2.2. Evaluation of expressions

The value of an expression is computed according to an evaluation environment and a memory, i.e. in a given state. This computation is defined by the evaluation semantics of the expression.

2.2.1. Syntax

The language of expressions Exp1 used here will be extended in Chapters 3 and 4. Its syntax is defined in Table 2.1.

Table 2.1. Language of expressions Exp1

e ::= k Integer constant (k ∈ ℤ)
| x Identifier (xX)
| e1 + e2 Addition (e1, e2Exp1)
| !x Dereferencing (xX)

Thus, an expression eExp1 is either an integer constant k ℤ, an identifier x ∈ X, an expression obtained by applying an addition operator to two expressions in Exp1 or an expression of the form !x denoting the value stored in the memory at the location bound to the mutable variable x. Thus, this is an inductive definition of the set Exp1. Note that Exp1 does not include an assignment construct. This is a deliberate choice. This point will be discussed in greater detail in section 2.3 by means of an extension of Exp1.

NOTE. – The symbol + used in defining the syntax of expressions does not denote the integer addition operator. It could be replaced by any other symbol (for example ⊠). Its meaning will be assigned by the evaluation semantics. The same is true of the constant symbols: for example, the symbol 4 may be interpreted as a natural integer, a relative integer or a character.

EXAMPLE 2.1.– !x + y is an expression of Exp1 in the same way as (x + 2) + 3. Parentheses are used here to structure the expression, they are part of the so-called concrete syntax and will disappear in the AST.

2.2.2. Values

Given a state (Env, Mem), we determine the evaluation semantics of an expression e ∈ Exp1 by computing the value of e in this state, i.e. by evaluating e in this state. Values may be relative integers or references, hence V = ℤ U R. An additional, specific value Err is added to the set V; this result is returned as the value of “meaningless” expressions. The result of the evaluation of an expression in Exp1 will therefore be a value belonging to the set image

2.2.3. Evaluation semantics

There are several formalisms that may be used to describe the evaluation of an expression. These will be introduced later. Let us construct an evaluation function:

image

The evaluation of the expression e in the environment Env and memory state Mem is denoted as image with vV. Table 2.2 contains the recursive definition of the function image

Table 2.2. Evaluation of the expressions of Exp1

image (k ∈ ℤ)
image if xX and x dom(Env)
image if xX and x ∉ dom(Env)
image image
image image
image image
image image

The value of an integer constant is the integer that it represents. The value of an identifier is that which is bound to it in the environment, or Err. The value of an expression constructed with an addition symbol and two expressions e1 and e2 is obtained by adding the relative integers resulting from the evaluations of e1 and e2; the result will be Err if e1 or e2 is not an integer. The value of !x is the value stored at the reference Env(x) when x is a mutable variable, and Err otherwise.

Thus, if e is evaluated as a reference, then e can only be an identifier. Furthermore, certain expressions in Exp1 are syntactically correct, but meaningless: for example, the expression !x when x is not a mutable variable, i.e. when x does not bind a reference in the environment, or x1 + x2 when x1 (or x2) is a mutable variable. On the other hand, !x + y is a meaningful expression that denotes a value when y binds an integer and x binds a reference to an integer.

EXAMPLE 2.2.– Let us evaluate the expression !x + y

in the state image and image

image

2.3. Definition and assignment

2.3.1. Defining an identifier

The language Def 1 extends Exp1 by adding definitions of identifiers. There are two constructs that make it possible to introduce an identifier naming a mutable or non-mutable variable (as defined in section 2.1.3). Note that, in both cases, the initial value must be provided. This value corresponds to a constant or to the result of a computation specified by an expression eExp1. These constructs modify the current state of the system; after computing image, the next step in evaluating let x = e; is to add the binding (x, image) to the environment, while the evaluation of var x = e; adds a binding (x, rx) to the environment and writes the value image to the reference rx. In this case, we assume that the location denoted by the reference rx is computed by an external mechanism responsible for memory allocation.

Table 2.3. Language Def 1 of definitions

d ::= let x = e; Definition of a non-mutable variable (xX, eExp1)
| var x = e; Definition of a mutable variable (xX, eExp1)

The evaluation of a definition is expressed as follows:

(2.1)image

This evaluation →Def 1 defines a relation between a state, a definition and a resulting state, or, in formal terms:

image

Starting with a finite sequence of definitions d = [d1; . . . dn] and an initial state (Env0, Mem0), this relation produces the state (Envn, Memn):

image

This sequence of transitions may, more simply, be noted image (Envn, Memn).

EXAMPLE 2.3.– Starting with a memory with no accessible references and an “empty” environment, the sequence [var y = 2; let x = !y + 3;] builds the following state:

image

In the environment Env = [(x, 5), (y, ry)], we obtain = {y} and = {x}.

NOTE.– In the definition of the two transitions in [2.1], we presume that the result of the evaluation of the expression e, denoted as image, is not an error result. In the case of an error, no state will be produced and the evaluation stops.

2.3.2. Assignment

The language Lang1 extends Def 1 by adding assignment. The syntax of an assignment instruction is:

x : = e

where xX and eExp1. When the mutable variable x is already bound in the current environment, this instruction enables us to modify the value of !x. Formally, execution of the instruction x := e modifies the memory of the current state, and it is described by the following transition:

image

NOTE.– Once again, if the identifier x is not bound in the environment or if the evaluation of e results in an error, no state is generated and evaluation stops.

EXAMPLE 2.4.– Based on the state obtained in example 2.3, the following two assignments can be executed:

image

NOTE.– In the presentation above, assignment concerns only mutable variables. Similarly, the dereferencing operator ! can only be applied to a reference to obtain the stored value at this location. Thus, the assignment of a value to a mutable variable from a given reference requires here the use of the dedicated syntactic structures: x := ! x + 1. However, in many languages, assignment does not explicitly mention the dereferencing operator, and the syntactic use of variables is identical on both sides of the assignment: x = x + 1. Hence, an identifier x denotes two different notions: the value Env(x) on the left of the assignment symbol, and the value Mem(Env(x)) on the right of the assignment symbol.

These languages mask the different roles of a variable according to its position in the assignment. When a variable x is positioned to the left of the assignment, it is known as an l-value and it denotes a location where the value to be assigned should be stored. In this way, it acts as a pointer. The variable x on the right of the assignment is known as the r-value, and implicitly acts as a dereferenced pointer: the value to fetch is found at the location denoted by the variable. Thus, in the expression x = x + 1, even if x is used in the same way from a syntactic perspective, the instance on the right implicitly denotes ! x. In some languages, variable declaration implicitly involves the creation of a reference, unless otherwise stated; the name of the variable represents the location where the compiler will store the values assigned to it.

2.4. Exercises

Exercise 2.1

Consider the state etat0 defined by:

image
  • 1) compute image and image.
  • 2) Give a sequence of definitions to obtain the state etat0 from the empty state ([ ], [ ]). Are there other sequences that can be used to obtain this state?
  • 3) compute image
  • 4) Give three examples of expressions that generate three different error types when evaluated in the state etat0.
  • 5) Consider the sequence
    image

    Determine the states etat1 = (Env1, Mem2) and etat2 = (Env2, Mem2) and the sets image, image, image.

  • 6) Consider the sequence
image

Determine the states etat3 = (Env3, Mem3) and etat4 = (Env4, Mem4). Does image

Exercise 2.2

The two constructs x + + and + + x:

image

are added to the language Exp1. In these two new constructs, the identifier x must be a mutable variable. Intuitively, we see that the evaluation of the expression x + + produces the value stored in the location denoted by x and increments the value in the memory by 1. The expression + + x is evaluated differently: the value stored at the location denoted by x is incremented by 1, and this new value is the result of the evaluation.

  • 1) Define an evaluation function:
    image

    for expressions in the language Exp1, extended so that image expresses the fact that evaluation of the expression e in the state (Env, Mem) transforms this state into a state (Env’, Mem’) and produces the value v.

  • 2) Let Env = [(x, rx)] and Mem = [(rx, 6)]. Compute:
    image
  • 3) Show that, for any state (Env, Mem):
    image

    Now, we extend the language Lang1 by considering the expressions of the extended Def 1 language and adding the construction x+ := e. In informal terms, the execution of this instruction in a state (Env, Mem) consists of first finding the value vx stored at the reference v in Mem, then evaluating the expression e in this state to obtain its value ve and a new state (Env’, Mem’), and finally, assigning to v the result of the addition of ve and vx. If at least one of the two values ve and vx is not an integer, the execution fails.

  • 4) Redefine the relation →Lang1 for the extended Lang1 language.
  • 5) Let Env0 = [(x, rx)] and Mem0 = [(rx, 2)]. Determine the state etat1 such that image
  • 6) Do the assignments x := x + +, x := + + x and x+ := 1 produce the same states when executed in the same state?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.168.172