9.5 Abstract Syntax

Consider the string ((lambda (x) (f x)) (g y)) representing an expression in λ-calculus. An implementation of a programming language, such as an interpreter or compiler, reads strings like this, typically from standard input, and processes them. This program string is an external representation (i.e., it is external to the system processing it) and uses concrete syntax. Programs in concrete syntax are not readily processable. Scheme, however, is a homoiconic language, meaning that program codes and data are both represented using the same representation—in the case of Scheme as a list. In consequence, the availability of a Scheme program as an S-expression is convenient for any system processing it. The (read) facility in Scheme reads from standard input and returns the data read as an S-expression, sometimes called a list-and-symbol representation:

A set of four code lines in Scheme with the facility read.
Description
Continuation of the code in Scheme with the facility read consisting of four lines.
Description

While an S-expression representing a program can be more easily processed using calls to car and cdr than can a string representing a program, we still consider the former as concrete syntax.

Accessing the individual lexemes of this program to evaluate this expression requires cryptic and lengthy chains of calls to car and cdr. For instance, consider accessing the operand x of the call to f:

A set of two code lines in Scheme with the expressions car and c d r.
Description

Notably, the preceding program is more manipulable and, thus, processable when represented using the following definition of an expression data type:

A set of nine code lines with the definition of an expression data type.
Description

An abstract-syntax tree (AST) is similar to a parse tree, except that it uses abstract syntax or an internal representation (i.e., it is internal to the system processing it) rather than concrete syntax. Specifically, while the structure of a parse tree depicts how a sentence (in concrete syntax) conforms to a grammar, the structure of an abstract-syntax tree illustrates how the sentence is represented internally, typically with an inductive, variant record data type. For instance, Figure 9.1 illustrates an AST for the λ-calculus expression ((lambda (x) (f x)) (g y)). Abstract syntax is a representation of a program as a data structure—in this case, an inductive variant record. Consider the following grammar for λ-calculus, which is annotated with variants of this expression inductive variant record data type above the right-hand side of each production rule:3

A flow diagram of an abstract-syntax tree for an expression.

Figure 9.1 Abstract-syntax tree for ((lambda (x) (f x)) (g y)).

Description
A list of three rules for a lambda-calculus expression.
Description

Use of the expression data type makes other language-processing functions, such as occurs-free? (discussed in Chapter 6), more readable, because it eliminates cryptic and lengthy chains of calls to car and cdr:

A set of 11 code lines for the definition of occurs-free question mark.
Description

Recall, from Chapter 3, that parsing is the process of determining whether a string is a sentence (in some language) and, if so, typically converting the concrete representation of that sentence into an abstract representation that facilitates the intended subsequent processing. An abstract representation does not contain the details of the concrete representation that are irrelevant to the subsequent processing. The parser component of an interpreter or compiler typically converts the source program, once syntactically validated, into an abstract, or more easily manipulable, representation.

It is easier to (parse and) convert a list-and-symbol representation of a λ-calculus expression into abstract syntax than a string representation of the same expression. The following concrete2abstract function converts a concrete-syntax representation—in this case, a list-and-symbol S-expression—of a λ-calculus expression into its abstract-syntax representation:

A set of three code lines with the definition of concrete 2 abstract.
Description
Continuation of the code with the definition of concrete 2 abstract consisting of 11 code lines.
Description

Now consider an application of concrete2abstract to the λ-calculus expression:

A set of 10 code lines with an application of concrete 2 abstract to the lambda-calculus expression.
Description

Use of abstract syntax makes data representing code easier to manipulate and a program that processes code (i.e., programs) more readable.

3. This is the annotative style used in Friedman, Wand, and Haynes (2001).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.203.96