Consider the string ((lambda (x) (f x)) (g y)) representing an expression in λ-calculus. An implementation of a programming language, such as an interpreter or compiler, reads strings like this, typically from standard input, and processes them. This program string is an external representation (i.e., it is external to the system processing it) and uses concrete syntax. Programs in concrete syntax are not readily processable. Scheme, however, is a homoiconic language, meaning that program codes and data are both represented using the same representation—in the case of Scheme as a list. In consequence, the availability of a Scheme program as an S-expression is convenient for any system processing it. The (read) facility in Scheme reads from standard input and returns the data read as an S-expression, sometimes called a list-and-symbol representation:
While an S-expression representing a program can be more easily processed using calls to car and cdr than can a string representing a program, we still consider the former as concrete syntax.
Accessing the individual lexemes of this program to evaluate this expression requires cryptic and lengthy chains of calls to car and cdr. For instance, consider accessing the operand x of the call to f:
Notably, the preceding program is more manipulable and, thus, processable when represented using the following definition of an expression data type:
An abstract-syntax tree (AST) is similar to a parse tree, except that it uses abstract syntax or an internal representation (i.e., it is internal to the system processing it) rather than concrete syntax. Specifically, while the structure of a parse tree depicts how a sentence (in concrete syntax) conforms to a grammar, the structure of an abstract-syntax tree illustrates how the sentence is represented internally, typically with an inductive, variant record data type. For instance, Figure 9.1 illustrates an AST for the λ-calculus expression ((lambda (x) (f x)) (g y)). Abstract syntax is a representation of a program as a data structure—in this case, an inductive variant record. Consider the following grammar for λ-calculus, which is annotated with variants of this expression inductive variant record data type above the right-hand side of each production rule:3
Use of the expression data type makes other language-processing functions, such as occurs-free? (discussed in Chapter 6), more readable, because it eliminates cryptic and lengthy chains of calls to car and cdr:
Recall, from Chapter 3, that parsing is the process of determining whether a string is a sentence (in some language) and, if so, typically converting the concrete representation of that sentence into an abstract representation that facilitates the intended subsequent processing. An abstract representation does not contain the details of the concrete representation that are irrelevant to the subsequent processing. The parser component of an interpreter or compiler typically converts the source program, once syntactically validated, into an abstract, or more easily manipulable, representation.
It is easier to (parse and) convert a list-and-symbol representation of a λ-calculus expression into abstract syntax than a string representation of the same expression. The following concrete2abstract function converts a concrete-syntax representation—in this case, a list-and-symbol S-expression—of a λ-calculus expression into its abstract-syntax representation:
Now consider an application of concrete2abstract to the λ-calculus expression:
Use of abstract syntax makes data representing code easier to manipulate and a program that processes code (i.e., programs) more readable.
3. This is the annotative style used in Friedman, Wand, and Haynes (2001).
3.143.203.96