10.6 A First Camille Interpreter

10.6.1 Front End for Camille

Language processing starts with a program to convert Camille program text (i.e., a string) into an abstract-syntax tree. In other words, we need a scanner and a parser, referred to as a front end (shown on the left-hand side of Figure 10.1), which can accept a string, verify that it is a sentence in Camille, and translate it into an abstract-syntax representation. Recall from Chapter 3 that scanning culls out the lexemes, determines whether all are valid, and returns a list of tokens. Parsing determines whether the list of tokens is in the correct order and, if so, structures this list into an abstract-syntax tree. A parser generator is a program that accepts lexical and syntactic specifications and automatically generates a scanner and parser from them. We use the PLY (Python Lex-Yacc) parser generator for Python introduced in Chapter 3 (i.e., the Python analog for lex and yacc in C). The following code is a generator in PLY for the front end of Camille:

A set of nine code lines in Camille that is a generator in P L Y for the front end of Camille.
Description
Continuation of the code in Camille that is a generator in P L Y for the front end of Camille, consisting of 63 lines.
Description
Continuation of the code in Camille that is a generator in P L Y for the front end of Camille, consisting of 58 lines.
Description

Lines 9–63 and 65–112 constitute the lexical and syntactic specifications, respectively. Comments in Camille programs begin with the lexeme --- (i.e., three consecutive dashes) and continue to the end of the line. Multi-line comments are not supported. Comments are ignored by the scanner (lines 51–53). Recall from Chapter 3 that the lex.lex() (line 62) generates a scanner. Similarly, the function yacc.yacc() generates a parser and is called in the interpreter from the REPL definition (Section 10.6.4). Notice that the p_line_expr function (lines 82–85) has changed slightly from the version shown on lines 135–139 in the parser generator listing in Section 9.6.2. In particular, lines 138–139 in the original definition

A set of five code lines in Camille with the p underscore line underscore e x p r function.
Description

are replaced with line 85 in the current definition:

A set of four code lines in Camille with the p underscore line underscore e x p r function.
Description

Rather than assign the final abstract-syntax tree to the global variable global_tree (line 139) so that it can be referenced by a function that invokes the parser (e.g., the concrete2abstract function), now we pass the tree to the interpreter (i.e., the evaluate_expr function) on line 85.

For details on PLY, see https://www.dabeaz.com/ply/. The use of a scanner/parser generator facilitates this incremental development approach, which leads to a more malleable interpreter/language. Thus, the lexical and syntactic specifications given here can be used as is, and the scanner and parser generated from them can be considered black boxes.

10.6.2 Simple Interpreter for Camille

A simple interpreter for Camille follows:

A set of 19 code lines in Camille with a simple interpreter.
Description
Continuation of the code in Camille with a simple interpreter consisting of 63 lines.
Description
Continuation of the code in Camille with a simple interpreter consisting of 18 lines.
Description

This segment of code contains both the definitions of the abstract-syntax tree data structure (lines 144–163) and the evaluate_expr function (lines 194–228). Notice that for each variant (lines 147–150) of the TreeNode data type (lines 152–162) that represents a Camille expression, there is a corresponding action in the evaluate_expr function (lines 194–228). Each variant in the TreeNode variant record2 has a case in the evaluate_expr function. This interpreter is the component on the right-hand side of Figure 4.1, replicated here as Figure 10.1.

10.6.3 Abstract-Syntax Trees for Arguments Lists

We briefly discuss how the arguments to a primitive operator are represented in the abstract-syntax tree and evaluated. The following rules are used to represent the list of arguments to a primitive operator (or a function, which we encounter in Chapter 11):

A list of three rules for representing grammar.
Description

Since all primitive operators in Camille accept arguments, the rule <arguments> ::= ε applies to (forthcoming) user-defined functions that may or may not accept arguments (as discussed in Chapter 11).

Consider the expression *(7,x) and its abstract-syntax tree presented in Figure 10.2. The top half of each node represents the type field of the TreeNode, the bottom right quarter of each node represents one member of the children list, and bottom left quarter of each node represents the leaf field. The ntExpressionList variant of TreeNode represents an argument list.

A flow diagram of the abstract-syntax tree for a Camille expression.

Figure 10.2 Abstract-syntax tree for the Camille expression *(7,x).

Description

The ntExpressionList variant of an abstract-syntax tree constructed by the parser is flattened into a Python list by the interpreter for subsequent processing. A post-order traversal of the ntExpressionList variant is conducted, with the values in the leaf nodes being inserted into a Python list in the order in which they appear in the application of the primitive operator in the Camille source code. Each leaf is evaluated using evaluate_expr and its value is inserted into the Python list. Lines 205–211 of the evaluate_expr function (replicated here) demonstrate this process:

A set of seven code lines in Python with the evaluate underscore e x p r function.
Description

If a child exists, it becomes the next ntExpressionList node to be (recursively) traversed (line 210). This flattening process continues until a ntExpressionList node without a child is reached. The list returned by the recursive call to evaluate_expr is appended to the list created with the leaf of the node (line 210).

10.6.4 REPL: Read-Eval-Print Loop

To make this interpreter operable (i.e., to test it), we need an interface for entering Camille expressions and running programs. The following is a read-eval-print loop (REPL) interface to the Camille interpreter:

A set of 42 code lines in Camille with R E P L interface.
Description

The function yacc.yacc() invoked on line 232 generates a parser and returns an object (here, named parser) that contains a function (named parse). This function accepts a string (representing a Camille program) and parses it (line 118 in the parser generator listing).

This REPL supports two ways of running Camille programs: interactively and non-interactively. In interactive mode (lines 238–259), the function main_func prints the prompt, reads a string from standard input (line 243), and passes that string to the parser (line 245). In non-interactive mode (lines 261–268), the prompt for input is not printed. Instead, the REPL receives one or more Camille programs in a single source code file passed as a command-line argument (line 262), reads it as a string (line 263), and passes that string to the parser (line 264).

10.6.5 Connecting the Components

The following diagram depicts how the components of the interpreter are connected.

A flow diagram is as follows: R E P L with the function parser parse in line 118 leads to front end, which, with the function evaluate underscore e x p r leads in 85, leads to the interpreter.

The REPL reads a string and passes it to the front end (parser.parse; line 118). The front end parses that string, while concomitantly building an abstract-syntax representation/tree for it, and passes that tree to the interpreter (evaluate_expr—the entry point of the interpreter; line 85). The interpreter traverses the tree to evaluate the program that the tree represents. Notice that this diagram is an instantiated view of Figure 10.1 with respect to the components of the Camille interpreter presented here.

10.6.6 How to Run a Camille Program

A bash script named run is available for use with each version of the Camille interpreter:

A set of two code lines in Camille with a bash script.
Description

Interactive mode is invoked by executing run without any command-line argument. The following is an interactive session with the Camille interpreter:

A set of 15 code lines in Camille with the run script.
Description

Non-interactive mode is invoked by passing the run script a single source code filename representing one or more Camille programs:

A set of 17 code lines in Camille with the run script in a non-interactive mode.
Description

In both interactive and non-interactive modes, Camille programs must be separated by a blank line—which explains the blank lines after each input expression in these transcripts from the Camille interpreter. We use this blank line after each program to support both the evaluation of multi-line programs at the REPL (in interactive mode) and the evaluation of multiple programs in a single source code file (in non-interactive mode).

2. Technically, it is not a variant record as strictly defined, but rather a data type with fixed fields, where one of the fields, the type flag, indicates the interpretation of the fields.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.28.9