Language processing starts with a program to convert Camille program text (i.e., a string) into an abstract-syntax tree. In other words, we need a scanner and a parser, referred to as a front end (shown on the left-hand side of Figure 10.1), which can accept a string, verify that it is a sentence in Camille, and translate it into an abstract-syntax representation. Recall from Chapter 3 that scanning culls out the lexemes, determines whether all are valid, and returns a list of tokens. Parsing determines whether the list of tokens is in the correct order and, if so, structures this list into an abstract-syntax tree. A parser generator is a program that accepts lexical and syntactic specifications and automatically generates a scanner and parser from them. We use the PLY (Python Lex-Yacc) parser generator for Python introduced in Chapter 3 (i.e., the Python analog for lex and yacc in C). The following code is a generator in PLY for the front end of Camille:
Lines 9–63 and 65–112 constitute the lexical and syntactic specifications, respectively. Comments in Camille programs begin with the lexeme --- (i.e., three consecutive dashes) and continue to the end of the line. Multi-line comments are not supported. Comments are ignored by the scanner (lines 51–53). Recall from Chapter 3 that the lex.lex() (line 62) generates a scanner. Similarly, the function yacc.yacc() generates a parser and is called in the interpreter from the REPL definition (Section 10.6.4). Notice that the p_line_expr function (lines 82–85) has changed slightly from the version shown on lines 135–139 in the parser generator listing in Section 9.6.2. In particular, lines 138–139 in the original definition
are replaced with line 85 in the current definition:
Rather than assign the final abstract-syntax tree to the global variable global_tree (line 139) so that it can be referenced by a function that invokes the parser (e.g., the concrete2abstract function), now we pass the tree to the interpreter (i.e., the evaluate_expr function) on line 85.
For details on PLY, see https://www.dabeaz.com/ply/. The use of a scanner/parser generator facilitates this incremental development approach, which leads to a more malleable interpreter/language. Thus, the lexical and syntactic specifications given here can be used as is, and the scanner and parser generated from them can be considered black boxes.
A simple interpreter for Camille follows:
This segment of code contains both the definitions of the abstract-syntax tree data structure (lines 144–163) and the evaluate_expr function (lines 194–228). Notice that for each variant (lines 147–150) of the TreeNode data type (lines 152–162) that represents a Camille expression, there is a corresponding action in the evaluate_expr function (lines 194–228). Each variant in the TreeNode variant record2 has a case in the evaluate_expr function. This interpreter is the component on the right-hand side of Figure 4.1, replicated here as Figure 10.1.
We briefly discuss how the arguments to a primitive operator are represented in the abstract-syntax tree and evaluated. The following rules are used to represent the list of arguments to a primitive operator (or a function, which we encounter in Chapter 11):
Since all primitive operators in Camille accept arguments, the rule <arguments> ::= ε applies to (forthcoming) user-defined functions that may or may not accept arguments (as discussed in Chapter 11).
Consider the expression *(7,x) and its abstract-syntax tree presented in Figure 10.2. The top half of each node represents the type field of the TreeNode, the bottom right quarter of each node represents one member of the children list, and bottom left quarter of each node represents the leaf field. The ntExpressionList variant of TreeNode represents an argument list.
The ntExpressionList variant of an abstract-syntax tree constructed by the parser is flattened into a Python list by the interpreter for subsequent processing. A post-order traversal of the ntExpressionList variant is conducted, with the values in the leaf nodes being inserted into a Python list in the order in which they appear in the application of the primitive operator in the Camille source code. Each leaf is evaluated using evaluate_expr and its value is inserted into the Python list. Lines 205–211 of the evaluate_expr function (replicated here) demonstrate this process:
If a child exists, it becomes the next ntExpressionList node to be (recursively) traversed (line 210). This flattening process continues until a ntExpressionList node without a child is reached. The list returned by the recursive call to evaluate_expr is appended to the list created with the leaf of the node (line 210).
To make this interpreter operable (i.e., to test it), we need an interface for entering Camille expressions and running programs. The following is a read-eval-print loop (REPL) interface to the Camille interpreter:
The function yacc.yacc() invoked on line 232 generates a parser and returns an object (here, named parser) that contains a function (named parse). This function accepts a string (representing a Camille program) and parses it (line 118 in the parser generator listing).
This REPL supports two ways of running Camille programs: interactively and non-interactively. In interactive mode (lines 238–259), the function main_func prints the prompt, reads a string from standard input (line 243), and passes that string to the parser (line 245). In non-interactive mode (lines 261–268), the prompt for input is not printed. Instead, the REPL receives one or more Camille programs in a single source code file passed as a command-line argument (line 262), reads it as a string (line 263), and passes that string to the parser (line 264).
The following diagram depicts how the components of the interpreter are connected.
The REPL reads a string and passes it to the front end (parser.parse; line 118). The front end parses that string, while concomitantly building an abstract-syntax representation/tree for it, and passes that tree to the interpreter (evaluate_expr—the entry point of the interpreter; line 85). The interpreter traverses the tree to evaluate the program that the tree represents. Notice that this diagram is an instantiated view of Figure 10.1 with respect to the components of the Camille interpreter presented here.
A bash script named run is available for use with each version of the Camille interpreter:
Interactive mode is invoked by executing run without any command-line argument. The following is an interactive session with the Camille interpreter:
Non-interactive mode is invoked by passing the run script a single source code filename representing one or more Camille programs:
In both interactive and non-interactive modes, Camille programs must be separated by a blank line—which explains the blank lines after each input expression in these transcripts from the Camille interpreter. We use this blank line after each program to support both the evaluation of multi-line programs at the REPL (in interactive mode) and the evaluation of multiple programs in a single source code file (in non-interactive mode).
2. Technically, it is not a variant record as strictly defined, but rather a data type with fixed fields, where one of the fields, the type flag, indicates the interpretation of the fields.
3.147.28.9