Building an Abstract Syntax Tree

Currently, our parser interprets the input code and evaluates it in the same pass. Most compilers and interpreters; however, create an intermediate data structure before actually running the program: an Abstract Syntax Tree (AST). Using an AST offers some interesting possibilities; for example, it provides you with a structured representation of your program that you can then analyze. Also, you can use the AST and transform it back into a text-based program (maybe of another language).

An AST is a tree that represents the structure of a program. The first step to building an AST-based parser is to design the tree's object model: which classes are needed and in which way are they associated to another. The following figure shows the first draft for an object model that can be used to describe mathematical expressions:

Building an Abstract Syntax Tree

The (preliminary) object model for our Abstract Syntax Tree

In this model, nearly all classes implement the Expression interface. This interface prescribes the evaluate() method, which can be provided by the implementations of this interface to actually execute the operation modeled by the respective tree node. Let's start by implementing the PacktChp8DSLASTExpression interface:

namespace PacktChp8DSLAST; 
 
interface Expression 
{ 
    public function evaluate() 
} 

The next step is the Number class with its two subclasses: Integer and Decimal. As we're going to be using PHP 7's type hinting feature, and both the Integer and Decimal classes work exclusively with either the int or float variables; we cannot make much use of inheritance, forcing us to leave the Number class empty:

namespace PacktChp8DSLAST; 
 
abstract class Number implements Expression 
{} 

The Integer class can be initialized with a PHP integer value. As this class models a literal integer value; the only thing that the evaluate() method needs to do in this class is to return this value again:

namespace PacktChp8DSLAST; 
 
class Integer extends Number 
{ 
    private $value; 
 
    public function __construct(int $value) 
    { 
        $this->value = $value; 
    } 
 
    public function evaluate(): int 
    { 
        return $this->value; 
    } 
} 

The Decimal class can be implemented the same way; in this case, simply use float instead of int as type hints:

namespace PacktChp8DSLAST; 
 
class Decimal extends Number 
{ 
    private $value; 
 
    public function __construct(float $value) 
    { 
        $this->value = $value; 
    } 
 
    public function evaluate(): float 
    { 
        return $this->value; 
    } 
} 

For the classes Addition, Subtraction, Multiplication and Division, we'll be using a common base class, PacktChp8DSLASTBinaryOperation. This class will hold the constructor that you then won't have to implement over and over again:

namespace PacktChp8DSLAST; 
 
abstract class BinaryOperation implements Expression 
{ 
    protected $left; 
    protected $right; 
 
    public function __construct(Expression $left, Expression $right) 
    { 
        $this->left  = $left; 
        $this->right = $right; 
    } 
} 

Implementing the actual classes modeling the operations becomes easy. Let's consider the Addition class as an example:

namespace PacktChp8DSLAST; 
 
class Addition extends BinaryOperation 
{ 
    public function evaluate() 
    { 
        return $this->left->evaluate() + $this->right->evaluate(); 
    } 
} 

The remaining classes called Subtraction, Multiplication and Division can be implemented in a way similar to the Addition class. For the sake of brevity, the actual implementation of these classes is left as an exercise for you.

What's left now is to actually build the AST in the parser. This is relatively easy, as we can now simply modify the already existing hook functions that are called by the parser when individual rules are matched.

Let's start with the rules for parsing numbers:

Integer: value:('-'? Digit+) 
    function value(array &$result, array $sub) { 
        $result['node'] = new Integer((int) $sub['text']); 
    } 
 
Decimal: value:('-'? Digit* '.' Digit+) 
    function value(array &$result, array $sub) { 
        $result['node']  = new Decimal((float) $sub['text']); 
    } 
 
Number: Decimal | Integer 
    function Decimal(&$result, $sub) { 
        $result['node']  = $sub['node']; 
    } 
    function Integer(&$result, $sub) { 
        $result['node']  = $sub['node']; 
    } 

When the Integer or Decimal rule matches, we create a new AST node of the Integer or Decimal class and save it in the return array's node property. When the Number rule matches, we simply take over the already created node stored in the matched symbol.

We can adjust the Product rule in a similar way:

Product: left:Value (operand:(> operator:('*'|'/') > right:Value))* 
    function left(array &$result, array $sub) { 
        $result['node']  = $sub['node']; 
    } 
    function right(array &$result, array $sub) { 
        $result['node']  = $sub['node']; 
    } 
    function operator(array &$result, array $sub) { 
        $result['operator'] = $sub['text']; 
    } 
    function operand(array &$result, array $sub) { 
        if ($sub['operator'] == '*') { 
            $result['node'] = new Multiplication($result['node'], $sub['node']); 
        } else { 
            $result['node'] = new Division($result['node'], $sub['node']); 
        } 
    } 

As our AST model treats operations such as multiplications strictly as binary operations, the parser will deconstruct input expressions such as 1 * 2 * 3 * 4 into a chain of binary multiplications (similar to 1 * (2 * (3 * 4)) as shown in the following figure):

Building an Abstract Syntax Tree

The expression 1 * 2 * 3 * 4 as a syntax tree

Continue by adjusting your Sum rule in the same way:

Sum: left:Product (operand:(> operator:('+'|'-') > right:Product))* 
    function left(&$result, $sub) { 
        $result['node']  = $sub['node']; 
    } 
    function right(&$result, $sub) { 
        $result['node']  = $sub['node']; 
    } 
    function operator(&$result, $sub) { $result['operator'] = $sub['text']; } 
    function operand(&$result, $sub) { 
        if ($sub['operator'] == '+') { 
            $result['node'] = new Addition($result['node'], $sub['node']); 
        } else { 
            $result['node'] = new Subtraction($result['node'], $sub['node']); 
        } 
    } 

Now, all that's left is to read the created AST node in the Value and Expr rules is follows:

Value: Number | '(' > Expr > ')' 
    function Number(array &$result, array $sub) { 
        $result['node'] = $sub['node']; 
    } 
 
Expr: Sum 
    function Sum(array &$result, array $sub) { 
        $result['node'] = $sub['node']; 
    } 

In your test script, you can now test if the AST is correctly built by extracting the node property from the match_Expr() function's return value. You can then get the expression's result by calling the evaluate() method on the AST's root node:

$astRoot = (new Parser('1 + 2 * 3'))->match_Expr()['node']; 
var_dump($astRoot, $astRoot->evaluate()); 
 
$astRoot = (new Parser('(1 + 2) * 3'))->match_Expr()['node']; 
var_dump($astRoot, $astRoot->evaluate()); 

Note that the two expressions in this test script should yield two different syntax trees (both shown in the following figure) and evaluate to 7 and 9, respectively.

Building an Abstract Syntax Tree

The two syntax trees resulting from parsing the 1+2*' and (1+2)*' expressions

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.13.192