Currently, our parser interprets the input code and evaluates it in the same pass. Most compilers and interpreters; however, create an intermediate data structure before actually running the program: an Abstract Syntax Tree (AST). Using an AST offers some interesting possibilities; for example, it provides you with a structured representation of your program that you can then analyze. Also, you can use the AST and transform it back into a text-based program (maybe of another language).
An AST is a tree that represents the structure of a program. The first step to building an AST-based parser is to design the tree's object model: which classes are needed and in which way are they associated to another. The following figure shows the first draft for an object model that can be used to describe mathematical expressions:
In this model, nearly all classes implement the Expression
interface. This interface prescribes the evaluate()
method, which can be provided by the implementations of this interface to actually execute the operation modeled by the respective tree node. Let's start by implementing the PacktChp8DSLASTExpression
interface:
namespace PacktChp8DSLAST; interface Expression { public function evaluate() }
The next step is the Number
class with its two subclasses: Integer
and Decimal
. As we're going to be using PHP 7's type hinting feature, and both the Integer
and Decimal
classes work exclusively with either the int
or float
variables; we cannot make much use of inheritance, forcing us to leave the Number
class empty:
namespace PacktChp8DSLAST; abstract class Number implements Expression {}
The Integer
class can be initialized with a PHP integer value. As this class models a literal integer value; the only thing that the evaluate()
method needs to do in this class is to return this value again:
namespace PacktChp8DSLAST; class Integer extends Number { private $value; public function __construct(int $value) { $this->value = $value; } public function evaluate(): int { return $this->value; } }
The Decimal
class can be implemented the same way; in this case, simply use float
instead of int
as type hints:
namespace PacktChp8DSLAST; class Decimal extends Number { private $value; public function __construct(float $value) { $this->value = $value; } public function evaluate(): float { return $this->value; } }
For the classes Addition
, Subtraction
, Multiplication
and Division
, we'll be using a common base class, PacktChp8DSLASTBinaryOperation
. This class will hold the constructor that you then won't have to implement over and over again:
namespace PacktChp8DSLAST; abstract class BinaryOperation implements Expression { protected $left; protected $right; public function __construct(Expression $left, Expression $right) { $this->left = $left; $this->right = $right; } }
Implementing the actual classes modeling the operations becomes easy. Let's consider the Addition
class as an example:
namespace PacktChp8DSLAST;
class Addition extends BinaryOperation
{
public function evaluate()
{
return $this->left->evaluate() + $this->right->evaluate();
}
}
The remaining classes called Subtraction
, Multiplication
and Division
can be implemented in a way similar to the Addition
class. For the sake of brevity, the actual implementation of these classes is left as an exercise for you.
What's left now is to actually build the AST in the parser. This is relatively easy, as we can now simply modify the already existing hook functions that are called by the parser when individual rules are matched.
Let's start with the rules for parsing numbers:
Integer: value:('-'? Digit+) function value(array &$result, array $sub) { $result['node'] = new Integer((int) $sub['text']); } Decimal: value:('-'? Digit* '.' Digit+) function value(array &$result, array $sub) { $result['node'] = new Decimal((float) $sub['text']); } Number: Decimal | Integer function Decimal(&$result, $sub) { $result['node'] = $sub['node']; } function Integer(&$result, $sub) { $result['node'] = $sub['node']; }
When the Integer
or Decimal
rule matches, we create a new AST node of the Integer
or Decimal
class and save it in the return array's node property. When the Number
rule matches, we simply take over the already created node stored in the matched symbol.
We can adjust the Product
rule in a similar way:
Product: left:Value (operand:(> operator:('*'|'/') > right:Value))* function left(array &$result, array $sub) { $result['node'] = $sub['node']; } function right(array &$result, array $sub) { $result['node'] = $sub['node']; } function operator(array &$result, array $sub) { $result['operator'] = $sub['text']; } function operand(array &$result, array $sub) { if ($sub['operator'] == '*') { $result['node'] = new Multiplication($result['node'], $sub['node']); } else { $result['node'] = new Division($result['node'], $sub['node']); } }
As our AST model treats operations such as multiplications strictly as binary operations, the parser will deconstruct input expressions such as 1 * 2 * 3 * 4
into a chain of binary multiplications (similar to 1 * (2 * (3 * 4))
as shown in the following figure):
Continue by adjusting your Sum
rule in the same way:
Sum: left:Product (operand:(> operator:('+'|'-') > right:Product))* function left(&$result, $sub) { $result['node'] = $sub['node']; } function right(&$result, $sub) { $result['node'] = $sub['node']; } function operator(&$result, $sub) { $result['operator'] = $sub['text']; } function operand(&$result, $sub) { if ($sub['operator'] == '+') { $result['node'] = new Addition($result['node'], $sub['node']); } else { $result['node'] = new Subtraction($result['node'], $sub['node']); } }
Now, all that's left is to read the created AST node in the Value
and Expr
rules is follows:
Value: Number | '(' > Expr > ')' function Number(array &$result, array $sub) { $result['node'] = $sub['node']; } Expr: Sum function Sum(array &$result, array $sub) { $result['node'] = $sub['node']; }
In your test script, you can now test if the AST is correctly built by extracting the node
property from the match_Expr()
function's return value. You can then get the expression's result by calling the evaluate()
method on the AST's root node:
$astRoot = (new Parser('1 + 2 * 3'))->match_Expr()['node']; var_dump($astRoot, $astRoot->evaluate()); $astRoot = (new Parser('(1 + 2) * 3'))->match_Expr()['node']; var_dump($astRoot, $astRoot->evaluate());
Note that the two expressions in this test script should yield two different syntax trees (both shown in the following figure) and evaluate to 7 and 9, respectively.
3.147.13.192