In this recipe, you will learn how to parse a simple expression. A simple expression may consist of numeric values, identifiers, function calls, a function declaration, and function definitions. For each type of expression, individual parser logic needs to be defined.
We must have the custom-defined language—that is, the TOY language in this case—and also stream of tokens generated by lexer. We already defined ASTs above. Further, we are going to parse the expression and invoke AST constructors for every type of expression.
To parse simple expressions, proceed with the following code flow:
toy.cpp
file as follows:$ vi toy.cpp
We already have lexer logic present in the toy.cpp
file. Whatever code follows needs to be appended after the lexer code in the toy.cpp
file.
parser
function for numeric expression as follows:static BaseAST *numeric_parser() { BaseAST *Result = new NumericAST(Numeric_Val); next_token(); return Result; }
parser
function for an identifier expression. Note that identifier can be a variable reference or a function call. They are distinguished by checking if the next token is (
. This is implemented as follows:static BaseAST* identifier_parser() { std::string IdName = Identifier_string; next_token(); if(Current_token != '(') return new VariableAST(IdName); next_token(); std::vector<BaseAST*> Args; if(Current_token != ')') { while(1) { BaseAST* Arg = expression_parser(); if(!Arg) return 0; Args.push_back(Arg); if(Current_token == ')') break; if(Current_token != ',') return 0; next_token(); } } next_token(); return new FunctionCallAST(IdName, Args); }
parser
function for the function declaration as follows:static FunctionDeclAST *func_decl_parser() { if(Current_token != IDENTIFIER_TOKEN) return 0; std::string FnName = Identifier_string; next_token(); if(Current_token != '(') return 0; std::vector<std::string> Function_Argument_Names; while(next_token() == IDENTIFIER_TOKEN) Function_Argument_Names.push_back(Identifier_string); if(Current_token != ')') return 0; next_token(); return new FunctionDeclAST(FnName, Function_Argument_Names); }
parser
function for the function definition as follows:static FunctionDefnAST *func_defn_parser() { next_token(); FunctionDeclAST *Decl = func_decl_parser(); if(Decl == 0) return 0; if(BaseAST* Body = expression_parser()) return new FunctionDefnAST(Decl, Body); return 0; }
Note that the function called expression_parser
used in the preceding code, parses the expression. The function can be defined as follows:
static BaseAST* expression_parser() { BaseAST *LHS = Base_Parser(); if(!LHS) return 0; return binary_op_parser(0, LHS); }
If a numeric token is encountered, the constructor for the numeric expression is invoked and the AST object for the numeric value is returned by the parser, filling up the AST for numeric values with the numeric data.
Similarly, for identifier expressions, the parsed data will either be a variable or a function call. For function declaration and definitions, the name of the function and function arguments is parsed and the corresponding AST class constructors are invoked.
18.227.72.212