Parsing simple expressions

In this recipe, you will learn how to parse a simple expression. A simple expression may consist of numeric values, identifiers, function calls, a function declaration, and function definitions. For each type of expression, individual parser logic needs to be defined.

Getting ready

We must have the custom-defined language—that is, the TOY language in this case—and also stream of tokens generated by lexer. We already defined ASTs above. Further, we are going to parse the expression and invoke AST constructors for every type of expression.

How to do it…

To parse simple expressions, proceed with the following code flow:

  1. Open the toy.cpp file as follows:
    $ vi toy.cpp

    We already have lexer logic present in the toy.cpp file. Whatever code follows needs to be appended after the lexer code in the toy.cpp file.

  2. Define the parser function for numeric expression as follows:
    static BaseAST *numeric_parser() {
      BaseAST *Result = new NumericAST(Numeric_Val);
      next_token();
      return Result;
    }
  3. Define the parser function for an identifier expression. Note that identifier can be a variable reference or a function call. They are distinguished by checking if the next token is (. This is implemented as follows:
    static BaseAST* identifier_parser() {
      std::string IdName = Identifier_string;
      
      next_token();
      
      if(Current_token != '(')
      return new VariableAST(IdName);
      
      next_token();
      
      std::vector<BaseAST*> Args;
      if(Current_token != ')') {
        while(1) {
          BaseAST* Arg = expression_parser();
          if(!Arg) return 0;
          Args.push_back(Arg);
          
          if(Current_token == ')') break;
          
          if(Current_token != ',')
          return 0;
          next_token();
        }
      }
      next_token();
      
      return new FunctionCallAST(IdName, Args);
    }
  4. Define the parser function for the function declaration as follows:
    static FunctionDeclAST *func_decl_parser() {
      if(Current_token != IDENTIFIER_TOKEN)
      return 0;
      
      std::string FnName = Identifier_string;
      next_token();
      
      if(Current_token != '(')
      return 0;
      
      std::vector<std::string> Function_Argument_Names;
      while(next_token() == IDENTIFIER_TOKEN)
      Function_Argument_Names.push_back(Identifier_string);
      if(Current_token != ')')
      return 0;
      
      next_token();
      
      return new FunctionDeclAST(FnName, Function_Argument_Names);
    }
  5. Define the parser function for the function definition as follows:
    static FunctionDefnAST *func_defn_parser() {
      next_token();
      FunctionDeclAST *Decl = func_decl_parser();
      if(Decl == 0) return 0;
      
      if(BaseAST* Body = expression_parser())
      return new FunctionDefnAST(Decl, Body);
      return 0;
    }

    Note that the function called expression_parser used in the preceding code, parses the expression. The function can be defined as follows:

    static BaseAST* expression_parser() {
      BaseAST *LHS = Base_Parser();
      if(!LHS) return 0;
      return binary_op_parser(0, LHS);
    }

How it works…

If a numeric token is encountered, the constructor for the numeric expression is invoked and the AST object for the numeric value is returned by the parser, filling up the AST for numeric values with the numeric data.

Similarly, for identifier expressions, the parsed data will either be a variable or a function call. For function declaration and definitions, the name of the function and function arguments is parsed and the corresponding AST class constructors are invoked.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.72.212