Defining Abstract Syntax Tree

AST is a tree representation of the abstract syntactic structure of the source code of a programming language. The ASTs of programming constructs, such as expressions, flow control statements, and so on, are grouped into operators and operands. ASTs represent relationships between programming constructs, and not the ways they are generated by grammar. ASTs ignore unimportant programming elements such as punctuations and delimiters. ASTs generally contain additional properties of every element in it, which are useful in further compilation phases. Location of source code is one such property, which can be used to throw an error line number if an error is encountered in determining the correctness of the source code in accordance with the grammar (location, line number, column number, and so on, and other related properties are stored in an object of the SourceManager class in Clang frontend for C++).

The AST is used intensively during semantic analysis, where the compiler checks for correct usage of the elements of the program and the language. The compiler also generates symbol tables based on the AST during semantic analysis. A complete traversal of the tree allows verification of the correctness of the program. After verifying correctness, the AST serves as the base for code generation.

Getting ready

We must have run the lexer by now to obtain the tokens that will be used in generating the AST. The languages we intend to parse consist of expressions, function definitions, and function declarations. Again we have various types of expressions—variables, binary operators, numeric expressions, and so on.

How to do it…

To define AST structure, proceed with the following steps:

  1. Open the toy.cpp file as follows:
    $ vi toy.cpp

    Below the lexer code, define ASTs.

  2. A base class is defined for parsing an expression as follows:
    class BaseAST {
      public :
      virtual ~BaseAST();
    };

    Then, several derived classes are defined for every type of expression to be parsed.

  3. An AST class for variable expressions is defined as follows:
    class VariableAST  : public BaseAST{
      std::string Var_Name;  
    // string object to store name of
    // the variable.
      public:
      VariableAST (std::string &name) : Var_Name(name) {}  // ..// parameterized constructor of variable AST class to be initialized with the string passed to the constructor.
    };
  4. The language has some numeric expressions. The AST class for such numeric expressions can be defined as follows:
    class NumericAST : public BaseAST {
      int numeric_val;
      public :
      NumericAST (intval) :numeric_val(val)  {}
    };
  5. For expressions involving binary operation, the AST class can be defined as follows:
    Class BinaryAST : public BaseAST {
      std::string Bin_Operator;  // string object to store
      // binary operator
      BaseAST  *LHS, *RHS;  // Objects used to store LHS and   
    // RHS of a binary Expression. The LHS and RHS binary   
    // operation can be of any type, hence a BaseAST object 
    // is used to store them.
      public:
      BinaryAST (std::string op, BaseAST *lhs, BaseAST *rhs ) :
      Bin_Operator(op), LHS(lhs), RHS(rhs) {}  // Constructor
      //to initialize binary operator, lhs and rhs of the binary
      //expression.
    };
  6. The AST class for function declaration can be defined as follows:
    class FunctionDeclAST {
      std::string Func_Name;
      std::vector<std::string> Arguments;
      public:
      FunctionDeclAST(const std::string &name, const       std::vector<std::string> &args) :
      Func_Name(name), Arguments(args) {};
    };
  7. The AST class for function definition can be defined as follows:
    class FunctionDefnAST {
      FunctionDeclAST *Func_Decl;
      BaseAST* Body;
      public:
      FunctionDefnAST(FunctionDeclAST *proto, BaseAST *body) :
      Func_Decl(proto), Body(body) {}
    };
  8. The AST class for function call can be defined as follows:
    class FunctionCallAST : public BaseAST {
      std::string Function_Callee;
      std::vector<BaseAST*> Function_Arguments;
      public:
      FunctionCallAST(const std::string &callee, std::vector<BaseAST*> &args) :
      Function_Callee(callee), Function_Arguments(args) {}
    };

The basic skeleton of the AST is now ready to use.

How it works…

The AST acts as a data structure for storing various information about the tokens given by the lexer. This information is generated in the parser logic and ASTs are filled up according to the type of token being parsed.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.163.250