Generating code for loops

Loops make a language powerful enough to perform the same operation several times, with limited lines of code. Loops are present in almost every language. This recipe demonstrates how loops are handled in the TOY language.

Getting ready

A loop typically has a start that initializes the induction variable, a step that indicates an increment or decrement in the induction variable, and an end condition for termination of the loop. The loop in our TOY language can be defined as follows:

for i = 1, i < n, 1 in
     x + y;

The start expression is the initialization of i = 1. The end condition for the loop is i<n. The first line of the code indicates i be incremented by 1.

As long as the end condition is true, the loop iterates and, after each iteration, the induction variable, i, is incremented by 1. An interesting thing called PHI node comes into the picture to decide which value the induction variable, i, will take. Remember that our IR is in the single static assignment (SSA) form. In a control flow graph, for a given variable, the values can come from two different blocks. To represent SSA in LLVM IR, the phi instruction is defined. Here is an example of phi:

%i = phi i32 [ 1, %entry ], [ %nextvar, %loop ]

The preceding IR indicates that the value for i can come from two basic blocks: %entry and %loop. The value from the %entry block will be 1, while the %nextvar variable will be from %loop. We will see the details after implementing the loop for our toy compiler.

How to do it...

Like any other expression, loops are also handled by including states in lexer, defining the AST data structure to hold loop values, and defining the parser and the Codegen() function to generate the LLVM IR:

  1. The first step is to define tokens in the lexer in toy.cpp file:
    enum Token_Type {
      …
      …
      FOR_TOKEN,
      IN_TOKEN
      …
      …
    };
  2. Then we include the logic in the lexer:
    static int get_token() {
      …
      …
    if (Identifier_string == "else")
          return ELSE_TOKEN;
        if (Identifier_string == "for")
          return FOR_TOKEN;
        if (Identifier_string == "in")
          return IN_TOKEN;
      …
      …
    }
  3. The next step is to define the AST for the for loop:
    class ExprForAST  : public BaseAST {
      std::string Var_Name;
      BaseAST *Start, *End, *Step, *Body;
    
    public:
      ExprForAST (const std::string &varname, BaseAST *start, BaseAST *end,
                 BaseAST *step, BaseAST *body)
          : Var_Name(varname), Start(start), End(end), Step(step), Body(body) {}
      Value *Codegen() override;
    };
  4. Then we define the parser logic for the loop:
    static BaseAST *For_parser() {
      next_token();
    
      if (Current_token != IDENTIFIER_TOKEN)
        return 0;
    
      std::string IdName = Identifier_string;
      next_token();
    
      if (Current_token != '=')
        return 0;
      next_token();
    
      BaseAST *Start = expression_parser();
      if (Start == 0)
        return 0;
      if (Current_token != ',')
        return 0;
      next_token();
    
      BaseAST *End = expression_parser();
      if (End == 0)
        return 0;
    
      BaseAST *Step = 0;
      if (Current_token == ',') {
        next_token();
        Step = expression_parser();
        if (Step == 0)
          return 0;
      }
    
      if (Current_token != IN_TOKEN)
        return 0;
      next_token();
    
      BaseAST *Body = expression_parser();
      if (Body == 0)
        return 0;
    
      return new ExprForAST (IdName, Start, End, Step, Body);
    }
  5. Next we define the Codegen() function to generate the LLVM IR:
    Value *ExprForAST::Codegen() {
    
      Value *StartVal = Start->Codegen();
      if (StartVal == 0)
        return 0;
    
      Function *TheFunction = Builder.GetInsertBlock()->getParent();
      BasicBlock *PreheaderBB = Builder.GetInsertBlock();
      BasicBlock *LoopBB =
          BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
    
      Builder.CreateBr(LoopBB);
    
      Builder.SetInsertPoint(LoopBB);
    
      PHINode *Variable = Builder.CreatePHI(Type::getInt32Ty(getGlobalContext()), 2, Var_Name.c_str());
      Variable->addIncoming(StartVal, PreheaderBB);
    
      Value *OldVal = Named_Values[Var_Name];
      Named_Values[Var_Name] = Variable;
    
      if (Body->Codegen() == 0)
        return 0;
    
      Value *StepVal;
      if (Step) {
        StepVal = Step->Codegen();
        if (StepVal == 0)
          return 0;
      } else {
        StepVal = ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 1);
      }
    
      Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
    
      Value *EndCond = End->Codegen();
      if (EndCond == 0)
        return EndCond;
    
      EndCond = Builder.CreateICmpNE(
          EndCond, ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 0), "loopcond");
    
      BasicBlock *LoopEndBB = Builder.GetInsertBlock();
      BasicBlock *AfterBB =
          BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
    
      Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
    
      Builder.SetInsertPoint(AfterBB);
    
      Variable->addIncoming(NextVar, LoopEndBB);
    
      if (OldVal)
        Named_Values[Var_Name] = OldVal;
      else
        Named_Values.erase(Var_Name);
    
      return Constant::getNullValue(Type::getInt32Ty(getGlobalContext()));
    }

How it works...

Do the following steps:

  1. Compile the toy.cpp file:
    $ g++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core ` -O3 -o toy
    
  2. Open an example file:
    $ vi example
    
  3. Write the following code for a for loop in the example file:
    def printstar(n x)
      for i = 1, i < n, 1.0 in
        x + 1
  4. Compile the example file with the TOY compiler:
    $ ./toy example
    
  5. The LLVM IR for the preceding for loop code will be generated, as follows:
    ; ModuleID = 'my compiler'
    target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
    
    define i32 @printstar(i32 %n, i32 %x) {
    entry:
      br label %loop
    
    loop:                                             ; preds = %loop, %entry
      %i = phi i32 [ 1, %entry ], [ %nextvar, %loop ]
      %nextvar = add i32 %i, 1
      %cmptmp = icmp ult i32 %i, %n
      br i1 %cmptmp, label %loop, label %afterloop
    
    afterloop:                                        ; preds = %loop
      ret i32 0
    }

The parser you just saw identifies the loop, initialization of the induction variable, the termination condition, the step value for the induction variable, and the body of the loop. It then converts each of the blocks in LLVM IR, as seen previously.

As seen previously, a phi instruction gets two values for the variable i from two basic blocks: %entry and %loop. In the preceding case, the %entry block represents the value assigned to the induction variable at the start of the loop (this is 1). The next updated value of i comes from the %loop block, which completes one iteration of the loop.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.2.225