Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Generating code for loops

Loops make a language powerful enough to perform the same operation several times, with limited lines of code. Loops are present in almost every language. This recipe demonstrates how loops are handled in the TOY language.

Getting ready

A loop typically has a start that initializes the induction variable, a step that indicates an increment or decrement in the induction variable, and an end condition for termination of the loop. The loop in our TOY language can be defined as follows:

for i = 1, i < n, 1 in
     x + y;

The start expression is the initialization of i = 1. The end condition for the loop is i<n. The first line of the code indicates i be incremented by 1.

As long as the end condition is true, the loop iterates and, after each iteration, the induction variable, i, is incremented by 1. An interesting thing called PHI node comes into the picture to decide which value the induction variable, i, will take. Remember that our IR is in the single static assignment (SSA) form. In a control flow graph, for a given variable, the values can come from two different blocks. To represent SSA in LLVM IR, the phi instruction is defined. Here is an example of phi:

%i = phi i32 [ 1, %entry ], [ %nextvar, %loop ]

The preceding IR indicates that the value for i can come from two basic blocks: %entry and %loop. The value from the %entry block will be 1, while the %nextvar variable will be from %loop. We will see the details after implementing the loop for our toy compiler.

How to do it...

Like any other expression, loops are also handled by including states in lexer, defining the AST data structure to hold loop values, and defining the parser and the Codegen() function to generate the LLVM IR:

The first step is to define tokens in the lexer in toy.cpp file:

enum Token_Type {
  …
  …
  FOR_TOKEN,
  IN_TOKEN
  …
  …
};

Then we include the logic in the lexer:

static int get_token() {
  …
  …
if (Identifier_string == "else")
      return ELSE_TOKEN;
    if (Identifier_string == "for")
      return FOR_TOKEN;
    if (Identifier_string == "in")
      return IN_TOKEN;
  …
  …
}

The next step is to define the AST for the for loop:

class ExprForAST  : public BaseAST {
  std::string Var_Name;
  BaseAST *Start, *End, *Step, *Body;

public:
  ExprForAST (const std::string &varname, BaseAST *start, BaseAST *end,
             BaseAST *step, BaseAST *body)
      : Var_Name(varname), Start(start), End(end), Step(step), Body(body) {}
  Value *Codegen() override;
};

Then we define the parser logic for the loop:

static BaseAST *For_parser() {
  next_token();

  if (Current_token != IDENTIFIER_TOKEN)
    return 0;

  std::string IdName = Identifier_string;
  next_token();

  if (Current_token != '=')
    return 0;
  next_token();

  BaseAST *Start = expression_parser();
  if (Start == 0)
    return 0;
  if (Current_token != ',')
    return 0;
  next_token();

  BaseAST *End = expression_parser();
  if (End == 0)
    return 0;

  BaseAST *Step = 0;
  if (Current_token == ',') {
    next_token();
    Step = expression_parser();
    if (Step == 0)
      return 0;
  }

  if (Current_token != IN_TOKEN)
    return 0;
  next_token();

  BaseAST *Body = expression_parser();
  if (Body == 0)
    return 0;

  return new ExprForAST (IdName, Start, End, Step, Body);
}

Next we define the Codegen() function to generate the LLVM IR:

Value *ExprForAST::Codegen() {

  Value *StartVal = Start->Codegen();
  if (StartVal == 0)
    return 0;

  Function *TheFunction = Builder.GetInsertBlock()->getParent();
  BasicBlock *PreheaderBB = Builder.GetInsertBlock();
  BasicBlock *LoopBB =
      BasicBlock::Create(getGlobalContext(), "loop", TheFunction);

  Builder.CreateBr(LoopBB);

  Builder.SetInsertPoint(LoopBB);

  PHINode *Variable = Builder.CreatePHI(Type::getInt32Ty(getGlobalContext()), 2, Var_Name.c_str());
  Variable->addIncoming(StartVal, PreheaderBB);

  Value *OldVal = Named_Values[Var_Name];
  Named_Values[Var_Name] = Variable;

  if (Body->Codegen() == 0)
    return 0;

  Value *StepVal;
  if (Step) {
    StepVal = Step->Codegen();
    if (StepVal == 0)
      return 0;
  } else {
    StepVal = ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 1);
  }

  Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");

  Value *EndCond = End->Codegen();
  if (EndCond == 0)
    return EndCond;

  EndCond = Builder.CreateICmpNE(
      EndCond, ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 0), "loopcond");

  BasicBlock *LoopEndBB = Builder.GetInsertBlock();
  BasicBlock *AfterBB =
      BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);

  Builder.CreateCondBr(EndCond, LoopBB, AfterBB);

  Builder.SetInsertPoint(AfterBB);

  Variable->addIncoming(NextVar, LoopEndBB);

  if (OldVal)
    Named_Values[Var_Name] = OldVal;
  else
    Named_Values.erase(Var_Name);

  return Constant::getNullValue(Type::getInt32Ty(getGlobalContext()));
}

How it works...

Do the following steps:

Compile the toy.cpp file:

$ g++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core ` -O3 -o toy

Open an example file:
```
$ vi example
```
Write the following code for a for loop in the example file:
```
def printstar(n x)
  for i = 1, i < n, 1.0 in
    x + 1
```
Compile the example file with the TOY compiler:
```
$ ./toy example
```

The LLVM IR for the preceding for loop code will be generated, as follows:

; ModuleID = 'my compiler'
target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"

define i32 @printstar(i32 %n, i32 %x) {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  %i = phi i32 [ 1, %entry ], [ %nextvar, %loop ]
  %nextvar = add i32 %i, 1
  %cmptmp = icmp ult i32 %i, %n
  br i1 %cmptmp, label %loop, label %afterloop

afterloop:                                        ; preds = %loop
  ret i32 0
}

The parser you just saw identifies the loop, initialization of the induction variable, the termination condition, the step value for the induction variable, and the body of the loop. It then converts each of the blocks in LLVM IR, as seen previously.

As seen previously, a phi instruction gets two values for the variable i from two basic blocks: %entry and %loop. In the preceding case, the %entry block represents the value assigned to the induction variable at the start of the loop (this is 1). The next updated value of i comes from the %loop block, which completes one iteration of the loop.

Table of Contents for
Generating code for loops

Generating code for loops

Getting ready

How to do it...

How it works...

See also

Table of Contents for Generating code for loops

Create new playlist

Sign In

Sign Up

Generating code for loops

Getting ready

How to do it...

How it works...

See also

Table of Contents for
Generating code for loops