Loops make a language powerful enough to perform the same operation several times, with limited lines of code. Loops are present in almost every language. This recipe demonstrates how loops are handled in the TOY language.
A loop typically has a start that initializes the induction variable, a step that indicates an increment or decrement in the induction variable, and an end condition for termination of the loop. The loop in our TOY language can be defined as follows:
for i = 1, i < n, 1 in x + y;
The start expression is the initialization of i = 1
. The end condition for the loop is i<n
. The first line of the code indicates i
be incremented by 1
.
As long as the end condition is true, the loop iterates and, after each iteration, the induction variable, i
, is incremented by 1. An interesting thing called PHI node comes into the picture to decide which value the induction variable, i
, will take. Remember that our IR is in the single static assignment (SSA) form. In a control flow graph, for a given variable, the values can come from two different blocks. To represent SSA in LLVM IR, the phi
instruction is defined. Here is an example of phi
:
%i = phi i32 [ 1, %entry ], [ %nextvar, %loop ]
The preceding IR indicates that the value for i
can come from two basic blocks: %entry
and %loop
. The value from the %entry
block will be 1
, while the %nextvar
variable will be from %loop
. We will see the details after implementing the loop for our toy compiler.
Like any other expression, loops are also handled by including states in lexer, defining the AST data structure to hold loop values, and defining the parser and the Codegen()
function to generate the LLVM IR:
toy.cpp
file:enum Token_Type { … … FOR_TOKEN, IN_TOKEN … … };
static int get_token() { … … if (Identifier_string == "else") return ELSE_TOKEN; if (Identifier_string == "for") return FOR_TOKEN; if (Identifier_string == "in") return IN_TOKEN; … … }
for
loop:class ExprForAST : public BaseAST { std::string Var_Name; BaseAST *Start, *End, *Step, *Body; public: ExprForAST (const std::string &varname, BaseAST *start, BaseAST *end, BaseAST *step, BaseAST *body) : Var_Name(varname), Start(start), End(end), Step(step), Body(body) {} Value *Codegen() override; };
static BaseAST *For_parser() { next_token(); if (Current_token != IDENTIFIER_TOKEN) return 0; std::string IdName = Identifier_string; next_token(); if (Current_token != '=') return 0; next_token(); BaseAST *Start = expression_parser(); if (Start == 0) return 0; if (Current_token != ',') return 0; next_token(); BaseAST *End = expression_parser(); if (End == 0) return 0; BaseAST *Step = 0; if (Current_token == ',') { next_token(); Step = expression_parser(); if (Step == 0) return 0; } if (Current_token != IN_TOKEN) return 0; next_token(); BaseAST *Body = expression_parser(); if (Body == 0) return 0; return new ExprForAST (IdName, Start, End, Step, Body); }
Codegen()
function to generate the LLVM IR:Value *ExprForAST::Codegen() { Value *StartVal = Start->Codegen(); if (StartVal == 0) return 0; Function *TheFunction = Builder.GetInsertBlock()->getParent(); BasicBlock *PreheaderBB = Builder.GetInsertBlock(); BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); Builder.CreateBr(LoopBB); Builder.SetInsertPoint(LoopBB); PHINode *Variable = Builder.CreatePHI(Type::getInt32Ty(getGlobalContext()), 2, Var_Name.c_str()); Variable->addIncoming(StartVal, PreheaderBB); Value *OldVal = Named_Values[Var_Name]; Named_Values[Var_Name] = Variable; if (Body->Codegen() == 0) return 0; Value *StepVal; if (Step) { StepVal = Step->Codegen(); if (StepVal == 0) return 0; } else { StepVal = ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 1); } Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar"); Value *EndCond = End->Codegen(); if (EndCond == 0) return EndCond; EndCond = Builder.CreateICmpNE( EndCond, ConstantInt::get(Type::getInt32Ty(getGlobalContext()), 0), "loopcond"); BasicBlock *LoopEndBB = Builder.GetInsertBlock(); BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); Builder.CreateCondBr(EndCond, LoopBB, AfterBB); Builder.SetInsertPoint(AfterBB); Variable->addIncoming(NextVar, LoopEndBB); if (OldVal) Named_Values[Var_Name] = OldVal; else Named_Values.erase(Var_Name); return Constant::getNullValue(Type::getInt32Ty(getGlobalContext())); }
Do the following steps:
toy.cpp
file:$ g++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core ` -O3 -o toy
$ vi example
for
loop in the example file:def printstar(n x) for i = 1, i < n, 1.0 in x + 1
$ ./toy example
for
loop code will be generated, as follows:; ModuleID = 'my compiler' target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" define i32 @printstar(i32 %n, i32 %x) { entry: br label %loop loop: ; preds = %loop, %entry %i = phi i32 [ 1, %entry ], [ %nextvar, %loop ] %nextvar = add i32 %i, 1 %cmptmp = icmp ult i32 %i, %n br i1 %cmptmp, label %loop, label %afterloop afterloop: ; preds = %loop ret i32 0 }
The parser you just saw identifies the loop, initialization of the induction variable, the termination condition, the step value for the induction variable, and the body of the loop. It then converts each of the blocks in LLVM IR, as seen previously.
As seen previously, a phi
instruction gets two values for the variable i
from two basic blocks: %entry
and %loop
. In the preceding case, the %entry
block represents the value assigned to the induction variable at the start of the loop (this is 1
). The next updated value of i
comes from the %loop
block, which completes one iteration of the loop.
3.149.253.210