Chapter 2. Building LLVM IR

A high level programming language facilitates human interaction with the target machine. Most of the popular high level languages today have certain basic elements such as variables, loops, if-else decision making statements, blocks, functions, and so on. A variable holds value of data types; a basic block gives an idea of the scope of the variable. An if-else decision statement helps in selection of a path of code. A function makes a block of code reusable. High level languages may vary in type checking, type casting, variable declarations, complex data types, and so on. However, almost every other language has the basic building blocks listed earlier in this section.

A language may have its own parser which tokenizes the statement and extracts meaningful information such as identifier, its data type; a function name, its declaration, definition and calls; a loop condition, and so on. This meaningful information may be stored in a data structure where the flow of the code can be easily retrieved. Abstract Syntax Tree (AST) is a popular tree representation of the source code. The AST's can be used for further transformation and analysis.

A language parser can be written in various ways with various tools such as lex, yacc, and so on, or can even be handwritten. Writing an efficient parser is an art in itself. But this is not what we intend to cover in this chapter. We would like to focus more on LLVM IR and how a high-level language after parsing can be converted to LLVM IR using LLVM libraries.

This chapter will cover how to construct basic working LLVM sample code, which includes the following:

  • Creating an LLVM module
  • Emitting a function in a module
  • Adding a block to a function
  • Emitting a global variable
  • Emitting a return statement
  • Emitting function arguments
  • Emitting a simple arithmetic statement in a basic block
  • Emitting if-else condition IR
  • Emitting LLVM IR for loops

Creating an LLVM module

In the previous chapter, we got an idea as to how an LLVM IR looks. In LLVM, a module represents a single unit of code that is to be processed together. An LLVM module class is the top-level container for all other LLVM IR objects. The LLVM module contains global variables, functions, data layout, host triples, and so on. Let's create a simple LLVM module.

LLVM provides Module() constructor for creating a module. The first argument is the name of the module. The second argument is LLVMContext. Let's get these arguments in the main function and create a module as demonstrated here:

static LLVMContext &Context = getGlobalContext();
static Module *ModuleOb = new Module("my compiler", Context);

For these functions to work, we need to include certain header files:

#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
using namespace llvm;
static LLVMContext &Context = getGlobalContext();
static Module *ModuleOb = new Module("my compiler", Context);

int main(int argc, char *argv[]) {
  ModuleOb->dump();
  return 0;
}

Put this code in a file, let's say toy.cpp and compile it:

$ clang++ -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
$ ./toy

The output will be as follows:

; ModuleID = 'my compiler'
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.208