Chapter 4. Basic IR Transformations

Until now, we have seen how the IR is independent of its target and how it can be used to generate code for a specific backend. To generate efficient code for the backend, we optimize the IR generated by the frontend by running a series of analysis and transformation passes using the LLVM pass manager. We must note that most of the optimizations that happen in a compiler take place on the IR, one of the reasons being that the IR is retargetable and the same set of optimizations would be valid for a number of targets. It reduces the effort of writing the same optimization for every target. There are some target-specific optimizations too; they happen at the selection DAG level, which we will see later. Another reason for IR being the target of optimization is that LLVM IR is in SSA form, which means every variable is assigned only once and every new assignment to a variable is a new variable itself. One very visible benefit of this representation is that we don't have to do reaching definition analysis where some variable is assigned a value of another variable. SSA representation also helps in a number of optimizations such as constant propagation, dead code elimination, and so on. Going ahead, we will see some of the important optimizations in LLVM, what is the role of LLVM Pass Infrastructure, and how we can use the opt tool to perform different optimizations.

In this chapter, we will cover following topics:

  • The opt tool
  • Pass and Pass Manager
  • Using other Pass info in own pass
  • IR simplification examples
  • IR combination examples

Opt Tool

Opt is the LLVM Optimizer and analyzer tool that is run on LLVM IR to optimize the IR or produce an analysis about it. We saw in the first chapter a very basic introduction to the opt tool, and how to use it to run analysis and transformation passes. In this section, we will see what else the opt tool does. We must note that opt is a developer tool and all the optimizations that it provides can be invoked from the frontend as well.

With the opt tool, we can specify the level of optimization that we need, which means we can specify the optimization levels from O0, O1, O2, to O3(O0 being the least optimized code and O3 being the most optimized code). Apart from these, there is also an optimization level Os or Oz, which deals with space optimization. The syntax to invoke one of these optimizations is:

$ opt -Ox -S input.ll

Here, x represents the optimization level, which can have a value from 0 to 3 or s or z. These optimization levels are similar to what Clang frontend specifies. -O0 represents no optimization whereas –O1 means only few optimizations are enabled. –O2 is a moderate level of optimization and –O3 is the highest level of optimization, which is similar to –O2 but it allows optimization that takes longer to perform or may generate larger code (the O3 level does not guarantee that the code will be the most optimized and efficient, it just says that the compiler will try more to optimize the code and in the process may break things also). –Os means optimization for size, basically not running optimizations which increase code size (for example, it removes slp-vectorizer optimization) and perform optimizations that reduce code size (for example, instruction combining optimization).

We can direct the opt tool to run a specific pass that we require. These passes can be one of the already defined passes listed at http://llvm.org/docs/Passes.html or one of the passes we have written ourselves. The passes listed in the above link are also run in the optimization levels of -O1, -O2, and -O3. To view which pass is being run at a certain optimization level, use the -debug-pass=Structure command-line option along with the opt tool.

Let's take an example to demonstrate the difference between the O1 and O2 level of optimization. The O3 level generally has one or two more passes from O2. So, let's take an example and see how much the O2 level of optimization optimizes the code. Write the test code in the test.ll file:

define internal i32 @test(i32* %X, i32* %Y)
{  
    %A = load i32, i32* %X
    %B = load i32, i32* %Y
    %C = add i32 %A, %B
    ret i32 %C
}
define internal i32 @caller(i32* %B)
{
    %A = alloca i32
    store i32 1, i32* %A
    %C = call i32 @test(i32* %A, i32* %B)
    ret i32 %C
}
define i32 @callercaller()
{
    %B = alloca i32
    store i32 2, i32* %B
    %X = call i32 @caller(i32* %B)
    ret i32 %X
}

In this test code, the callercaller function calls the caller function, which in turn calls the test function, which performs an addition of two numbers and returns the value to its caller, which in turn returns the value to the callercaller function.

Now, run the O1 and O2 levels of optimization, as shown:

$ opt -O1 -S test.ll > 1.ll
$ opt -O2 -S test.ll > 2.ll

The following screenshot shows the difference in the optimized code for the O1 and O2 levels:

Opt Tool

As we can see, the code in O2 has optimized the calls to the function and the Add operations as well and returns the result directly from the callercaller function. This is obtained due to the fact that O2 optimization runs the passes always-inline which inlines all the function calls and treats the code as one big function. Then, in also runs the globaldce pass, which eliminates unreachable internals from the code. After this, it runs constmerge which merges duplicate global constants into a single constant. It also performs a global value numbering pass that eliminates partially or fully redundant instructions and eliminates redundant load instructions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.28.70