Writing a pass for memory optimization

In this recipe, we will briefly discuss a transformation pass that deals with memory optimization.

Getting ready

For this recipe, you will need the opt tool installed.

How to do it…

  1. Write the test code on which we will run the memcpy optimization pass:
    $ cat memcopytest.ll
    @cst = internal constant [3 x i32] [i32 -1, i32 -1, i32 -1], align 4
    
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind
    declare void @foo(i32*) nounwind
    
    define void @test1() nounwind {
      %arr = alloca [3 x i32], align 4
      %arr_i8 = bitcast [3 x i32]* %arr to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arr_i8, i8* bitcast ([3 x i32]* @cst to i8*), i64 12, i32 4, i1 false)
      %arraydecay = getelementptr inbounds [3 x i32], [3 x i32]* %arr, i64 0, i64 0
      call void @foo(i32* %arraydecay) nounwind
      ret void
    }
    
  2. Run the memcpyopt pass on the preceding test case:
    $ opt -memcpyopt -S memcopytest.ll
    ; ModuleID = ' memcopytest.ll'
    
    @cst = internal constant [3 x i32] [i32 -1, i32 -1, i32 -1], align 4
    
    ; Function Attrs: nounwind
    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32, i1) #0
    
    ; Function Attrs: nounwind
    declare void @foo(i32*) #0
    
    ; Function Attrs: nounwind
    define void @test1() #0 {
      %arr = alloca [3 x i32], align 4
      %arr_i8 = bitcast [3 x i32]* %arr to i8*
      call void @llvm.memset.p0i8.i64(i8* %arr_i8, i8 -1, i64 12, i32 4, i1 false)
      %arraydecay = getelementptr inbounds [3 x i32]* %arr, i64 0, i64 0
      call void @foo(i32* %arraydecay) #0
      ret void
    }
    
    ; Function Attrs: nounwind
    declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) #0
    
    attributes #0 = { nounwind }
    

How it works…

The Memcpyopt pass deals with eliminating the memcpy calls wherever possible, or transforms them into other calls.

Consider this memcpy call:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arr_i8, i8* bitcast ([3 x i32]* @cst to i8*), i64 12, i32 4, i1 false).

In the preceding test case, this pass converts it into a memset call:

call void @llvm.memset.p0i8.i64(i8* %arr_i8, i8 -1, i64 12, i32 4, i1 false)

If we look into the source code of the pass, we realize that this transformation is brought about by the tryMergingIntoMemset function in the MemCpyOptimizer.cpp file in llvm/lib/Transforms/Scalar.

The tryMergingIntoMemset function looks for some other pattern to fold away when scanning forward over instructions. It looks for stores in the neighboring memory and, on seeing consecutive ones, it attempts to merge them together into memset.

The processMemSet function looks out for any other neighboring memset to this memset, which helps us widen out the memset call to create a single larger store.

See also

To see the details of the various types of memory optimization passes, go to http://llvm.org/docs/Passes.html#memcpyopt-memcpy-optimization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.203.96