A simple profiling example

Go supports two kinds of profiling: CPU profiling and memory profiling. It is not recommended that you profile an application for both kinds at the same time, because these two different kinds of profiling do not work well with each other. The profileMe.go application is an exception, however, because it is used for illustrating the two techniques.

The Go code to be profiled is saved as profileMe.go, and it will be presented in five parts. The first part of profileMe.go is shown in the following Go code:

package main 
import ( 
    "fmt" 
    "math" 
    "os" 
    "runtime" 
    "runtime/pprof" 
    "time" 
) 
func fibo1(n int) int64 { 
    if n == 0 || n == 1 { 
          return int64(n) 
    } 
    time.Sleep(time.Millisecond) 
    return int64(fibo2(n-1)) + int64(fibo2(n-2)) 
}

Notice that it is compulsory to import the runtime/pprof package directly or indirectly for your program to create profiling data. The reason for calling time.Sleep() in the fibo1() function is to slow it down a bit. You will learn why, near the end of this section.

The second code segment of profileMe.go follows next:

func fibo2(n int) int { 
    fn := make(map[int]int) 
    for i := 0; i <= n; i++ { 
        var f int 
        if i <= 2 { 
            f = 1 
        } else { 
            f = fn[i-1] + fn[i-2] 
        } 
        fn[i] = f 
    } 
    time.Sleep(50 * time.Millisecond) 
    return fn[n] 
}

The preceding code contains the implementation of another Go function that uses a different algorithm for calculating the numbers of the Fibonacci sequence.

The third part of profileMe.go contains the following Go code:

func N1(n int) bool { 
    k := math.Floor(float64(n/2 + 1)) 
    for i := 2; i < int(k); i++ { 
         if (n % i) == 0 { 
               return false 
         } 
    } 
    return true 
} 
func N2(n int) bool { 
    for i := 2; i < n; i++ { 
        if (n % i) == 0 { 
            return false 
        } 
    } 
    return true 
}

Both the N1() and N2() functions are used for finding out whether a given integer is a prime number or not. The first function is optimized because its for loop iterates over approximately half the numbers used in the for loop of N2(). As both functions are relatively slow, there is no need for a call to time.Sleep() here.

The fourth code segment of profileMe.go is as follows:

func main() { 
    cpuFile, err := os.Create("/tmp/cpuProfile.out") 
    if err != nil { 
         fmt.Println(err) 
         return 
    } 
    pprof.StartCPUProfile(cpuFile) 
    defer pprof.StopCPUProfile() 
    total := 0 
    for i := 2; i < 100000; i++ { 
        n := N1(i) 
        if n { 
            total = total + 1 
        } 
    } 
    fmt.Println("Total primes:", total) 
    total = 0 
    for i := 2; i < 100000; i++ { 
        n := N2(i) 
        if n { 
            total = total + 1 
        } 
    } 
    fmt.Println("Total primes:", total) 
    for i := 1; i < 90; i++ { 
        n := fibo1(i) 
        fmt.Print(n, " ") 
    } 
    fmt.Println() 
    for i := 1; i < 90; i++ { 
        n := fibo2(i) 
        fmt.Print(n, " ") 
    } 
    fmt.Println() 
    runtime.GC()

The call to os.Create() is used to have a file to which to write the profiling data. The pprof.StartCPUProfile() call begins the CPU profiling of the program, and the call to pprof.StopCPUProfile() stops it.

The last part of profileMe.go follows next:

   // Memory profiling! 
   memory, err := os.Create("/tmp/memoryProfile.out") 
   if err != nil { 
         fmt.Println(err) 
         return 
   } 
   defer memory.Close() 
   for i := 0; i < 10; i++ { 
         s := make([]byte, 50000000) 
         if s == nil { 
               fmt.Println("Operation failed!") 
         } 
         time.Sleep(50 * time.Millisecond) 
   } 
   err = pprof.WriteHeapProfile(memory) 
   if err != nil { 
         fmt.Println(err) 
         return 
   } 
}

In the last part, you can see how the memory profiling technique works. It is pretty similar to CPU profiling, and once again you will need a file for writing out the profiling data.

Executing profileMe.go will generate the following output:

$ go run profileMe.go
Total primes: 9592
Total primes: 9592
1 2 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 225851433717 365435296162 591286729879 956722026041 1548008755920 2504730781961 4052739537881 6557470319842 10610209857723 17167680177565 27777890035288 44945570212853 72723460248141 117669030460994 190392490709135 308061521170129 498454011879264 806515533049393 1304969544928657 2111485077978050 3416454622906707 5527939700884757 8944394323791464 14472334024676221 23416728348467685 37889062373143906 61305790721611591 99194853094755497 160500643816367088 259695496911122585 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 225851433717 365435296162 591286729879 956722026041 1548008755920 2504730781961 4052739537881 6557470319842 10610209857723 17167680177565 27777890035288 44945570212853 72723460248141 117669030460994 190392490709135 308061521170129 498454011879264 806515533049393 1304969544928657 2111485077978050 3416454622906707 5527939700884757 8944394323791464 14472334024676221 23416728348467685 37889062373143906 61305790721611591 99194853094755497 160500643816367088 259695496911122585 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189

Apart from the output, the program will also collect the profiling data in two files:

$ cd /tmp
$ ls -l *Profile*
-rw-r--r-- 1 mtsouk wheel 1965 Mar 8 16:53 cpuProfile.out
-rw-r--r-- 1 mtsouk wheel 484 Mar 8 16:53 memoryProfile.out

It is only after collecting the profiling data that you can start to inspect it. Thus, you can now start the command-line profiler to examine the CPU data as follows:

$ go tool pprof /tmp/cpuProfile.out
Main binary filename not available.
Type: cpu
Time: Mar 8, 2018 at 4:53pm (EET)
Duration: 19.12s, Total samples = 4.18s (21.86%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

Typing help while in the profiler shell will generate the following output:

(pprof) help
Commands:
callgrind Outputs a graph in callgrind format
comments Output all profile comments
disasm Output assembly listings annotated with samples
dot Outputs a graph in DOT format
eog Visualize graph through eog
evince Visualize graph through evince
gif Outputs a graph image in GIF format
gv Visualize graph through gv
kcachegrind Visualize report in KCachegrind
list Output annotated source for functions matching regexp
pdf Outputs a graph in PDF format
peek Output callers/callees of functions matching regexp
png Outputs a graph image in PNG format
proto Outputs the profile in compressed protobuf format
ps Outputs a graph in PS format
raw Outputs a text representation of the raw profile
svg Outputs a graph in SVG format
tags Outputs all tags in the profile
text Outputs top entries in text form
top Outputs top entries in text form
topproto Outputs top entries in compressed protobuf format
traces Outputs all profile samples in text form
tree Outputs a text rendering of call graph
web Visualize graph through web browser
weblist Display annotated source in a web browser
o/options List options and their current values
quit/exit/^D Exit pprof
Options:
call_tree Create a context-sensitive call tree
compact_labels Show minimal headers
divide_by Ratio to divide all samples before visualization
drop_negative Ignore negative differences
edgefraction Hide edges below <f>*total
focus Restricts to samples going through a node matching regexp
hide Skips nodes matching regexp
ignore Skips paths going through any nodes matching regexp
mean Average sample value over first value (count)
nodecount Max number of nodes to show
nodefraction Hide nodes below <f>*total
normalize Scales profile based on the base profile.
output Output filename for file-based outputs
positive_percentages Ignore negative samples when computing percentages
prune_from Drops any functions below the matched frame.
relative_percentages Show percentages relative to focused subgraph
sample_index Sample value to report (0-based index or name)
show Only show nodes matching regexp
source_path Search path for source files
tagfocus Restricts to samples with tags in range or matched by regexp
taghide Skip tags matching this regexp
tagignore Discard samples with tags in range or matched by regexp
tagshow Only consider tags matching this regexp
trim Honor nodefraction/edgefraction/nodecount defaults
unit Measurement units to display
Option groups (only set one per group):
cumulative
cum Sort entries based on cumulative weight
flat Sort entries based on own weight
granularity
addresses Aggregate at the function level.
addressnoinlines Aggregate at the function level, including functions' addresses in the output.
files Aggregate at the file level.
functions Aggregate at the function level.
lines Aggregate at the source code line level.
noinlines Aggregate at the function level.
: Clear focus/ignore/hide/tagfocus/tagignore
type "help <cmd|option>" for more information
(pprof)

Do find the time to try out all of the commands of the go tool pprof utility and familiarize yourself with them.

The top command returns the top 10 entries in text form:

(pprof) top
Showing nodes accounting for 4120ms, 98.56% of 4180ms total
Dropped 26 nodes (cum <= 20.90ms)
Showing top 10 nodes out of 21
flat flat% sum% cum cum%
1950ms 46.65% 46.65% 1950ms 46.65% main.N2
1170ms 27.99% 74.64% 1170ms 27.99% runtime.usleep
910ms 21.77% 96.41% 910ms 21.77% main.N1
90ms 2.15% 98.56% 90ms 2.15% runtime.mach_semaphore_signal
0 0% 98.56% 2890ms 69.14% main.main
0 0% 98.56% 80ms 1.91% runtime.goready
0 0% 98.56% 80ms 1.91% runtime.goready.func1
0 0% 98.56% 80ms 1.91% runtime.goroutineReady
0 0% 98.56% 90ms 2.15% runtime.mach_semrelease
0 0% 98.56% 2890ms 69.14% runtime.main

As the first line of the output states, the functions presented are responsible for 98.56% of the total execution time of the program. The main.N2 function in particular is responsible for the 46.65% of the execution time of the program.

The top10 --cum command returns the cumulative time for each function:

(pprof) top10 --cum
Showing nodes accounting for 4030ms, 96.41% of 4180ms total
Dropped 26 nodes (cum <= 20.90ms)
Showing top 10 nodes out of 21
flat flat% sum% cum cum%
0 0% 0% 2890ms 69.14% main.main
0 0% 0% 2890ms 69.14% runtime.main
1950ms 46.65% 46.65% 1950ms 46.65% main.N2
0 0% 46.65% 1180ms 28.23% runtime.mstart
0 0% 46.65% 1180ms 28.23% runtime.mstart1
0 0% 46.65% 1180ms 28.23% runtime.sysmon
1170ms 27.99% 74.64% 1170ms 27.99% runtime.usleep
910ms 21.77% 96.41% 910ms 21.77% main.N1
0 0% 96.41% 130ms 3.11% runtime.systemstack
0 0% 96.41% 110ms 2.63% runtime.timerproc

What if you want to find out what is happening with a particular function? You can use the list command followed by the function name, combined with the package name, and you'll get more detailed information about the performance of that function:

(pprof) list main.N1
Total: 4.18s
ROUTINE ======= main.N1 in /Users/mtsouk/ch11/code/profileMe.go
910ms 910ms (flat, cum) 21.77% of Total
. . 32:     return fn[n]
. . 33:}
. . 34:
. . 35:func N1(n int) bool {
. . 36:     k := math.Floor(float64(n/2 + 1))
60ms 60ms 37:     for i := 2; i < int(k); i++ {
850ms 850ms 38:         if (n % i) == 0 {
. . 39:                 return false
. . 40:           }
. . 41:     }
. . 42:     return true
. . 43:}
(pprof)

The output shows that the for loop of main.N1 is responsible for almost all of the execution time of the entire function. Specifically, the if (n % i) == 0 statement is responsible for 850ms out of 910ms of the execution time of the entire function.

You can also create a PDF output of the profiling data from the shell of the Go profiler, using the pdf command:

(pprof) pdf
Generating report in profile001.pdf

Note that you will need Graphviz in order to generate a PDF file that can be viewed using your favorite PDF reader.

Finally, a warning: if your program executes too quickly, then the profiler will not have enough time to get its required samples and you might see the Total samples = 0 output when you load the data file. In that case, you will not be able to get any useful information from the profiling process. This is the reason for using the time.Sleep() function in some of the functions of profileMe.go:

$ go tool pprof /tmp/cpuProfile.out
Type: cpu
Time: Mar 7, 2018 at 10:25pm (EET)
Duration: 202.29ms, Total samples = 0

Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

Table of Contents for A simple profiling example

Create new playlist

Sign In

Sign Up

Table of Contents for
A simple profiling example