Go supports two kinds of profiling: CPU profiling and memory profiling. It is not recommended that you profile an application for both kinds at the same time, because these two different kinds of profiling do not work well with each other. The profileMe.go application is an exception, however, because it is used for illustrating the two techniques.
The Go code to be profiled is saved as profileMe.go, and it will be presented in five parts. The first part of profileMe.go is shown in the following Go code:
package main import ( "fmt" "math" "os" "runtime" "runtime/pprof" "time" ) func fibo1(n int) int64 { if n == 0 || n == 1 { return int64(n) } time.Sleep(time.Millisecond) return int64(fibo2(n-1)) + int64(fibo2(n-2)) }
Notice that it is compulsory to import the runtime/pprof package directly or indirectly for your program to create profiling data. The reason for calling time.Sleep() in the fibo1() function is to slow it down a bit. You will learn why, near the end of this section.
The second code segment of profileMe.go follows next:
func fibo2(n int) int { fn := make(map[int]int) for i := 0; i <= n; i++ { var f int if i <= 2 { f = 1 } else { f = fn[i-1] + fn[i-2] } fn[i] = f } time.Sleep(50 * time.Millisecond) return fn[n] }
The preceding code contains the implementation of another Go function that uses a different algorithm for calculating the numbers of the Fibonacci sequence.
The third part of profileMe.go contains the following Go code:
func N1(n int) bool { k := math.Floor(float64(n/2 + 1)) for i := 2; i < int(k); i++ { if (n % i) == 0 { return false } } return true } func N2(n int) bool { for i := 2; i < n; i++ { if (n % i) == 0 { return false } } return true }
Both the N1() and N2() functions are used for finding out whether a given integer is a prime number or not. The first function is optimized because its for loop iterates over approximately half the numbers used in the for loop of N2(). As both functions are relatively slow, there is no need for a call to time.Sleep() here.
The fourth code segment of profileMe.go is as follows:
func main() { cpuFile, err := os.Create("/tmp/cpuProfile.out") if err != nil { fmt.Println(err) return } pprof.StartCPUProfile(cpuFile) defer pprof.StopCPUProfile() total := 0 for i := 2; i < 100000; i++ { n := N1(i) if n { total = total + 1 } } fmt.Println("Total primes:", total) total = 0 for i := 2; i < 100000; i++ { n := N2(i) if n { total = total + 1 } } fmt.Println("Total primes:", total) for i := 1; i < 90; i++ { n := fibo1(i) fmt.Print(n, " ") } fmt.Println() for i := 1; i < 90; i++ { n := fibo2(i) fmt.Print(n, " ") } fmt.Println() runtime.GC()
The call to os.Create() is used to have a file to which to write the profiling data. The pprof.StartCPUProfile() call begins the CPU profiling of the program, and the call to pprof.StopCPUProfile() stops it.
The last part of profileMe.go follows next:
// Memory profiling! memory, err := os.Create("/tmp/memoryProfile.out") if err != nil { fmt.Println(err) return } defer memory.Close() for i := 0; i < 10; i++ { s := make([]byte, 50000000) if s == nil { fmt.Println("Operation failed!") } time.Sleep(50 * time.Millisecond) } err = pprof.WriteHeapProfile(memory) if err != nil { fmt.Println(err) return } }
In the last part, you can see how the memory profiling technique works. It is pretty similar to CPU profiling, and once again you will need a file for writing out the profiling data.
Executing profileMe.go will generate the following output:
$ go run profileMe.go Total primes: 9592 Total primes: 9592 1 2 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 225851433717 365435296162 591286729879 956722026041 1548008755920 2504730781961 4052739537881 6557470319842 10610209857723 17167680177565 27777890035288 44945570212853 72723460248141 117669030460994 190392490709135 308061521170129 498454011879264 806515533049393 1304969544928657 2111485077978050 3416454622906707 5527939700884757 8944394323791464 14472334024676221 23416728348467685 37889062373143906 61305790721611591 99194853094755497 160500643816367088 259695496911122585 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 225851433717 365435296162 591286729879 956722026041 1548008755920 2504730781961 4052739537881 6557470319842 10610209857723 17167680177565 27777890035288 44945570212853 72723460248141 117669030460994 190392490709135 308061521170129 498454011879264 806515533049393 1304969544928657 2111485077978050 3416454622906707 5527939700884757 8944394323791464 14472334024676221 23416728348467685 37889062373143906 61305790721611591 99194853094755497 160500643816367088 259695496911122585 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189
Apart from the output, the program will also collect the profiling data in two files:
$ cd /tmp $ ls -l *Profile* -rw-r--r-- 1 mtsouk wheel 1965 Mar 8 16:53 cpuProfile.out -rw-r--r-- 1 mtsouk wheel 484 Mar 8 16:53 memoryProfile.out
It is only after collecting the profiling data that you can start to inspect it. Thus, you can now start the command-line profiler to examine the CPU data as follows:
$ go tool pprof /tmp/cpuProfile.out Main binary filename not available. Type: cpu Time: Mar 8, 2018 at 4:53pm (EET) Duration: 19.12s, Total samples = 4.18s (21.86%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)
Typing help while in the profiler shell will generate the following output:
(pprof) help Commands: callgrind Outputs a graph in callgrind format comments Output all profile comments disasm Output assembly listings annotated with samples dot Outputs a graph in DOT format eog Visualize graph through eog evince Visualize graph through evince gif Outputs a graph image in GIF format gv Visualize graph through gv kcachegrind Visualize report in KCachegrind list Output annotated source for functions matching regexp pdf Outputs a graph in PDF format peek Output callers/callees of functions matching regexp png Outputs a graph image in PNG format proto Outputs the profile in compressed protobuf format ps Outputs a graph in PS format raw Outputs a text representation of the raw profile svg Outputs a graph in SVG format tags Outputs all tags in the profile text Outputs top entries in text form top Outputs top entries in text form topproto Outputs top entries in compressed protobuf format traces Outputs all profile samples in text form tree Outputs a text rendering of call graph web Visualize graph through web browser weblist Display annotated source in a web browser o/options List options and their current values quit/exit/^D Exit pprof Options: call_tree Create a context-sensitive call tree compact_labels Show minimal headers divide_by Ratio to divide all samples before visualization drop_negative Ignore negative differences edgefraction Hide edges below <f>*total focus Restricts to samples going through a node matching regexp hide Skips nodes matching regexp ignore Skips paths going through any nodes matching regexp mean Average sample value over first value (count) nodecount Max number of nodes to show nodefraction Hide nodes below <f>*total normalize Scales profile based on the base profile. output Output filename for file-based outputs positive_percentages Ignore negative samples when computing percentages prune_from Drops any functions below the matched frame. relative_percentages Show percentages relative to focused subgraph sample_index Sample value to report (0-based index or name) show Only show nodes matching regexp source_path Search path for source files tagfocus Restricts to samples with tags in range or matched by regexp taghide Skip tags matching this regexp tagignore Discard samples with tags in range or matched by regexp tagshow Only consider tags matching this regexp trim Honor nodefraction/edgefraction/nodecount defaults unit Measurement units to display Option groups (only set one per group): cumulative cum Sort entries based on cumulative weight flat Sort entries based on own weight granularity addresses Aggregate at the function level. addressnoinlines Aggregate at the function level, including functions' addresses in the output. files Aggregate at the file level. functions Aggregate at the function level. lines Aggregate at the source code line level. noinlines Aggregate at the function level. : Clear focus/ignore/hide/tagfocus/tagignore type "help <cmd|option>" for more information (pprof)
The top command returns the top 10 entries in text form:
(pprof) top Showing nodes accounting for 4120ms, 98.56% of 4180ms total Dropped 26 nodes (cum <= 20.90ms) Showing top 10 nodes out of 21 flat flat% sum% cum cum% 1950ms 46.65% 46.65% 1950ms 46.65% main.N2 1170ms 27.99% 74.64% 1170ms 27.99% runtime.usleep 910ms 21.77% 96.41% 910ms 21.77% main.N1 90ms 2.15% 98.56% 90ms 2.15% runtime.mach_semaphore_signal 0 0% 98.56% 2890ms 69.14% main.main 0 0% 98.56% 80ms 1.91% runtime.goready 0 0% 98.56% 80ms 1.91% runtime.goready.func1 0 0% 98.56% 80ms 1.91% runtime.goroutineReady 0 0% 98.56% 90ms 2.15% runtime.mach_semrelease 0 0% 98.56% 2890ms 69.14% runtime.main
As the first line of the output states, the functions presented are responsible for 98.56% of the total execution time of the program. The main.N2 function in particular is responsible for the 46.65% of the execution time of the program.
The top10 --cum command returns the cumulative time for each function:
(pprof) top10 --cum Showing nodes accounting for 4030ms, 96.41% of 4180ms total Dropped 26 nodes (cum <= 20.90ms) Showing top 10 nodes out of 21 flat flat% sum% cum cum% 0 0% 0% 2890ms 69.14% main.main 0 0% 0% 2890ms 69.14% runtime.main 1950ms 46.65% 46.65% 1950ms 46.65% main.N2 0 0% 46.65% 1180ms 28.23% runtime.mstart 0 0% 46.65% 1180ms 28.23% runtime.mstart1 0 0% 46.65% 1180ms 28.23% runtime.sysmon 1170ms 27.99% 74.64% 1170ms 27.99% runtime.usleep 910ms 21.77% 96.41% 910ms 21.77% main.N1 0 0% 96.41% 130ms 3.11% runtime.systemstack 0 0% 96.41% 110ms 2.63% runtime.timerproc
What if you want to find out what is happening with a particular function? You can use the list command followed by the function name, combined with the package name, and you'll get more detailed information about the performance of that function:
(pprof) list main.N1 Total: 4.18s ROUTINE ======= main.N1 in /Users/mtsouk/ch11/code/profileMe.go 910ms 910ms (flat, cum) 21.77% of Total . . 32: return fn[n] . . 33:} . . 34: . . 35:func N1(n int) bool { . . 36: k := math.Floor(float64(n/2 + 1)) 60ms 60ms 37: for i := 2; i < int(k); i++ { 850ms 850ms 38: if (n % i) == 0 { . . 39: return false . . 40: } . . 41: } . . 42: return true . . 43:} (pprof)
The output shows that the for loop of main.N1 is responsible for almost all of the execution time of the entire function. Specifically, the if (n % i) == 0 statement is responsible for 850ms out of 910ms of the execution time of the entire function.
You can also create a PDF output of the profiling data from the shell of the Go profiler, using the pdf command:
(pprof) pdf Generating report in profile001.pdf
Note that you will need Graphviz in order to generate a PDF file that can be viewed using your favorite PDF reader.
Finally, a warning: if your program executes too quickly, then the profiler will not have enough time to get its required samples and you might see the Total samples = 0 output when you load the data file. In that case, you will not be able to get any useful information from the profiling process. This is the reason for using the time.Sleep() function in some of the functions of profileMe.go:
$ go tool pprof /tmp/cpuProfile.out Type: cpu Time: Mar 7, 2018 at 10:25pm (EET) Duration: 202.29ms, Total samples = 0
Entering interactive mode (type "help" for commands, "o" for options) (pprof)