Callgrind

Callgrind is a call-graph generating profiler that also collects information about processor cache hit rate and branch prediction. Callgrind is only useful if your bottleneck is CPU-bound. It's not useful if heavy I/O or multiple processes are involved.

Valgrind does not require kernel configuration but it does need debug symbols. It is available as a target package in both the Yocto Project and Buildroot (BR2_PACKAGE_VALGRIND).

You run Callgrind in Valgrind on the target, like so:

# valgrind --tool=callgrind <program>

This produces a file called callgrind.out.<PID> which you can copy to the host and analyze with callgrind_annotate.

The default is to capture data for all the threads together in a single file. If you add option --separate-threads=yes when capturing, there will be profiles for each of the threads in files named callgrind.out.<PID>-<thread id>, for example, callgrind.out.122-01, callgrind.out.122-02, and so on.

Callgrind can simulate the processor L1/L2 cache and report on cache misses. Capture the trace with the --simulate-cache=yes option. L2 misses are much more expensive than L1 misses, so pay attention to code with high D2mr or D2mw counts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220