One way to obtain useful information about the execution of a program is to use the GNU profiler utility gprof
. This utility, which is part of most standard Linux installations, provides information on functions your program calls. By analyzing the data it provides, you can often improve the execution speed of a program by revising slow, inefficient sections of code. Additionally, profiling may illuminate bugs that may not have surfaced. The following sections present an overview of how to use gprof
. Interested readers seeking additional information on this utility are encouraged to read the manual page for the utility and to visit the official GNU website (www.gnu.org) and peruse the full set of online documentation on gprof
.
Profiling a program is a three-step process:
A selection sort program is used to demonstrate the profiling process. A selection sort is a general-purpose sort that is a variation of an exchange sort. Basically, this sort passes through a list of elements and finds the index (location) of the smallest (least) element. It exchanges the data at this index with the data stored in the first element of the list. It then repeats the selection process with the remaining list to find the second smallest element, and exchanges the data at this index with the data stored in the second element of the list. The process is repeated until the list is ordered.
There are numerous ways to code a selection sort. Below is a C++ version of this sort.
File : ss.cxx | #include <iostream> | using namespace std; // Function prototypes | void Get_Data ( int [], int ); | void Display ( int [], int ); + void Do_S_Sort ( int [], int ); | int Find_Least( const int [], int, int ); | int Compare ( const int & , const int & ); | void Swap ( int &, int & ); | int 10 main( ) { | const int max = 10; | int List[max]; | Get_Data ( List, max ); // Obtain data | cout << "Initial list" << endl; + Display ( List, max ); // Show it | Do_S_Sort( List, max ); // Sort it | cout << "Sorted list" << endl; | Display ( List, max ); // Show it again | return 0; 20 } | // Obtain data to sort from standard input | void | Get_Data(int a[], int n) { | cout << "Please enter " << n << " integers" << endl; + for(int i=0; i < n; ++i) | cin >> a[i]; | } | // Display the current contents of list | void 30 Display(int a[], int n) { | for(int i=0; i < n; ++i) | cout << " " << a[i]; | cout << endl; | } + // Do the Selection Sort, Display after each pass | void | Do_S_Sort( int a[], int n ){ | int index; | for (int i=0; i < n-1; ++i){ 40 index=Find_Least( a, i, n ); | if ( i != index ) | Swap( a[i], a[index] ); | cout << "After pass " << i+1 << " : "; | Display( a, n ); + } | } | // Find the index of the least element in list | int | Find_Least( const int a[], int start, int stop ){ 50 int Index_of_Least = start; | for (int i=start+1; i < stop; ++i ) | if ( Compare(a[i], a[Index_of_Least]) ) | Index_of_Least = i; | return Index_of_Least; + } | // Compare two data elements | int | Compare( const int &a, const int &b ){ | return ( a < b ); 60 } | // Exchange two data elements | void | Swap( int &a, int &b ){ | int temp; + temp = a; | a = b; | b = temp; | }
As shown, six functions are used to implement the sort
FUNCTION | PURPOSE |
---|---|
| Obtains the data to be sorted from standard input. |
| Prints the current contents of the list to standard output. |
| Performs the selection sort routine. |
| Traverses a portion of the list to find (and ultimately return) the index of the smallest element. |
| Compares two list elements and returns a nonzero value if the first element is less than the second element. |
| Exchanges the data for two elements in the list. |
At first glance, using so many functions for such a simple algorithm may seem to be overkill. However, coding some statements (such as the compare in line 52) as a function call will facilitate referencing when generating profile information with gprof
.
A good portion of the work done by the selection sort is actually performed by the Find_Least
function (see line 49). This function is passed the list and two integer values. The first integer value is the starting point for the search through the remainder of the list. The second value is the size of the list (which in this case reflects its end or stopping point). This function works by storing, in the variable Index_of_Least
, the index of the element having the smallest value. Initially, this is the index of the current start of the list (see line 50). It then uses a for
loop to pass through the reminder of the list, checking each location against the current smallest value to determine if it has found a lesser value. Each check is carried out by the Compare
function. If the value is less, the index (not the value) of the location is stored. When the loop finishes, the index of the element with the smallest value is returned to the calling function. If the returned index is not the index of the current head of list, the two values are exchanged by calling the Swap
function. If the current value is already in order, no exchange is needed.
To collect profiling information, a program must be compiled and linked with profiling enabled. To accomplish this, the command-line option -pg
is passed to the compiler. For example, if our source program was ss.cxx
, we would use the following command:
linux$ g++ ss.cxx -pg -o ss
If your program consists of separate source files, which are compiled individually, each intermediate compilation should be passed the -pg
option if you want to profile the functions found in the source file. If the debug command-line option -g
is also specified, the gprof
utility will generate additional line-by-line profiling information. This technique is discussed in detail at the end of this appendix.
Once successfully compiled, the program is run using its normal arguments, input files, and so on. The profiling data generated by the program's execution is stored in a file called gmon.out
in the current directory. Should there be an existing gmon.out
file, it is overwritten. Note that the gmon.out
file will not be produced if the program does not exit normally (i.e., does not return from main
or call exit
).[1] The gmon.out
file is a data file not a plain text file and as such should not be directly printed or displayed on the screen.
The profile data stored in the gmon.out
file is analyzed and displayed using the gprof
utility. If called with no options or command-line arguments, gprof
assumes that the executable it is working with is a.out
and that the profile data for this executable is in the file gmon.out
. Also, unless directed otherwise, gprof
generates and displays a fully annotated flat profile and a call graph using the collected data. Like most utilities, gprof
supports an extensive set of options. We will examine a small subset of its options.
Staying with our selection sort example, as shown previously, the command-line sequence
linux$ g++ ss.cxx -pg -o ss
compiles the source file ss.cxx
into the executable file ss
. The program is then run. In this example the program obtains its input, via command-line redirection, from a data file called data_reverse
. The data file contains the numbers 1 through 10 in reverse order. The output from the program's execution is displayed on the screen.
linux$ ss < data_reverse <-- 1 Please enter 10 integers Initial list <-- 2 10 9 8 7 6 5 4 3 2 1 After pass 1 : 1 9 8 7 6 5 4 3 2 10 After pass 2 : 1 2 8 7 6 5 4 3 9 10 After pass 3 : 1 2 3 7 6 5 4 8 9 10 After pass 4 : 1 2 3 4 6 5 7 8 9 10 After pass 5 : 1 2 3 4 5 6 7 8 9 10 <-- 3 After pass 6 : 1 2 3 4 5 6 7 8 9 10 After pass 7 : 1 2 3 4 5 6 7 8 9 10 After pass 8 : 1 2 3 4 5 6 7 8 9 10 After pass 9 : 1 2 3 4 5 6 7 8 9 10 Sorted list 1 2 3 4 5 6 7 8 9 10
Once the program is run, the gprof
utility is used to analyze and view the profile data. If options are to be passed to gprof
, they should be specified individually on the command line following the call to the utility. The name of the executable file comes after the options. If not specified, the default is a.out
. The executable file name is optionally followed by the profile file name, which, if not given, defaults to gmon.out
. In the example below the -b
option (for brief) is specified.
linux$ gprof -b ss <-- 1
The output generated by this invocation is divided into two sections, the first of which is a flat profile. This section shows how much time the program spent in each function and how many times the function was called. Sorting 10 items is a trivial task even with the somewhat inefficient selection sort. Therefore, to produce a more instructive data set, the number of values to be sorted was increased to 1,000 and the program recompiled and rerun. To reduce the time needed for display, the output of the executable was discarded by directing it to /dev/null
.
linux$ ss < data_reverse > /dev/null
The gprof
output below is based on sorting 1,000 values.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 100.00 0.04 0.04 999 40.04 40.04 Find_Least(int const *, int, int) 0.00 0.04 0.00 499500 0.00 0.00 Compare(int const &, int const &) 0.00 0.04 0.00 1001 0.00 0.00 Display(int *, int) 0.00 0.04 0.00 500 0.00 0.00 Swap(int &, int &) 0.00 0.04 0.00 1 0.00 40000.00 Do_S_Sort(int *, int) 0.00 0.04 0.00 1 0.00 0.00 Get_Data(int *, int)
In the flat profile section each function is listed in decreasing order based on its runtime (if the runtime is too small to be recorded, it is reported as 0). The meaning of each field in the flat profile is listed in the following table.
FIELD NAME | MEANING |
---|---|
| The percentage of total execution time. |
| Total seconds executing this function and all those listed above it in the table. |
| Total seconds executing just this function (the primary sort key). |
| The number of times this function was called (blank if not called). |
| Average number of |
| Per call, the average number of |
| Function name (acts as the secondary sort key) |
As might be anticipated with a list of 1,000 elements, the Compare
function in our example is called 499,500 times (the summation of 999 + 998 + 997 + 1, the series representing the number of compares). Clearly, the Find_Least
function is the most time intensive of the group.
The flat profile is followed by a call graph that details the amount of time spent in each function. The output in this section is divided into a series of entries, one per function. Each entry, which is one or more lines in length, is separated from the next by a series of dashes. Within each entry there is one primary line. The primary line indicates the function associated with the entry data. The primary line is easily identified, as it begins with an index number surrounded by set of square brackets. In each entry lines that precede the primary line are functions that call the function, while lines that follow the primary line are functions that are called by the function (in call graph-speak, its children). If the caller of a function cannot be determined, <spontaneous>
is printed instead.
Here is the call graph for our selection sort program when it sorts a list of 1,000 values, which were initially in inverse order. For ease of reference, each primary line has been highlighted.
Call graph granularity: each sample hit covers 4 byte(s) for 25.00% of 0.04 seconds index % time self children called name 0.04 0.00 999/999 Do_S_Sort(int *, int) [2] [1] 100.0 0.04 0.00 999 Find_Least(int const *, int, int) [1] 0.00 0.00 499500/499500 Compare(int const &, int const &) [4] ----------------------------------------------- 0.00 0.04 1/1 main [3] [2] 100.0 0.00 0.04 1 Do_S_Sort(int *, int) [2] 0.04 0.00 999/999 Find_Least(int const *, int, int) [1] 0.00 0.00 999/1001 Display(int *, int) [5] 0.00 0.00 500/500 Swap(int &, int &) [6] ----------------------------------------------- <spontaneous> [3] 100.0 0.00 0.04 main [3] 0.00 0.04 1/1 Do_S_Sort(int *, int) [2] 0.00 0.00 2/1001 Display(int *, int) [5] 0.00 0.00 1/1 Get_Data(int *, int) [7] ----------------------------------------------- 0.00 0.00 499500/499500 Find_Least(int const *, int, int) [1] [4] 0.0 0.00 0.00 499500 Compare(int const &, int const &) [4] ----------------------------------------------- 0.00 0.00 2/1001 main [3] 0.00 0.00 999/1001 Do_S_Sort(int *, int) [2] [5] 0.0 0.00 0.00 1001 Display(int *, int) [5] ----------------------------------------------- 0.00 0.00 500/500 Do_S_Sort(int *, int) [2] [6] 0.0 0.00 0.00 500 Swap(int &, int &) [6] ----------------------------------------------- 0.00 0.00 1/1 main [3] [7] 0.0 0.00 0.00 1 Get_Data(int *, int) [7] ----------------------------------------------- Index by function name [4] Compare(int const &, int const &) [2] Do_S_Sort(int *, int) [7] Get_Data(int *, int) [5] Display(int *, int) [1] Find_Least(int const *, int, int) [6] Swap(int &, int &)
The meaning of each field is based on its context (i.e., the line's designation): primary, function's callers (call this function), or called functions (called by this function).
FIELD | PRIMARY | FUNCTION'S CALLERS | CALLED FUNCTIONS |
---|---|---|---|
| Index number of this function. | ||
| Percent of total time spent in this function and its children. | ||
| Total amount of time just spent in this function (same as | Estimate of time spent in this function when invoked by the .caller function. | Estimate of time spent in called function. |
| Total amount of time spent in the function calls made by this function. | Estimate of time spent in calls to its children. | Estimate of time spent in the children of the called function. |
| Number of times this function was called. A +is used to separate recursive calls. | Number of times this function is called; the total number of nonrecursive calls. | Number of times this function is called; the total number of nonrecursive calls. |
| The name and index number of the function. | The name and index number of the function. | The name and index number of the function. |
If we look at the first entry in the call graph output, we note that the primary line is flagged by [1]
. The associated function, Find_Least
, is called by the Do_S_Sort
function 999 times. In turn, the Find_Least
function calls the Compare
function 499,500 times.
The gprof
utility can be directed to display an annotated source code listing where it identifies the number of calls for the function. To produce this output, the source program must be compiled with the -g
option, and gprof
passed the -A
option (indicating annotated source). Using our same source program, this sequence would be
linux$ g++ -g ss.cxx -pg -o ss linux$ ss < data_reverse > /dev/null linux$ gprof -A ss
As shown below, gprof
's output lists the program source code and indicates the number of times each function is called. At the end of the listing, it displays a top 10 list indicating the top 10 lines based on their execution activity. Following the top 10 list is an execution summary.
*** File /home/faculty/gray/revision/profile/ss.cxx: #include <iostream> using namespace std; // Function prototypes void Get_Data ( int [], int ); void Display ( int [], int ); void Do_S_Sort ( int [], int ); int Find_Least( const int [], int, int ); int Compare ( const int &, const int & ); void Swap ( int &, int & ); int ##### -> main( ) { const int max = 1000; int List[max]; Get_Data ( List, max ); // Obtain data cout << "Initial list" << endl; Display ( List, max ); // Show it Do_S_Sort( List, max ); // Sort it cout << "Sorted list" << endl; Display ( List, max ); // Show it again return 0; } // Obtain data to sort from standard input void 1 -> Get_Data(int a[], int n) { cout << "Please enter " << n << " integers" << endl; for(int i=0; i < n; ++i) cin >> a[i]; } // Display the current contents of list void 1001 -> Display(int a[], int n) { for(int i=0; i < n; ++i) cout << " " << a[i]; cout << endl; } // Do the Selection Sort, Display after each pass void 1 -> Do_S_Sort( int a[], int n ){ int index; for (int i=0; i < n-1; ++i){ index=Find_Least( a, i, n ); if ( i != index ) Swap( a[i], a[index] ); cout << "After pass " << i+1 << " : "; Display( a, n ); } } // Find the index of the least element in list int 999 -> Find_Least( const int a[], int start, int stop ){ int Index_of_Least = start; for (int i=start+1; i < stop; ++i ) if ( Compare(a[i], a[Index_of_Least]) ) Index_of_Least = i; return Index_of_Least; } // Compare two data elements int 499500 -> Compare( const int &a, const int &b ){ return ( a < b ); } // Exchange two data elements void 500 -> Swap( int &a, int &b ){ int temp; temp = a; a = b; b = temp; ##### -> } Top 10 Lines: Line Count 58 499500 30 1001 49 999 63 500 23 1 37 1 Execution Summary: 8 Executable lines in this file 8 Lines executed 100.00 Percent of the file executed 502002 Total number of line executions 62750.25 Average executions per line
While the gprof
utility is helpful in analyzing the execution of programs, the current Linux version of this utility does have a serious limitation. As implemented, the utility uses a signal handler to collect profile information. The signal handler is invoked by the periodic generation of a SIGPROF (27) signal. When a child process is generated via a fork
or a new thread is created with a call to pthread_create
, they do not, by default, receive these signals. As a result, no profile data is collected for the child process or new thread.
[1] In this case calling _exit
would not work, as it does not perform the cleanup activities that a call to exit
does.
18.119.28.94