Appendix D. Profiling Programs

Introduction

One way to obtain useful information about the execution of a program is to use the GNU profiler utility gprof. This utility, which is part of most standard Linux installations, provides information on functions your program calls. By analyzing the data it provides, you can often improve the execution speed of a program by revising slow, inefficient sections of code. Additionally, profiling may illuminate bugs that may not have surfaced. The following sections present an overview of how to use gprof. Interested readers seeking additional information on this utility are encouraged to read the manual page for the utility and to visit the official GNU website (www.gnu.org) and peruse the full set of online documentation on gprof.

Profiling a program is a three-step process:

  1. The program is compiled and linked with profiling enabled.

  2. The executable version of the program is run to generate a special profile data file.

  3. The gprof utility is used to analyze and display the profiling data.

Sample Program for Profiling

A selection sort program is used to demonstrate the profiling process. A selection sort is a general-purpose sort that is a variation of an exchange sort. Basically, this sort passes through a list of elements and finds the index (location) of the smallest (least) element. It exchanges the data at this index with the data stored in the first element of the list. It then repeats the selection process with the remaining list to find the second smallest element, and exchanges the data at this index with the data stored in the second element of the list. The process is repeated until the list is ordered.

There are numerous ways to code a selection sort. Below is a C++ version of this sort.

File : ss.cxx
  |     #include <iostream>
  |     using namespace std;               // Function prototypes
  |     void Get_Data  ( int [], int );
  |     void Display   ( int [], int );
  +     void Do_S_Sort ( int [], int );
  |     int  Find_Least( const int [], int, int );
  |     int  Compare   ( const int & , const int & );
  |     void Swap      ( int &, int & );
  |     int
 10     main( ) {
  |       const int max = 10;
  |       int List[max];
  |       Get_Data ( List, max );              // Obtain data
  |       cout << "Initial list" << endl;
  +       Display  ( List, max );              // Show it
  |       Do_S_Sort( List, max );              // Sort it
  |       cout << "Sorted list" << endl;
  |       Display  ( List, max );              // Show it again
  |       return 0;
 20     }
  |     // Obtain data to sort from standard input
  |     void
  |     Get_Data(int a[], int n) {
  |       cout << "Please enter " << n << " integers" << endl;
  +       for(int i=0; i < n; ++i)
  |          cin >> a[i];
  |     }
  |     // Display the current contents of list
  |     void
 30     Display(int a[], int n) {
  |       for(int i=0; i < n; ++i)
  |          cout << " " << a[i];
  |       cout << endl;
  |     }
  +     //  Do the Selection Sort, Display after each pass
  |     void
  |     Do_S_Sort( int a[], int n ){
  |       int index;
  |       for (int i=0; i < n-1; ++i){
 40         index=Find_Least( a, i, n );
  |         if ( i != index )
  |           Swap( a[i], a[index] );
  |         cout << "After pass " << i+1 << " : ";
  |         Display( a, n );
  +       }
  |     }
  |     //  Find the index of the least element in list
  |     int
  |     Find_Least( const int a[], int start, int stop ){
 50       int Index_of_Least = start;
  |       for (int i=start+1; i < stop; ++i )
  |         if ( Compare(a[i], a[Index_of_Least]) )
  |           Index_of_Least = i;
  |       return Index_of_Least;
  +     }
  |     //  Compare two data elements
  |     int
  |     Compare( const int &a, const int &b ){
  |       return ( a < b );
 60     }
  |     //  Exchange two data elements
  |     void
  |     Swap( int &a, int &b ){
  |       int temp;
  +       temp = a;
  |       a    = b;
  |       b    = temp;
  |     }

As shown, six functions are used to implement the sort

FUNCTION

PURPOSE

Get_Data

Obtains the data to be sorted from standard input.

Display

Prints the current contents of the list to standard output.

Do_S_Sort

Performs the selection sort routine.

Find_Least

Traverses a portion of the list to find (and ultimately return) the index of the smallest element.

Compare

Compares two list elements and returns a nonzero value if the first element is less than the second element.

Swap

Exchanges the data for two elements in the list.

At first glance, using so many functions for such a simple algorithm may seem to be overkill. However, coding some statements (such as the compare in line 52) as a function call will facilitate referencing when generating profile information with gprof.

A good portion of the work done by the selection sort is actually performed by the Find_Least function (see line 49). This function is passed the list and two integer values. The first integer value is the starting point for the search through the remainder of the list. The second value is the size of the list (which in this case reflects its end or stopping point). This function works by storing, in the variable Index_of_Least, the index of the element having the smallest value. Initially, this is the index of the current start of the list (see line 50). It then uses a for loop to pass through the reminder of the list, checking each location against the current smallest value to determine if it has found a lesser value. Each check is carried out by the Compare function. If the value is less, the index (not the value) of the location is stored. When the loop finishes, the index of the element with the smallest value is returned to the calling function. If the returned index is not the index of the current head of list, the two values are exchanged by calling the Swap function. If the current value is already in order, no exchange is needed.

Generating Profile Data

To collect profiling information, a program must be compiled and linked with profiling enabled. To accomplish this, the command-line option -pg is passed to the compiler. For example, if our source program was ss.cxx, we would use the following command:

linux$ g++ ss.cxx -pg -o ss

If your program consists of separate source files, which are compiled individually, each intermediate compilation should be passed the -pg option if you want to profile the functions found in the source file. If the debug command-line option -g is also specified, the gprof utility will generate additional line-by-line profiling information. This technique is discussed in detail at the end of this appendix.

Once successfully compiled, the program is run using its normal arguments, input files, and so on. The profiling data generated by the program's execution is stored in a file called gmon.out in the current directory. Should there be an existing gmon.out file, it is overwritten. Note that the gmon.out file will not be produced if the program does not exit normally (i.e., does not return from main or call exit).[1] The gmon.out file is a data file not a plain text file and as such should not be directly printed or displayed on the screen.

Viewing and Interpreting Profile Data

The profile data stored in the gmon.out file is analyzed and displayed using the gprof utility. If called with no options or command-line arguments, gprof assumes that the executable it is working with is a.out and that the profile data for this executable is in the file gmon.out. Also, unless directed otherwise, gprof generates and displays a fully annotated flat profile and a call graph using the collected data. Like most utilities, gprof supports an extensive set of options. We will examine a small subset of its options.

Staying with our selection sort example, as shown previously, the command-line sequence

linux$ g++ ss.cxx -pg -o ss

compiles the source file ss.cxx into the executable file ss. The program is then run. In this example the program obtains its input, via command-line redirection, from a data file called data_reverse. The data file contains the numbers 1 through 10 in reverse order. The output from the program's execution is displayed on the screen.

linux$ ss < data_reverse        <-- 1
Please enter 10 integers
Initial list        <-- 2
10 9 8 7 6 5 4 3 2 1
After pass 1 :  1 9 8 7 6 5 4 3 2 10
After pass 2 :  1 2 8 7 6 5 4 3 9 10
After pass 3 :  1 2 3 7 6 5 4 8 9 10
After pass 4 :  1 2 3 4 6 5 7 8 9 10
After pass 5 :  1 2 3 4 5 6 7 8 9 10        <-- 3
After pass 6 :  1 2 3 4 5 6 7 8 9 10
After pass 7 :  1 2 3 4 5 6 7 8 9 10
After pass 8 :  1 2 3 4 5 6 7 8 9 10
After pass 9 :  1 2 3 4 5 6 7 8 9 10
Sorted list
1 2 3 4 5 6 7 8 9 10
  • (1)Program is run normally; obtains input from a data file.

  • (2)Its output is displayed on the screen.

  • (3)Given the unique nature of the input data (it starts in descending order) and the algorithm's logic, the data is actually sorted before the program terminates!

Once the program is run, the gprof utility is used to analyze and view the profile data. If options are to be passed to gprof, they should be specified individually on the command line following the call to the utility. The name of the executable file comes after the options. If not specified, the default is a.out. The executable file name is optionally followed by the profile file name, which, if not given, defaults to gmon.out. In the example below the -b option (for brief) is specified.

linux$ gprof -b ss        <-- 1
  • (1)Call gprof and pass it the -b option and the name of the executable file (ss). The profile data file, which is not specified, is assumed to be gmon.out.

The output generated by this invocation is divided into two sections, the first of which is a flat profile. This section shows how much time the program spent in each function and how many times the function was called. Sorting 10 items is a trivial task even with the somewhat inefficient selection sort. Therefore, to produce a more instructive data set, the number of values to be sorted was increased to 1,000 and the program recompiled and rerun. To reduce the time needed for display, the output of the executable was discarded by directing it to /dev/null.

linux$ ss < data_reverse > /dev/null

The gprof output below is based on sorting 1,000 values.

Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative  self           self     total
 time  seconds  seconds  calls  us/call  us/call  name
100.00   0.04    0.04     999    40.04    40.04  Find_Least(int const *, int, int)
  0.00   0.04    0.00  499500     0.00     0.00  Compare(int const &, int const &)
  0.00   0.04    0.00    1001     0.00     0.00  Display(int *, int)
  0.00   0.04    0.00     500     0.00     0.00  Swap(int &, int &)
  0.00   0.04    0.00       1     0.00 40000.00  Do_S_Sort(int *, int)
  0.00   0.04    0.00       1     0.00     0.00  Get_Data(int *, int)

In the flat profile section each function is listed in decreasing order based on its runtime (if the runtime is too small to be recorded, it is reported as 0). The meaning of each field in the flat profile is listed in the following table.

FIELD NAME

MEANING

% time

The percentage of total execution time.

cumulative seconds

Total seconds executing this function and all those listed above it in the table.

self seconds

Total seconds executing just this function (the primary sort key).

calls

The number of times this function was called (blank if not called).

self Ts/call

Average number of T seconds spent in this function (m = milli, u = micro, n = nano, etc.).

total Ts/call

Per call, the average number of T seconds in this function and its descendants.

name

Function name (acts as the secondary sort key)

As might be anticipated with a list of 1,000 elements, the Compare function in our example is called 499,500 times (the summation of 999 + 998 + 997 + 1, the series representing the number of compares). Clearly, the Find_Least function is the most time intensive of the group.

The flat profile is followed by a call graph that details the amount of time spent in each function. The output in this section is divided into a series of entries, one per function. Each entry, which is one or more lines in length, is separated from the next by a series of dashes. Within each entry there is one primary line. The primary line indicates the function associated with the entry data. The primary line is easily identified, as it begins with an index number surrounded by set of square brackets. In each entry lines that precede the primary line are functions that call the function, while lines that follow the primary line are functions that are called by the function (in call graph-speak, its children). If the caller of a function cannot be determined, <spontaneous> is printed instead.

Here is the call graph for our selection sort program when it sorts a list of 1,000 values, which were initially in inverse order. For ease of reference, each primary line has been highlighted.

                        Call graph
granularity: each sample hit covers 4 byte(s) for 25.00% of 0.04 seconds

index % time   self  children    called  name
               0.04    0.00     999/999     Do_S_Sort(int *, int) [2]
[1]    100.0   0.04    0.00     999       Find_Least(int const *, int, int) [1]
               0.00    0.00  499500/499500  Compare(int const &, int const &) [4]
-----------------------------------------------
               0.00    0.04       1/1        main [3]
[2]    100.0   0.00    0.04       1     Do_S_Sort(int *, int) [2]
               0.04    0.00     999/999      Find_Least(int const *, int, int) [1]
               0.00    0.00     999/1001     Display(int *, int) [5]
               0.00    0.00     500/500      Swap(int &, int &) [6]
-----------------------------------------------
                                             <spontaneous>
[3]    100.0   0.00    0.04             main [3]
               0.00    0.04       1/1        Do_S_Sort(int *, int) [2]
               0.00    0.00       2/1001     Display(int *, int) [5]
               0.00    0.00       1/1        Get_Data(int *, int) [7]
-----------------------------------------------
               0.00    0.00  499500/499500   Find_Least(int const *, int, int) [1]
[4]      0.0   0.00    0.00  499500     Compare(int const &, int const &) [4]
-----------------------------------------------
               0.00    0.00       2/1001     main [3]
               0.00    0.00     999/1001     Do_S_Sort(int *, int) [2]
[5]      0.0   0.00    0.00    1001     Display(int *, int) [5]
-----------------------------------------------
               0.00    0.00     500/500      Do_S_Sort(int *, int) [2]
[6]      0.0   0.00    0.00     500     Swap(int &, int &) [6]
-----------------------------------------------
               0.00    0.00       1/1        main [3]
[7]      0.0   0.00    0.00       1     Get_Data(int *, int) [7]
-----------------------------------------------
Index by function name
[4] Compare(int const &, int const &) [2] Do_S_Sort(int *, int)
[7] Get_Data(int *, int)              [5] Display(int *, int)
[1] Find_Least(int const *, int, int) [6] Swap(int &, int &)

The meaning of each field is based on its context (i.e., the line's designation): primary, function's callers (call this function), or called functions (called by this function).

FIELD

PRIMARY

FUNCTION'S CALLERS

CALLED FUNCTIONS

index

Index number of this function.

  

% time

Percent of total time spent in this function and its children.

  

self

Total amount of time just spent in this function (same as self seconds value in flat profile).

Estimate of time spent in this function when invoked by the .caller function.

Estimate of time spent in called function.

children

Total amount of time spent in the function calls made by this function.

Estimate of time spent in calls to its children.

Estimate of time spent in the children of the called function.

called

Number of times this function was called. A +is used to separate recursive calls.

Number of times this function is called; the total number of nonrecursive calls.

Number of times this function is called; the total number of nonrecursive calls.

name

The name and index number of the function.

The name and index number of the function.

The name and index number of the function.

If we look at the first entry in the call graph output, we note that the primary line is flagged by [1]. The associated function, Find_Least, is called by the Do_S_Sort function 999 times. In turn, the Find_Least function calls the Compare function 499,500 times.

The gprof utility can be directed to display an annotated source code listing where it identifies the number of calls for the function. To produce this output, the source program must be compiled with the -g option, and gprof passed the -A option (indicating annotated source). Using our same source program, this sequence would be

linux$ g++ -g ss.cxx -pg -o ss
linux$ ss < data_reverse > /dev/null
linux$ gprof -A ss

As shown below, gprof's output lists the program source code and indicates the number of times each function is called. At the end of the listing, it displays a top 10 list indicating the top 10 lines based on their execution activity. Following the top 10 list is an execution summary.

*** File /home/faculty/gray/revision/profile/ss.cxx:
                #include <iostream>
                using namespace std;                   // Function prototypes
                void Get_Data  ( int [], int );
                void Display   ( int [], int );
                void Do_S_Sort ( int [], int );
                int  Find_Least( const int [], int, int );
                int  Compare   ( const int &, const int & );
                void Swap      ( int &, int & );
                int
       ##### -> main( ) {
                  const int max = 1000;
                  int List[max];
                  Get_Data ( List, max );              // Obtain data
                  cout << "Initial list" << endl;
                  Display  ( List, max );              // Show it
                  Do_S_Sort( List, max );              // Sort it
                  cout << "Sorted list" << endl;
                  Display  ( List, max );              // Show it again
                  return 0;
                }
                // Obtain data to sort from standard input
                void
           1 -> Get_Data(int a[], int n) {
                  cout << "Please enter " << n << " integers" << endl;
                  for(int i=0; i < n; ++i)
                     cin >> a[i];
                }
                // Display the current contents of list
                void
        1001 -> Display(int a[], int n) {
                  for(int i=0; i < n; ++i)
                     cout << " " << a[i];
                  cout << endl;
                }
                //  Do the Selection Sort, Display after each pass
                void
           1 -> Do_S_Sort( int a[], int n ){
                  int index;
                  for (int i=0; i < n-1; ++i){
                    index=Find_Least( a, i, n );
                    if ( i != index )
                      Swap( a[i], a[index] );
                    cout << "After pass " << i+1 << " : ";
                    Display( a, n );
                  }
                }
                //  Find the index of the least element in list
                int
         999 -> Find_Least( const int a[], int start, int stop ){
                  int Index_of_Least = start;
                  for (int i=start+1; i < stop; ++i )
                    if ( Compare(a[i], a[Index_of_Least]) )
                      Index_of_Least = i;
                  return Index_of_Least;
                }
                //  Compare two data elements
                int
      499500 -> Compare( const int &a, const int &b ){
                  return ( a < b );
                }
                //  Exchange two data elements
                void
         500 -> Swap( int &a, int &b ){
                  int temp;
                  temp = a;
                  a    = b;
                  b    = temp;
       ##### -> }
Top 10 Lines:
     Line      Count
       58     499500
       30       1001
       49        999
       63        500
       23          1
       37          1
Execution Summary:
        8   Executable lines in this file
        8   Lines executed
   100.00   Percent of the file executed
   502002   Total number of line executions
 62750.25   Average executions per line

While the gprof utility is helpful in analyzing the execution of programs, the current Linux version of this utility does have a serious limitation. As implemented, the utility uses a signal handler to collect profile information. The signal handler is invoked by the periodic generation of a SIGPROF (27) signal. When a child process is generated via a fork or a new thread is created with a call to pthread_create, they do not, by default, receive these signals. As a result, no profile data is collected for the child process or new thread.



[1] In this case calling _exit would not work, as it does not perform the cleanup activities that a call to exit does.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.94