Day 17: Profiling and Optimization

When your program is complete and runs well, it is time to make it smaller and faster. The compiler can help, but there are advanced professional techniques and tools that can do a far better job than any compiler.

Today you will learn

Optimizing Your Code

Programming is, to coin a phrase, the art of compromise. You constantly are trading off speed for size, memory for storage, development time for maintenance time, and so on. Most professional level C++ compilers do an excellent job optimizing your code for size or performance, but you must decide which to accentuate.

For some applications, size is critical. Your program must fit on one disk, or it must be transferred easily by modem when your customer is paying by the minute. For most applications, however, size is less important than performance. Most customers will put up with a slightly larger program in exchange for a slightly faster program.

Programming Optimization

There are a number of places you can optimize your own code even before your compiler lends a hand. Inline code can be seen as an optimization of performance for size. Each inline function is repeated every time it is called, thus bloating the size of the code. The overhead of a function call is avoided, however, which slightly speeds up your program. The rule of thumb I use is, if the function is one line, consider making it inline.

This and most other hand-optimizations, however, are best left to the compiler. Modern C++ compilers can optimize away inline code, the creation of unneeded temporaries, the creation of unreferenced variables, and so on. In order to go beyond what your compiler can give you, you need to use a profiler, as described in the next section.

Compiler Optimization

Don't underestimate the power of your compiler to optimize your code. Borland, Microsoft, Symantec, and all the major vendors offer optimizing compilers that can do a fantastic job of making your program smaller and faster. In fact, many programmers choose their compiler principally on its success at optimization. Check the reviews of the various C++ compilers in the programming magazines to find out how they stack up against one another in speed and size optimization and flexibility.

One thing to keep in mind is that while you are creating and debugging your code, turn optimization off! An optimizing compiler is free to rearrange your code, to collapse variables, to stash away data in registers, and so on. Although these actions are highly desirable in your run-time commercial release, it can undermine your debugging efforts, as described on day 20.

Most professional compilers offer a variety of settings for optimization, but the three most important follow:

  1. Turn off optimization.

  2. Minimize space.

  3. Optimize speed.

Using Profilers

To go beyond what your compiler can offer you in performance optimization, you need to examine your program in detail to find out where it is spending its time. This is called profiling your program, you generate a profile of what gets called, and how often and how long each function call or other event takes.

Studies have shown that most programmers don't profile their programs, and that most programmers guess incorrectly about where to optimize their code. The distinguishing characteristic of very fast programs is not the programmer's skill in writing tight, fast code, but instead the programmer's willingness to profile the final product and then to tweak the important areas.

Because profilers are programs, and because each compiler vendor offers a different profiler, most advanced programming books avoid discussing them. It is difficult to be precise about how a profiler works, because each profiling program works differently.

Nonetheless, profiling is a critical subject for a professional programmer, and there are many things that can be said here that will apply to all profilers. The overall approach is very similar across the various profilers, and the techniques you learn today will apply regardless of which product you end up using. Of course, you should carefully review the documentation of your specific profiler to find out what it offers and how to invoke it.

For purposes of illustration, I'll be using the profiler provided with Microsoft Visual C++ version 1.5. I will not be discussing the syntax of invoking this profiler, but I will examine those aspects of its reports that are universal to all profilers. Once again, your output might look slightly different, but the information should be essentially the same.

Profiling Your Program

To illustrate the points made in this chapter, reexamine the selection sort shown on day 4. Listing 17.1 is the code for the first version of this program.

Listing 17.1 Using a Selection Sort

    1:     // Listing 4.1 - Using a Selection Sort
    2:
    3:     #include <iostream.h>
    4:     #include <string.h>
    5:
    6:     void Swap( char *Array, int index1, int index2);
    7:     void SelectionSort(char *Array);
    8:
    9:     int main()
    10:    {
    11:       char buffer[100];
    12:       cout << "Enter up to 100 characters: ";
    13:       cin.getline(buffer,100);
    14:       cout << "Unsorted:\t" << buffer << endl;
    15:       SelectionSort(buffer);
    16:       cout << "Sorted:\t\t" << buffer << endl;
    17:       return 0;
    18:    }
    19:
    20:    // Read through each member of the array in turn
    21:    // For every member, examine every remaining member and
    22:    // swap with smallest
    23:    void SelectionSort(char *Array)
    24:    {
    25:       int ArrayLen = strlen(Array);
    26:       for (int i = 0; i<ArrayLen; i++)
    27:       {
    28:          int min = i;
    29:          for (int j = i+1; j< ArrayLen; j++)
    30:             if (Array[j] < Array[min])
    31:                min = j;
    32:          Swap(Array,min,i);
    33:       }
    34:    }
    35:
    36:    void Swap( char *Array, int left, int right)
    37:    {
    38:       char tmp = Array[left];
    39:       Array[left]=Array[right];
    40:       Array[right]=tmp;
    41:    }
Output:
    Enter up to 100 characters: Eternal vigilance is the price of liberty

    Unsorted:       Eternal vigilance is the price of liberty

    Sorted:               Eaabcceeeeefghiiiiilllnnoprrrstttvy

Profiling Function Count

The first and simplest statistic about this program is how many function calls it makes. An analysis of this simple question often can point out where a program is spending much of its time. Listing 17.2 shows the function count report for this program.

Listing 17.2 Function Count Report

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function counting, sorted by counts.
    4:     Date:    Sat Jan 21 16:44:47 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total functions: 8
    10:        Total hits: 54
    11:        Function coverage: 100.0%
    12:
    13:    Module Statistics for d:\112\day17\1701.exe
    14:    -------------------------------------------
    15:        Functions in module: 8
    16:        Hits in module: 54
    17:        Module function coverage: 100.0%
    18:
    19:     Hit
    20:    count   %   Function
    21:    --------------------
    22:       41  75.9 Swap (1701.cpp:37)
    23:        4   7.4 ostream::operator<< (ostream.h:85)
    24:        2   3.7 ostream::operator<< (ostream.h:88)
    25:        2   3.7 flush (ostream.h:118)
    26:        2   3.7 endl (ostream.h:119)
    27:        1   1.9 istream::getline (istream.h:118)
    28:        1   1.9 main (1701.cpp:10)
    29:        1   1.9 SelectionSort (1701.cpp:24)
    30:
Analysis:

The first few lines of this report and all profiler reports identify what you tested and when. Lines 9 and 10 indicate that the profiler found calls to eight different functions and that a total of 54 function calls were made.

Of the 54 function calls, nearly 76 percent (41 calls) were made to the Swap() function. The report indicates that this is shown in line 37 of 1701.cpp, the file that holds this program.

Note: The line numbers included in these reports will not always correspond to the line numbers shown in the listings in this book, due to special publishing considerations. In your own programs, these line numbers should correspond to the line numbers in your program.

Note that operator<< is reported on twice, due to operator overloading. The profiler reports on the different types of calls to cout operator <<.

Examining Time versus Occurrence

So far, all you know is that Swap() is being called a lot. Although it is tempting to examine whether you can reduce the number of calls, the first thing you must determine is how long each call takes. After all, if calling Swap() 66 times takes less time than calling operator << six times, then you may want to turn your attention to cout. The next report, shown in listing 17.3, is a report of where this program is spending its time.

Listing 17.3 Time Report

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function timing, sorted by time.
    4:     Date:    Sat Jan 21 16:47:51 1995
    5:
    6:     Program Statistics
    7:     ------------------
    8:         Total time: 6907.295 milliseconds
    9:         Time outside of functions: 2.803 milliseconds
    10:        Call depth: 5
    11:        Total functions: 8
    12:        Total hits: 54
    13:        Function coverage: 100.0%
    14:
    15:    Module Statistics for d:\112\day17\1701.exe
    16:    -------------------------------------------
    17:        Time in module: 6904.491 milliseconds
    18:        Percent of time in module: 100.0%
    19:        Functions in module: 8
    20:        Hits in module: 54
    21:        Module function coverage: 100.0%
    22:
    23:           Func        Func+Child        Hit
    24:           Time   %       Time      %   count Function
    25:    --------------------------------------------------
    26:       6881.761  99.7    6881.761  99.7     1 istream::getline (istream.h:118)
    27:         22.008   0.3      22.008   0.3     2 flush (ostream.h:118)
    28:          0.361   0.0    6904.491 100.0     1 main (1701.cpp:10)
    29:          0.144   0.0       0.144   0.0    41 Swap (1701.cpp:37)
    30:          0.117   0.0      22.184   0.3     4 ostream::operator<< (ostream.h:85)
    31:          0.058   0.0       0.058   0.0     2 ostream::operator<< (ostream.h:88)
    32:          0.041   0.0       0.185   0.0     1 SelectionSort (1701.cpp:24)
    33:          0.002   0.0      22.132   0.3     2 endl (ostream.h:119)
Analysis:

This report indicates that the entire running time of the program was just less than seven seconds (there are 1,000 milliseconds in each second). The report goes on to tell you that it took 2.803 milliseconds (line 9) between the start of the program and the first function call.

Line 10 tells you that the deepest nesting of functions was five. This information can be vital when examining recursive functions.

The timing section reveals critical information. Although the Swap() method was called 41 times, and getline was called only once, 99.7 percent of the time spent in this program was spent in getline!

This would seem to be a significant datum indeed, until you realize that nearly all that time was spent waiting for the user to input data. Clearly this line has no significance, and if you run the program again and type slowly, this number will skyrocket!

An examination of the rest of the numbers is interesting, however; 22 milliseconds were spent on the two flush calls, but only .144 millisecond was spent in all 41 calls to Swap(). Eureka! If you do want to optimize this program, the calls to flush() are clearly candidates for reexamination.

But where is the call to flush()? Try as you might, you will not find it in listing 17.3. Remember, however, that endl does call flush(). There's the culprit. If speed were critical in this program, you might want to consider replacing the two endl calls with a simple line feed ("\n").

Profiling by Line

In addition to telling you how often your function is called and how long the program spends in each function, it is possible to break down your profile by line number. Listing 17.4 illustrates this by profiling every line in program 1701.cpp.

Listing 17.4 Profiling by Line

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Line timing, sorted by time.
    4:     Date:    Sat Jan 21 17:03:15 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total time: 5767.145 milliseconds
    10:        Time before any line: 1.080 milliseconds
    11:        Total lines: 28
    12:        Total hits: 2077
    13:        Line coverage: 100.0%
    14:
    15:    Module Statistics for d:\112\day17\1701.exe
    16:    -------------------------------------------
    17:        Time in module: 5766.065 milliseconds
    18:        Percent of time in module: 100.0%
    19:        Lines in module: 28
    20:        Hits in module: 2077
    21:        Module line coverage: 100.0%
    22:
    23:           Line        Hit
    24:           Time   %   count Line
    25:    ----------------------------
    26:       5706.451  99.0     1 (istream.h:118)
    27:         23.561   0.4   820 (1701.cpp:32)
    28:         16.695   0.3   820 (1701.cpp:30)
    29:          9.828   0.2     2 (ostream.h:118)
    30:          3.164   0.1    41 (1701.cpp:39)
    31:          1.793   0.0     1 (1701.cpp:18)
    32:          1.162   0.0    86 (1701.cpp:31)
    33:          0.660   0.0    41 (1701.cpp:40)
    34:          0.532   0.0    41 (1701.cpp:29)
    35:          0.377   0.0    41 (1701.cpp:37)
    36:          0.320   0.0    41 (1701.cpp:41)
    37:          0.289   0.0    41 (1701.cpp:38)
    38:          0.254   0.0    41 (1701.cpp:33)
    39:          0.234   0.0     1 (1701.cpp:14)
    40:          0.218   0.0    41 (1701.cpp:28)
    41:          0.135   0.0     1 (1701.cpp:12)
    42:          0.125   0.0     1 (1701.cpp:16)
    43:          0.082   0.0     2 (ostream.h:88)
    44:          0.060   0.0     1 (1701.cpp:17)
    45:          0.035   0.0     4 (ostream.h:85)
    46:          0.022   0.0     1 (1701.cpp:15)
    47:          0.018   0.0     1 (1701.cpp:10)
    48:          0.016   0.0     2 (ostream.h:119)
    49:          0.009   0.0     1 (1701.cpp:25)
    50:          0.008   0.0     1 (1701.cpp:13)
    51:          0.007   0.0     1 (1701.cpp:26)
    52:          0.006   0.0     1 (1701.cpp:34)
    53:          0.005   0.0     1 (1701.cpp:24)
Analysis:

In this particular case, listing 17.4 provides little new information. It is likely, however, that when you are examining larger, more complex functions, the capability to zero in on a particular line can be very useful.

Note that line-by-line profiling can take a very long time and produce a very large report. Most profilers will let you specify which range of lines and which functions you want to profile.

Using Coverage Profiling

For large programs, where profiling can take a very long time, a good first approximation can be achieved with a coverage analysis. This analysis tells you which lines have been executed without telling you how often they are called or how long they take. The idea is to tell you what parts of your program actually are being used, and which are laying dormant. Listing 17.5 shows the coverage report.

Listing 17.5 Coverage Report

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Line coverage, sorted by line.
    4:     Date:    Sat Jan 21 17:10:38 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total lines: 28
    10:        Line coverage: 100.0%
    11:
    12:    Module Statistics for d:\112\day17\1701.exe
    13:    -------------------------------------------
    14:        Lines in module: 28
    15:        Module line coverage: 100.0%
    16:
    17:    Source file: c:\msvc\include\ostream.h
    18:
    19:     >>> analysis here of header files!
    20:
    21:    Source file: d:\112\day17\1701.cpp
    22:
    23:      Line Covered Source
    24:    ---------------------
    25:        1:         //Listing 4.1 - Demonstrates Selection Sort
    26:        2:
    27:        3:         #include <iostream.h>
    28:        4:         #include <string.h>
    29:        5:
    30:        6:         void Swap( char *Array, int index1, int index2);
    31:        7:         void SelectionSort(char *Array);
    32:        8:
    33:        9:         int main()
    34:       10:    *    {
    35:       11:            char buffer[100];
    36:       12:    *       cout << "Enter up to 100 characters: ";
    37:       13:    *       cin.getline(buffer,100);
    38:       14:    *       cout << "Unsorted:\t" << buffer << endl;
    39:       15:    *       SelectionSort(buffer);
    40:       16:    *       cout << "Sorted:\t\t" << buffer << endl;
    41:       17:    *       return 0;
    42:       18:    *    }
    43:       19:
    44:       20:         // Read through each member of the array in turn
    45:       21:         // For every member, examine every remaining member and
    46:       22:         // swap with smallest
    47:       23:         void SelectionSort(char *Array)
    48:       24:    *    {
    49:       25:    *       int ArrayLen = strlen(Array);
    50:       26:    *       for (int i = 0; i<ArrayLen; i++)
    51:       27:            {
    52:       28:    *          int min = i;
    53:       29:    *          for (int j = i+1; j< ArrayLen; j++)
    54:       30:    *             if (Array[j] < Array[min])
    55:       31:    *                min = j;
    56:       32:    *          Swap(Array,min,i);
    57:       33:    *       }
    58:       34:    *    }
    59:       35:
    60:       36:         void Swap( char *Array, int left, int right)
    61:       37:    *    {
    62:       38:    *       char tmp = Array[left];
    63:       39:    *       Array[left]=Array[right];
    64:       40:    *       Array[right]=tmp;
    65:       41:    *    }
Analysis:

A complete coverage analysis of the included files normally would appear in lines 17 through 21. In this case, however, those files are copyright by Microsoft, and therefore are not reproduced here. This coverage report shows an asterisk (*) next to each line that actually is called. In listing 17.5, every line except the header file lines is called.

Typically, you can ask for coverage on a line-by-line or function-by-function basis, just as you can for time and count.

Putting the Profile to Use

On day 4, I illustrated two versions of the Bubble Sort. Listing 17.6 shows the earlier, nonoptimized version.

Listing 17.6 Bubble Sort

    1:         #include <iostream.h>
    2:         #include <string.h>
    3:         void BubbleSort(char *Array);
    4:         void swap(char& i, char& j);
    5:
    6:         // Ask for a buffer full of characters
    7:         // Use the bubble sort to sort 'em
    8:         int main()
    9:         {
    10:
    11:           char * buffer = "Teach Yourself More C++ in 21 Days by Jesse
                  Liberty";
    12:
    13:           cout << "Unsorted:\t" << buffer << endl;
    14:
    15:           BubbleSort(buffer);
    16:
    17:           cout << "Sorted:\t\t" << buffer << endl;
    18:
    19:           return 0;
    20:        }
    21:
    22:        enum BOOL {FALSE, TRUE};
    23:
    24:        // Examine each member in turn, bubbling into place
    25:        // any smaller member
    26:        void BubbleSort(char *Input)
    27:        {
    28:           int N = strlen(Input);
    29:           for (int i = 0; i< N; i++)
    30:           {
    31:              cout << "i: " << i << " buffer: " << Input << endl;
    32:              for (int j = N-1; j>0; j--)
    33:                 if (Input[j-1] > Input[j])
    34:                    swap(Input[j-1],Input[j]);
    35:           }
    36:        }
    37:
    38:        void swap(char& i, char& j)
    39:        {
    40:           char temp;
    41:           temp = j;
    42:           j = i;
    43:           i = temp;
    44:        }

Listing 17.7 shows the count analysis of this program, and listing 17.8 shows the time analysis. Note that the program has been modified to use a preset string for sorting. This eliminates the variable of user input time.

Listing 17.7 Count Analysis of Bubble Sort

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function counting, sorted by counts.
    4:     Date:    Sat Jan 21 17:22:34 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total functions: 7
    10:        Total hits: 822
    11:        Function coverage: 100.0%
    12:
    13:    Module Statistics for d:\112\day17\1706.exe
    14:    -------------------------------------------
    15:        Functions in module: 7
    16:        Hits in module: 822
    17:        Module function coverage: 100.0%
    18:
    19:     Hit
    20:    count   %   Function
    21:    --------------------
    22:      555  67.5 swap (1701.cpp:39)
    23:      106  12.9 ostream::operator<< (ostream.h:85)
    24:       53   6.4 ostream::operator<< (ostream.h:88)
    25:       53   6.4 flush (ostream.h:118)
    26:       53   6.4 endl (ostream.h:119)
    27:        1   0.1 main (1701.cpp:9)
    28:        1   0.1 BubbleSort (1701.cpp:27)
Analysis:

Note that the vast majority of hits are on the Swap() function (67.5 percent). Examination of the time spent in the Swap() function is shown in listing 17.8.

Listing 17.8 Time Analysis of Bubble Sort

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function timing, sorted by time.
    4:     Date:    Sat Jan 21 17:23:20 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total time: 648.279 milliseconds
    10:        Time outside of functions: 2.441 milliseconds
    11:        Call depth: 6
    12:        Total functions: 7
    13:        Total hits: 822
    14:        Function coverage: 100.0%
    15:
    16:    Module Statistics for d:\112\day17\1706.exe
    17:    -------------------------------------------
    18:        Time in module: 645.838 milliseconds
    19:        Percent of time in module: 100.0%
    20:        Functions in module: 7
    21:        Hits in module: 822
    22:        Module function coverage: 100.0%
    23:
    24:          Func       Func+Child        Hit
    25:          Time   %      Time      %   count Function
    26:    ------------------------------------------------
    27:       609.947  94.4    609.947  94.4    53 flush (ostream.h:118)
    28:        11.671   1.8     11.671   1.8   555 swap (1701.cpp:39)
    29:        10.744   1.7    617.940  95.7     1 BubbleSort (1701.cpp:27)
    30:         5.844   0.9    623.206  96.5   106 ostream::operator<< (ostream.h:85)
    31:         4.267   0.7      4.267   0.7    53 ostream::operator<< (ostream.h:88)
    32:         3.149   0.5    620.073  96.0    53 endl (ostream.h:119)
    33:         0.216   0.0    645.838 100.0     1 main (1701.cpp:9)
Analysis:

Once again, flush is very expensive, as one would expect. The second most expensive activity in this program is the repeated calls to Swap(). Listing 17.9 provides a time analysis of every line in the Bubble Sort.

Listing 17.9 Time Analysis of Each Line of Bubble Sort

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Line timing, sorted by line.
    4:     Date:    Sat Jan 21 17:23:54 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total time: 897.300 milliseconds
    10:        Time before any line: 1.037 milliseconds
    11:        Total lines: 25
    12:        Total hits: 8808
    13:        Line coverage: 100.0%
    14:
    15:    Module Statistics for d:\112\day17\1706.exe
    16:    -------------------------------------------
    17:        Time in module: 896.263 milliseconds
    18:        Percent of time in module: 100.0%
    19:        Lines in module: 25
    20:        Hits in module: 8808
    21:        Module line coverage: 100.0%
    22:
    23:    Source file: c:\msvc\include\ostream.h
    24:
    25:
    26:    Source file: d:\112\day17\1706.cpp
    27:
    28:                  Line        Hit
    29:      Line        Time   %   count Source
    30:    -------------------------------------
    31:        1:                             #include <iostream.h>
    32:        2:                             #include <string.h>
    33:        3:                             void BubbleSort(char *Array);
    34:        4:                             void swap(char& i, char& j);
    35:        5:
    36:        6:                             // Ask for a buffer full of characters
    37:        7:                             // Use the bubble sort to sort 'em
    38:        8:                             int main()
    39:        9:       0.022   0.0     1     {
    40:       10:
    41:       11:       0.080   0.0     1        char * buffer = "Teach Yourself ...
    42:       12:
    43:       13:       0.180   0.0     1        cout << "Unsorted:\t" << buffer <<
                                                 endl;
    44:       14:
    45:       15:       0.034   0.0     1        BubbleSort(buffer);
    46:       16:
    47:       17:       0.136   0.0     1        cout << "Sorted:\t\t" << buffer <<
                                                 endl;
    48:       18:
    49:       19:       0.034   0.0     1        return 0;
    50:       20:       1.471   0.2     1     }
    51:       21:
    52:       22:                             enum BOOL {FALSE, TRUE};
    53:       23:
    54:       24:                             // Examine each member in turn, bubbling
                                                 into place
    55:       25:                             // any smaller member
    56:       26:                             void BubbleSort(char *Input)
    57:       27:       0.018   0.0     1     {
    58:       28:       0.022   0.0     1        int N = strlen(Input);
    59:       29:       0.018   0.0     1        for (int i = 0; i< N; i++)
    60:       30:                                {
    61:       31:      15.928   1.8    51           cout << "i: " << i << " buffer: "
                                                    << Input << endl;
    62:       32:       3.014   0.3    51           for (int j = N-1; j>0; j--)
    63:       33:     107.410  12.0  2550              if (Input[j-1] > Input[j])
    64:       34:      12.074   1.3   555                 swap(Input[j-1],Input[j]);
    65:       35:      81.884   9.1  2550        }
    66:       36:       0.018   0.0     1     }
    67:       37:
    68:       38:                             void swap(char& i, char& j)
    69:       39:      30.049   3.4   555     {
    70:       40:                                char temp;
    71:       41:      19.049   2.1   555        temp = j;
    72:       42:      14.348   1.6   555        j = i;
    73:       43:      36.621   4.1   555        i = temp;
    74:       44:      14.992   1.7   555     }
Analysis:

In many ways, this is the most interesting report. It indicates that the conditional test in line 63, if (Input[j-1] > Input[j], is very expensive, taking 12 percent of the time and being called 2,550 times. The 555 calls to Swap() pales next to this call.

False Optimizations

Programmers often are called on to tighten up their code, making their programs smaller, faster, or both. Although it is tempting to dive in and make a few quick optimizations, it is imperative that you first profile your code, or you run the risk of optimizing the wrong thing. There is little benefit in optimizing a segement of code that is called only once, while ignoring a second section of code that is called thousands of times.

On day 4, I optimized the bubble sort as shown in listing 17.10.

Listing 17.10 Optimized Bubble Sort

    1:          #include <iostream.h>
    2:         #include <string.h>
    3:         void BubbleSort(char *);
    4:         void BetterBubble(char *);
    5:         void swap(char& i, char& j);
    6:
    7:         // Ask for a buffer full of characters
    8:         // Use the bubble sort to sort 'em
    9:         int main()
    10:        {
    11:
    12:           char * buffer = "Teach Yourself More C++ in 21 Days by Jesse
                  Liberty";
    13:           char * buff2 = "Teach Yourself More C++ in 21 Days by Jesse Liberty";
    14:
    15:           cout << "Unsorted:\t" << buffer << endl;
    16:
    17:           BubbleSort(buffer);
    18:
    19:           cout << "Sorted:\t\t" << buffer << endl;
    20:
    21:           BetterBubble(buff2);
    22:
    23:           cout << "Sorted:\t\t" << buff2 << endl;
    24:
    25:           return 0;
    26:        }
    27:
    28:        enum BOOL {FALSE, TRUE};
    29:        // Examine each member in turn, bubbling into place
    30:        // any smaller member
    31:        void BubbleSort(char *Input)
    32:        {
    33:           int N = strlen(Input);
    34:           int compare = 0;
    35:           int didSwap = 0;
    36:           for (int i = 0; i< N; i++)
    37:           {
    38:              for (int j = N-1; j>0; j--)
    39:              {
    40:                compare++;
    41:                 if (Input[j-1] > Input[j])
    42:                 {
    43:                    didSwap++;
    44:                    swap(Input[j-1],Input[j]);
    45:                 }
    46:              }
    47:           }
    48:           cout << compare << " compares; " << didSwap << " swaps" << endl;
    49:        }
    50:
    51:       void BetterBubble(char *Input)
    52:       {
    53:          int n = strlen(Input);
    54:           int compare = 0;
    55:           int didSwap = 0;
    56:          BOOL swapped = TRUE;
    57:
    58:          for (int i=0; swapped; i++)
    59:          {
    60:             swapped = FALSE;
    61:             for (int j=n-1;j>i; j--)
    62:             {
    63:                compare++;
    64:                if (Input[j-1] > Input[j])
    65:                {
    66:                   swap(Input[j-1], Input[j]);
    67:                   swapped = TRUE;
    68:                   didSwap++;
    69:                }
    70:             }
    71:          }
    72:           cout << compare << " compares; " << didSwap << " swaps" << endl;
    73:       }
    74:
    75:        void swap(char& i, char& j)
    76:        {
    77:           char temp;
    78:           temp = j;
    79:           j = i;
    80:           i = temp;
    81:       }
Analysis:

This program took the basic bubble sort, and improved on it, producing BetterBubble(). The basis of the improvement was a pair of observations. The first is that if you are sorting upward, after you move a value into position, there is no reason to compare other values with that value ever again. That is, if you bubble the lowest number to the top of the array, there is no reason to compare other numbers to see whether they are lower than that number; after all, that is what a bubble sort does, it moves the lowest number up into position, and then it moves the second lowest, and so on.

The second observation is that if you make no swaps in any comparison all the way through one iteration, you are done; no future iterations will create swaps either. The driver program runs both approaches to illustrate the performance improvement.

Profile analysis will show how much of an improvement actually was registered. Listing 17.11 shows the count profile.

Listing 17.11 Count Analysis of the Improved Bubble Sort

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function counting, sorted by counts.
    4:     Date:    Sat Jan 21 17:40:13 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total functions: 8
    10:        Total hits: 1138
    11:        Function coverage: 100.0%
    12:
    13:    Module Statistics for d:\112\day17\1701.exe
    14:    -------------------------------------------
    15:        Functions in module: 8
    16:        Hits in module: 1138
    17:        Module function coverage: 100.0%
    18:
    19:     Hit
    20:    count   %   Function
    21:    --------------------
    22:     1110  97.5 swap (1701a.cpp:76)
    23:       10   0.9 ostream::operator<< (ostream.h:85)
    24:        5   0.4 ostream::operator<< (ostream.h:88)
    25:        5   0.4 flush (ostream.h:118)
    26:        5   0.4 endl (ostream.h:119)
    27:        1   0.1 main (1701a.cpp:10)
    28:        1   0.1 BubbleSort (1701a.cpp:32)
    29:        1   0.1 BetterBubble (1701a.cpp:52)
Analysis:

This report is not promising. There is no distinction made between the various calls to Swap(), although you do note that swap is called 1,110 times. When only the original Bubble() program was run, it was called 555 times, so there seems to be no savings there. Listing 17.12 provides the timing analysis.

Listing 17.12 Timing Analysis

    1:     Microsoft PLIST Version 1.30.200
    2:
    3:     Profile: Function timing, sorted by time.
    4:     Date:    Sat Jan 21 17:40:32 1995
    5:
    6:
    7:     Program Statistics
    8:     ------------------
    9:         Total time: 129.902 milliseconds
    10:        Time outside of functions: 2.398 milliseconds
    11:        Call depth: 6
    12:        Total functions: 8
    13:        Total hits: 1138
    14:        Function coverage: 100.0%
    15:
    16:    Module Statistics for d:\112\day17\1701.exe
    17:    -------------------------------------------
    18:        Time in module: 127.505 milliseconds
    19:        Percent of time in module: 100.0%
    20:        Functions in module: 8
    21:        Hits in module: 1138
    22:        Module function coverage: 100.0%
    23:
    24:          Func       Func+Child        Hit
    25:          Time   %      Time      %   count Function
    26:    ------------------------------------------------
    27:        59.561  46.7     59.561  46.7     5 flush (ostream.h:118)
    28:        33.126  26.0     33.126  26.0  1110 swap (1701a.cpp:76)
    29:        30.072  23.6     51.260  40.2     1 BubbleSort (1701a.cpp:32)
    30:         3.818   3.0     19.460  15.3     1 BetterBubble (1701a.cpp:52)
    31:         0.585   0.5     60.302  47.3    10 ostream::operator<< (ostream.h:85)
    32:         0.187   0.1    127.505 100.0     1 main (1701a.cpp:10)
    33:         0.115   0.1      0.115   0.1     5 ostream::operator<< (ostream.h:88)
    34:         0.041   0.0     59.996  47.1     5 endl (ostream.h:119)
Analysis:

Here, you do see substantial improvement. The call to BubbleSort() took 30 milliseconds, and the call to BetterBubble() took only 3 millisecondsa substantial savings.

Applying the Lessons

Although these sample programs show only trivial improvements in performance, the lesson is clear. Profiling can help you target those areas of your program where changes can make a big difference. The question now arises, "What kind of changes?"

Tightening code is rarely an answer. Although making a given function inline or using a particular construct might save you a minimal amount of time, the really big savings are in changing your algorithm.

If you have a function that is called hundreds or thousands of times, it only makes sense to make sure that it runs very, very fast.

If you have a slow function that is called only once or twice, in most large programs, fixing that function will not help very much. This is not so in the tiny programs examined so far, but in most real-world programs, the savings from changing an endl() call to a new line will not amount to a significant percentage of the actual program time.

Summary

Today you saw how to use a profiler to examine your code. You saw how to count the number of calls to a function and how often a line of code is used. Finally, you learned how to examine the amount of time each function or line of code takes to execute.

Modern compilers can offer tremendous optimization without any effort on your part, but targeted improvements to critical areas of the code can make a significant difference in the overall performance of the program. Profilers differ in the reports they generate and the syntax used to invoke them, but all provide a way to analyze how your program is running and where to put your efforts.

Q&A

Q: Why bother profiling your code when the optimizing compiler already has done all it can?

A: The point of profiling is to find those areas of the code where you can change an algorithm or otherwise manipulate your code to achieve significant savings.

Q: What is the purpose of coverage analysis?

A: Coverage analysis helps you focus on those areas of your code where the action is. There is little point in fine-tuning a function that is rarely called.

Q: How big must a program be to warrant profiling?

A: Any program larger than a single function can benefit from profiling. Larger programs, of course, get the most benefit because they have more corners in which inefficiencies can hide.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.

Quiz

  1. What is the difference between optimization and profiling?

  2. If function A() is called 1,000 percent more often than function B(), which one is the better candidate for performance improvement?

  3. In listing 17.3, the total time was reported as 6907.295 milliseconds. How many seconds is this?

[Click here for Answers]

Exercises

Note that the exercises for the profiler are dependent on the specific software provided by your compiler vendor, and the results will vary based on the speed of your computer and its disks. Answers are not provided for these exercises, but you are strongly encouraged to examine your results and to use your profiler sufficiently to become comfortable with its operation and results.

  1. Read your documentation and determine how your profiler works. Run a profile of the QuickSort() example in listing 4.10, shown in day 4.

  2. Examine the results, speculate on what changes might be made, and in which areas to focus.

  3. Examine the improvement to QuickSort() in listing 4.11 of day 4. Run a profile and compare the results to those obtained from exercise 2.

  4. Run a profile of the entire ROBIN program shown at the end of day 16. Are any of the timings surprising?

Go to: Table of Contents | Next Page