One long-standing piece of conventional wisdom that shouldn't be left unmentioned is the advice that when you run into a performance bottleneck, you should recode in a low-level language. If you're coding in C++, the low-level language might be assembler. If you're coding in Python, the low-level language might be C. Recoding in a lowlevel language tends to improve both speed and code size. Here is a typical approach to optimizing with a low-level language:
Write 100 percent of an application in a high-level language.
Fully test the application, and verify that it's correct.
If performance improvements are needed after that, profile the application to identify hot spots. Since about 5 percent of a program usually accounts for about 50 percent of the running time, you can usually identify small pieces of the program as hot spots.
For details on the phenomenon of a small percentage of a program accounting for most of its run time, see "The Pareto Principle" in Introduction to Code Tuning.
Recode a few small pieces in a low-level language to improve overall performance.
Whether you follow this well-beaten path depends on how comfortable you are with low-level languages, how well-suited the problem is to low-level languages, and on your level of desperation. I got my first exposure to this technique on the Data Encryption Standard program I mentioned in the previous chapter. I had tried every optimization I'd ever heard of, and the program was still twice as slow as the speed goal. Recoding part of the program in assembler was the only remaining option. As an assembler novice, about all I could do was make a straight translation from a high-level language to assembler, but I got a 50 percent improvement even at that rudimentary level.
Suppose you have a routine that converts binary data to uppercase ASCII characters. The next example shows the Delphi code to do it:
Example 26-41. Delphi Example of Code That's Better Suited to Assembler
procedure HexExpand( var source: ByteArray; var target: WordArray; byteCount: word ); var index: integer; targetIndex: integer; begin targetIndex := 1; for index := 1 to byteCount do begin target[ targetIndex ] := ( (source[ index ] and $F0) shr 4 ) + $41; target[ targetIndex+1 ] := (source[ index ] and $0f) + $41; targetIndex := targetIndex + 2; end; end;
Although it's hard to see where the fat is in this code, it contains a lot of bit manipulation, which isn't exactly Delphi's forte. Bit manipulation is assembler's forte, however, so this code is a good candidate for recoding. Here's the assembler code:
Example 26-42. Example of a Routine Recoded in Assembler
procedure HexExpand( var source; var target; byteCount : Integer ); label EXPAND; asm MOV ECX,byteCount // load number of bytes to expand MOV ESI,source // source offset MOV EDI,target // target offset XOR EAX,EAX // zero out array offset EXPAND: MOV EBX,EAX // array offset MOV DL,[ESI+EBX] // get source byte MOV DH,DL // copy source byte AND DH,$F // get msbs ADD DH,$41 // add 65 to make upper case SHR DL,4 // move lsbs into position AND DL,$F // get lsbs ADD DL,$41 // add 65 to make upper case SHL BX,1 // double offset for target array offset MOV [EDI+EBX],DX // put target word INC EAX // increment array offset LOOP EXPAND // repeat until finished end;
Rewriting in assembler in this case was profitable, resulting in a time savings of 41 percent. It's logical to assume that code in a language that's more suited to bit manipulation— C++, for instance—would have less to gain than Delphi code would. Here are the results:
Language |
High-Level Time |
Assembler Time |
Time Savings |
---|---|---|---|
C++ |
4.25 |
3.02 |
29% |
Delphi |
5.18 |
3.04 |
41% |
The "before" picture in these measurements reflects the two languages' strengths at bit manipulation. The "after" picture looks virtually identical, and it appears that the assembler code has minimized the initial performance differences between Delphi and C++.
The assembler routine shows that rewriting in assembler doesn't have to produce a huge, ugly routine. Such routines are often quite modest, as this one is. Sometimes assembler code is almost as compact as its high-level-language equivalent.
A relatively easy and effective strategy for recoding in assembler is to start with a compiler that generates assembler listings as a byproduct of compilation. Extract the assembler code for the routine you need to tune, and save it in a separate source file. Using the compiler's assembler code as a base, hand-optimize the code, checking for correctness and measuring improvements at each step. Some compilers intersperse the high-level-language statements as comments in the assembler code. If yours does, you might keep them in the assembler code as documentation.
3.139.82.4