Recoding in a Low-Level Language

One long-standing piece of conventional wisdom that shouldn't be left unmentioned is the advice that when you run into a performance bottleneck, you should recode in a low-level language. If you're coding in C++, the low-level language might be assembler. If you're coding in Python, the low-level language might be C. Recoding in a lowlevel language tends to improve both speed and code size. Here is a typical approach to optimizing with a low-level language:

  1. Write 100 percent of an application in a high-level language.

  2. Fully test the application, and verify that it's correct.

  3. If performance improvements are needed after that, profile the application to identify hot spots. Since about 5 percent of a program usually accounts for about 50 percent of the running time, you can usually identify small pieces of the program as hot spots.

    Cross-Reference

    For details on the phenomenon of a small percentage of a program accounting for most of its run time, see "The Pareto Principle" in Introduction to Code Tuning.

  4. Recode a few small pieces in a low-level language to improve overall performance.

Whether you follow this well-beaten path depends on how comfortable you are with low-level languages, how well-suited the problem is to low-level languages, and on your level of desperation. I got my first exposure to this technique on the Data Encryption Standard program I mentioned in the previous chapter. I had tried every optimization I'd ever heard of, and the program was still twice as slow as the speed goal. Recoding part of the program in assembler was the only remaining option. As an assembler novice, about all I could do was make a straight translation from a high-level language to assembler, but I got a 50 percent improvement even at that rudimentary level.

Suppose you have a routine that converts binary data to uppercase ASCII characters. The next example shows the Delphi code to do it:

Example 26-41. Delphi Example of Code That's Better Suited to Assembler

procedure HexExpand(
   var source: ByteArray;
   var target: WordArray;
   byteCount: word
);
var
   index: integer;
   targetIndex: integer;
begin
   targetIndex := 1;
   for index := 1 to byteCount do begin
      target[ targetIndex ] := ( (source[ index ] and $F0) shr 4 ) + $41;
      target[ targetIndex+1 ] := (source[ index ] and $0f) + $41;
      targetIndex := targetIndex + 2;
   end;
end;

Although it's hard to see where the fat is in this code, it contains a lot of bit manipulation, which isn't exactly Delphi's forte. Bit manipulation is assembler's forte, however, so this code is a good candidate for recoding. Here's the assembler code:

Example 26-42. Example of a Routine Recoded in Assembler

procedure HexExpand(
   var source;
   var target;
   byteCount : Integer
);
   label
   EXPAND;

   asm
         MOV   ECX,byteCount      // load number of bytes to expand
         MOV   ESI,source         // source offset
         MOV   EDI,target         // target offset
         XOR   EAX,EAX            // zero out array offset

   EXPAND:
         MOV   EBX,EAX            // array offset
         MOV   DL,[ESI+EBX]       // get source byte
         MOV   DH,DL              // copy source byte

         AND   DH,$F              // get msbs
         ADD   DH,$41             // add 65 to make upper case

         SHR   DL,4               // move lsbs into position
         AND   DL,$F              // get lsbs
         ADD   DL,$41             // add 65 to make upper case
         SHL   BX,1               // double offset for target array offset
         MOV   [EDI+EBX],DX       // put target word

         INC   EAX                // increment array offset
         LOOP  EXPAND             // repeat until finished
   end;

Rewriting in assembler in this case was profitable, resulting in a time savings of 41 percent. It's logical to assume that code in a language that's more suited to bit manipulation— C++, for instance—would have less to gain than Delphi code would. Here are the results:

Language

High-Level Time

Assembler Time

Time Savings

C++

4.25

3.02

29%

Delphi

5.18

3.04

41%

The "before" picture in these measurements reflects the two languages' strengths at bit manipulation. The "after" picture looks virtually identical, and it appears that the assembler code has minimized the initial performance differences between Delphi and C++.

The assembler routine shows that rewriting in assembler doesn't have to produce a huge, ugly routine. Such routines are often quite modest, as this one is. Sometimes assembler code is almost as compact as its high-level-language equivalent.

A relatively easy and effective strategy for recoding in assembler is to start with a compiler that generates assembler listings as a byproduct of compilation. Extract the assembler code for the routine you need to tune, and save it in a separate source file. Using the compiler's assembler code as a base, hand-optimize the code, checking for correctness and measuring improvements at each step. Some compilers intersperse the high-level-language statements as comments in the assembler code. If yours does, you might keep them in the assembler code as documentation.

cc2e.com/2672

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.4