SubstringFinder

SubstringFinder uses the parallel_for template in a substring matching program. For each position in a string, the program finds the length and location of the largest matching substring elsewhere in the string. For instance, take the string OregonOrmereg. Starting a scan at the first character (position 0), the largest substring with a match elsewhere in the string is Or at position 6. Starting at position 1, the largest match is reg at position 10. The position of such matches, and the length of the match, is recorded for every position in the string being searched.

Example 11-9 shows a serial version (lines 12–31) and a parallel version (lines 33–55). Note how lines 15–31 and 39–55 are nearly identical. The only difference is that the serial version does all the work directly on the input array, whereas the parallel version works on a range passed to it in the blocked_range r. For the sake of simplicity, both versions declare an array of static size to hold the output.

Example 11-9. Serial and parallel SubstringFinder

 1 #include <iostream>
 2 #include <string>
 3 #include <algorithm>
 4 #include "tbb/task_scheduler_init.h"
 5 #include "tbb/parallel_for.h"
 6 #include "tbb/blocked_range.h"
 7 #include "tbb/tick_count.h"
 8 using namespace tbb;
 9 using namespace std;
10 static const size_t N = 22;
11
12 void SerialSubStringFinder ( const string &str,
13                              size_t *max_array,
14                              size_t *pos_array) {
15  for ( size_t i = 0; i < str.size(); ++i ) {
16    size_t max_size = 0, max_pos = 0;
17    for (size_t j = 0; j < str.size(); ++j)
18     if (j != i) {
19      size_t limit = str.size()-( i > j ? i : j );
20      for (size_t k = 0; k < limit; ++k) {
21       if (str[i + k] != str[j + k]) break;
22       if (k > max_size) {
23        max_size = k;
24        max_pos = j;
25       }
26      }
27     }
28    max_array[i] = max_size;
29    pos_array[i] = max_pos;
30   }
31 }
32
33 classSubStringFinder {
34  const string str;
35  size_t *max_array;
36  size_t *pos_array;
37  public:
38  void operator( ) ( const blocked_range<size_t>& r ) const {
39   for ( size_t i = r.begin(); i != r.end( ); ++i ) {
40    size_t max_size = 0, max_pos = 0;
41    for (size_t j = 0; j < str.size( ); ++j)
42     if (j != i) {
43      size_t limit = str.size( )-( i > j ? i : j );
44      for (size_t k = 0; k < limit; ++k) {
45       if (str[i + k] != str[j + k]) break;
46       if (k > max_size) {
47        max_size = k;
48        max_pos = j;
49       }
50      }
51     }
52    max_array[i] = max_size;
53    pos_array[i] = max_pos;
54   }
55  }
56
57  SubStringFinder(string &s, size_t *m, size_t *p) :
58   str(s), max_array(m), pos_array(p) { }
59 };

The main program (Example 11-11) sets up a string (with a Fibonacci string of “a” and “b” characters that is 17,711 characters long: "babbababbabbababbababbabbababbabb…”).

The result of the SubstringFinder is an array of positions and an array of lengths of the matches found at the corresponding position. Two pairs of arrays are prepared: one filled in by the serial version and the other by the parallel version. The results are checked to be sure they are identical.

The parallel_for is used with a blocked_range with a grainsize of 100. The initial range is set to [0, 17711).

When this example was run on a dual-core machine, the task scheduler made 254 calls to operator(). At first, this may seem excessive. You might have assumed it would split the work into [0, 8855) and [8856, 17711) and let the two processor cores do equal work. But there are two reasons not to do this. The first is that operator() is assumed not to take the same amount of processing on every invocation. Because short substrings take less time to process than long ones, and other variations exist, the time will vary.

The second reason is that each thread is subject to differing demands on the processor core on which it is run. It is possible that one processor’s core may be interrupted more often than the other. It is also possible in the future that processor cores will not all be the same.

For these reasons, in this simple example, the run split up the work into 254 tasks. Some of the ranges are shown in Example 11-10. You might also note that the grain size of 100 means that ranges can become as small as 50. (In this example, 69 was a favorite because 17711/2/2/2/2/2/2/2/2 = 69.1836). The split stopped at 69 because ranges larger than the grain size, which was set to 100, were considered splittable. You can also see how the ranges are kept far apart until the end in order to avoid cache contention. Interestingly enough, starting one thread in the middle and moving away from the other thread might perform slightly better by avoiding the slight potential for conflict at the end. There are always things to think about for future versions of the task scheduler.

Tip

Aha! grainsize is the largest size not to split. Ranges half the size of grainsize are possible, or smaller depending on the logic of the splitter. All that is certain is that the splitter is supposed to reduce the range passed to it, so both subranges created will be smaller.

Example 11-10. Ranges passed to operator() on dual-core machine

  1 [0,69)
  2 [8924,8993)
  3 [69,138)
  4 [8993,9062)
  5 [138,207)
  6 [9062,9131)
  7 [207,276)
  8 [9131,9200)
  9 [276,345)
 10 [9200,9269)
 11 [345,414)
 12 [9269,9338)
 13 [9338,9408)
 14 [414,483)
 15 [9408,9477)
 16 [483,553)
 17 [553,622)
 18 [9477,9546)
 19 [622,691)
 20 [9546,9615)
 21 [691,760)
 22 [9615,9685)
 23 [760,829)
 24 [9685,9754)
 25 [829,898)
 26 [9754,9823)
 27 [898,967)
 28 [9823,9892)
 29 [967,1036)
 30 [9892,9962)
 31 [1036,1106)
 32 [9962,10031)
 33 [1106,1175)
 34 [10031,10100)
 35 [10100,10169)
 36 [1175,1244)
 37 [10169,10238)
 38 [1244,1313)
 39 [10238,10307)
 40 [1313,1382)
 41 [10307,10376)
...
244 [8231,8301)
245 [17572,17641)
246 [8301,8370)
247 [17641,17711)
248 [8578,8647)
249 [8370,8439)
250 [8647,8716)
251 [8439,8508)
252 [8716,8785)
253 [8508,8578)
254 [8785,8855)

Example 11-11 runs the two substring finders, and the results are shown in Example 11-12, SubstringFinder. The speedup of 1.9 is near the ideal 2X (there were two processors used). The timer functions from Threading Building Blocks (described in Chapter 7) are used in this example to determine the times.

Example 11-11. Driver (main) for SubstringFinder

 1 int main(size_t argc, char *argv[]) {
 2   task_scheduler_init init;
 3
 4   string str[N] = { string("a"), string("b") };
 5   for (size_t i = 2; i < N; ++i) str[i] = str[i-1]+str[i-2];
 6   string &to_scan = str[N-1];
 7
 8   size_t *max = new size_t[to_scan.size( )];
 9   size_t *max2 = new size_t[to_scan.size( )];
10   size_t *pos = new size_t[to_scan.size( )];
11   size_t *pos2 = new size_t[to_scan.size( )];
12   cout << " Done building string." << endl;
13
14   tick_count serial_t0 = tick_count::now( );
15   SerialSubStringFinder(to_scan, max2, pos2);
16   tick_count serial_t1 = tick_count::now( );
17   cout << " Done with serial version." << endl;
18
19   tick_count parallel_t0 = tick_count::now( );
20   parallel_for(blocked_range<size_t>(0, to_scan.size( ), 100),
21                SubStringFinder( to_scan, max, pos ) );
22   tick_count parallel_t1 = tick_count::now( );
23   cout << " Done with parallel version." << endl;
24
25   for (size_t i = 0; i < to_scan.size( ); ++i) {
26     if (max[i] != max2[i] || pos[i] != pos2[i]) {
27       cout << "ERROR: Serial and Parallel Results are Different!" << endl;
28     }
29   }
30   cout << " Done validating results." << endl;
31
32   cout << "Serial version ran in " <<
33          (serial_t1 - serial_t0).seconds( ) << " seconds" << endl
34        << "Parallel version ran in " <<
35          (parallel_t1 - parallel_t0).seconds( ) << " seconds" << endl
36        << "Resulting in a speedup of " <<
37          (serial_t1 - serial_t0).seconds( ) /
38          (parallel_t1 - parallel_t0).seconds( ) << endl;
39   return 0;
40 }

Example 11-12. SubstringFinder run on a dual-core machine

Done building string.
 Done with serial version.
 Done with parallel version.
 Done validating results.
Serial version ran in 19.476 seconds
Parallel version ran in 10.1276 seconds
Resulting in a speedup of 1.92305
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.240.21