Rationalization of the Naive Search Algorithm

The naive search algorithm takes O((n - m + 1)m) time, which is a tight bound on the worst case. We can imagine a worst case of the naive search algorithm if we have a text string with the character a repeating for n times, that is, an (such as a5 = "aaaaa"), and the pattern am (for m <= n). In this case, we have to execute the inner loop m times to validate the shift.

The naive search algorithm can be improved if we know that all characters in pattern P are different. In this case, whenever we fail validating a shift because P[j] doesn't match T[i + j], we don't need to backtrack. Instead, we can start validating the next shift on (i + j), therefore reducing the running time of the algorithm to O(n).

For example, if P = "abcd" and T = "abcaabcd", when i = 0 and j = 3, we find a mismatch ('a' != 'd'). Instead of repeating the comparisons for i = 1, we can start on i = 3, because we're sure there's no other a between i = 0 and i = 3 (remember that all characters of P are different). These kinds of observations on the pattern P are the basis of the Boyer-Moore algorithm.

In this first section, we introduced the string matching problem and solved it using a naive algorithm. In the following section, we'll introduce a much more efficient algorithm to solve this problem—the Boyer-Moore algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.15