Naive Search Algorithm

The string matching problem has two inputs, as follows:

  • An array T[1, 2, ...n] of length n
  • An array P[1, 2, ...m] of length m (<= n)

The elements of T and P are characters from the same finite alphabet (usually called ∑).

For instance, we may be searching in binary strings, in which case our alphabet is {0, 1}, or we may be searching in strings of lowercase letters, in which case our alphabet is {a, b… z}.

The following diagram represents this terminology:

Figure 5.1: Representation of text array T, pattern array P, and finite alphabet ∑

The character arrays P and T are usually called "strings of characters". We're interested in finding occurrences of pattern P in text T.

We say that pattern P occurs in text T if we can align the pattern P with text T so that all characters in P match the ones in T. When aligning, we need to shift pattern P zero or more times to the right.

Therefore, in the string matching problem, we're interested in valid shifts with which pattern P occurs in text T. We say that the pattern P occurs with a shift s in text T if the pattern P occurs beginning at position s + 1 in text T. In other words, we need to shift P from the start of text T s times to the right, in order to find a match. In its essence, the string matching problem aims to find all valid shifts with which pattern P occurs in a given text T.

Two common examples, besides text-editing programs, are finding patterns in DNA sequences and finding web pages that are relevant to queries in internet search engines.

Now that we've formalized the string matching problem, let's look at the naive algorithm to solve it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.18.198