The aim here is to develop a code in Java for implementing the Rabin-Karp algorithm for matching a string from a set of alphabetical characters that have decimal digits.
Perform the following steps:
- Convert the text and pattern strings into digits.
- Use if and for loops to calculate the number of matching characters.
- Put everything together to implement the Rabin-Karp algorithm. The following Snippet 5.9 shows the pre-compute part of the algorithm:
long q = BigInteger.probablePrime(31, new Random()).longValue();
// Precompute d^(m-1) % q for use when removing leading digit
long dm = 1;
for (int i = 1; i <= m - 1; i++)
dm = (d * dm) % q;
// Precompute p and t0
long ph = 0;
long th = 0;
for (int i = 0; i < m; i++) {
ph = (d * ph + P.charAt(i)) % q;
th = (d * th + T.charAt(i)) % q;
}
Snippet 5.9: Implementation of the Rabin-Karp algorithm. Source class name: RabinKarp
In the previous implementation, we chose q as a large prime number (using the BigInteger API). We did that so that we have a good hash function and avoided the most of false positives from the p = ti comparison. This is a similar technique to the one we saw in the remainder method for hash tables in Chapter 3, Hash Tables and Binary Search Trees.
Despite being outside the scope of this book, the Rabin-Karp algorithm generalizes well to having a set of patterns to be found in the same text. For that purpose, it is frequently used in plagiarism detection.