The Soundex Class

Searching is a common need in many applications. An effective search should find matches even if the user misspells words. Folks misspell my name in endless ways: Langer, Lang, Langur, Lange, and Lutefisk, to name a few. I’d prefer they find me regardless.

In this chapter, we will test-drive a Soundex class that can improve the search capability in an application. The long-standing Soundex algorithm encodes words into a letter plus three digits, mapping similarly sounding words to the same encoding. Here are the rules for Soundex, per Wikipedia:[4]

  1. Retain the first letter. Drop all other occurrences of a, e, i, o, u, y, h, w.

  2. Replace consonants with digits (after the first letter):

    • b, f, p, v: 1

    • c, g, j, k, q, s, x, z: 2

    • d, t : 3

    • l: 4

    • m, n: 5

    • r: 6

  3. If two adjacent letters encode to the same number, encode them instead as a single number. Also, do so if two letters with the same number are separated by h or w (but code them twice if separated by a vowel). This rule also applies to the first letter.

  4. Stop when you have a letter and three digits. Zero-pad if needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.163.62