2.14. Calculating String Difference

Problem

Your application needs to compare two strings and print out the difference.

Solution

Use StringUtils.difference( ), StringUtils.indexOfDifference( ), and StringUtils.getLevenshteinDistance( ). StringUtils.difference( ) prints out the difference between two strings, StringUtils.indexOfDifference( ) returns the index at which two strings begin to differ, and StringUtils.getLevenshteinDistance( ) returns the “edit distance” between two strings. The following example demonstrates all three of these methods:

int dist = StringUtils.getLevenshteinDistance( "Word", "World" );
String diff = StringUtils.difference( "Word", "World" );
int index = StringUtils.indexOfDifference( "Word", "World" );

System.out.println( "Edit Distance: " + dist );
System.out.println( "Difference: " + diff );
System.out.println( "Diff Index: " + index );

This code compares the strings “Word” and “World,” producing the following output:

Edit Distance: 2
Difference: ld
Diff Index: 3

Discussion

StringUtils.difference() returns the difference between two strings, returning the portion of the second string, which starts to differ from the first. StringUtils.indexOfDifference() returns the index at which the second string starts to diverge from the first. The difference between “ABC” and “ABE” is “E,” and the index of the difference is 2. Here’s a more complex example:

String a = "Strategy";
String b = "Strategic";

String difference = StringUtils.difference( a, b );
int differenceIndex = StringUtils.indexOfDifference( a, b );

System.out.println( "difference(Strategy, Strategic) = " +
                    difference );
System.out.println( "index(Strategy, Strategic) = " +
                    differenceIndex );

a = "The Secretary of the UN is Kofi Annan."
b = "The Secretary of State is Colin Powell."

difference = StringUtils.difference( a, b );
differenceIndex = StringUtils.indexOfDifference( a, b );

System.out.println( "difference(..., ...) = " +
                    difference );
System.out.println( "index(..., ...) = " +
                    differenceIndex );

This produces the following output, showing the differences between two strings:

difference(Strategy, Strategic) = ic
index(Strategy, Strategic) = 7
difference(...,...) = State is Colin Powell.
index(...,...) = 17

The Levenshtein distance is calculated as the number of insertions, deletions, and replacements it takes to get from one string to another. The distance between “Boat” and “Coat” is a one letter replacement, and the distance between “Remember” and “Alamo” is 8—five letter replacements and three deletions. Levenshtein distance is also known as the edit distance, which is the number of changes one needs to make to a string to get from string A to string B. The following example demonstrates the getLevenshteinDistance( ) method:

int distance1 = 
    StringUtils.getLevenshteinDistance( "Boat", "Coat" );
int distance2 = 
    StringUtils.getLevenshteinDistance( "Remember", "Alamo" );
int distance3 = 
    StringUtils.getLevenshteinDistance( "Steve", "Stereo" );

System.out.println( "distance(Boat, Coat): " + distance1 );
System.out.println( "distance(Remember, Alamo): " + distance2 );
System.out.println( "distance(Steve, Stereo): " + distance3 );

This produces the following output, showing the Levenshtein (or edit) distance between various strings:

distance(Boat, Coat): 1
distance(Remember, Alamo): 8
distance(Steve, Stereo): 3

See Also

The Levenshtein distance has a number of different applications, including pattern recognition and correcting spelling mistakes. For more information about the Levenshtein distance, see http://www.merriampark.com/ld.htm, which explains the algorithm and provides links to implementations of this algorithm in 15 different languages.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.55.18