2. Finding the first non-repeated character

There are different solutions to finding the first non-repeated character in a string. Mainly, the problem can be solved in a single traversal of the string or in more complete/partial traversals.

In the single traversal approach, we populate an array that's meant to store the indexes of all of the characters that appear exactly once in the string. With this array, simply return the smallest index containing a non-repeated character:

private static final int EXTENDED_ASCII_CODES = 256;
...
public char firstNonRepeatedCharacter(String str) {

int[] flags = new int[EXTENDED_ASCII_CODES];

for (int i = 0; i < flags.length; i++) {
flags[i] = -1;
}

for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (flags[ch] == -1) {
flags[ch] = i;
} else {
flags[ch] = -2;
}
}

int position = Integer.MAX_VALUE;

for (int i = 0; i < EXTENDED_ASCII_CODES; i++) {
if (flags[i] >= 0) {
position = Math.min(position, flags[i]);
}
}

return position == Integer.MAX_VALUE ?
Character.MIN_VALUE : str.charAt(position);
}

This solution assumes that every character from the string is part of the extended ASCII table (256 codes). Having codes greater than 256 requires us to increase the size of the array accordingly (http://www.alansofficespace.com/unicode/unicd99.htm). The solution will work as long as the array size is not extended beyond the largest value of the char type, which is Character.MAX_VALUE, that is, 65,535. On the other hand, Character.MAX_CODE_POINT returns the maximum value of a Unicode code point, 1,114,111. To cover this range, we need another implementation based on codePointAt() and codePoints()

Thanks to the single traversal approach, this is pretty fast. Another solution consists of looping the string for each character and counting the number of occurrences. Every second occurrence (duplicate) simply breaks the loop, jumps to the next character, and repeats the algorithm. If the end of the string is reached, then it returns the current character as the first non-repeatable character. This solution is available in the code bundled with this book.

Another solution that's presented here relies on LinkedHashMap. This Java map is an insertion-order map (it maintains the order in which the keys were inserted into the map) and is very convenient for this solution. LinkedHashMap is populated with characters as keys and the number of occurrences as values. Once LinkedHashMap is complete, it will return the first key that has a value equal to 1. Thanks to the insertion-order feature, this is the first non-repeatable character in the string:

public char firstNonRepeatedCharacter(String str) {

Map<Character, Integer> chars = new LinkedHashMap<>();

// or use for(char ch: str.toCharArray()) { ... }
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);

chars.compute(ch, (k, v) -> (v == null) ? 1 : ++v);
}

for (Map.Entry<Character, Integer> entry: chars.entrySet()) {
if (entry.getValue() == 1) {
return entry.getKey();
}
}

return Character.MIN_VALUE;
}

In the code bundled with this book, the aforementioned solution has been written in Java 8 functional style. Moreover, the functional style solution for supporting ASCII, 16-bit Unicode, and Unicode surrogate pairs is as follows:

public static String firstNonRepeatedCharacter(String str) {

Map<Integer, Long> chs = str.codePoints()
.mapToObj(cp -> cp)
.collect(Collectors.groupingBy(Function.identity(),
LinkedHashMap::new, Collectors.counting()));

int cp = chs.entrySet().stream()
.filter(e -> e.getValue() == 1L)
.findFirst()
.map(Map.Entry::getKey)
.orElse(Integer.valueOf(Character.MIN_VALUE));

return String.valueOf(Character.toChars(cp));
}

To understand this code in more detail, please consider the What about Unicode characters? subsection of the Counting duplicate characters section. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.86.18