Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Errata Piracy Questions Getting Started with Regular Expressions Introduction to regular expressions A bit of history of regular expressions Various flavors of regular expressions What type of problems need regular expressions to solve The basic rules of regular expressions Constructs of the standard regular expression and meta characters Some basic regular expression examples Eager matching The effect of eager matching on regular expression alternation Summary Understanding the Core Constructs of Java Regular Expressions Understanding the core constructs of regular expressions Quantifiers Basic quantifiers Examples using quantifiers Greedy versus reluctant (lazy) matching using quantifiers Possessive quantifiers Boundary constructs Examples using boundary constructs Character classes Examples of character classes Range inside a character class Examples of character range Escaping special regex metacharacters and escaping rules inside the character classes Escaping inside a character class Examples of escaping rules inside the character class Literally matching a string that may contain special regex metacharacters Negated character classes Examples of negated character classes Predefined shorthand character classes POSIX character classes Unicode support in Java regular expressions Commonly used Unicode character properties Negation of the preceding regex directives Unicode scripts support Examples of matching Unicode text in regular expressions Double escaping in a Java String when defining regular expressions Embedded regular expression mode modifiers The placement of embedded modes in a Java regular expression Disabling mode modifiers Summary Working with Groups, Capturing, and References Capturing groups Group numbering Named groups Non-capturing groups Advantages of non-capturing groups Back references Back reference of a named group Replacement reference of a named group Forward references Invalid (non-existing) backward or forward references Summary Regular Expression Programming Using Java String and Scanner APIs Introduction to the Java String API for regular expressions' evaluation Method - boolean matches(String regex) Example of the matches method Method - String replaceAll(String regex, String replacement) Examples of the replaceAll method Method - String replaceFirst(String regex, String replacement) Examples of the replaceFirst method Methods - String split methods The limit parameter rules Examples of the split method Example of the split method using the limit parameter Using regular expressions in Java Scanner API Summary Introduction to Java Regular Expression APIs - Pattern and Matcher Classes The MatchResult interface The Pattern class Examples using the Pattern class Filtering a list of tokens using the asPredicate() method The Matcher class Examples using the Matcher class Method Boolean lookingAt() The matches() method The find() and find(int start) methods The appendReplacement(StringBuffer sb, String replacement) method The appendTail(StringBuffer sb) method Example of the appendReplacement and appendTail methods Summary Exploring Zero-Width Assertions, Lookarounds, and Atomic Groups Zero-width assertions Predefined zero-width assertions Regex defined zero-width assertions G boundary assertion Atomic groups Lookahead assertions Positive lookahead Negative lookahead Lookbehind assertions Positive lookbehind Negative lookbehind Capturing text from overlapping matches Be careful with capturing groups inside a lookahead or lookbehind atomic group Lookbehind limitations in Java regular expressions Summary Understanding the Union, Intersection, and Subtraction of Character Classes The union of character classes The intersection of character classes The subtraction of character classes Why should you use composite character classes? Summary Regular Expression Pitfalls, Optimization, and Performance Improvements Common pitfalls and ways to avoid them while writing regular expressions Do not forget to escape regex metacharacters outside a character class Avoid escaping every non-word character Avoid unnecessary capturing groups to reduce memory consumption However, don't forget to use the required group around alternation Use predefined character classes instead of longer versions Use the limiting quantifier instead of repeating a character or pattern multiple times Do not use an unescaped hyphen in the middle of a character class The mistake of calling matcher.goup() without a prior call to matcher.find(), matcher.matches(), or matcher.lookingAt() Do not use regular expressions to parse XML / HTML data How to test and benchmark your regular expression performance Catastrophic or exponential backtracking How to avoid catastrophic backtracking Optimization and performance enhancement tips Use a compiled form of regular expressions Use a negated character class instead of the greedy and slow .* or .+ Avoid unnecessary grouping Use lazy quantifiers strategically instead of greedy quantifiers that cause excessive backtracking Make use of possessive quantifiers to avoid backtracking Extract common repeating substrings out of alternation Use atomic group to avoid backtracking and fail fast Summary