Do not use regular expressions to parse XML / HTML data

Using regular expressions to parse XML or HTML text is probably the most frequently committed mistake. Although regular expressions are very useful, they have their limitations and these limits are usually met when trying to use them for XML or HTML parsing. HTML and XML are not regular languages by nature.

Luckily, there are other tools in Java for that purpose. The JDK contains readily available classes to parse these formats and convert them to Document Object Model (DOM), or to work with them on the fly using the SAX parsing model.

Do not use regular expressions for certain tasks when there are more specific parsers for the purpose. The fact that there are other readily available tools gives you a hint that probably regular expressions, in such a case, are not the best tools. After all, that is the reason why the programmers of the XML and HTML parsers started their work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.34.39