lxml is a XML toolkit, with a rich library set to process XML and HTML. lxml is preferred over other XML-based libraries in Python for its high speed and effective memory management. It also contains various other features to handle both small or large XML files. Python programmers use lxml to process XML and HTML documents. For more detailed information on lxml and its library support, please visit https://lxml.de/.
lxml provides native support to XPath and XSLT and is built on powerful C libraries: libxml2 and libxslt. Its library set is used normally with XML or HTML to access XPath, parsing, validating, serializing, transforming, and extending features from ElementTree (http://effbot.org/zone/element-index.htm#documentation). Parsing, traversing ElementTree, XPath, and CSS selector-like features from lxml makes it handy enough for a task such as web scraping. lxml is also used as a parser engine in Python Beautiful Soup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and pandas (https://pandas.pydata.org/).
XSLT is a language to transform an XML document into HTML, XHML, text, and so on. XSLT uses XPath to navigate in XML documents. XSLT is a template type of structure that is used to transform XML document into new documents.
The lxml library contains important modules, as listed here:
- lxml.etree (https://lxml.de/api/lxml.etree-module.html): Parsing and implementing ElementTree; supports XPath, iterations, and more
- lxml.html (https://lxml.de/api/lxml.html-module.html): Parses HTML, supports XPath, CSSSelect, HTML form, and form submission
- lxml.cssselect (https://lxml.de/api/lxml.cssselect-module.html): Converts CSS selectors into XPath expressions; accepts a CSS selector or CSS Query as an expression