Example 1 – reading XML from file and traversing through its elements

In this example, we will be reading the XML content available from the food.xml file. We will use XML content:

from lxml import etree
xml = open("food.xml","rb").read() #open and read XML file

The XML response obtained from the preceding code needs to be parsed and traversed using lxml.etree.XML(). The XML() function parses the XML document and returns the menus root node, in this case. Please refer to https://lxml.de/api/lxml.etree-module.html for more detailed information on lxml.etree:

tree = etree.XML(xml) 
#tree = etree.fromstring(xml)
#tree = etree.parse(xml)

The functions fromstring() and parse() functions, found in the preceding code, also provide content to a default or chosen parser used by lxml.etree.

A number of parsers are provided by lxml (XMLParser and HTMLParser) and the default one used in code can be found using >>> etree.get_default_parser(). In the preceding case, it results in <lxml.etree.XMLParser>.

Let's verify tree received after parsing:

print(tree)  
print(type(tree))

<Element menus at 0x3aa1548>
<class 'lxml.etree._Element'>

The preceding two statements confirm that tree is an XML root element of the lxml.etree._Element type. For traversing through all elements inside a tree, tree iteration can be used, which results in elements in their found order.

Tree iteration is performed using the iter() function. The elements' tag name can be accessed using the element property, tag; similarly, elements' text can be accessed by the text property, as shown in the following:

for element in tree.iter():
print("%s - %s" % (element.tag, element.text))

The preceding tree iteration will result in the following output:

menus - 
food -

name - Butter Milk with Vanilla
price - $3.99
description - Rich tangy buttermilk with vanilla essence
rating - 5.0
feedback - 6
.............
food -

name - Orange Juice
price - $2.99
description - Fresh Orange juice served
rating - 4.9
feedback - 10

We, too, can pass child elements as an argument to the tree iterator (price and name) to obtain selected element-based responses. After passing the child element to tree.iter(), we can obtain Tag and Text or Content child elements using element.tag and element.text, respectively, as shown in the following code:

#iter through selected elements found in Tree
for element in tree.iter('price','name'):
print("%s - %s" % (element.tag, element.text))

name - Butter Milk with Vanilla
price - $3.99
name - Fish and Chips
price - $4.99
...........
name - Eggs and Bacon
price - $5.50
name - Orange Juice
price - $2.99

Also to be noted is that the food.xml file has been opened in rb mode and not in r mode. While dealing with local file-based content and files having encoding declarations, such as <?xml version="1.0" encoding="UTF-8"?>, there's a possibility of encountering the error as ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. Encoding/decoding the content might solve the issue mentioned, which is also based on the file mode.

To deal with the preceding condition or reading the content from file, HTTP URL, or FTP, parse() is a really effective approach. It uses the default parser unless specified; one is supplied to it as an extra argument. The following code demonstrates the use of the parse() function, which is being iterated for the element name to obtain its text:

from lxml import etree

#read and parse the file
tree = etree.parse("food.xml")

#iterate through 'name' and print text content
for element in tree.iter('name'):
print(element.text)

The preceding code results in the following output: Butter Milk with Vanilla, Fish and Chips, and so on, which are obtained from the name element and from the food.xml file:

Butter Milk with Vanilla
Fish and Chips
Egg Roll
Pineapple Cake
Eggs and Bacon
Orange Juice

A multiple-tree element can also be iterated, as seen here:

for element in tree.iter('name','rating','feedback'):
print("{} - {}".format(element.tag, element.text))

name - Butter Milk with Vanilla
rating - 5.0
feedback - 6
name - Fish and Chips
rating - 5.0
...........
feedback - 4
name - Orange Juice
rating - 4.9
feedback - 10

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.247.5