Beautiful Soup not only searches, navigates, and modifies the HTML/XML, but also the output content in a good format. Beautiful Soup can deal with different types of printing such as:
Apart from these, Beautiful Soup provides different formatters to format the output. Since the HTML tree can undergo modification after creation, these output methods will help in viewing the modified HTML tree.
Also in this chapter, we will discuss a simple method of getting only the text stored within a web page.
Beautiful Soup has two supported ways of printing. The first one is formatted printing that prints the current Beautiful Soup object into the formatted Unicode strings. Each tag is printed in a separate line with good indentation and this leads to the right look and feel. Beautiful Soup has the built-in method prettify()
for formatted printing. For example:
html_markup = """<p class="ecopyramid"> <ul id="producers"> <li class="producerlist"> <div class="name">plants</div> <div class="number">100000</div> </li> <li class="producerlist"> <div class="name">algae</div> <div class="number">100000</div> </li> </ul>""" soup = BeautifulSoup(html_markup,"lxml") print(soup.prettify())
The following screenshot shows the output:
In the output, we can see that <html><body>
gets appended. This is because Beautiful Soup uses the lxml
parser and it identifies any string passed by default as HTML and performs the printing after appending the extra tags.
The prettify()
method can be called either on a Beautiful Soup object or any of the tag objects. For example:
producer_entry = soup.ul print(producer_entry.prettify())
3.133.112.90