Chapter 7. Output in Beautiful Soup

Beautiful Soup not only searches, navigates, and modifies the HTML/XML, but also the output content in a good format. Beautiful Soup can deal with different types of printing such as:

  • Formatted printing
  • Unformatted printing

Apart from these, Beautiful Soup provides different formatters to format the output. Since the HTML tree can undergo modification after creation, these output methods will help in viewing the modified HTML tree.

Also in this chapter, we will discuss a simple method of getting only the text stored within a web page.

Formatted printing

Beautiful Soup has two supported ways of printing. The first one is formatted printing that prints the current Beautiful Soup object into the formatted Unicode strings. Each tag is printed in a separate line with good indentation and this leads to the right look and feel. Beautiful Soup has the built-in method prettify() for formatted printing. For example:

html_markup = """<p class="ecopyramid">
<ul id="producers">
  <li class="producerlist">
    <div class="name">plants</div>
    <div class="number">100000</div>
  </li>
  <li class="producerlist">
    <div class="name">algae</div>
    <div class="number">100000</div>
  </li>
</ul>"""
soup = BeautifulSoup(html_markup,"lxml")
print(soup.prettify())

The following screenshot shows the output:

Formatted printing

In the output, we can see that <html><body> gets appended. This is because Beautiful Soup uses the lxml parser and it identifies any string passed by default as HTML and performs the printing after appending the extra tags.

The prettify() method can be called either on a Beautiful Soup object or any of the tag objects. For example:

producer_entry = soup.ul
print(producer_entry.prettify())
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.112.90