There's more...

Regexes can be used as well as input in the .find() and .find_all() methods. For example, this search uses the h2 and h3 tags:

>>> page.find_all(re.compile('^h(2|3)'))
[<h2>Sample Web Page</h2>, <h3><a name="contents">CONTENTS</a></h3>, <h3><a name="basics">1. Creating a Web Page</a></h3>, <h3><a name="syntax">2. HTML Syntax</a></h3>, <h3><a name="chars">3. Special Characters</a></h3>, <h3><a name="convert">4. Converting Plain Text to HTML</a></h3>, <h3><a name="effects">5. Effects</a></h3>, <h3><a name="lists">6. Lists</a></h3>, <h3><a name="links">7. Links</a></h3>,
<h3><a name="tables">8. Tables</a></h3>, <h3><a name="install">9. Installing Your Web Page on the Internet</a></h3>, <h3><a name="more">10. Where to go from here</a></h3>]

Another useful find parameter is including the CSS class with the class_ parameter. This will be shown later in the book.

The full Beautiful Soup documentation can be found here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.17.244