How to do it...

Import BeautifulSoup and requests:

>>> import requests
>>> from bs4 import BeautifulSoup

Set up the URL of the page to download and retrieve it:

>>> URL = 'http://www.columbia.edu/~fdc/sample.html'
>>> response = requests.get(URL)
>>> response
<Response [200]>

Parse the downloaded page:

>>> page = BeautifulSoup(response.text, 'html.parser')

Obtain the title of the page. See that it is the same as what's displayed in the browser:

>>> page.title
<title>Sample Web Page</title>
>>> page.title.string
'Sample Web Page'

Find all the h3 elements in the page, to determine the existing sections:

>>> page.find_all('h3')
[<h3><a name="contents">CONTENTS</a></h3>, <h3><a name="basics">1. Creating a Web Page</a></h3>, <h3><a name="syntax">2. HTML Syntax</a></h3>, <h3><a name="chars">3. Special Characters</a></h3>, <h3><a name="convert">4. Converting Plain Text to HTML</a></h3>, <h3><a name="effects">5. Effects</a></h3>, <h3><a name="lists">6. Lists</a></h3>, <h3><a name="links">7. Links</a></h3>, <h3><a name="tables">8. Tables</a></h3>, <h3><a name="install">9. Installing Your Web Page on the Internet</a></h3>, <h3><a name="more">10. Where to go from here</a></h3>]

Extract the text on the section links. Stop when you reach the next <h3> tag:

>>> link_section = page.find('a', attrs={'name': 'links'})
>>> section = []
>>> for element in link_section.next_elements:
...     if element.name == 'h3':
...         break
...     section.append(element.string or '')
...
>>> result = ''.join(section)
>>> result
'7. Links

Links can be internal within a Web page (like to
the Table of ContentsTable of Contents at the top), or they
can be to external web pages or pictures on the same website, or they
can be to websites, pages, or pictures anywhere else in the world.



Here is a link to the Kermit
Project home pageKermit
Project home page.



Here is a link to Section 5Section 5 of this document.



Here is a link to
Section 4.0Section 4.0
of the C-Kermit
for Unix Installation InstructionsC-Kermit
for Unix Installation Instructions.



Here is a link to a picture:
CLICK HERECLICK HERE to see it.


'

Notice that there are no HTML tags; it's all raw text.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...