Getting links from a URL with urllib2

In this script, we can see how to extract links using urllib2 and HTMLParser. HTMLParser is a module that allows us to parse text files formatted in HTML.

You can get more information at https://docs.python.org/2/library/htmlparser.html.

You can find the following code in the get_links_from_url.py file:

#!/usr/bin/python
import urllib2
from HTMLParser import HTMLParser
class myParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if (tag == "a"):
for a in attrs:
if (a[0] == 'href'):
link = a[1]
if (link.find('http') >= 0):
print(link)
newParse = myParser()
newParse.feed(link)

web = raw_input("Enter url: ")
url = "http://"+web
request = urllib2.Request(url)
handle = urllib2.urlopen(request)
parser = myParser()
parser.feed(handle.read().decode('utf-8'))

In the following screenshot, we can see the script in execution for the python.org domain:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.233