Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Getting links from a URL with urllib2

In this script, we can see how to extract links using urllib2 and HTMLParser. HTMLParser is a module that allows us to parse text files formatted in HTML.

You can get more information at https://docs.python.org/2/library/htmlparser.html.

You can find the following code in the get_links_from_url.py file:

#!/usr/bin/python
import urllib2
from HTMLParser import HTMLParser
class myParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if (tag == "a"):
            for a in attrs:
                if (a[0] == 'href'):
                    link = a[1]
                    if (link.find('http') >= 0):
                        print(link)
                        newParse = myParser()
                        newParse.feed(link)

web =  raw_input("Enter url: ")
url = "http://"+web
request = urllib2.Request(url)
handle = urllib2.urlopen(request)
parser = myParser()
parser.feed(handle.read().decode('utf-8'))

In the following screenshot, we can see the script in execution for the python.org domain:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.226.187.233

Table of Contents for Getting links from a URL with urllib2

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting links from a URL with urllib2