Extending the login script to update content

Now that the login automation is working, we can make the script more interesting by extending it to interact with the website and update the country data. The code used in this section is available at https://bitbucket.org/wswp/code/src/tip/chapter06/edit.py. You may have noticed an Edit link at the bottom of each country:

Extending the login script to update content

When logged in, this leads to another page where each property of a country can be edited:

Extending the login script to update content

We will make a script to increase the population of a country by one person each time it is run. The first step is to extract the current values of the country by reusing the parse_form() function:

>>> import login
>>> COUNTRY_URL = 'http://example.webscraping.com/edit/United-Kingdom-239'
>>> opener = login.login_cookies()
>>> country_html = opener.open(COUNTRY_URL).read()
>>> data = parse_form(country_html)
>>> pprint.pprint(data)
{'_formkey': '4cf0294d-ea71-4cd8-ae2a-43d4ca0d46dd',
 '_formname': 'places/5402840151359488',
 'area': '244820.00',
 'capital': 'London',
 'continent': 'EU',
 'country': 'United Kingdom',
 'currency_code': 'GBP',
 'currency_name': 'Pound',
 'id': '5402840151359488',
 'iso': 'GB',
 'languages': 'en-GB,cy-GB,gd',
 'neighbours': 'IE',
 'phone': '44',
 'population': '62348447',
 'postal_code_format': '@# #@@|@## #@@|@@# #@@|@@## #@@|@#@
    #@@|@@#@ #@@|GIR0AA',
 'postal_code_regex': '^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}
    [A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}
        [A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\
            d[A-Z]\d[A-Z]{2})|(GIR0AA))$',
 'tld': '.uk'}

Now we increase the population by one and submit the updated version to the server:

>>> data['population'] = int(data['population']) + 1
>>> encoded_data = urllib.urlencode(data)
>>> request = urllib2.Request(COUNTRY_URL, encoded_data)
>>> response = opener.open(request)

When we return to the country page, we can verify that the population has increased to 62,348,448:

Extending the login script to update content

Feel free to test and modify the other fields too—the database is restored to the original country data each hour to keep the data sane. The code used in this section is available at https://bitbucket.org/wswp/code/src/tip/chapter06/edit.py. Note that the example covered here is not strictly web scraping, but falls under the wider scope of online bots. However, the form techniques used can be applied to interacting with complex forms when scraping.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.172.50