Now that the login automation is working, we can make the script more interesting by extending it to interact with the website and update the country data. The code used in this section is available at https://bitbucket.org/wswp/code/src/tip/chapter06/edit.py. You may have noticed an Edit link at the bottom of each country:
When logged in, this leads to another page where each property of a country can be edited:
We will make a script to increase the population of a country by one person each time it is run. The first step is to extract the current values of the country by reusing the parse_form()
function:
>>> import login >>> COUNTRY_URL = 'http://example.webscraping.com/edit/United-Kingdom-239' >>> opener = login.login_cookies() >>> country_html = opener.open(COUNTRY_URL).read() >>> data = parse_form(country_html) >>> pprint.pprint(data) {'_formkey': '4cf0294d-ea71-4cd8-ae2a-43d4ca0d46dd', '_formname': 'places/5402840151359488', 'area': '244820.00', 'capital': 'London', 'continent': 'EU', 'country': 'United Kingdom', 'currency_code': 'GBP', 'currency_name': 'Pound', 'id': '5402840151359488', 'iso': 'GB', 'languages': 'en-GB,cy-GB,gd', 'neighbours': 'IE', 'phone': '44', 'population': '62348447', 'postal_code_format': '@# #@@|@## #@@|@@# #@@|@@## #@@|@#@ #@@|@@#@ #@@|GIR0AA', 'postal_code_regex': '^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3} [A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3} [A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\ d[A-Z]\d[A-Z]{2})|(GIR0AA))$', 'tld': '.uk'}
Now we increase the population by one and submit the updated version to the server:
>>> data['population'] = int(data['population']) + 1 >>> encoded_data = urllib.urlencode(data) >>> request = urllib2.Request(COUNTRY_URL, encoded_data) >>> response = opener.open(request)
When we return to the country page, we can verify that the population has increased to 62,348,448:
Feel free to test and modify the other fields too—the database is restored to the original country data each hour to keep the data sane. The code used in this section is available at https://bitbucket.org/wswp/code/src/tip/chapter06/edit.py. Note that the example covered here is not strictly web scraping, but falls under the wider scope of online bots. However, the form techniques used can be applied to interacting with complex forms when scraping.
3.15.237.31