© Paul Gerrard 2016
Paul GerrardLean Python10.1007/978-1-4842-2385-7_9

9. Accessing the Web

Paul Gerrard
(1)
Maidenhead, Berkshire, UK
 
Python has standard1 libraries that enable programmers to write both clients and servers that both use and implement Internet services such as electronic mail, File Transfer Protocol (FTP), and, of course, web sites.
In this chapter we look at how it is possible to use Python to access web sites and services. Suppose you wanted to download a page from a web site and save the HTML that was retrieved. The user needs to enter a URL for the site. Perhaps you want to be able to add a query string to the URL to pass data to the request, and you want to then display the response or save it to disk.
You would design your program to work in stages, of course:
  1. 1.
    Ask the user for a URL.
     
  2. 2.
    Ask for the query string to append to the URL.
     
  3. 3.
    Ask whether to save to disk.
     
The listing of program webtest.py is shown here.
1   import requests
2   from urllib.parse import urlparse
3   
4   url=input('Web url to fetch:')
5   urlparts=urlparse(url)
6   if urlparts[0]=='':
7       url=''.join(('http://',url))
8   
9   qstring=input('Enter query string:')
10  if len(qstring)>0:
11      url='?'.join((url,qstring))
12  
13  save=input('Save downloaded page to disk [y/n]?')
14      
15  print('Requesting',url)
16  
17  try:
18      response = requests.get(url)
19      if save.lower()=='y':
20          geturl=response.url
21          urlparts=urlparse(geturl)
22          netloc=urlparts[1]
23          if len(netloc)==0:
24              fname='save.html'
25          else:
26              fname='.'.join((netloc,'html'))
27          print('saving to',fname,'...')
28          fp=open(fname,'w')
29          fp.write(response.text)
30          fp.close()
31      else:
32          print(response.text)
33  except Exception as e:
34      print(e.__class__.__name__,e)
Let’s walk through this program.2
  • Lines 1 and 2 import required modules (requests and urlparse ).
  • Lines 4 through 7 get a URL from the user. If the user doesn’t include the http:// part of the URL, the program adds the prefix.
  • Lines 10 through 12 ask the user for a query string and append it to the URL with a ? character.
  • Lines 14 through 16 ask the user if he or she wants to save the output to a file, then print the full URL to be requested.
  • Lines 18 through 40 do most of the work; any exception is trapped by lines 34 and 35.
  • Line 19 gets the URL and saves the response in response.
  • Lines 20 through 31 create a file name based on the URL to the web site (or uses save.html) and saves the output to that file.
  • Line 33 prints the response content to the screen.
When I ran this program, this is what I saw:
D:LeanPythonprograms>python webtest.py
Web url to fetch:uktmf.com
Enter query string:q=node/5277
Save downloaded page to disk [y/n]?y
Requesting http://uktmf.com?q=node/5277
saving to uktmf.com.html ...
d:LeanPythonprograms>
The contents of the downloaded page were saved in uktmf.com.html.
The requests library is very flexible in that you can access the HTTP “post” verb using requests.post().
You can provide data to post commands as follows:
data = {'param1': 'value 1','param2': 'value 2'}
response = request.post(url,data=data)
Where web sites or web services require it, you can provide credentials for authentication and obtain the content as JSON data. You can provide custom headers to requests and see the headers returned in the response easily, too.
The requests module can be used to test web sites and web services quite comprehensively.
Footnotes
1
Some familiarity with the operation of web servers, browsers, and HTML is assumed in this chapter.
 
2
Yes, it’s a program, the first that really does something you might actually find useful.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.157