You would like to check the existence of a web page without downloading the HTML content. This means that we need to send a get HEAD
request with a browser client. According to Wikipedia, the HEAD
request asks for the response identical to the one that would correspond to a GET
request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.
We would like to send a HEAD
request to www.python.org. This will not download the content of the homepage, rather it checks whether the server returns one of the valid responses, for example, OK
, FOUND
, MOVED PERMANENTLY
, and so on.
Listing 4.6 explains checking a web page with the HEAD
request as follows:
#!/usr/bin/env python # Python Network Programming Cookbook -- Chapter - 4 # This program is optimized for Python 2.7. # It may run on any other version with/without modifications. import argparse import httplib import urlparse import re import urllib DEFAULT_URL = 'http://www.python.org' HTTP_GOOD_CODES = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY] def get_server_status_code(url): """ Download just the header of a URL and return the server's status code. """ host, path = urlparse.urlparse(url)[1:3] try: conn = httplib.HTTPConnection(host) conn.request('HEAD', path) return conn.getresponse().status except StandardError: return None if __name__ == '__main__': parser = argparse.ArgumentParser(description='Example HEAD Request') parser.add_argument('--url', action="store", dest="url", default=DEFAULT_URL) given_args = parser.parse_args() url = given_args.url if get_server_status_code(url) in HTTP_GOOD_CODES: print "Server: %s status is OK: " %url else: print "Server: %s status is NOT OK!" %url
Running this script shows the success or error if the page is found by the HEAD
request as follows:
$ python 4_6_checking_webpage_with_HEAD_request.py Server: http://www.python.org status is OK! $ python 4_6_checking_webpage_with_HEAD_request.py --url=http://www.zytho.org Server: http://www.zytho.org status is NOT OK!
We used the HTTPConnection()
method of httplib
, which can make a HEAD
request to a server. We can specify the path if necessary. Here, the HTTPConnection()
method checks the home page or path of www.python.org. However, if the URL is not correct, it can't find the return response inside the accepted list of return codes.
3.144.31.163