Checking whether a web page exists with the HEAD request

You would like to check the existence of a web page without downloading the HTML content. This means that we need to send a get HEAD request with a browser client. According to Wikipedia, the HEAD request asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

How to do it...

We would like to send a HEAD request to www.python.org. This will not download the content of the homepage, rather it checks whether the server returns one of the valid responses, for example, OK, FOUND, MOVED PERMANENTLY, and so on.

Listing 4.6 explains checking a web page with the HEAD request as follows:

#!/usr/bin/env python
# Python Network Programming Cookbook -- Chapter - 4
# This program is optimized for Python 2.7.
# It may run on any other version with/without modifications.
import argparse
import httplib
import urlparse
import re
import urllib

DEFAULT_URL = 'http://www.python.org'
HTTP_GOOD_CODES =  [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]

def get_server_status_code(url):
  """
  Download just the header of a URL and
  return the server's status code.
  """
  host, path = urlparse.urlparse(url)[1:3] 
  try:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', path)
    return conn.getresponse().status
    except StandardError:
  return None

if __name__ == '__main__':
  parser = argparse.ArgumentParser(description='Example HEAD 
Request')
  parser.add_argument('--url', action="store", dest="url", 
default=DEFAULT_URL)
  given_args = parser.parse_args() 
  url = given_args.url
  if get_server_status_code(url) in HTTP_GOOD_CODES:
    print "Server: %s status is OK: " %url
  else:
    print "Server: %s status is NOT OK!" %url

Running this script shows the success or error if the page is found by the HEAD request as follows:

$ python 4_6_checking_webpage_with_HEAD_request.py 
Server: http://www.python.org status is OK!
$ python 4_6_checking_webpage_with_HEAD_request.py --url=http://www.zytho.org
Server: http://www.zytho.org status is NOT OK!

How it works...

We used the HTTPConnection() method of httplib, which can make a HEAD request to a server. We can specify the path if necessary. Here, the HTTPConnection() method checks the home page or path of www.python.org. However, if the URL is not correct, it can't find the return response inside the accepted list of return codes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.31.163