Downloading data from an HTTP server

You would like to write a simple HTTP client to fetch some data from any web server using the native HTTP protocol. This can be the very first steps towards creating your own HTTP browser.

How to do it...

Let us access www.python.org with our Pythonic minimal browser that uses Python's httplib.

Listing 4.1 explains the following code for a simple HTTP client:

#!/usr/bin/env python
# Python Network Programming Cookbook -- Chapter - 4
# This program is optimized for Python 2.7.
# It may run on any other version with/without modifications.


import argparse
import httplib

REMOTE_SERVER_HOST = 'www.python.org'
REMOTE_SERVER_PATH = '/'

class HTTPClient:

  def __init__(self, host):
    self.host = host

  def fetch(self, path):
    http = httplib.HTTP(self.host)

    # Prepare header
    http.putrequest("GET", path)
    http.putheader("User-Agent", __file__)
    http.putheader("Host", self.host)
    http.putheader("Accept", "*/*")
    http.endheaders()

    try:
      errcode, errmsg, headers = http.getreply()

    except Exception, e:
      print "Client failed error code: %s message:%s headers:%s" 
%(errcode, errmsg, headers)
    else: 
      print "Got homepage from %s" %self.host 

    file = http.getfile()
    return file.read()

if __name__ == "__main__":
  parser = argparse.ArgumentParser(description='HTTP Client 
Example')
  parser.add_argument('--host', action="store", dest="host",  
default=REMOTE_SERVER_HOST)
  parser.add_argument('--path', action="store", dest="path",  
default=REMOTE_SERVER_PATH)
  given_args = parser.parse_args() 
  host, path = given_args.host, given_args.path
  client = HTTPClient(host)
  print client.fetch(path)

This recipe will by default fetch a page from www.python.org. You can run this recipe with or without the host and path arguments. If this script is run, it will show the following output:

$  python 4_1_download_data.py --host=www.python.org 
Got homepage from www.python.org
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.og/1999/xhtml" xml:lang="en" lang="en">

<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Python Programming Language &ndash; Official Website</title>
....

If you run this recipe with an invalid path, it will show the following server response:

$ python 4_1_download_data.py --host='www.python.org' --path='/not-
exist'
Got homepage from www.python.org
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Page Not Found</title>
  <meta name="keywords" content="Page Not Found" />
  <meta name="description" content="Page Not Found" />

How it works...

This recipe defines an HTTPClient class that fetches data from the remote host. It is built using Python's native httplib library. In the fetch() method, it uses the HTTP() function and other auxiliary functions to create a dummy HTTP client, such as putrequest() or putheader(). It first puts the GET/path string that is followed by setting up a user agent, which is the name of the current script (__file__).

The main request getreply()method is put inside a try-except block. The response is retrieved from the getfile() method and the stream's content is read.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.100.62