Networking often refers to connecting multiple computers together for the purpose of allowing some communication among them. But, for our purposes, we are less interested in allowing computers to communicate with one another and more interested in allowing processes to communicate with one another. Whether the processes are on the same computer or different computers is irrelevant for the techniques that we’re going to show.
This chapter will focus on writing Python programs that connect to
other processes using the standard socket
library (as well as libraries built on
top of socket
) and then interacting
with those other processes.
While servers sit and wait for a client to connect to them, clients initiate connections. The Python Standard Library contains implementations of many used network clients. This section will discuss some of the more common and frequently useful clients.
The socket
module provides a
Python interface to your operating system’s socket
implementation. This means that you can do whatever can be done to or
with sockets, using Python. In case you have never done any network
programming before, this chapter does provide a brief overview of
networking. It should give you a flavor of what kinds of things you
can do with the Python networking libraries.
The socket
module provides
the factory function, socket()
.
The socket()
function, in turn, returns a socket
object. While there are a number of arguments to pass to socket()
for specifying the kind of socket
to create, calling the socket()
factory function with no arguments returns a socket
object with sensible defaults—a
TCP/IP socket:
In [1]: import socket In [2]: s = socket.socket() In [3]: s.connect(('192.168.1.15', 80)) In [4]: s.send("GET / HTTP/1.0 ") Out[4]: 16 In [5]: s.recv(200) Out[5]: 'HTTP/1.1 200 OK Date: Mon, 03 Sep 2007 18:25:45 GMT Server: Apache/2.0.55 (Ubuntu) DAV/2 PHP/5.1.6 Content-Length: 691 Connection: close Content-Type: text/html; charset=UTF-8 <!DOCTYPE HTML P' In [6]: s.close()
This example created a socket
object called
s
from the socket()
factory function. It then connected
to a local default web server, indicated by port 80, which is the
default port for HTTP. Then, it sent the server the text string
"GET / HTTP/1.0
"
(which is
simply an HTTP request). Following the send, it received the first 200
bytes of the server’s response, which is a 200 OK
status message and HTTP headers.
Finally, we closed the connection.
The socket
methods
demonstrated in this example represent the methods that you are likely
to find yourself using most often. Connect()
establishes a communication channel between your socket
object and the remote (specifically
meaning “not this socket object”). Send()
transmits data from your
socket
object to the remote end. Recv()
receives any data that the remote end
has sent back. And close()
terminates the communication channel between the two sockets. This is a
really simple example that shows the ease with which you can create
socket
objects and then send and receive data
over them.
Now we’ll look at a slightly more useful example. Suppose you have a server that is running some sort of network application, such as a web server. And suppose that you are interested in watching this server to be sure that, over the course of a day, you can make a socket connection to the web server. This sort of monitoring is minimal, but it proves that the server itself is still up and that the web server is still listening on some port. See Example 5-1.
#!/usr/bin/env python import socket import re import sys def check_server(address, port): #create a TCP socket s = socket.socket() print "Attempting to connect to %s on port %s" % (address, port) try: s.connect((address, port)) print "Connected to %s on port %s" % (address, port) return True except socket.error, e: print "Connection to %s on port %s failed: %s" % (address, port, e) return False if __name__ == '__main__': from optparse import OptionParser parser = OptionParser() parser.add_option("-a", "--address", dest="address", default='localhost', help="ADDRESS for server", metavar="ADDRESS") parser.add_option("-p", "--port", dest="port", type="int", default=80, help="PORT for server", metavar="PORT") (options, args) = parser.parse_args() print 'options: %s, args: %s' % (options, args) check = check_server(options.address, options.port) print 'check_server returned %s' % check sys.exit(not check)
All of the work occurs in the check_server()
function. Check_server()
creates a socket
object. Then, it tries to connect to
the specified address and port number. If it succeeds, it returns
True
. If it fails, the socket.connect()
call will throw an
exception, which is handled, and the function returns False
. The main
section of the
code calls check_server()
. This
“main” section parses the arguments from the user and puts the user
requested arguments into an appropriate format to pass in to check_server()
. This whole script prints out
status messages as it goes along. The last thing it prints out is the
return value of check_server()
. The
script returns the opposite of the check_server()
return code to the shell. The
reason that we return the opposite of this return code is to make this
script a useful scriptable utility. Typically, utilities like this
return 0
to the shell on success
and something other than 0
on
failure (typically something positive). Here is an example of the
piece of code successfully connecting to the web server we connected
to earlier:
jmjones@dinkgutsy:code$ python port_checker_tcp.py -a 192.168.1.15 -p 80 options: {'port': 80, 'address': '192.168.1.15'}, args: [] Attempting to connect to 192.168.1.15 on port 80 Connected to 192.168.1.15 on port 80 check_server returned True
The last output line, which contains check_server returned
True
, means that the connection was a success.
Here is an example of a connection call that failed:
jmjones@dinkgutsy:code$ python port_checker_tcp.py -a 192.168.1.15 -p 81 options: {'port': 81, 'address': '192.168.1.15'}, args: [] Attempting to connect to 192.168.1.15 on port 81 Connection to 192.168.1.15 on port 81 failed: (111, 'Connection refused') check_server returned False
The last log line, which contains check_server returned
False
, means that the connection was a failure. In the
penultimate output line, which contains Connection to
192.168.1.15 on port 81 failed
, we also see the reason,
'Connection refused'
. Just a wild
guess here, but it may have something to do with there being nothing
running on port 81 of this particular server.
We’ve created three examples to demonstrate how you can use this
utility in shell scripts. First, we give a shell command to run the
script and to print out SUCCESS
if
the script succeeds. We use the &&
operator in place of an if-then
statement:
$ python port_checker_tcp.py -a 192.168.1.15 -p 80 && echo "SUCCESS" options: {'port': 80, 'address': '192.168.1.15'}, args: [] Attempting to connect to 192.168.1.15 on port 80 Connected to 192.168.1.15 on port 80 check_server returned True SUCCESS
This script succeeded, so after executing and printing status
results, the shell prints SUCCESS
:
$ python port_checker_tcp.py -a 192.168.1.15 -p 81 && echo "FAILURE" options: {'port': 81, 'address': '192.168.1.15'}, args: [] Attempting to connect to 192.168.1.15 on port 81 Connection to 192.168.1.15 on port 81 failed: (111, 'Connection refused') check_server returned False
This script failed, so it never printed FAILURE
:
$ python port_checker_tcp.py -a 192.168.1.15 -p 81 || echo "FAILURE" options: {'port': 81, 'address': '192.168.1.15'}, args: [] Attempting to connect to 192.168.1.15 on port 81 Connection to 192.168.1.15 on port 81 failed: (111, 'Connection refused') check_server returned False FAILURE
This script failed, but we changed the &&
to ||
. This just means if the script returns a
failure result, print FAILURE
. So
it did.
The fact that a web server allows a connection on port 80 doesn’t mean that there is an HTTP server available for the connection. A test that will help us better determine the status of a web server is whether the web server generates HTTP headers with the expected status code for some specific URL. Example 5-2 does just that.
#!/usr/bin/env python import socket import re import sys def check_webserver(address, port, resource): #build up HTTP request string if not resource.startswith('/'): resource = '/' + resource request_string = "GET %s HTTP/1.1 Host: %s " % (resource, address) print 'HTTP request:' print '|||%s|||' % request_string #create a TCP socket s = socket.socket() print "Attempting to connect to %s on port %s" % (address, port) try: s.connect((address, port)) print "Connected to %s on port %s" % (address, port) s.send(request_string) #we should only need the first 100 bytes or so rsp = s.recv(100) print 'Received 100 bytes of HTTP response' print '|||%s|||' % rsp except socket.error, e: print "Connection to %s on port %s failed: %s" % (address, port, e) return False finally: #be a good citizen and close your connection print "Closing the connection" s.close() lines = rsp.splitlines() print 'First line of HTTP response: %s' % lines[0] try: version, status, message = re.split(r's+', lines[0], 2) print 'Version: %s, Status: %s, Message: %s' % (version, status, message) except ValueError: print 'Failed to split status line' return False if status in ['200', '301']: print 'Success - status was %s' % status return True else: print 'Status was %s' % status return False if __name__ == '__main__': from optparse import OptionParser parser = OptionParser() parser.add_option("-a", "--address", dest="address", default='localhost', help="ADDRESS for webserver", metavar="ADDRESS") parser.add_option("-p", "--port", dest="port", type="int", default=80, help="PORT for webserver", metavar="PORT") parser.add_option("-r", "--resource", dest="resource", default='index.html', help="RESOURCE to check", metavar="RESOURCE") (options, args) = parser.parse_args() print 'options: %s, args: %s' % (options, args) check = check_webserver(options.address, options.port, options.resource) print 'check_webserver returned %s' % check sys.exit(not check)
Similar to the previous example where check_server()
did all the work, check_webserver()
does all the work in this
example, too. First, check_webserver()
builds up the HTTP request
string. The HTTP protocol, in case you don’t know, is a well-defined
way that HTTP clients and servers communicate. The HTTP request that
check_webserver()
builds is nearly
the simplest HTTP request possible. Next, check_webserver()
creates a socket object,
connects to the server, and sends the HTTP request to the server.
Then, it reads back the response from the server and closes the
connection. When there is a socket error, check_webserver()
returns False
, indicating that the check failed. It
then takes what it read from the server, and extracts the status code
from it. If the status code is either 200 meaning “OK,” or 301,
meaning “Moved Permanently,” check_webserver()
returns True
, otherwise, it returns False
. The main
portion of the
script parses the input from the user and calls check_webserver()
. After it gets the result
back from check_webserver()
, it
returns the opposite of the return value from check_webserver()
to the shell. The concept
here is similar to what we did with the plain socket checker. We want
to be able to call this from a shell script and see if it succeeded or
failed. Here is the code in action:
$ python web_server_checker_tcp.py -a 192.168.1.15 -p 80 -r apache2-default options: {'resource': 'apache2-default', 'port': 80, 'address': '192.168.1.15'}, args: [] HTTP request: |||GET /apache2-default HTTP/1.1 Host: 192.168.1.15 ||| Attempting to connect to 192.168.1.15 on port 80 Connected to 192.168.1.15 on port 80 Received 100 bytes of HTTP response |||HTTP/1.1 301 Moved Permanently Date: Wed, 16 Apr 2008 23:31:24 GMT Server: Apache/2.0.55 (Ubuntu) ||| Closing the connection First line of HTTP response: HTTP/1.1 301 Moved Permanently Version: HTTP/1.1, Status: 301, Message: Moved Permanently Success - status was 301 check_webserver returned True
The last four output lines show that the HTTP status code for
/apache2-default
on this web server
was 301, so this run was successful.
Here is another run. This time, we’ll intentionally specify a
resource that isn’t there to show what happens when the HTTP call is
False
:
$ python web_server_checker_tcp.py -a 192.168.1.15 -p 80 -r foo options: {'resource': 'foo', 'port': 80, 'address': '192.168.1.15'}, args: [] HTTP request: |||GET /foo HTTP/1.1 Host: 192.168.1.15 ||| Attempting to connect to 192.168.1.15 on port 80 Connected to 192.168.1.15 on port 80 Received 100 bytes of HTTP response |||HTTP/1.1 404 Not Found Date: Wed, 16 Apr 2008 23:58:55 GMT Server: Apache/2.0.55 (Ubuntu) DAV/2 PH||| Closing the connection First line of HTTP response: HTTP/1.1 404 Not Found Version: HTTP/1.1, Status: 404, Message: Not Found Status was 404 check_webserver returned False
Just as the last four lines of the previous example showed that
the run was successful, the last four lines of this example show that
it was unsuccessful. Because there is no /foo
on this web server, this
checker returned False
.
This section showed how to construct low-level utilities to
connect to network servers and perform basic checks on them. The
purpose of these examples was to introduce you to what happens behind
the scenes when clients and servers communicate with one another. If
you have an opportunity to write a network component using a higher
library than the socket
module, you
should take it. It is not desirable to spend your time
writing network components using a low-level library such as socket
.
The previous example showed how to make an HTTP request using the socket
module directly. This example will
show how to use the httplib
module.
When should you consider using the httplib
module rather than the socket
module? Or more generically, when
should you consider using a higher level library rather than a lower
level library? A good rule of thumb is any chance you get. Sometimes
using a lower level library makes sense. You might need to accomplish
something that isn’t already in an available library, for example, or
you might need to have finer-grained control of something already in a
library, or there might be a performance advantage. But in this case,
there is no reason not to use a higher-level library such as httplib
over a lower-level library such as
socket
.
Example 5-3 accomplishes the same
functionality as the previous example did with the httplib
module.
#!/usr/bin/env python import httplib import sys def check_webserver(address, port, resource): #create connection if not resource.startswith('/'): resource = '/' + resource try: conn = httplib.HTTPConnection(address, port) print 'HTTP connection created successfully' #make request req = conn.request('GET', resource) print 'request for %s successful' % resource #get response response = conn.getresponse() print 'response status: %s' % response.status except sock.error, e: print 'HTTP connection failed: %s' % e return False finally: conn.close() print 'HTTP connection closed successfully' if response.status in [200, 301]: return True else: return False if __name__ == '__main__': from optparse import OptionParser parser = OptionParser() parser.add_option("-a", "--address", dest="address", default='localhost', help="ADDRESS for webserver", metavar="ADDRESS") parser.add_option("-p", "--port", dest="port", type="int", default=80, help="PORT for webserver", metavar="PORT") parser.add_option("-r", "--resource", dest="resource", default='index.html', help="RESOURCE to check", metavar="RESOURCE") (options, args) = parser.parse_args() print 'options: %s, args: %s' % (options, args) check = check_webserver(options.address, options.port, options.resource) print 'check_webserver returned %s' % check sys.exit(not check)
In its conception, this example follows the socket example
pretty closely. Two of the biggest differences are that you don’t have
to manually create the HTTP request and that you don’t have to
manually parse the HTTP response. The httplib
connection object has a request()
method that builds and sends the HTTP request for you.
The connection object also has a getresponse()
method
that creates a response object for you. We were able to access the
HTTP status by referring to the status
attribute on the response object.
Even if it isn’t that much less code to write, it is nice to not have
to go through the trouble of keeping up with creating, sending, and
receiving the HTTP request and response. This code just feels more
tidy.
Here is a run that uses the same command-line parameters the
previous successful scenario used. We’re looking for /
on
our web server, and we find it:
$ python web_server_checker_httplib.py -a 192.168.1.15 -r / options: {'resource': '/', 'port': 80, 'address': '192.168.1.15'}, args: [] HTTP connection created successfully request for / successful response status: 200 HTTP connection closed successfully check_webserver returned True
And here is a run with the same command-line parameters as the
failure scenario earlier. We’re looking for /foo
, and we
don’t find it:
$ python web_server_checker_httplib.py -a 192.168.1.15 -r /foo options: {'resource': '/foo', 'port': 80, 'address': '192.168.1.15'}, args: [] HTTP connection created successfully request for /foo successful response status: 404 HTTP connection closed successfully check_webserver returned False
As we said earlier, any time you have a chance to use a
higher-level library, you should use it. Using httplib
rather than using the socket
module alone was a simpler, cleaner
process. And the simpler you can make your code, the fewer bugs
you’ll have.
In addition to the socket
and
httplib
modules, the Python Standard Library also contains an
FTP client module named ftplib
.
ftplib
is a full-featured FTP
client library that will allow you to programmatically perform any
tasks you would normally use an FTP client application to perform. For
example, you can log in to an FTP server, list files in a particular
directory, retrieve files, put files, change directories, and logout,
all from within a Python script. You can even use one of the many GUI
frameworks available in Python and build your own GUI FTP
application.
Rather than give a full overview of this library, we’ll show you Example 5-4 and then explain how it works.
#!/usr/bin/env python from ftplib import FTP import ftplib import sys from optparse import OptionParser parser = OptionParser() parser.add_option("-a", "--remote_host_address", dest="remote_host_address", help="REMOTE FTP HOST.", metavar="REMOTE FTP HOST") parser.add_option("-r", "--remote_file", dest="remote_file", help="REMOTE FILE NAME to download.", metavar="REMOTE FILE NAME") parser.add_option("-l", "--local_file", dest="local_file", help="LOCAL FILE NAME to save remote file to", metavar="LOCAL FILE NAME") parser.add_option("-u", "--username", dest="username", help="USERNAME for ftp server", metavar="USERNAME") parser.add_option("-p", "--password", dest="password", help="PASSWORD for ftp server", metavar="PASSWORD") (options, args) = parser.parse_args() if not (options.remote_file and options.local_file and options.remote_host_address): parser.error('REMOTE HOST, LOCAL FILE NAME, ' 'and REMOTE FILE NAME are mandatory') if options.username and not options.password: parser.error('PASSWORD is mandatory if USERNAME is present') ftp = FTP(options.remote_host_address) if options.username: try: ftp.login(options.username, options.password) except ftplib.error_perm, e: print "Login failed: %s" % e sys.exit(1) else: try: ftp.login() except ftplib.error_perm, e: print "Anonymous login failed: %s" % e sys.exit(1) try: local_file = open(options.local_file, 'wb') ftp.retrbinary('RETR %s' % options.remote_file, local_file.write) finally: local_file.close() ftp.close()
The first part of the working code (past all the command-line
parsing) creates an FTP
object by
passing the FTP server’s address to FTP
’s constructor. Alternatively, we could
have created an FTP
object by
passing nothing to the constructor and then calling the connect()
method with
the FTP server’s address. The code then logs into the FTP server,
using the username and password if they were provided, or anonymous
authentication if they were not. Next, it creates a file
object to store the data from the file
on the FTP server. Then it calls the retrbinary()
method on the FTP
object. Retrbinary()
, as the name implies, retrieves
a binary file from an FTP server. It takes two parameters: the FTP
retrieve command and a callback function. You might notice that our
callback function is the write
method on the file object we created in the previous step. It is
important to note that we are not calling the write()
method in this case. We are passing the write
method in to the retrbinary()
method so that retrbinary()
can call write()
. Retrbinary()
will
call whatever callback function we pass it with each chunk of data
that it receives from the FTP server. This callback function could do
anything with the data. The callback function could just log that it
received N number of bytes from the FTP server. Passing in a file
object’s write
method causes the script to write the
contents of the file from the FTP server to the file
object. Finally, it closes the file
object and the FTP connection. We did a
little error handling in the process: we set up a try
block around retrieving the binary file from the FTP server and a
finally
block around the call to close the local
file and FTP connection. If anything bad happens, we want to clean up
our files before the script terminates. For a brief
discussion of callbacks, see the Appendix.
Moving up the standard library modules to a higher-level library, we arrive at urllib
. When you think of urllib
, it’s easy to think of HTTP libraries
only and forget that FTP resources also can be identified by URLs.
Consequently, you might not have considered using urllib
to retrieve FTP resources, but the
functionality is there. Example 5-5 is the
same as the ftplib
example earlier,
except it uses urllib
.
#!/usr/bin/env python """ url retriever Usage: url_retrieve_urllib.py URL FILENAME URL: If the URL is an FTP URL the format should be: ftp://[username[:password]@]hostname/filename If you want to use absolute paths to the file to download, you should make the URL look something like this: ftp://user:password@host/%2Fpath/to/myfile.txt Notice the '%2F' at the beginning of the path to the file. FILENAME: absolute or relative path to the filename to save downloaded file as """ import urllib import sys if '-h' in sys.argv or '--help' in sys.argv: print __doc__ sys.exit(1) if not len(sys.argv) == 3: print 'URL and FILENAME are mandatory' print __doc__ sys.exit(1) url = sys.argv[1] filename = sys.argv[2] urllib.urlretrieve(url, filename)
This script is short and sweet. It really shows off the power of
urllib
. There are actually more
lines of usage documentation than code in it. There is even more
argument parsing than code, which says a lot because there isn’t much
of that, either. We decided to go with a very simple argument parsing
routine with this script. Since both of the “options” were mandatory,
we decided to use positional arguments rather than option switches.
Effectively, the only line of code in this example that performs work
is this one:
urllib.urlretrieve(url, filename)
After retrieving the options with sys.argv
, this line of code pulls down the
specified URL and saves it to the specified local filename. It works
with HTTP URLs and FTP URLs, and will even work when the username and
password are included in the URL.
A point worth emphasizing here is that if you think that
something should be easier than the way you are doing it with another
language, it probably is. There is probably some higher-level library
out there somewhere that will do what you need to do frequently, and
that library will be in the Python Standard Library. In this case,
urllib
did exactly what we wanted
to do, and we didn’t have to go anywhere beyond the standard library
docs to find out about it. Sometimes, you might have to go outside the
Python Standard Library, but you will find other Python resources such
as the Python Package Index (PyPI) at http://pypi.python.org/pypi.
Another high level library is urllib2
. Urllib2
contains pretty much the same
functionality as urllib
, but
expands on it. For example, urllib2
contains better authentication support and better cookie support. So
if you start using urllib
and think
it isn’t doing everything for you that it should, take a look at urllib2
to see if it meets your
needs.
Typically, the reason for writing networking code is that you need interprocess communication (IPC). Often, plain IPC, such as HTTP or a plain socket, is good enough. However, there are times when it would be even more useful to execute code in a different process or even on a different computer, as though it were in the same process that the code you are working on is in. If you could, in fact, execute code remotely in some other process from your Python program, you might expect that the return values from the remote calls would be Python objects which you could deal more easily with than chunks of text through which you have to manually parse. The good news is that there are several tools for remote procedure call (RPC) functionality.
XML-RPC exchanges a specifically formatted XML document between two processes to perform a remote procedure call. But you don’t need to worry about XML part; you’ll probably never have to know the format of the document that is being exchanged between the two processes. The only thing you really need to know to get started using XML-RPC is that there is an implementation of both the client and the server portions in the Python Standard Library. Two things that might be useful to know are XML-RPC is available for most programming languages, and it is very simple to use.
Example 5-6 is a simple XML-RPC server.
#!/usr/bin/env python import SimpleXMLRPCServer import os def ls(directory): try: return os.listdir(directory) except OSError: return [] def ls_boom(directory): return os.listdir(directory) def cb(obj): print "OBJECT::", obj print "OBJECT.__class__::", obj.__class__ return obj.cb() if __name__ == '__main__': s = SimpleXMLRPCServer.SimpleXMLRPCServer(('127.0.0.1', 8765)) s.register_function(ls) s.register_function(ls_boom) s.register_function(cb) s.serve_forever()
This code creates a new SimpleXMLRPCServer
object and binds it to
port 8765 on 127.0.0.1, the loop back interface, which makes this
accessible to processes only on this particular machine. It then
registers the functions ls()
,
ls_boom()
, and cb()
, which we defined in the code. We’ll
explain the cb()
function in a few
moments. The Ls()
function will
list the contents of the directory passed in using os.listdir()
and return those results as a
list. ls()
masks any OSError
exceptions that we may get. ls_boom()
lets any exception that we hit
find its way back to the XML-RPC client. Then, the code enters into
the serve_forever()
loop, which
waits for a connection it can handle. Here is an example of this code
used in an IPython shell:
In [1]: import xmlrpclib In [2]: x = xmlrpclib.ServerProxy('http://localhost:8765') In [3]: x.ls('.') Out[3]: ['.svn', 'web_server_checker_httplib.py', .... 'subprocess_arp.py', 'web_server_checker_tcp.py'] In [4]: x.ls_boom('.') Out[4]: ['.svn', 'web_server_checker_httplib.py', .... 'subprocess_arp.py', 'web_server_checker_tcp.py'] In [5]: x.ls('/foo') Out[5]: [] In [6]: x.ls_boom('/foo') --------------------------------------------------------------------------- <class 'xmlrpclib.Fault'> Traceback (most recent call last) ... . . <<big nasty traceback>> . . ... 786 if self._type == "fault": --> 787 raise Fault(**self._stack[0]) 788 return tuple(self._stack) 789 <class 'xmlrpclib.Fault'>: <Fault 1: "<type 'exceptions.OSError'> :[Errno 2] No such file or directory: '/foo'">
First, we created a ServerProxy()
object by passing in the
address of the XML-RPC server. Then, we called .ls('.')
to see which files were in the
server’s current working directory. The server was running in a
directory that contains example code from this book, so those are the
files you see from the directory listing. The really interesting thing
is that on the client side, x.ls('.')
returned a Python list. Had this
server been implemented in Java, Perl, Ruby, or C#, you could expect
the same thing. The language that implements the server would have
done a directory listing; created a list, array, or collection of
filenames; and the XML-RPC server code would have then created an XML
representation of that list or array and sent it back over the wire to
your client. We also tried out ls_boom()
. Since ls_boom()
lacks the exception handling of
ls()
, we can see that the exception
passes from the server back to the client. We even see a traceback on
the client.
The interoperability possibilities that XML-RPC opens up to you are certainly interesting. But perhaps more interesting is the fact that you can write a piece of code to run on any number of machines and be able to execute that code remotely whenever you wish.
XML-RPC is not without its limitations, though. Whether you think these limitations are problematic or not is a matter of engineering taste. For example, if you pass in a custom Python object, the XML-RPC library will convert that object to a Python dictionary, serialize it to XML, and pass it across the wire. You can certainly work around this, but it would require writing code to extract your data from the XML version of the dictionary so that you could pass it back into the original object that was dictified. Rather than go through that trouble, why not use your objects directly on your RPC server? You can’t with XML-RPC, but there are other options.
Pyro is one framework that alleviates XML-RPC shortcomings. Pyro
stands for Python Remote Objects (capitalization intentional).
It lets you do everything you could do with XML-RPC, but rather than
dictifying your objects, it maintains their types when you pass them
across. If you do want to use Pyro, you will have to install it
separately. It doesn’t come with Python. Also be aware that Pyro only
works with Python, whereas XML-RPC can work between Python and other
languages. Example 5-7 is an implementation of the
same ls()
functionality from the
XML-RPC example.
#!/usr/bin/env python import Pyro.core import os from xmlrpc_pyro_diff import PSACB class PSAExample(Pyro.core.ObjBase): def ls(self, directory): try: return os.listdir(directory) except OSError: return [] def ls_boom(self, directory): return os.listdir(directory) def cb(self, obj): print "OBJECT:", obj print "OBJECT.__class__:", obj.__class__ return obj.cb() if __name__ == '__main__': Pyro.core.initServer() daemon=Pyro.core.Daemon() uri=daemon.connect(PSAExample(),"psaexample") print "The daemon runs on port:",daemon.port print "The object's uri is:",uri daemon.requestLoop()
The Pyro example is similar to the XML-RPC example. First, we
created a PSAExample
class with
ls()
, ls_boom()
, and cb()
methods on it. We then created a daemon
from Pyro’s internal plumbing. Then, we associated the PSAExample
with the daemon. Finally, we told
the daemon to start serving requests.
Here we access the Pyro server from an IPython prompt:
In [1]: import Pyro.core /usr/lib/python2.5/site-packages/Pyro/core.py:11: DeprecationWarning: The sre module is deprecated, please import re. import sys, time, sre, os, weakref In [2]: psa = Pyro.core.getProxyForURI("PYROLOC://localhost:7766/psaexample") Pyro Client Initialized. Using Pyro V3.5 In [3]: psa.ls(".") Out[3]: ['pyro_server.py', .... 'subprocess_arp.py', 'web_server_checker_tcp.py'] In [4]: psa.ls_boom('.') Out[4]: ['pyro_server.py', .... 'subprocess_arp.py', 'web_server_checker_tcp.py'] In [5]: psa.ls("/foo") Out[5]: [] In [6]: psa.ls_boom("/foo") --------------------------------------------------------------------------- <type 'exceptions.OSError'> Traceback (most recent call last) /home/jmjones/local/Projects/psabook/oreilly/<ipython console> in <module>() . . ... <<big nasty traceback>> ... . . --> 115 raise self.excObj 116 def __str__(self): 117 s=self.excObj.__class__.__name__ <type 'exceptions.OSError'>: [Errno 2] No such file or directory: '/foo'
Nifty. It returned the same output as the XML-RPC example. We
expected as much. But what happens when we pass in a custom object?
We’re going to define a new class, create an object from it, and then
pass it to the XML-RPC cb()
function and the Pyro cb()
method
from the examples above. Example 5-8 shows
the piece of code that we are going to execute.
import Pyro.core import xmlrpclib class PSACB: def __init__(self): self.some_attribute = 1 def cb(self): return "PSA callback" if __name__ == '__main__': cb = PSACB() print "PYRO SECTION" print "*" * 20 psapyro = Pyro.core.getProxyForURI("PYROLOC://localhost:7766/psaexample") print "-->>", psapyro.cb(cb) print "*" * 20 print "XML-RPC SECTION" print "*" * 20 psaxmlrpc = xmlrpclib.ServerProxy('http://localhost:8765') print "-->>", psaxmlrpc.cb(cb) print "*" * 20
The call to the Pyro and XML-RPC implementation of the cb()
function should both call cb()
on the object passed in to it. And in
both instances, it should return the string PSA callback
.
And here is what happens when we run it:
jmjones@dinkgutsy:code$ python xmlrpc_pyro_diff.py /usr/lib/python2.5/site-packages/Pyro/core.py:11: DeprecationWarning: The sre module is deprecated, please import re. import sys, time, sre, os, weakref PYRO SECTION ******************** Pyro Client Initialized. Using Pyro V3.5 -->> PSA callback ******************** XML-RPC SECTION ******************** -->> Traceback (most recent call last): File "xmlrpc_pyro_diff.py", line 23, in <module> print "-->>", psaxmlrpc.cb(cb) File "/usr/lib/python2.5/xmlrpclib.py", line 1147, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.5/xmlrpclib.py", line 1437, in __request verbose=self.__verbose File "/usr/lib/python2.5/xmlrpclib.py", line 1201, in request return self._parse_response(h.getfile(), sock) File "/usr/lib/python2.5/xmlrpclib.py", line 1340, in _parse_response return u.close() File "/usr/lib/python2.5/xmlrpclib.py", line 787, in close raise Fault(**self._stack[0]) xmlrpclib.Fault: <Fault 1: "<type 'exceptions.AttributeError'>:'dict' object has no attribute 'cb'">
The Pyro implementation worked, but the XML-RPC implementation
failed and left us a traceback. The last line of the traceback says
that a dict object has no attribute of cb
. This will make more sense when we show
you the output from the XML-RPC server. Remember that the cb()
function had some print
statements in it to show some
information about what was going on. Here is the XML-RPC server
output:
OBJECT:: {'some_attribute': 1} OBJECT.__class__:: <type 'dict'> localhost - - [17/Apr/2008 16:39:02] "POST /RPC2 HTTP/1.0" 200 -
In dictifying the object that we created in the XML-RPC client,
some_attribute
was converted to a
dictionary key. While this one attribute was preserved, the cb()
method was not.
Here is the Pyro server output:
OBJECT: <xmlrpc_pyro_diff.PSACB instance at 0x9595a8> OBJECT.__class__: xmlrpc_pyro_diff.PSACB
Notice that the class of the object is a PSACB
, which is how it was created. On the
Pyro server side, we had to include code that imported the same code
that the client was using. It makes sense that the Pyro server needs
to import the client’s code. Pyro uses the Python standard pickle
to serialize objects, so it makes
sense that Pyro behaves similarly.
In summary, if you want a simple RPC solution, don’t want external dependencies, can live with the limitations of XML-RPC, and think that interoperability with other languages could come in handy, then XML-RPC is probably a good choice. On the other hand, if the limitations of XML-RPC are too constraining, you don’t mind installing external libraries, and you don’t mind being limited to using only Python, then Pyro is probably a better option for you.
SSH is an incredibly powerful, widely used protocol. You can also think of it as a tool since the most common implementation includes the same name. SSH allows you to securely connect to a remote server, execute shell commands, transfer files, and forward ports in both directions across the connection.
If you have the command-line ssh
utility, why would you ever want to script
using the SSH protocol? The main reason is that using the SSH protocol
gives you the full power of SSH combined with the full power of
Python.
The SSH2 protocol is implemented using the Python library called
paramkio
. From within a Python
script, writing nothing but Python code, you can connect
to an SSH server and accomplish those pressing SSH tasks. Example 5-9 is an example of connecting to an SSH
server and executing a simple command.
#!/usr/bin/env python import paramiko hostname = '192.168.1.15' port = 22 username = 'jmjones' password = 'xxxYYYxxx' if __name__ == "__main__": paramiko.util.log_to_file('paramiko.log') s = paramiko.SSHClient() s.load_system_host_keys() s.connect(hostname, port, username, password) stdin, stdout, stderr = s.exec_command('ifconfig') print stdout.read() s.close()
As you can see, we import the paramiko
module and define three variables.
Next, we create an SSHClient
object.
Then we tell it to load the host keys, which, on Linux, come from the
“known_hosts” file. After that we connect to the SSH server. None of
these steps is particularly complicated, especially if you’re already
familiar with SSH.
Now we’re ready to execute a command remotely. The call to
exec_command()
executes the
command that you pass in and returns three file handles
associated with the execution of the command: standard input, standard
output, and standard error. And to show that this is being executed on a
machine with the same IP address as the address we connected to with the
SSH call, we print out the results of ifconfig
on the remote server:
jmjones@dinkbuntu:~/code$ python paramiko_exec.py eth0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX inet addr:192.168.1.15 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: xx00::000:x0xx:xx0x:0x00/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9667336 errors:0 dropped:0 overruns:0 frame:0 TX packets:11643909 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1427939179 (1.3 GiB) TX bytes:2940899219 (2.7 GiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:123571 errors:0 dropped:0 overruns:0 frame:0 TX packets:123571 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:94585734 (90.2 MiB) TX bytes:94585734 (90.2 MiB)
It looks exactly as if we had run ifconfig
on our local machine, except the IP
address is different.
Example 5-10 shows you how to
use paramiko
to SFTP files between a
remote machine and your local machine. This particular example
only retrieves files from the remote machine using the get()
method. If you want to send files to the
remote machine, use the put()
method.
#!/usr/bin/env python import paramiko import os hostname = '192.168.1.15' port = 22 username = 'jmjones' password = 'xxxYYYxxx' dir_path = '/home/jmjones/logs' if __name__ == "__main__": t = paramiko.Transport((hostname, port)) t.connect(username=username, password=password) sftp = paramiko.SFTPClient.from_transport(t) files = sftp.listdir(dir_path) for f in files: print 'Retrieving', f sftp.get(os.path.join(dir_path, f), f) t.close()
In case you want to use public/private keys rather than passwords, Example 5-11 is a modification of the remote execution example using an RSA key.
#!/usr/bin/env python import paramiko hostname = '192.168.1.15' port = 22 username = 'jmjones' pkey_file = '/home/jmjones/.ssh/id_rsa' if __name__ == "__main__": key = paramiko.RSAKey.from_private_key_file(pkey_file) s = paramiko.SSHClient() s.load_system_host_keys() s.connect(hostname, port, pkey=key) stdin, stdout, stderr = s.exec_command('ifconfig') print stdout.read() s.close()
And Example 5-12 is a modification of the sftp script using an RSA key.
#!/usr/bin/env python import paramiko import os hostname = '192.168.1.15' port = 22 username = 'jmjones' dir_path = '/home/jmjones/logs' pkey_file = '/home/jmjones/.ssh/id_rsa' if __name__ == "__main__": key = paramiko.RSAKey.from_private_key_file(pkey_file) t = paramiko.Transport((hostname, port)) t.connect(username=username, pkey=key) sftp = paramiko.SFTPClient.from_transport(t) files = sftp.listdir(dir_path) for f in files: print 'Retrieving', f sftp.get(os.path.join(dir_path, f), f) t.close()
Twisted is an event-driven networking framework for Python that can tackle pretty much any type of network-related task you need it to. A comprehensive single solution has a price of complexity. Twisted will begin to make sense after you’ve used it a few times, but understanding it initially can be difficult. Further, learning Twisted is such a large project that finding a beginning point to solve a specific problem can often be daunting.
Despite that, though, we highly recommend that you become familiar with it and see if it fits the way you think. If you can easily tailor your thinking to “the Twisted way,” then learning Twisted will be likely to be a valuable investment. Twisted Network Programming Essentials by Abe Fettig (O’Reilly) is a good place to get started. This book helps to reduce the negative points we have mentioned.
Twisted is an event-driven network, meaning that rather than focusing on writing code that initiates connections being made and dropped and low-level details of data reception, you focus on writing code that handles those happenings.
What advantage would you gain by using Twisted? The framework encourages, and at times nearly requires, that you break your problems into small pieces. The network connection is decoupled from the logic of what occurs when connections are made. These two facts gain you some level of automatic re-usability from your code. Another thing that Twisted gains for you is that you won’t have to worry so much about lower level connection and error handling with network connections. Your part in writing network code is deciding what happens when certain events transpire.
Example 5-13 is a port checker that we’ve implemented in Twisted. It is very basic, but will demonstrate the event-driven nature of Twisted as we go through the code. But before we do that, we’ll go over a few basic concepts that you’ll need to know. The basics include reactors, factory, protocols, and deferreds. Reactors are the heart of a Twisted application’s main event loop. Reactors handle event dispatching, network communications, and threading. Factories are responsible for spawning new protocol instances. Each factory instance can spawn one type of protocol. Protocols define what to do with a specific connection. At runtime, a protocol instance is created for each connection. And deferreds are a way of chaining actions together.
#!/usr/bin/env python from twisted.internet import reactor, protocol import sys class PortCheckerProtocol(protocol.Protocol): def __init__(self): print "Created a new protocol" def connectionMade(self): print "Connection made" reactor.stop() class PortCheckerClientFactory(protocol.ClientFactory): protocol = PortCheckerProtocol def clientConnectionFailed(self, connector, reason): print "Connection failed because", reason reactor.stop() if __name__ == '__main__': host, port = sys.argv[1].split(':') factory = PortCheckerClientFactory() print "Testing %s" % sys.argv[1] reactor.connectTCP(host, int(port), factory) reactor.run()
Notice that we defined two classes (PortCheckerProtocol
and PortCheckerClientFactory
), both of which
inherit from Twisted classes. We tied our factory, PortCheckerClientFactory
, to PortCheckerProtocol
by assigning PortCheckerProtocol
to PortCheckerClientFactory
’s class attribute
protocol
. If a factory attempts to
make a connection but fails, the factory’s clientConnectionFailed()
method will be
called. ClientConnectionFailed()
is a
method that is common to all Twisted factories and is the only method we
defined for our factory. By defining a method that “comes with” the
factory
class,
we are overriding the default behavior of the class. When a client
connection fails, we want to print out a message to that effect and stop
the reactor.
PortCheckerProtocol
is one of
the protocols we discussed earlier. An instance of this class will be
created once we have established a connection to the server whose port
we are checking. We have only defined one method on PortCheckerProtocol
: connectionMade()
. This is a method that
is common to all Twisted protocol classes. By defining this method
ourselves, we are overriding the default behavior. When a connection is
successfully made, Twisted will call this protocol’s connectionMade()
method. As you can see, it
prints out a simple message and stops the reactor. (We’ll get to the
reactor shortly.)
In this example, both connectionMade()
and clientConnectionFailed()
demonstrate the
“event-driven” nature of Twisted. A connection being made is an event.
So also is when a client connection fails to be made. When these events
occur, Twisted calls the appropriate methods to handle the events, which
are referred to as event handlers.
In the main section of this example, we create an instance of
PortCheckerClientFactory
. We then
tell the Twisted reactor to connect to the hostname and port number,
which were passed in as command-line arguments, using the specified
factory. After telling the reactor to connect to a certain port on a
certain host, we tell the reactor to run. If we had not told the reactor
to run, nothing would have happened.
To summarize the flow chronologically, we start the reactor after
giving it a directive. In this case, the directive was to connect to a
server and port and use PortCheckerClientFactory
to help dispatch
events. If the connection to the given host and port fails, the event
loop will call clientConnectionFailed()
on PortCheckerClientFactory
. If the connection
succeeds, the factory creates an instance of the protocol, PortCheckerProtocol
, and
calls connectionMade()
on that
instance. Whether the connection succeeds or fails, the respective event
handlers will shut the reactor down and the program will stop
running.
That was a very basic example, but it showed the basics of Twisted’s event handling nature. A key concept of Twisted programming that we did not cover in this example is the idea of deferreds and callbacks. A deferred represents a promise to execute the requested action. A callback is a way of specifying an action to accomplish. Deferreds can be chained together and pass their results on from one to the next. This point is often difficult to really understand in Twisted. (Example 5-14 will elaborate on deferreds.)
Example 5-14 is an example of using Perspective Broker, an RPC mechanism that is unique to Twisted. This example is another implementation of the remote “ls” server that we implemented in XML-RPC and Pyro, earlier in this chapter. First, we will walk you through the server.
import os from twisted.spread import pb from twisted.internet import reactor class PBDirLister(pb.Root): def remote_ls(self, directory): try: return os.listdir(directory) except OSError: return [] def remote_ls_boom(self, directory): return os.listdir(directory) if __name__ == '__main__': reactor.listenTCP(9876, pb.PBServerFactory(PBDirLister())) reactor.run()
This example defines one class, PBDirLister
. This is the Perspective Broker
(PB) class that will act as
a remote object when the client connects to it. This example defines
only two methods on this class: remote_ls()
and remote_ls_boom()
. Remote_ls()
is, not surprisingly, one of the
remote methods that the client will call. This remote_ls()
method will simply return a
listing of the specified directory. And remote_ls_boom()
will do the same thing that
remote_ls()
will do, except that it
won’t perform exception handling. In the main section of the example, we
tell the Perspective Broker
to bind
to port 9876 and then run the reactor.
Example 5-15 is not as straightforward; it calls
remote_ls()
.
#!/usr/bin/python from twisted.spread import pb from twisted.internet import reactor def handle_err(reason): print "an error occurred", reason reactor.stop() def call_ls(def_call_obj): return def_call_obj.callRemote('ls', '/home/jmjones/logs') def print_ls(print_result): print print_result reactor.stop() if __name__ == '__main__': factory = pb.PBClientFactory() reactor.connectTCP("localhost", 9876, factory) d = factory.getRootObject() d.addCallback(call_ls) d.addCallback(print_ls) d.addErrback(handle_err) reactor.run()
This client example defines three functions, handle_err()
, call_ls()
, and print_ls()
. Handle_err()
will handle any errors that occur
along the way. Call_ls()
will
initiate the calling of the remote “ls” method. Print_ls()
will print the results of the “ls”
call. This seems a bit odd that there is one function to initiate a
remote call and another to print the results of the call. But because
Twisted is an asynchronous, event-driven network framework, it makes
sense in this case. The framework intentionally encourages writing code
that breaks work up into many small pieces.
The main section of this example shows how the reactor knows when
to call which callback function. First, we create a client Perspective Broker
factory and tell the
reactor to connect to localhost:9876
, using the PB
client factory to handle requests. Next, we
get a placeholder for the remote object by calling factory.getRootObject()
. This is actually a
deferred, so we can pipeline activity together by calling addCallback()
to it.
The first callback that we add is the call_ls()
function call. Call_ls()
calls the callRemote()
method on the deferred object
from the previous step. CallRemote()
returns a deferred as well. The second callback in the processing chain
is print_ls()
. When the reactor calls
print_ls()
, print_ls()
prints the result of the remote
call to remote_ls()
in the previous
step. In fact, the reactor passes in the results of the remote call into
print_ls()
. The third callback in the
processing chain is handle_err()
,
which is simply an error handler that lets us know if an error occurred
along the way. When either an error occurs or the pipeline reaches
print_ls()
, the respective methods
shut the reactor down.
Here is what running this client code looks like:
jmjones@dinkgutsy:code$ python twisted_perspective_broker_client.py ['test.log']
The output is a list of files in the directory we specified, exactly as we would have expected.
This example seems a bit complicated for the simple RPC example we laid out here. The server side seems pretty comparable. Creating the client seemed to be quite a bit more work with the pipeline of callbacks, deferreds, reactors, and factories. But this was a very simple example. The structure that Twisted provides really shines when the task at hand is of a higher level of complexity.
Example 5-16 is a slight modification to the
Perspective Broker client code that we just demonstrated. Rather than
calling ls
on the remote side, it
calls ls_boom
. This will show us how
the client and server deal with exceptions.
#!/usr/bin/python from twisted.spread import pb from twisted.internet import reactor def handle_err(reason): print "an error occurred", reason reactor.stop() def call_ls(def_call_obj): return def_call_obj.callRemote('ls_boom', '/foo') def print_ls(print_result): print print_result reactor.stop() if __name__ == '__main__': factory = pb.PBClientFactory() reactor.connectTCP("localhost", 9876, factory) d = factory.getRootObject() d.addCallback(call_ls) d.addCallback(print_ls) d.addErrback(handle_err) reactor.run()
Here is what happens when we run this code:
jmjones@dinkgutsy:code$ python twisted_perspective_broker_client_boom.py an error occurred [Failure instance: Traceback from remote host -- Traceback unavailable ]
And on the server:
Peer will receive following PB traceback: Traceback (most recent call last): ... <more traceback> ... state = method(*args, **kw) File "twisted_perspective_broker_server.py", line 13, in remote_ls_boom return os.listdir(directory) exceptions.OSError: [Errno 2] No such file or directory: '/foo'
The specifics of the error were in the server code rather than the client. In the client, we only knew that an error had occurred. If Pyro or XML-RPC had behaved like this, we would have considered that to be a bad thing. However, in the Twisted client code, our error handler was called. Since this is a different model of programming from Pyro and XML-RPC (event-based), we expect to have to handle our errors differently, and the Perspective Broker code did what we would have expected it to do.
We gave a less-than-tip-of-the-iceberg introduction to Twisted here. Twisted can be a bit difficult to get started with because it is such a comprehensive project and takes such a different approach than what most of us are accustomed to. Twisted is definitely worth investigating further and having in your toolbox when you need it.
If you like writing network code, you are going to love Scapy. Scapy is an incredibly handy interactive packet manipulation program and library. Scapy can discover networks, perform scans, traceroutes, and probes. There is also excellent documentation available for Scapy. If you like this intro, you should buy the book for even more details on Scapy.
The first thing to figure out about Scapy is that, as of this writing, it is kept in a single file. You will need to download the latest copy of Scapy here: http://hg.secdev.org/scapy/raw-file/tip/scapy.py. Once you download Scapy, you can run it as a standalone tool or import it and use it as a library. Let’s get started by using it as an interactive tool. Please keep in mind that you will need to run Scapy with root privileges, as it needs privileged control of your network interfaces.
Once you download and install Scapy, you will see something like this:
Welcome to Scapy (1.2.0.2) >>>
You can do anything you would normally do with a Python
interpreter,and there are special Scapy commands as well. The first
thing we are going to do is run a Scapy ls()
function,
which lists all available layers:
>>> ls() ARP : ARP ASN1_Packet : None BOOTP : BOOTP CookedLinux : cooked linux DHCP : DHCP options DNS : DNS DNSQR : DNS Question Record DNSRR : DNS Resource Record Dot11 : 802.11 Dot11ATIM : 802.11 ATIM Dot11AssoReq : 802.11 Association Request Dot11AssoResp : 802.11 Association Response Dot11Auth : 802.11 Authentication [snip]
We truncated the output as it is quite verbose. Now, we’ll perform a recursive DNS query of http://www.oreilly.com using Caltech University’s public DNS server:
>>> sr1(IP(dst="131.215.9.49")/UDP()/DNS(rd=1,qd=DNSQR(qname="www.oreilly.com"))) Begin emission: Finished to send 1 packets. ...* Received 4 packets, got 1 answers, remaining 0 packets IP version=4L ihl=5L tos=0x0 len=223 id=59364 flags=DF frag=0L ttl=239 proto=udp chksum=0xb1e src=131.215.9.49 dst=10.0.1.3 options='' |UDP sport=domain dport=domain len=203 chksum=0x843 | DNS id=0 qr=1L opcode=QUERY aa=0L tc=0L rd=1L ra=1L z=0L rcode=ok qdcount=1 ancount=2 nscount=4 arcount=3 qd= DNSQR qname='www.oreilly.com.' qtype=A qclass=IN |> an=DNSRR rrname='www.oreilly.com.' type=A rclass=IN ttl=21600 rdata='208.201.239.36' [snip]
Next, let’s perform a traceroute:
>>> ans,unans=sr(IP(dst="oreilly.com", >>> ttl=(4,25),id=RandShort())/TCP(flags=0x2)) Begin emission: ..............*Finished to send 22 packets. *...........*********.***.***.*.*.*.*.* Received 54 packets, got 22 answers, remaining 0 packets >>> for snd, rcv in ans: ... print snd.ttl, rcv.src, isinstance(rcv.payload, TCP) ... [snip] 20 208.201.239.37 True 21 208.201.239.37 True 22 208.201.239.37 True 23 208.201.239.37 True 24 208.201.239.37 True 25 208.201.239.37 True
Scapy can even do pure packet dumps like tcpdump:
>>> sniff(iface="en0", prn=lambda x: x.show()) ###[ Ethernet ]### dst= ff:ff:ff:ff:ff:ff src= 00:16:cb:07:e4:58 type= IPv4 ###[ IP ]### version= 4L ihl= 5L tos= 0x0 len= 78 id= 27957 flags= frag= 0L ttl= 64 proto= udp chksum= 0xf668 src= 10.0.1.3 dst= 10.0.1.255 options= '' [snip]
You can also do some very slick network visualization of traceroutes if you install graphviz and imagemagic. This example is borrowed from the official Scapy documentation:
>>> res,unans = traceroute(["www.microsoft.com","www.cisco.com","www.yahoo.com", "www.wanadoo.fr","www.pacsec.com"],dport=[80,443],maxttl=20,retry=-2) Begin emission: ************************************************************************ Finished to send 200 packets. ******************Begin emission: *******************************************Finished to send 110 packets. **************************************************************Begin emission: Finished to send 5 packets. Begin emission: Finished to send 5 packets. Received 195 packets, got 195 answers, remaining 5 packets 193.252.122.103:tcp443 193.252.122.103:tcp80 198.133.219.25:tcp443 198.133.219.25:tcp80 207.46.193.254:tcp443 207.46.193.254:tcp80 69.147.114.210:tcp443 69.147.114.210:tcp80 72.9.236.58:tcp443 72.9.236.58:tcp80
You can now create a fancy graph from those results:
>>> res.graph() >>> res.graph(type="ps",target="| lp") >>> res.graph(target="> /tmp/graph.svg")
Now that you’ve installed graphviz and imagemagic, the network visualization will blow your mind!
The real fun in using Scapy, though, is when you create custom command-line tools and scripts. In the next section, we will take a look at Scapy the library.
Now that we can build something permanent with Scapy, one interesting thing to show right off the bat is an arping tool. Let’s look at a platform-specific arping tool first:
#!/usr/bin/env python import subprocess import re import sys def arping(ipaddress="10.0.1.1"): """Arping function takes IP Address or Network, returns nested mac/ip list""" #Assuming use of arping on Red Hat Linux p = subprocess.Popen("/usr/sbin/arping -c 2 %s" % ipaddress, shell=True, stdout=subprocess.PIPE) out = p.stdout.read() result = out.split() #pattern = re.compile(":") for item in result: if ':' in item: print item if __name__ == '__main__': if len(sys.argv) > 1: for ip in sys.argv[1:]: print "arping", ip arping(ip) else: arping()
Now let’s look at how we can create that exact same thing using Scapy, but in a platform-neutral way:
#!/usr/bin/env python from scapy import srp,Ether,ARP,conf import sys def arping(iprange="10.0.1.0/24"): """Arping function takes IP Address or Network, returns nested mac/ip list""" conf.verb=0 ans,unans=srp(Ether(dst="ff:ff:ff:ff:ff:ff")/ARP(pdst=iprange), timeout=2) collection = [] for snd, rcv in ans: result = rcv.sprintf(r"%ARP.psrc% %Ether.src%").split() collection.append(result) return collection if __name__ == '__main__': if len(sys.argv) > 1: for ip in sys.argv[1:]: print "arping", ip print arping(ip) else: print arping()
As you can see, the information contained in the output is quite handy, as it gives us the Mac and IP addresses of everyone on the subnet:
# sudo python scapy_arp.py [['10.0.1.1', '00:00:00:00:00:10'], ['10.0.1.7', '00:00:00:00:00:12'], ['10.0.1.30', '00:00:00:00:00:11'], ['10.0.1.200', '00:00:00:00:00:13']]
From these examples, you should get the impression of how handy Scapy is and how easy it is to use.
3.148.107.254