Straddling the French-Swiss border is CERN—a particle physics research institute that would seem a good lair for a Bond villain. Luckily, its quest is not world domination but to understand how the universe works. This has always led CERN to generate prodigious amounts of data, challenging physicists and computer scientists just to keep up.
In 1989, the English scientist Tim Berners-Lee first circulated a proposal to help disseminate information within CERN and the research community. He called it the World Wide Web, and soon distilled its design into three simple ideas:
A specification for web clients and servers to interchange requests and responses
A way to uniquely represent a server and a resource on that server
In its simplest usage, a web client (I think Berners-Lee was the first to use the term browser) connected to a web server with HTTP, requested a URL, and received HTML.
He wrote the first web browser and server on a NeXT computer, invented by a short-lived company Steve Jobs founded during his hiatus from Apple Computer. Web awareness really expanded in 1993, when a group of students at the University of Illinois released the Mosaic web browser (for Windows, the Macintosh, and Unix) and NCSA httpd server. When I downloaded these and started building sites, I had no idea that the Web and the Internet would soon become part of everyday life. At the time, the Internet was still officially noncommercial; there were about 500 known web servers in the world. By the end of 1994, the number of web servers had grown to 10,000. The Internet was opened to commercial use, and the authors of Mosaic founded Netscape to write commercial web software. Netscape went public as part of the Internet frenzy that was occurring at the time, and the Web’s explosive growth has never stopped.
Almost every computer language has been used to write web clients and web servers. The dynamic languages Perl, PHP, and Ruby have been especially popular. In this chapter, I’ll show why Python is a particularly good language for web work at every level:
Clients, to access remote sites
Servers, to provide data for websites and web APIs
Web APIs and services, to interchange data in other ways than viewable web pages
And while we’re at it, we’ll build an actual interactive website in the exercises at the end of this chapter.
The low-level network plumbing of the Internet is called Transmission Control Protocol/Internet Protocol, or more commonly, simply TCP/IP (“TCP/IP” goes into more detail about this). It moves bytes among computers, but doesn’t care about what those bytes mean. That’s the job of higher-level protocols—syntax definitions for specific purposes. HTTP is the standard protocol for web data interchange.
The Web is a client-server system. The client makes a request to a server: it opens a TCP/IP connection, sends the URL and other information via HTTP, and receives a response.
The format of the response is also defined by HTTP. It includes the status of the request, and (if the request succeeded) the response’s data and format.
The most well-known web client is a web browser. It can make HTTP requests in a number of ways. You might initiate a request manually by typing a URL into the location bar or clicking on a link in a web page. Very often, the data returned is used to display a website—HTML documents, JavaScript files, CSS files, and images—but it can be any type of data, not just that intended for display.
An important aspect of HTTP is that it’s stateless. Each HTTP connection that you make is independent of all the others. This simplifies basic web operations but complicates others. Here are just a few samples of the challenges:
Remote content that doesn’t change should be saved by the web client and used to avoid downloading from the server again.
A shopping website should remember the contents of your shopping cart.
Sites that require your username and password should remember them while you’re logged in.
Solutions to statelessness include cookies, in which the server sends the client enough specific information to be able to identify it uniquely when the client sends the cookie back.
HTTP is a text-based protocol, so you can actually type it yourself
for web testing. The ancient telnet
program lets you connect to
any server and port and type commands.
Let’s ask everyone’s favorite test site, Google, some basic information about its home page. Type this:
$ telnet www.google.com 80
If there is a web server on port 80 at google.com
(I think that’s a safe bet),
telnet
will print some reassuring information and then display a final blank line that’s your cue to type something else:
Trying 74.125.225.177... Connected to www.google.com. Escape character is '^]'.
Now, type an actual HTTP command for telnet
to send to the Google web server.
The most common HTTP command
(the one your browser uses when you type a URL in
its location bar) is GET
.
This retrieves the contents of the specified resource, such as an HTML file,
and returns it to the client.
For our first test, we’ll use the HTTP command HEAD
, which just retrieves some
basic information about the resource:
HEAD / HTTP/1.1
That HEAD /
sends the HTTP HEAD
verb (command) to get information about
the home page (/
). Add an extra carriage return to send a blank
line so the remote server knows you’re all done and want a response.
You’ll receive a response such as this (we trimmed some of the long lines using … so they wouldn’t stick out of the book):
HTTP/1.1 200 OK Date: Sat, 26 Oct 2013 17:05:17 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: PREF=ID=962a70e9eb3db9d9:FF=0:TM=1382807117:LM=1382807117:S=y... expires=Mon, 26-Oct-2015 17:05:17 GMT; path=/; domain=.google.com Set-Cookie: NID=67=hTvtVC7dZJmZzGktimbwVbNZxPQnaDijCz716B1L56GM9qvsqqeIGb... expires=Sun, 27-Apr-2014 17:05:17 GMT path=/; domain=.google.com; HttpOnly P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts... Server: gws X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Alternate-Protocol: 80:quic Transfer-Encoding: chunked
These are HTTP response headers and their values.
Some, like Date
and Content-Type
, are required.
Others, such as Set-Cookie
, are used to track your activity across multiple
visits (we’ll talk about state management a little later in this chapter).
When you make an HTTP HEAD
request,
you get back only headers. If you had used the HTTP GET
or POST
commands,
you would also receive data from the home page (a mixture of HTML,
CSS, JavaScript,
and whatever else Google decided to throw into its home page).
I don’t want to leave you stranded in telnet. To close telnet, type the following:
q
In Python 2, web client and server modules were a bit scattered. One of the Python 3 goals was to bundle these modules into two packages (remember from Chapter 5 that a package is just a directory containing module files):
http
manages all the client-server HTTP details:
urllib
runs on top of http
:
request
handles the client request
response
handles the server response
parse
cracks the parts of a URL
Let’s use the standard library to get something from a website. The URL in the following example returns information about movies from the IMDB movie database:
>>>
import
urllib.request
as
ur
>>>
from
urllib.parse
import
quote
>>>
import
json
>>>
>>>
title
=
input
(
'Type a movie title: '
)
Type a movie title: eegah
>>>
url
=
'http://www.omdbapi.com/?t=
%s
'
%
quote
(
title
)
>>>
conn
=
ur
.
urlopen
(
url
)
For our example, we typed the movie title
eegah
; as you’ll soon see, there was a
movie with that name.
But first, some web computing details.
This little chunk of Python opened a TCP/IP connection
to the remote quote server,
made an HTTP request, and received an HTTP response.
The response contained more than just the page data
(the movie info).
In the official
documentation, we find that conn
is an HTTPResponse
object with a number of methods and attributes.
One of the most important parts of the response is the HTTP status code:
>>>
(
conn
.
status
)
200
A 200
means that everything was peachy.
There are dozens of HTTP status codes, grouped into five ranges by
their first (hundreds) digit:
The server received the request but has some extra information for the client.
It worked; every success code other than 200 conveys extra details.
The resource moved, so the response returns the new URL to the client.
Some problem from the client side, such as the famous 404 (not found). 418 (I’m a teapot) was an April Fool’s joke.
500 is the generic whoops; you might see a 502 (bad gateway) if there’s some disconnect between a web server and a backend application server.
To get the actual data contents from the web page,
use the read()
method of the conn
variable:
>>>
data
=
conn
.
read
()
>>>
(
data
)
b'{"Title":"Eegah","Year":"1962","Rated":"UNRATED","Released":"01 Apr 1965","Runtime":"90 min","Genre":"Comedy","Director":"Arch Hall Sr.","Writer":"Bob Wehling (screenplay), Arch Hall Sr. (original story)","Actors":"Arch Hall Jr., Marilyn Manning, Richard Kiel, Arch Hall Sr.","Plot":"Teenagers stumble across a prehistoric caveman, who goes on a rampage.","Language":"English","Country":"USA","Awards":"N/A", "Poster":"http://ia.media-imdb.com/images/M/MV5BMTY4MDE3NDQ1MF5BMl5BanB nXkFtZTcwODI3MDQyMQ@@._V1_SX300.jpg","Metascore":"N/A","imdbRating":"2.2", "imdbVotes":"4,387","imdbID":"tt0055946","Type":"movie","Response":"True"}'
That didn’t look like plain text or HTML.
Web servers can send data back to you in any format they like.
The data format is specified by the HTTP response header value with
the name Content-Type
, which we also saw in our google.com example:
>>>
(
conn
.
getheader
(
'Content-Type'
))
application/json; charset=utf-8
That application/json
string is a MIME type,
and it means JSON format, not plain text or HTML.
The MIME type for HTML, which the google.com example sent,
is text/html
.
I’ll show you more MIME types in this chapter.
Now that we know it’s JSON, we can convert it into Python data structures and print the ones we want:
>>>
try
:
...
str_data
=
data
.
decode
(
'utf8'
)
...
js_data
=
json
.
loads
(
str_data
)
...
(
'title:'
,
js_data
[
'Title'
])
...
(
'plot:'
,
js_data
[
'Plot'
])
...
except
:
...
(
'Sorry, no match for'
,
title
)
...
title: Eegah
plot: Teenagers stumble across a prehistoric caveman, who goes on a rampage.
In this example, the returned JSON string was converted
to a Python dictionary, and we printed the two elements
with the string keys Title
and Plot
.
Out of sheer curiosity, what other HTTP headers were sent back to us?
>>>
for
key
,
value
in
conn
.
getheaders
():
...
(
key
,
value
)
...
Date Tue, 09 Feb 2016 02:57:47 GMT Content-Type application/json; charset=utf-8 Content-Length 627 Connection close Set-Cookie __cfduid=dc4315212f945a15f879910e5f92c79651454986667; expires=Wed, 08-Feb-17 02:57:47 GMT; path=/; domain=.omdbapi.com; HttpOnly Cache-Control public, max-age=14400 Expires Tue, 09 Feb 2016 06:57:47 GMT Last-Modified Tue, 09 Feb 2016 01:43:08 GMT Vary Accept-Encoding X-AspNet-Version 4.0.30319 X-Powered-By ASP.NET Access-Control-Allow-Origin * CF-Cache-Status HIT Server cloudflare-nginx CF-RAY 271c4f0fe6da109f-ORD
Remember that telnet
example a little earlier?
Now, our Python library is parsing all those HTTP response
headers and providing them in a dictionary.
Date
and Server
seem straightforward; some of the others, less so.
It’s helpful to know that HTTP has a set of standard headers such as Content-Type
,
and many optional ones.
At the beginning of Chapter 1,
there’s a program that accessed a
Wayback Machine API by using
the standard libraries urllib.request
and json
.
Following that example is a version that uses the
third-party module requests
.
The requests
version is shorter and easier to understand.
For most purposes,
I think web client development with requests
is easier.
You can browse the documentation
(which is pretty good)
for full details.
I’ll show the basics of requests
in this section
and use it throughout this book for
web client tasks.
First, install the requests
library into your Python environment.
From a terminal window (Windows users, type cmd
to make one),
type the following command to make the Python package installer pip
download the latest version of the requests
package and install it:
$ pip install requests
If you have trouble, read Appendix D for details
on how to install and use pip
.
Let’s redo our previous call to the movie service with requests
.
This time,
just for cinematic variety,
we’ll input the name of a different wretched movie
from days gone by:
>>>
import
requests
>>>
import
json
>>>
>>>
url
=
'http://www.omdbapi.com'
>>>
title
=
input
(
'Type a movie title: '
)
Type a movie title: from hell it came
>>>
args
=
{
't'
:
title
}
>>>
resp
=
requests
.
get
(
url
,
params
=
args
)
>>>
resp
<Response [200]>
>>>
js_data
=
resp
.
json
()
>>>
try
:
...
(
'title:'
,
js_data
[
'Title'
])
...
(
'plot:'
,
js_data
[
'Plot'
])
...
except
:
...
(
'Sorry, no match for'
,
title
)
...
title: From Hell It Came plot: A wrongfully accused South Seas prince is executed, and returns as a walking tree stump.
It isn’t that different from using urllib.request.urlopen
,
but I think it’s a little more convenient and less wordy.
Web developers have found Python to be an excellent language for writing web servers and server-side programs. This has led to such a variety of Python-based web frameworks that it can be hard to navigate among them and make choices—not to mention deciding what deserves to go into a book.
A web framework provides features with which you can build websites, so it does more than a simple web (HTTP) server. You’ll see features such as routing (URL to server function), templates (HTM with dynamic inclusions), debugging, and more.
I’m not going to cover all of the frameworks here—just those that I’ve found to be relatively simple to use and suitable for real websites. I’ll also show how to run the dynamic parts of a website with Python and other parts with a traditional web server.
You can run a simple web server by typing just one line of Python:
$ python -m http.server
This implements a bare-bones Python HTTP server. If there are no problems, this will print an initial status message:
Serving HTTP on 0.0.0.0 port 8000 ...
That 0.0.0.0
means any TCP address, so web clients can access it
no matter what address the server has.
There’s more low-level details on TCP and other network plumbing for you to read about in Chapter 11.
You can now request files, with paths relative
to your current directory, and they will be returned.
If you type http://localhost:8000
in your web browser,
you should see a directory listing there,
and the server will print access log lines such as this:
127.0.0.1 - - [20/Feb/2013 22:02:37] "GET / HTTP/1.1" 200 -
localhost
and 127.0.0.1
are TCP synonyms for your local computer,
so this works regardless of whether you’re connected to the Internet.
You can interpret this line as follows:
127.0.0.1
is the client’s IP address
The first "-
" is the remote username, if found
The second "-
" is the login username, if required
[20/Feb/2013 22:02:37]
is the access date and time
"GET / HTTP/1.1"
is the command sent to the web server:
The HTTP method (GET
)
The resource requested (/
, the top)
The HTTP version (HTTP/1.1
)
The final 200
is the HTTP status code returned by the web server
Click any file. If your browser can recognize the format (HTML, PNG, GIF, JPEG, and so on) it should display it, and the server will log the request. For instance, if you have the file oreilly.png in your current directory, a request for http://localhost:8000/oreilly.png should return the image of the unsettling fellow in Figure 7-1, and the log should show something such as this:
127.0.0.1 - - [20/Feb/2013 22:03:48] "GET /oreilly.png HTTP/1.1" 200 -
If you have other files in the same directory on your computer, they should show up in a listing on your display, and you can click any one to download it. If your browser is configured to display that file’s format, you’ll see the results on your screen; otherwise, your browser will ask you if you want to download and save the file.
The default port number used is 8000, but you can specify another:
$ python -m http.server 9999
You should see this:
Serving HTTP on 0.0.0.0 port 9999 ...
This Python-only server is best suited for quick tests. You can stop it by killing its process; in most terminals, press Ctrl+C.
You should not use this basic server for a busy production website. Traditional web servers such as Apache and Nginx are much faster for serving static files. In addition, this simple server has no way to handle dynamic content, which more extensive servers can do by accepting parameters.
All too soon, the allure of serving simple files wears off, and we want a web server that can also run programs dynamically. In the early days of the Web, the Common Gateway Interface (CGI) was designed for clients to make web servers run external programs and return the results. CGI also handled getting input arguments from the client through the server to the external programs. However, the programs were started anew for each client access. This could not scale well, because even small programs have appreciable startup time.
To avoid this startup delay,
people began merging the language interpreter
into the web server.
Apache ran PHP within its mod_php
module,
Perl in mod_perl
,
and Python in mod_python
.
Then, code in these dynamic languages could be executed
within the long-running Apache process itself
rather than in external programs.
An alternative method was to run the dynamic language within a separate long-running program and have it communicate with the web server. FastCGI and SCGI are examples.
Python web development made a leap with the definition of Web Server Gateway Interface (WSGI), a universal API between Python web applications and web servers. All of the Python web frameworks and web servers in the rest of this chapter use WSGI. You don’t normally need to know how WSGI works (there really isn’t much to it), but it helps to know what some of the parts under the hood are called.
Web servers handle the HTTP and WSGI details, but you use web frameworks to actually write the Python code that powers the site. So, we’ll talk about frameworks for a while and then get back to alternative ways of actually serving sites that use them.
If you want to write a website in Python, there are many Python web frameworks (some might say too many). A web framework handles, at a minimum, client requests and server responses. It might provide some or all of these features:
Interpret URLs and find the corresponding server files or Python server code
Merge server-side data into pages of HTML
Handle usernames, passwords, permissions
Maintain transient data storage during a user’s visit to the website
In the coming sections,
we’ll write example code for two frameworks (bottle
and flask
).
Then, we’ll talk about alternatives, especially for database-backed websites.
You can find a Python framework to
power any site that you can think of.
Bottle consists of a single Python file, so it’s very easy to try out, and it’s easy to deploy later. Bottle isn’t part of standard Python, so to install it, type the following command:
$ pip install bottle
Here’s code that will run a test web server and return a line of text when your browser accesses the URL http://localhost:9999/. Save it as bottle1.py:
from bottle import route, run
@route('/')
def home():
return "It isn't fancy, but it's my home page"
run(host='localhost', port=9999)
Bottle uses the route
decorator to associate a URL with the following function;
in this case, /
(the home page) is handled by the home()
function.
Make Python run this server script by typing this:
$ python bottle1.py
You should see this on your browser when you access http://localhost:9999:
It isn't fancy, but it's my home page
The run()
function executes bottle
’s built-in Python test web server.
You don’t need to use this for bottle
programs,
but it’s useful for initial development and testing.
Now, instead of creating text for the home page in code, let’s make a separate HTML file called index.html that contains this line of text:
My <b>new</b> and <i>improved</i> home page!!!
Make bottle
return the contents of this file
when the home page is requested.
Save this script as bottle2.py:
from
bottle
import
route
,
run
,
static_file
@route
(
'/'
)
def
main
():
return
static_file
(
'index.html'
,
root
=
'.'
)
run
(
host
=
'localhost'
,
port
=
9999
)
In the call to static_file()
,
we want the file index.html
in the directory indicated by root
(in this case, '.'
, the current directory).
If your previous server example code was still running, stop it.
Now, run the new server:
$ python bottle2.py
When you ask your browser to get http:/localhost:9999/, you should see:
My
new
and
improved
home page!!!
Let’s add one last example that shows how to pass arguments to a URL and use them. Of course, this will be bottle3.py:
from
bottle
import
route
,
run
,
static_file
@route
(
'/'
)
def
home
():
return
static_file
(
'index.html'
,
root
=
'.'
)
@route
(
'/echo/<thing>'
)
def
echo
(
thing
):
return
"Say hello to my little friend:
%s
!"
%
thing
run
(
host
=
'localhost'
,
port
=
9999
)
We have a new function called echo()
and want to pass it a string argument in a URL.
That’s what the line
@route('/echo/<thing>')
in the preceding example does.
That <thing>
in the route means that
whatever was in the URL after /echo/
is assigned to the string argument thing
,
which is then passed to the echo
function.
To see what happens, stop the old server if it’s still running, and start it with the new code:
$ python bottle3.py
Then, access http://localhost:9999/echo/Mothra in your web browser. You should see the following:
Say hello to my little friend: Mothra!
Now, leave bottle3.py running for a minute so that we can try something else.
You’ve been verifying that these examples work by typing URLs into
your browser and looking at the displayed pages.
You can also use client libraries such as requests
to do your work for you.
Save this as bottle_test.py:
import
requests
resp
=
requests
.
get
(
'http://localhost:9999/echo/Mothra'
)
if
resp
.
status_code
==
200
and
resp
.
text
==
'Say hello to my little friend: Mothra!'
:
(
'It worked! That almost never happens!'
)
else
:
(
'Argh, got this:'
,
resp
.
text
)
Great! Now, run it:
$ python bottle_test.py
You should see this in your terminal:
It worked! That almost never happens!
This is a little example of a unit test. Chapter 8 provides more details on why tests are good and how to write them in Python.
There’s more to bottle
than I’ve shown here.
In particular, you can try adding these arguments when
you call run()
:
debug=True
creates a debugging page if you get an HTTP error;
reloader=True
reloads the page in the browser if you change any of the Python code.
It’s well documented at the developer site.
Bottle is a good initial web framework.
If you need a few more cowbells and whistles,
try Flask.
It started in 2010 as an April Fools’ joke,
but enthusiastic response
encouraged the author, Armin Ronacher,
to make it a real framework.
He named the result Flask
as a wordplay on bottle
.
Flask is about as simple to use as Bottle, but it supports many extensions that are useful in professional web development, such as Facebook authentication and database integration. It’s my personal favorite among Python web frameworks because it balances ease of use with a rich feature set.
The Flask package includes the
werkzeug
WSGI library and the
jinja2
template library.
You can install it from a terminal:
$ pip install flask
Let’s replicate the final bottle
example code in flask
.
First, though, we need to make a few changes:
Flask’s default directory home for static files is
static
, and URLs for files there also begin with /static
. We change the folder to '.'
(current directory)
and the URL prefix to ''
(empty) to allow
the URL /
to map to the file index.html.
In the run()
function, setting debug=True
also
activates the automatic reloader; bottle
used separate arguments for debugging and reloading.
Save this file to flask1.py:
from
flask
import
Flask
app
=
Flask
(
__name__
,
static_folder
=
'.'
,
static_url_path
=
''
)
@app.route
(
'/'
)
def
home
():
return
app
.
send_static_file
(
'index.html'
)
@app.route
(
'/echo/<thing>'
)
def
echo
(
thing
):
return
"Say hello to my little friend:
%s
"
%
thing
app
.
run
(
port
=
9999
,
debug
=
True
)
Then, run the server from a terminal or window:
$
python
flask1
.
py
Test the home page by typing this URL into your browser:
http://localhost:9999/
You should see the following (as you did for bottle
):
My
new
and
improved
home page!!!
Try the /echo
endpoint:
http://localhost:9999/echo/Godzilla
You should see this:
Say hello to my little friend: Godzilla
There’s another benefit to setting debug
to True
when calling run
.
If an exception occurs in the server code,
Flask returns a specially formatted page with
useful details about what went wrong, and where.
Even better, you can type some commands
to see the values of variables in the server program.
Do not set debug = True
in production web servers.
It exposes too much information about your server to potential intruders.
So far, the Flask example just replicates what we did with bottle
.
What can Flask do that bottle
can’t?
Flask includes jinja2
, a more extensive templating system.
Here’s a tiny example of how to use jinja2
and flask
together.
Create a directory called templates
, and a file within it called flask2.html:
<html> <head> <title>Flask2 Example</title> </head> <body> Say hello to my little friend: {{ thing }} </body> </html>
Next, we’ll write the server code to grab this template, fill
in the value of thing that we passed it,
and render it as HTML
(I’m dropping the home()
function here to save space).
Save this as flask2.py:
from
flask
import
Flask
,
render_template
app
=
Flask
(
__name__
)
@app.route
(
'/echo/<thing>'
)
def
echo
(
thing
):
return
render_template
(
'flask2.html'
,
thing
=
thing
)
app
.
run
(
port
=
9999
,
debug
=
True
)
That thing = thing
argument means to pass a variable named thing
to the template, with the value of the string thing
.
Ensure that flask1.py isn’t still running, and start flask2.py:
$ python flask2.py
Now, type this URL:
http://localhost:9999/echo/Gamera
You should see the following:
Say hello to my little friend: Gamera
Let’s modify our template and save it in the templates directory as flask3.html:
<html> <head> <title>Flask3 Example</title> </head> <body> Say hello to my little friend: {{ thing }}. Alas, it just destroyed {{ place }}! </body> </html>
You can pass this second argument to the echo
URL in many ways.
Using this method, you simply extend the URL itself (save this as flask3a.py):
from
flask
import
Flask
,
render_template
app
=
Flask
(
__name__
)
@app.route
(
'/echo/<thing>/<place>'
)
def
echo
(
thing
,
place
):
return
render_template
(
'flask3.html'
,
thing
=
thing
,
place
=
place
)
app
.
run
(
port
=
9999
,
debug
=
True
)
As usual, stop the previous test server script if it’s still running and then try this new one:
$ python flask3a.py
The URL would look like this:
http://localhost:9999/echo/Rodan/McKeesport
And you should see the following:
Say hello to my little friend: Rodan. Alas, it just destroyed McKeesport!
Or, you can provide the arguments as GET
parameters (save this as flask3b.py):
from
flask
import
Flask
,
render_template
,
request
app
=
Flask
(
__name__
)
@app.route
(
'/echo/'
)
def
echo
():
thing
=
request
.
args
.
get
(
'thing'
)
place
=
request
.
args
.
get
(
'place'
)
return
render_template
(
'flask3.html'
,
thing
=
thing
,
place
=
place
)
app
.
run
(
port
=
9999
,
debug
=
True
)
Run the new server script:
$ python flask3b.py
This time, use this URL:
http://localhost:9999/echo?thing=Gorgo&place=Wilmerding
You should get back what you see here:
Say hello to my little friend: Gorgo. Alas, it just destroyed Wilmerding!
When a GET
command is used for a URL, any arguments are passed
in the form &
key1
=val1
&key2
=val2
&...
You can also use the dictionary **
operator to pass multiple arguments to a template
from a single dictionary
(call this flask3c.py):
from
flask
import
Flask
,
render_template
,
request
app
=
Flask
(
__name__
)
@app.route
(
'/echo/'
)
def
echo
():
kwargs
=
{}
kwargs
[
'thing'
]
=
request
.
args
.
get
(
'thing'
)
kwargs
[
'place'
]
=
request
.
args
.
get
(
'place'
)
return
render_template
(
'flask3.html'
,
**
kwargs
)
app
.
run
(
port
=
9999
,
debug
=
True
)
That **kwargs
acts like thing=thing, place=place
.
It saves some typing if there are a lot of input arguments.
The jinja2
templating language does a lot more than this.
If you’ve programmed in PHP,
you’ll see many similarities.
So far, the web servers we’ve used have been simple:
the standard library’s
http.server
or the debugging servers in Bottle and Flask.
In production, you’ll want to run Python with a faster web server.
The usual choices are the following:
apache
with the mod_wsgi
module
nginx
with the uWSGI
app server
Both work well; apache
is probably the most popular, and nginx
has a reputation for stability and lower memory use.
The apache
web server’s best WSGI module is
mod_wsgi
.
This can run Python code within the Apache
process or in separate processes that
communicate with Apache.
You should already have apache
if your system is Linux or OS X.
For Windows, you’ll need to install apache.
Finally, install your preferred
WSGI-based Python web framework.
Let’s try bottle
here.
Almost all of the work involves configuring Apache,
which can be a dark art.
Create this test file and save it as /var/www/test/home.wsgi:
import
bottle
application
=
bottle
.
default_app
()
@bottle.route
(
'/'
)
def
home
():
return
"apache and wsgi, sitting in a tree"
Do not call run()
this time, because that starts the built-in Python web server.
We need to assign to the variable application
because that’s what mod_wsgi
looks for to marry the web server and the Python code.
If apache
and its mod_wsgi
module are working correctly,
we just need to connect them to our Python script.
We want to add one line to the file that defines the default website
for this apache
server, but finding that file is a task in and of itself. It could be
/etc/apache2/httpd.conf, or /etc/apache2/sites-available/default,
or the Latin name of someone’s pet salamander.
Let’s assume for now that you understand apache
and found that file.
Add this line inside the <VirtualHost>
section that governs the default website:
WSGIScriptAlias / /var/www/test/home.wsgi
That section might then look like this:
<VirtualHost *:80> DocumentRoot /var/www WSGIScriptAlias / /var/www/test/home.wsgi <Directory /var/www/test> Order allow,deny Allow from all </Directory> </VirtualHost>
Start apache
, or restart it if it was running to make it use this new configuration.
If you then browse to http://localhost/, you should see:
apache and wsgi, sitting in a tree
This runs mod_wsgi
in embedded mode, as part of apache
itself.
You can also run it in daemon mode:
as one or more processes,
separate from apache
.
To do this, add two new directive lines to your apache
config file:
$ WSGIDaemonProcess domain-name user=user-name group=group-name threads=25 WSGIProcessGroup domain-name
In the preceding example, user-name
and group-name
are the operating system user and group names,
and the domain-name
is the name of your Internet domain.
A minimal apache
config might look like this:
<VirtualHost *:80> DocumentRoot /var/www WSGIScriptAlias / /var/www/test/home.wsgi WSGIDaemonProcess mydomain.com user=myuser group=mygroup threads=25 WSGIProcessGroup mydomain.com <Directory /var/www/test> Order allow,deny Allow from all </Directory> </VirtualHost>
The nginx
web server does not have an embedded Python module.
Instead, it communicates by using a separate WSGI server
such as uWSGI.
Together they make
a very fast and configurable platform
for Python web development.
You can install nginx
from its website. You also need to install
uWSGI. uWSGI is a large system, with many levers and knobs to adjust.
A short documentation page
gives you instructions on how to combine Flask, nginx
, and uWSGI.
Websites and databases are like peanut butter and jelly—you see them together a lot.
The smaller frameworks such as bottle
and flask
do not include
direct support for databases, although some of their contributed
add-ons do.
If you need to crank out database-backed websites, and the database design doesn’t change very often, it might be worth the effort to try one of the larger Python web frameworks. The current main contenders include:
django
This is the most popular,
especially for large sites.
It’s worth learning for many reasons,
among them the frequent requests
for django
experience in Python job ads.
It includes ORM code
(we talked about ORMs in “The Object-Relational Mapper”)
to create automatic web pages for
the typical database CRUD functions
(create, replace, update, delete) that I discussed in “SQL”.
You don’t have to use django
’s ORM if you
prefer another, such as SQLAlchemy,
or direct SQL queries.
web2py
This covers much the same ground as django
,
with a different style.
pyramid
This grew from the earlier pylons
project,
and is similar to django
in scope.
turbogears
This framework supports an ORM, many databases, and multiple template languages.
wheezy.web
This is a newer framework optimized for performance. It was faster than the others in a recent test.
You can compare the frameworks by viewing this online table.
If you want to build a website backed by a relational database,
you don’t necessarily need one of these larger frameworks.
You can use bottle
, flask
, and others directly with relational database modules,
or use SQLAlchemy to help gloss over the differences.
Then, you’re writing generic SQL instead of specific ORM code,
and more developers know SQL than any particular ORM’s syntax.
Also, there’s nothing written in stone demanding that your database must be a relational one. If your data schema varies significantly—columns that differ markedly across rows—it might be worthwhile to consider a schemaless database, such as one of the NoSQL databases discussed in “NoSQL Data Stores”. I once worked on a website that initially stored its data in a NoSQL database, switched to a relational one, on to another relational one, to a different NoSQL one, and then finally back to one of the relational ones.
Following are some of the independent Python-based WSGI servers that work like apache
or nginx
,
using multiple processes and/or threads
(see “Concurrency”)
to handle simultaneous requests:
Here are some event-based servers, which use a single process but avoid blocking on any single request:
I have more to say about events in the discussion about concurrency in Chapter 11.
We’ve just looked at traditional web client and server applications, consuming and generating HTML pages. Yet the Web has turned out to be a powerful way to glue applications and data in many more formats than HTML.
Let’s start begin a little surprise. Start a Python session in a terminal window and type the following:
>>>
import
antigravity
This secretly calls the standard library’s
webbrowser
module
and directs your browser to an enlightening Python link.1
You can use this module directly. This program loads the main Python site’s page in your browser:
>>>
import
webbrowser
>>>
url
=
'http://www.python.org/'
>>>
webbrowser
.
open
(
url
)
True
This opens it in a new window:
>>>
webbrowser
.
open_new
(
url
)
True
And this opens it in a new tab, if your browser supports tabs:
>>>
webbrowser
.
open_new_tab
(
'http://www.python.org/'
)
True
The webbrowser
makes your browser do all the work.
Often, data is only available within web pages. If you want to access it, you need to access the pages through a web browser and read it. If the authors of the website made any changes since the last time you visited, the location and style of the data might have changed.
Instead of publishing web pages, you can provide data through a web application programming interface (API). Clients access your service by making requests to URLs and getting back responses containing status and data. Instead of HTML pages, the data is in formats that are easier for programs to consume, such as JSON or XML (refer to Chapter 8 for more about these formats).
Representational State Transfer (REST) was defined by Roy Fielding in his doctoral thesis. Many products claim to have a REST interface or a RESTful interface. In practice, this often only means that they have a web interface—definitions of URLs to access a web service.
A RESTful service uses the HTTP verbs in specific ways, as is described here:
HEAD
Gets information about the resource, but not its data.
GET
As its name implies, GET
retrieves the resource’s data from the server.
This is the standard method used by your browser.
Any time you see a URL with a question mark (?
) followed by a bunch of arguments,
that’s a GET
request.
GET
should not be used to create, change, or delete data.
POST
This verb updates data on the server. It’s often used by HTML forms and web APIs.
PUT
This verb creates a new resource.
DELETE
This one speaks for itself: DELETE
deletes. Truth in advertising!
A RESTful client can also request one or more content types from the server by using HTTP request headers. For example, a complex service with a REST interface might prefer its input and output to be JSON strings.
Sometimes, you might want a little bit of information—a movie rating, stock price, or product availability—but the information is available only in HTML pages, surrounded by ads and extraneous content.
You could extract what you’re looking for manually by doing the following:
Type the URL into your browser.
Wait for the remote page to load.
Look through the displayed page for the information you want.
Write it down somewhere.
Possibly repeat the process for related URLs.
However, it’s much more satisfying to automate some or all of these steps. An automated web fetcher is called a crawler or spider (unappealing terms to arachnophobes). After the contents have been retrieved from the remote web servers, a scraper parses it to find the needle in the haystack.
If you need an industrial-strength combined crawler and scraper, Scrapy is worth downloading:
$ pip install scrapy
Scrapy is a framework, not a module such as BeautifulSoup
.
It does more, but it’s more complex to set up.
To learn more about Scrapy, read the documentation
or the online introduction.
If you already have the HTML data from a website and just want to extract
data from it,
BeautifulSoup
is a good choice.
HTML parsing is harder than it sounds. This is because much of the HTML
on public web pages is technically invalid:
unclosed tags,
incorrect nesting,
and other complications.
If you try to write your own
HTML parser by using regular expressions
(discussed in Chapter 7)
you’ll soon encounter these messes.
To install BeautifulSoup
, type the following command (don’t forget the final 4
, or pip
will
try to install an older version and probably fail):
$ pip install beautifulsoup4
Now, let’s use it to get all the links from a web page.
The HTML a
element represents a link,
and href
is its attribute representing the link destination.
In the following example, we’ll define the function get_links()
to do the grunt work,
and a main program to get one or more URLs as command-line arguments:
def
get_links
(
url
):
import
requests
from
bs4
import
BeautifulSoup
as
soup
result
=
requests
.
get
(
url
)
page
=
result
.
text
doc
=
soup
(
page
)
links
=
[
element
.
get
(
'href'
)
for
element
in
doc
.
find_all
(
'a'
)]
return
links
if
__name__
==
'__main__'
:
import
sys
for
url
in
sys
.
argv
[
1
:]:
(
'Links in'
,
url
)
for
num
,
link
in
enumerate
(
get_links
(
url
),
start
=
1
):
(
num
,
link
)
()
I saved this program as links.py and then ran this command:
$ python links.py http://boingboing.net
Here are the first few lines that it printed:
Links in http://boingboing.net/ 1 http://boingboing.net/suggest.html 2 http://boingboing.net/category/feature/ 3 http://boingboing.net/category/review/ 4 http://boingboing.net/category/podcasts 5 http://boingboing.net/category/video/ 6 http://bbs.boingboing.net/ 7 javascript:void(0) 8 http://shop.boingboing.net/ 9 http://boingboing.net/about 10 http://boingboing.net/contact
9.1. If you haven’t installed flask
yet, do so now. This will also install
werkzeug
, jinja2
, and possibly other packages.
9.2. Build a skeleton website, using Flask’s debug/reload development web server.
Ensure that the server starts up for hostname localhost
on default port 5000
.
If your computer is already using port 5000 for something else, use another port number.
9.3. Add a home()
function to handle requests for the home page. Set it up to return the string
It's alive!
.
9.4. Create a Jinja2 template file called home.html
with the following contents:
<html> <head> <title>It's alive!</title> <body> I'm of course referring to {{thing}}, which is {{height}} feet tall and {{color}}. </body> </html>
9.5. Modify your server’s home()
function to use the home.html template.
Provide it with three GET
parameters: thing
, height
, and color
.
3.129.211.166