Hosting Python web applications

As we discussed at the beginning of this chapter, in order to run a Python web application, we need a web server to host it. There are many web servers in existence today, and you will very likely have heard of several. Popular examples are Apache, nginx (pronounced engine-x), lhttpd (pronounced lighty), and Microsoft's Internet Information Services (IIS).

There is a lot of terminology around web servers and various mechanisms they can use to invoke Python web applications. We're going to take a very brief tour of the history of web applications to help explain some of these concepts.

CGI

In the early days of the Web, web servers would mostly only be required to send clients HTML pages, or the occasional image file. As in the earlier figure of a HTTP request journey, these static resources would live on the hard disk of the server, and the web server's main task would be to accept socket connections from clients, map the URL of a request to a local file, and send the file back over the socket as an HTTP response.

However, with the rise of the need for dynamic content, web servers were given the ability to generate pages by invoking external programs and scripts, which we today call web applications. Web applications originally took the form of scripts or compiled executables that lived on disk next to the regular static content as part of the published web tree. The web server would be configured so that when a client requested these web application files, instead of just reading the file and returning it, the web server would launch a new operating system process and execute the file, returning the result as the requested HTML web page.

If we update our HTTP request's journey from our earlier image, our request's journey would now look something like this:

CGI

There obviously needs to be some kind of protocol for the web server and the web application to pass the HTTP request and the returned HTML page between them. The earliest mechanism for this was called the Common Gateway Interface (CGI). The web server would decompose the request into environment variables, which it would add to the environment of the handler program when it was invoked, and pass the body of the request, if there was one, to the program via its standard input. The program would then simply pipe the HTTP response it generated to its standard output, which the web server would catch and return to the client.

Due to performance issues however, CGI is slowly falling out of favor these days, and writing a Python CGI application is something that should be avoided if at all possible.

Recycling for a better world

CGI works, but the major drawback is that a new process has to be launched for each request. Launching processes is expensive in terms of operating system resources, and so this approach is very inefficient. Alternatives have been developed.

Two approaches became common. The first was to make web servers launch and maintain multiple processes at startup, ready to accept new connections— a technique known as pre-forking. With this technique, there is still a one-process-per- client relationship, but the processes are already created when a new client connects, improving response time. Also the processes can be reused instead of being re-created anew with each connection.

Alongside this, web servers were made extensible and bindings were created to different languages so that the web application could be embedded within the web server processes themselves. The most commonly seen examples of these are the various language modules for the Apache web server for languages such as PHP and Perl.

With pre-forking and web application embedding, our request's journey might look like this:

Recycling for a better world

Here, the request is transformed by the language binding code, and the request our web application sees depends on the design of the binding itself. This approach to managing a web application works fairly well for general web loads, and remains a popular way to host web applications today. Modern browsers usually also offer multithreaded variants, where each process can handle requests using multiple threads, one for each client connection, further improving efficiency.

The second approach to solving CGI's performance problems was to hand off the management of the web application processes completely to a separate system. The separate system would pre-fork and maintain a pool of processes running the web application code. Like web server pre-forking, these could be reused for each client connection. New protocols were developed to allow the web server to pass requests to the external processes, the most notable being FastCGI and SCGI. In this situation, our journey would be:

Recycling for a better world

Again, how the request is transformed and presented to the web application depends on the protocol used.

Although in practice this is somewhat more complex to configure, it has advantages over embedding a copy of the application code in pre-forked web server processes. Primarily, the web application process pool can be managed independently of the web server process pool, allowing more efficient tuning of both.

Event-driven servers

Web client numbers continued to grow though, and the need arose for servers to be able to handle very large numbers of simultaneous client connections, numbers that proved problematic using the multiprocessing approaches. This spurred the development of event-driven web servers, such as nginx and lighttpd, which can handle many thousands of simultaneous connections in a single process. These servers also leverage preforking, maintaining a number of event-driven processes in line with the number of CPU cores in a machine, and hence making sure the server's resources are fully utilized while also receiving the benefits of the event-driven architecture.

WSGI

Python web applications were originally written against these early integration protocols: CGI, FastCGI, and a now mostly defunct mod_python Apache module. This proved troublesome though since Python web applications were tied to the protocol or server they had been written for. Moving them to a different server or protocol required some reworking of the application code.

This problem was solved with PEP 333, which defined the Web Services Gateway Interface (WSGI) protocol. This established a common calling convention for web servers to invoke web application code, similar to CGI. When web servers and web applications both support WSGI, servers and applications can be exchanged with ease. WSGI support has been added to many modern web servers and is nowadays the main method of hosting Python applications on the Web. It was updated for Python 3 in PEP 3333.

Many of the web frameworks we discussed earlier support WSGI behind the scenes to communicate with their hosting web servers, Flask and Django included. This is another big benefit to using such a framework— you get full WSGI compatibility for free.

There are two ways a web server can use WSGI to host a web application. Firstly it can directly support hosting WSGI applications. Pure Python servers such as Gunicorn follow this approach, and they make serving Python web applications very easy. This is becoming a very popular way to host Python web applications.

The second approach is for a non-Python server to use an adapter plugin, such as Apache's mod_wsgi, or the mod_wsgi plugin for nginx.

The exception to the WSGI revolution is event-driven servers. WSGI doesn't include a mechanism to allow a web application to pass control back to the calling process, hence there is no benefit to using an event-driven server with a blocking-IO style WSGI web application because as soon as the application blocks, for example, for database access, it will block the whole web server process.

Hence, most event-driven frameworks include a production-ready web server—making the web application itself event-driven and embedding it in the web server process is really the only way to host it. To host web applications with these frameworks, check out the framework's documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.228.246