Optimizing individual upstreams

You may remember from previous chapters that Nginx has two main methods of generating a response to a request, one being very specific—reading a static file from the filesystem, and the other including a whole family of the so-called upstream modules. An upstream is an external server to which Nginx proxies the request. The most popular upstream is ngx_proxy, others are ngx_fastcgi, ngx_memcached, ngx_scgi, and so on. Because serving only static files is not usually enough for a modern website, upstreams are an essential part of any comprehensive setup. As we mentioned in the beginning of this chapter, upstreams themselves are usually the reason why your website has performance troubles. Your developers are responsible for this part because this is where all the web application processing happens. In the following sections, we are going to briefly describe the major stacks or platforms used to implement business logic on the upstream behind Nginx and the directions you should at least look in for clues about what to optimize.

Optimizing static files

Any web application will contain static resources that do not change and do not depend on the user currently using the application. Those are usually known as static files in webmaster parlance and consist of all the static images, CSS, JavaScript, and some other extra data, for example, cross-domain.xml files that are used by access control policies of the browsers. Serving the data directly from the application is usually supported to facilitate simple setups without any frontend, intermediate, accelerating server such as Nginx. Nginx's built-in HTTP proxy will happily serve them, and in the case of local caching, may even do that without any noticeable performance loss. However, such a setup is not recommended as a long-term solution if you strive for maximum performance.

One universal step that we feel the need to recommend (or remind of) is moving as much of the static data from the upstream under the control of Nginx. It will make your application more fragmented, but it will also be a very good performance optimization method trumping many of other potential and much harder to implement methods. If your upstreams serve static files, then you need to make them available as files to Nginx and serve them directly. This might be the first thing you do when you receive a new legacy upstream to optimize. It is also a very easy task to accomplish yourself or implement as a part of the whole deployment process.

Optimizing PHP backends

For several years, the modern way to run PHP applications behind Nginx front is the PHP-FPM or FastCGI Process Manager. As you may guess, it uses the FastCGI protocol and will require FastCGI upstream module in Nginx. However, when dealing with inherited legacy PHP websites, you may still meet the older ways of running the code, which will be your first candidates for optimizations.

There is the official Apache way using the mod_php Apache module. This module embeds PHP interpreter directly into (each and every!) Apache process. Most of the time, you will inherit Apache websites configured to run in this way. The main good side of embedded interpreters is well known—the code may be saved in some intermediate form between requests and not reinterpreted every time. The mod_php Apache module does that wonderfully and some people call it the single reason why PHP gained popularity on the Web in the first place. Well, the way to deal with mod_php in 2016 is getting rid of it, together with Apache.

Many PHP codebases can be moved from mod_php to PHP-FPM almost effortlessly. After this, you will change your main Nginx upstream from HTTP proxying to directly speaking FastCGI protocol with your PHP scripts that are kept running and ready by the FPM.

Sometimes, your developers will need to invest some resources into mostly restructuring and refactoring code to be runnable in a separate process without any help from Apache. One particularly difficult case is a code that relies heavily on calling into the Apache internals. Fortunately, this is not nearly as common in PHP codebases as it was in the mod_perl codebases. I will mention dealing with Perl-based websites later.

Another really old (and odd) way to run PHP is CGI scripts. Each and every web administrator did or will write a fair amount of temporary CGI scripts. You know, the kind of temporary scripts that live on and on through generations of hardware, managers, and business models. They rarely power parts of production that are user-oriented. Anyway, CGI was not popular with PHP at all because of the ubiquity and rather good quality of mod_php and Apache. Nevertheless, you may have some in your legacy, especially if that code had or has some chances to run on Windows.

CGI scripts are executed as separate processes for each request/response pair and therefore are prohibitively expensive. The only upsides of using CGI are increased compatibility with other Apache modules and another degree of privilege separation. Those are trumped by the performance compromises in all but the most exotic scenarios. By the way, Nginx will make a CGI-powered portion of your website significantly better by buffering the output and releasing the resources on the backend. You still have to plan the rewrite of those parts to be run as FastCGI under FPM as soon as possible.

PHP-FPM uses the same prefork model as does Apache 1.x and that renders some of the familiar knobs under your control. For example, you may configure the number of working processes FPM starts, the upper limit of the requests that may be processed by one child, and also the size of the available child processes pool. All those parameters may be set via the php-fpm.conf file, which is usually installed directly in /etc and following a good convention includes /etc/php-fpm.d/*.conf.

Java backends

The Java ecosystem is so huge that there is a whole bookshelf devoted solely to different Java web servers. We cannot delve deeper into such a topic. If you as an administrator have never had any experience with Java web applications, you will be happy to know that most of the time, those apps run their own web servers that do not depend on Apache. This is a list of popular Java web servers that you may encounter inside your upstreams: Apache Tomcat, Jetty, and Jboss/WildFly. Java applications are usually built on top of huge and comprehensive frameworks that employ a web server as one of the components. Your Nginx web accelerator will talk to the Java upstream via normal HTTP protocol using the ngx_proxy module. All the usual ngx_proxy optimizations apply, therefore. See a note on caching later in this chapter for examples.

There is little you can do to make a Java application perform better without getting your hands dirty deep inside the code. Some of the steps available from the level of system administration are:

  1. Choosing the right JVM. Many Java web servers support several different Java Virtual Machine implementations. The HotSpot JVM from Oracle (Sun) is considered one of the best, and you will probably start with that. But there are others; some of them are commercially available, for example, Azul Zing VM. They might provide you with a little performance boost. Unfortunately, changing JVM vendor is a huge step prone to incompatibility errors.
  2. Tuning threading parameters. Java code is traditionally written using threads that are a native and natural feature of the language. JVMs are free to implement threads using whatever resources they have. Usually, you will have a choice of using either operating system-level threads or the so-called "green threads," which are implemented in userland. Both approaches have advantages and disadvantages. Threads are usually grouped into pools, which are preforked in a fashion that is very similar to what Apache 1.x does with processes. There are a number of models that thread pools use to optimize both memory and performance, and you, as administrator, will be able to tune them up.

Optimizing Python and Ruby backends

Python and Ruby both built their strength as more open and clear alternatives to Perl and PHP in the age when web applications were already one of the dominant way to deploy business logic. They both started late and with a clear goal of being very web-friendly. There were both the mod_python and mod_ruby Apache modules that embedded interpreters into the Apache web server processes, but they quickly went out of fashion. The Python community developed the Web Server Gateway Interface (WSGI) protocol as a generic interface to write web applications regardless of deployment options. This allowed free innovation in the actual web server space that mostly converged on a couple of standalone WSGI servers or containers (such as gunicorn and uWSGI) and mod_wsgi Apache module. They all may be used to run a Python web application without changing any code.

So, it was very natural that Nginx developed its own WSGI upstream module, ngx_wsgi, which you should use to replace any other WSGI implementation. The actual migration path may be a little bit more complex. If the backend application used to run under Apache + mod_wsgi, then, by all means, switch to ngx_wsgi immediately and ditch Apache. Otherwise, for the sake of smoothness and stability, you may start with a simpler ngx_proxy configuration and then move to ngx_wsgi.

You may also encounter an application that uses long-polling (named Comet sometimes) and WebSockets, and runs on a special web server, for example, Tornado (of the FriendFeed fame). These are problems mostly because synchronous communication between the web server and the clients defeats the main advantage of Nginx as an accelerating reverse proxy—the part of the server that processes a request won't be made available quickly for another request by handling the byte pushing to the Nginx frontend. Modern Nginx supports proxying both Comet requests and Web Sockets, of course, but without any acceleration that you may have gotten used to.

The Ruby ecosystem went a slightly different way because there was (and still is) a so-called killer app for Ruby, that is, the Ruby on Rails web application framework. Most of the Ruby web applications are built on Ruby on Rails, and there was even a joke that it is high time to rename the whole language Ruby on Rails because nobody uses Ruby without those Rails. It is a wonderfully designed and executed web framework with many revolutionary ideas that inspired a whole wave of rapid application development techniques throughout the industry. It also decoupled the application developers from the problems of deploying their work by providing the web server that could be shared on the Internet right away.

The current Ruby on Rails preferred deployment options are either using Phusion Passenger or running a cluster of Unicorn web servers. Both options are fine for your task of migrating to Nginx. Phusion Passenger is a mature example of providing its own in-process code as it contains modules for both Apache and Nginx web servers. So, if you are lucky, you will switch from one to the other effortlessly. Passenger will still run worker processes outside of your main Nginx workers, but the module allows Nginx to communicate freely. It is a good example of a custom upstream module. See https://www.phusionpassenger.com/library/deploy/nginx/deploy/ruby/ Passenger guide for the actual instructions. Passenger may also run in the standalone mode exposing HTTP to the world. That is also the way Unicorn deploys Ruby applications. You know the way to deal with that—the universal helper ngx_proxy.

Optimizing Perl backends

Perl was the first widely used server-side programming language for the Web. We may say that it is Perl that brought the notion of dynamically generated web pages to popularity and paved the way for the web applications galore we experience today. There are still plenty of Perl-powered web businesses of various sizes, from the behemoths such as https://www.booking.com to smaller, feisty, ambitious startups such as DuckDuckGo. You might also have seen a couple of MovableType-powered blogs. This is a professional blogging platform developed by SixApart and then resold several times.

Perl is also the most popular language to write CGI scripts, and that is also the single reason why it is considered slow. CGI is a simple interface to run external programs from inside a web server. It is rather inefficient because it usually involves forking an operating system-level process and then shutting it down after a single request. This model plus the interpreting nature of Perl means that Perl CGI scripts are so suboptimal that they are used as a model of inefficient web development platforms.

If you have a user-facing, dynamic web page generated by a CGI script run from Apache, you have to get rid of it. See below for details.

There are a number of more advanced ways to run Perl code in production. Partly inspired by the mod_php success, there is a long-running project named mod_perl, which is an Apache module embedding the Perl interpreter into Apache processes. It is also highly successful because it is stable and robust, and powers a lot of heavily loaded websites. Alas, it is also rather complex, both for the developer and the administrator. Another difference from the mod_php Apache module is that mod_perl failed to provide strong separation of environments, which is vital for the virtual hosting businesses.

Anyway, if you have inherited a website based on mod_perl, you have several options. First, there might be a cheap way to move to the PSGI or FastCGI models that will allow you to get rid of Apache. The module Apache::Registry,which emulates a CGI environment inside mod_perl, may be a great sign of such situation. Second, the code may be written in a way that couples it tightly with Apache. The mod_perl module provides an interface to hook deeply into Apache's internals, which while providing several interesting capabilities for the developer, also makes it much harder to migrate. The developers will have to investigate the methods used in the software and make a final decision. They may decide to leave Apache + mod_perl alone and continue to use it as a heavy and over-capable process manager.

Moving CGI to mod_perl nowadays is never a good way forward, we do not recommend it.

There are a number of FastCGI managers for Perl that are similar to PHP-FPM described earlier. They all are very lucky options for you as the Nginx administrator because most of the time the migration will be smooth and easy.

One of the interesting recent modes to run Perl code in web servers is the so-called Perl Server Gateway Interface (PSGI). It is more or less a direct port of Rack architecture from the Ruby stack to Perl. It is interesting that PSGI was invented and implemented in the world where Nginx was already popular. Therefore, if you have a web application that uses PSGI, it was most probably tested and run behind Nginx. No need to port anything. PSGI might be the most important target architecture to upgrade CGI or the mod_perl applications.

Bigger Perl web frameworks usually have a number of ways to run the applications. For example, both Dancer and the older Catalyst provide the glue scripts to run the same application as a separate web server (which you might expose to the world with the help of the Nginx ngx_proxy upstream), as a mod_perl application or even as a CGI script. Not all of those methods are suitable for production, but they will definitely help in migration. Never accept "we should rewrite everything from scratch" as a recommendation from the developers before weighing other options. If the application was written during the last 3–4 years, it should definitely have PSGI implemented directly or via its framework.

PSGI applications are run with the help of special PSGI servers, such as Starman or Starlet, that speak simple HTTP to the outside world. Nginx will use the ngx_proxy upstream for such applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.226