Chapter 4. Optimizing Website Performance

One of the most popular reasons to migrate to Nginx is striving for better performance. Over the years, Nginx has acquired a certain reputation of being a silver bullet, a speed beast. Sometimes, this reputation may harm the project, but it is definitely earned. In many situations, that is exactly what happens: you add Nginx to a website setup as if it is a concoction ingredient and the website magically becomes faster. We will not explain the basics of how to set up Nginx because you probably know it all pretty well. In this chapter, we are going to delve a little into why this happens and what are the less-known options that will help you squeeze more out of your website.

We will cover these topics in the chapter:

  • How Nginx processes the requests
  • Nginx caching subsystems
  • Optimizing the upstreams
  • Some new Nginx features such as thread pools
  • Other performance issues

The overwhelming majority of all performance problems people have with Nginx-powered websites are actually on the upstreams. We will try to at least mention some of the methods you may use to tackle the challenge of optimizing your upstream application servers, but we will concentrate on the Nginx itself mostly. You will have to understand the inner workings of Nginx and reverse proxying in general, and we are devoting a good part of the chapter to explain the principles implemented in Nginx that let it run around other older web servers in terms of performance.

The bad news is that you probably won't be able to optimize Nginx very much. If you embarked on a project of making your website sufficiently, significantly faster, and started with inserting Nginx between the application and the users, then you have probably already done the most important steps in moving towards your goal. Nginx is extremely optimal in the sense of avoiding doing extra, unneeded work, and that is the core of any optimization.

Still, some of the configuration defaults may be too conservative for the sake of compatibility, and we will try to talk about this.

Why Nginx is so fast?

The question is intentionally formulated in an oversimplified way. This is what you might hear from your boss or client—let us migrate from old technologies to Nginx because it will make our website faster and users happier. The migration process is described in thousands of online articles and even some books, and we will not write about it here. Many of our readers have probably gone down that path several times and know the facts: first, it is usually true that websites get faster and second, no, it is not usually a full migration. You will rarely dispose of Apache completely and plug Nginx in its place. Although this "total conversion" also happens, most of the time you start with inserting Nginx between Apache and the Internet. To understand why this is okay, why this helps at all, and how to move forward from there, read on.

To describe the main conceptual change that is implemented by using Nginx as a reverse proxy we will use, for simplicity, the processing model of Apache 1.x, that is, a very old piece of software written in premultithreading traditions. The latest Apache version, which is 2.x, may use another, slightly more efficient model, which is based on threads instead of processes. But in comparison to Nginx, those two models look very similar, and the older one is easier to understand.

This is a simple diagram of how one HTTP request-response pair is processed:

Why Nginx is so fast?

Here is an explanation of the diagram:

  1. A user's browser opens a connection to your server using TCP.
  2. A web server software that runs on your server and listens to a particular set of TCP ports, accepts the connection, dedicates a part of itself to processing this connection, separates this part, and returns to listening and accepting other incoming connections. In the case of the Apache 1.x model, the separated part is a child process that has been forked beforehand and is waiting in the pool.
  3. There are usually some limits in place on how many concurrent connections may be processed and they are enforced on this step. It is very important to understand that this is the part where scaling happens.
  4. This dedicated part of the web server software reads the actual request URI, interprets it, and finds the relevant file or any other way to generate the response. Maybe that would even be an error message; it doesn't matter. It starts sending this response into the connection.
  5. The user's browser receives the bytes of the response one by one and generates pixels on the user's screen. This is actually a real job and a long one. Data is sent over hundreds of kilometers of wires and optical fiber, emitted into the air as electromagnetic waves and then "condensed" by induction into current again. From the viewpoint of your server, most of your users are on excruciatingly slow networks. The web server is literally feeding those browsers large amounts of data through a straw.

There is nothing that could be done to solve the fifth point. The last mile will always be the slowest link in the chain between your server and the user. Nginx makes a conceptual optimization on step 2 and scales much better this way. Let us explain that at a greater length.

Due to slow client connections, a snapshot of any popular website server software at any particular moment in time looks like this: a couple of requests that are actually processed in the sense that there is some important work being done by the CPU, memory, and disks and then a couple of thousands of requests for which all processing is done, responses are already generated and are very slowly, piece by piece inserted into the narrow connections to the users' browsers. Again, this is a simplified model, but still very adequate to explain what actually happens.

To implement scaling on step 2, the original Apache 1.x uses a mechanism that is very natural for all UNIX-based systems—it forks. There are some optimizations, for example, in the form of having a pool of processes forked beforehand (hence, the "prefork" model), and Apache 2.x may use threads instead of processes (also with pregenerated pools and all), but the idea is the same: scaling is achieved by handling individual requests to a group of some OS-level entities, each of which is able to work on a request and then send the data to the client. The problem is that those entities are rather big; you don't just need a group, but more like a horde of them, and most of the time, they do a very simple thing: they send bytes from a buffer into a TCP connection.

Nginx and other state machine-based servers significantly optimize step 2 by not making big, complex OS-level processes or threads do a simple job while hogging the memory at the same time. This is the essence of why Nginx suddenly makes your website faster—it manages to slowly feed all those thousands of very bandwidth-limited client connections using very little memory, saving on RAM.

An inquisitive reader may ask the question here about why adding Nginx as a reverse proxy without removing Apache still saves memory and speeds up websites. We believe that you already should have all the knowledge to come up with the correct answer for that. We will mention the most important part as a hint: the horde of Apaches is not needed anymore because Apache only does the response generation—the smartest and hardest thing—while offloading the dumb job of pushing bytes to thousands of slow connections. The reverse proxy is acting as a proxy client on behalf of all the users' browsers with the very important distinction: this client is sitting very close to the server and is capable of receiving the bytes of the response lightning fast.

So, the secret sauce to Nginx's performance is not its magical code quality (although it is written very well), but the fact that it saves up on system resources, mostly memory, by not making huge copies of data for each individual request it is processing. Interestingly enough, modern operating systems all have different low-level mechanisms to avoid excessive copying of data. Long gone are times when fork() literally created a whole copy of all code and data. As virtual memory and network subsystems get more and more sophisticated, we may end up with a system where the state machine as a model to code tight event-processing loops won't be needed any more. As of now, they still bring noticeable improvements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.53