Solving problems with cache

We spent a lot of time on providing good caching, that is, saving intermediate results and serving saved copies instead of recalculating from scratch for the same requests. This works perfectly only in a perfect world (for example, a pure functional world where functions and, by extension, GET/HEAD HTTP requests do not have side effects). In the real world, two equal requests may sometimes lead to different responses. There are two basic reasons for it: the earlier-mentioned side effects, which change the state despite the perceived idempotence of GET/HEAD, or flawed equality relationship between requests. A good example of this is ignoring wall time when the response depends on it.

Such problems usually manifest themselves as complaints about seeing stale versions of some pages on your website or seeing pages that belong to other users. Although you can tolerate the first type to some extent (for example, as a compromise for performance), the second type is a major offense and a blocker for the operation of your business.

The hunt for the misbehaving caches is a process that involves the same two sides that we discussed in the previous chapter. The caching may happen both inside Nginx as the effect of the caching upstream directives and on the side that is closer to the client—either the very browser that initiated the request or one of the intermediate caching proxies. The effect of the client-side caches is usually smaller these days, so it is safer to start switching it off first. You need to have this directive in all scopes:

expires -1;

Any negative value will work. This instructs Nginx to emit Cache-Control: no-cache HTTP response header alongside the content. It will effectively break client-side caching with a couple of caveats. First, we do not have direct control of those caches, of course, and they are free to comply with the standards of the modern web at will. For example, they may be configured to ignore no-cache in an ill-advised attempt to save on traffic. The authors personally debugged a couple of cases of such overzealous frugality, and it was a nightmare. And, second, even fully compliant caches may lag because to receive the no-cache instruction they need to reach the origin server while actively trying to avoid that, which is the whole point of caching.

The second step in this troubleshooting process is switching off the caching inside Nginx upstream caches. As was explained in the previous chapter, each Nginx upstream has a family of directives that configure caching for this particular upstream connection. The main switch for the whole mechanism is the *_cache directive. In the case of ngx_fastcgi upstream, the directive looks like this:

fastcgi_cache zone;

Here, the zone is an identifier of the so-called cache zone, which is basically a collection of caching configuration or caching profile. To switch caching off, you will use the fixed zone name off.

It will take immediate effect (the common cycle of nginx -t and then sudo service nginx reload, or analog for your distribution should be second nature by this time), but it may also devastate your actual application upstream by significantly increasing the incoming request rate. Be aware of that. You may take smaller steps in troubleshooting the cache by using the *_cache_bypass or *_cache_valid directives in a smart way. The first one provides a way to skip caching some of the responses altogether, and the second is a quick-and-dirty way to limit the age of the entries in the cache.

The *_cache_valid directive does not override the expiration parameters set via HTTP response headers from the upstream application. So for it to be effective, you will also need to remove those headers with a *_ignore_headers directive first.

Again, the asterisk here means the actual upstream type; in the case of FastCGI upstream you will use fastcgi_cache_valid and fastcgi_ignore_headers directives. The simple example will look like this:

fastcgi_ignore_headers "X-Accel-Expires" "Expires" "Cache-Control";
fastcgi_cache_valid 1m;

It will force caching all the responses for 1 minute. Unfortunately, it will also cache the responses that the upstream does not intend to be cached because Nginx will also ignore Cache-Control: no-cache in this configuration. Be careful not to leave your troubleshooting session in production.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.38.14