Chapter 10. Apache Proxy and Caching Support

Understanding the Need for Caching and Proxies

HTTP is a very simple, yet powerful, protocol. This chapter explains how to take advantage of caching and proxying features that allow you to implement scalable, flexible architectures. Caching allows you to simultaneously reduce the load on your servers and allow faster access to your site by quickly returning frequently requested content. Proxying allows you to create a control point for HTTP requests, which may be used to unify content from various backend servers as well as improving performance.

Understanding Forward and Reverse Proxies

There are different kinds of web proxies. A traditional HTTP proxy, also called a forward proxy, accepts requests from clients (usually web browsers), contacts the remote server, and returns the responses.

A reverse proxy is a web server that is placed in front of other servers, providing a unified frontend and acting as a gateway. As far as the web browsers are concerned, the reverse proxy is the “real” server, as that is the only one they interact with. The reverse proxy relays requests as necessary to the backend servers.

Differences Between Apache 1.3, 2.0, and 2.2

In Apache 1.3, caching support was part of mod_proxy and could not be used separately. In 2.0, the functionality was split into two separate modules, although the resulting code was considered experimental. This changed in 2.1 and 2.2, where the functionality is finally considered mature.

Enabling mod_proxy Support

Apache 1.3

--enable-module=proxy

Apache 2

--enable-proxy
--enable-proxy-connect
--enable-proxy-ftp
--enable-proxy-http
--enable-proxy-balancer (apache 2.1 and later)
--enable-proxy-ajp (apache 2.1 and later)

To enable proxy support in Apache, you need to enable the main proxy module and some or all of the three supported backends: HTTP, CONNECT, and FTP. The CONNECT option allows SSL connections to pass untouched via the proxy. The FTP backend allows the proxy server to act as a gateway to access remote FTP servers via a normal HTTP browser. Apache 2.1 and later versions include two additional proxy modules: balancer, which provides load-balancing support, and ajp, which provides support for the AJP protocol, commonly used to communicate with Tomcat and other servlet engines.

Enabling Forward Proxy Support

ProxyRequests on
<Proxy *>
Order deny,allow
Deny from all
Allow from 10.0.0.0/255.255.255.0
</Proxy>

Forward proxies were popular in the early days of the Internet, as they allowed several machines to easily share a connection to the outside world. Most proxy servers also include caching features, which proves useful when sharing slow connections, as well as offering isolation from the outside world. Fast connections and built-in NAT (Network Address Translation) in most gateway devices have significantly reduced the need for forward proxies. Nowadays, they are most commonly implemented when organizations need to control their employee’s browsing, using the proxy to log, filter, and authorize access to websites. This is starting to change, and as spyware and viruses become more common, organizations are implementing filtering proxies that remove these threats before they arrive to the user’s desktop. Proxies have thus found new life in the wireless network world as gateways.

You can enable forward proxy functionality using ProxyRequests On, as shown in the example. It is a good idea to restrict proxy support to only authorized clients, for the reasons explained in Chapter 6, “Security and Access Control.” You can do so using the <Proxy> container directive. The example shows how to restrict proxy access to a specific network space.

Using a Reverse Proxy to Unify Your URL Space

ProxyPass /crm http://crm.example.com/
ProxyPass /bugzilla
     http://backend.example.com/bugzilla

A reverse proxy can provide a unified frontend to a number of backend resources, associating certain URLs on the frontend machine to specific backend web servers. For example, you may have one server running a CRM application and another one running a bug tracking tool. Whenever your users need to use one application, they need to type a different address. You could integrate these services with your main site using ProxyPass, as shown in the example.

Now, when the reverse proxy machine receives a request for http://www.example.com/crm/login/index.html, it will request http://crm.example.com/login/index.html from the backend server and return the document to the browser.

The ProxyPass directive can be used standalone or inside a <Location> container, as in the following example:

<Location /crm>
ProxyPass http://crm.example.com/
</Location>

Finally, you probably want to use ProxyPass together with ProxyPassReverse, described in the following section.

Hiding the Backend Servers

ProxyPass /crm http://crm.example.com
ProxyPassReverse /crm http://crm.example.com
ProxyErrorOverride On

During the process described in the previous section, the client has only contacted the reverse proxy server and is unaware of the existence of the backend servers. Sometimes, however, the backend server will issue redirects or error pages that contain references to itself, for example in the Location: header.

The ProxyPassReverse directive will intercept these headers and rewrite them so that they include a reference to the reverse proxy (www.example.com) instead. The ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain directives operate similarly, but on the path and domain strings in Set-Cookie: headers.

Additionally, ProxyErrorOverride, which is only available in Apache 2, will allow you to display the error pages of the proxy server, replacing the error pages received from the backend. This enables you to further hide the existence of that backend server and provide a consistent frontend, even for error messages.

Note

Note that the ProxyPassReverse directive operates only at the HTTP header level. It will not inspect or rewrite links inside HTML documents. For that purpose, you can use mod_proxy_html, an Apache 2 module that allows you to parse the documents being served through the proxy and rewrite the HTML on the fly. You can download it from http://apache.webthing.com/mod_proxy_html/.

Preventing URLs from Being Reverse Proxied

ProxyPass /images/ !
ProxyPass / http://crm.example.com

It is possible to prevent certain URLs from not being proxied by specifying an exclamation mark (!) as the remote site URL in ProxyPass directives. It is important that those directives be placed before other ProxyPass directives. For example, this configuration will pass all requests to a backend site, except requests for images, which will be served locally.

Improving Performance

ProxyIOBufferSize 1024000

Reverse proxies can also be useful when you have complex, overloaded web and application servers. Slow clients over modem lines, buggy browsers, and big multimedia files can tie up valuable resources in the servers creating the content. If a client requests a big static file and downloads it slowly, an Apache child process or thread will be busy serving it until the download completes. A similar scenario occurs when some buggy TCP/IP implementations fail to properly close a connection to the server after the transmission has finished. This is called the “lingering close” problem and will cause resources to be tied up until the connection is closed because of a timeout. While these issues are hardly avoidable, the real problem occurs when you are using process-based MPMs (such as the prefork MPM). For example, if you are running mod_perl in Apache 1.3 with multiple other Perl modules loaded and some cached data, the resulting Apache children will likely be several megabytes in size. Whenever one of them is “wasting time” serving static files or waiting for a connection to close, there are less system resources available to serve the remaining requests. A reverse proxy can help here. You can have one or several threaded, lightweight Apache frontends serving your static content and taking care of slow and buggy clients and backend, full-featured, servers doing the dynamic content generation. You can tune ProxyIOBufferSize so big files are transferred to the reverse proxy quickly and the connection to the backend server is freed as soon as possible. This reduces the load on the backend server, though it increases the memory consumption in the proxy machine. Recent MPMs in Apache 2.1 allow the same Apache child to manage multiple connections, including having a dedicated thread whose task is to wait for connections to close. These MPMs, as they mature, will allow Apache to scale much better in a number of situations.

Offloading SSL Processing

As seen in Chapter 7, “SSL/TLS,” the computations required make SSL a resource-intensive protocol. This may impact the performance of your backend servers in a similar way to what was described in the previous section. One way to solve this issue is by having dedicated, optimized boxes running a reverse proxy with SSL support. The reverse proxy does all the heavy lifting, processing the SSL requests, maybe doing certificate-based authentication, and passing the requests as plain HTTP to the backend servers. The content is generated and returned to the reverse proxy, which performs the resource-intensive task of encrypting it. Since the SSL end-point is the reverse proxy, some information, such as certificate-related information, is lost and does not reach the backend server. How to do this is described in the next couple of sections.

Passing Proxy Information in Headers

ProxyPreserveHost on

When Apache is acting as a reverse proxy, the Host: header is modified in the proxy request to match the hostname specified in the ProxyPass directive. The original Host: header is placed in another header, X-ForwardedHost. In certain situations, it is desirable to preserve the original value of the header. This can be done by setting ProxyPreserveHost on in the configuration file.

Certain information about the request gets lost when a reverse proxy is in place. The reverse proxy records some of that information in new headers that are added to the request to the backend server:

  • X-Forwarded-ForIP address or hostname of the client

  • X-Forwarded-HostOriginal host requested

  • X-Forwarded-ServerHostname for the proxy server

You can pass additional information using the Header and RequestHeader directives, as shown in the next section.

Manipulating Headers

Header set header-name header-value

You can pass additional information to a backend server using the Header directive, provided by the mod_headers module. This module can be used to add and remove arbitrary headers in HTTP requests and responses.

You can add a response HTTP header, deleting any other HTTP headers with the same name that might be present by using Header set, as shown in the example. If you want to add a new header instead of replacing an existing one, you can use Header add instead of Header set. If you want to append the value to an existing header, remove certain headers, or add a request header to the response, you can use append, unset, and echo respectively.

You can modify the request headers sent to the client by using RequestHeader instead of Header. You can add the content of environment variables to the header-value argument by using the format string %{variablename}e. This is similar to how the LogFormat directive works, as explained in Chapter 3, “Logs and Monitoring.” For example, you can use this to pass information about an SSL connection and certificates to the backend server. For that, you will need first to tell mod_ssl to store this information in environment variables with SSLOptions +StdEnvVars. Starting with Apache 2.1, you can avoid that step and access SSL environment variables directly with %{variable-name}s.

Implementing a Caching Proxy

CacheRoot /usr/local/apache/cache
CacheSize 500000
CacheGcInterval 6
CacheMaxExpire 12

One of the advantages of a proxy is that it can cache the information that it serves. The next time that the same content is requested, the proxy can check whether it is already present in the cache and, if so, serve it directly from there. In Apache 1.3, the caching functionality is implemented as part of the mod_proxy module. These directives represent a sample configuration. CacheRoot allows you to specify the location of the cached files and CacheSize allows you to set the overall size in kilobytes of the cache. There are a number of other configuration directives that you can use to tweak the caching behavior. CacheGcInterval allows you to specify the frequency in hours that the cache will be periodically “purged” to comply with the CacheSize setting. CacheMaxExpire specifies the maximum amount of time a document can remain in the cache and still be considered valid without having to check with the original source.

Caching in Apache 2

CacheEnable disk /
CacheRoot /usr/local/apache/cache

The caching and proxying functionality in Apache was split into separate modules starting with Apache 2. While in Apache 2.0 the caching functionality is considered experimental, it is considered of production quality in Apache 2.1/2.2

In Apache 2, the main caching functionality is implemented by mod_cache, which in turn has two backends: mod_mem_cache, which stores cached resources directly in memory, and mod_disk_cache, which uses the file system. The CacheEnable directive takes a caching backend (mem or disk) parameter and a URL prefix. Requests that contain the URL prefix will be cached by the specified backend. You can use CacheDisable to disable caching for specific URLs. You can use the htcacheclean command-line utility to prune the cache at predefined intervals when using the disk backend.

Alternatively, if you have frequently requested files that you know will not change during the life of the server, you can use mod_file_cache to tell Apache to map specific files into memory or cache file handles:

CacheFile /usr/local/apache/htdocs/navigationbar.gif
MMapFile /usr/local/apache/htdocs/button_left.png

If you modify any of the static files, you will need to restart the server for the changes to take effect.

Load Balancing

<Location /balancer-manager>
SetHandler balancer-manager
Order deny,allow
Deny from all
Allow from localhost
</Location>
<Proxy balancer://balancer/ stickysession=PHPSES-
SIONID>
BalancerMember http://www1.example.com/
BalancerMember http://www2.example.com/
BalancerMember http://www3.example.com/
</Proxy>
ProxyPass /content balancer://balancer/

Starting with Apache 2.2, mod_proxy includes a new backend that enables load-balancing capabilities. The load balancing code is generic and allows you to balance multiple other protocols in addition to HTTP. To configure load balancing, first you need to define a group of backend servers with a <Proxy balancer://...> section, as shown here. Once defined, you can use the balancer ID with a regular ProxyPass directive. Each balancer ID and balancer member can take options to specify balancing strategies (based on traffic), fail over, connection pooling, and session support.

Finally, you can check the status of your load-balancing setup with the regular status handler and you can manipulate it with the balancer-manager handler.

Connecting to Tomcat

ProxyPass /myapp ajp://127.0.0.1:8009/myapp
ProxyPassReverse /myapp ajp://127.0.0.1:8009/myapp

Starting with Apache 2, mod_proxy includes an AJP protocol backend. The AJP protocol is commonly used by another Apache module, mod_jk, to communicate with application servers and servlet engines such as Tomcat and Jetty. It is now possible to replace mod_jk with the mod_proxy and mod_proxy_ajp modules, taking advantage of some of the newer functionality in mod_proxy such as load-balancing. As shown in the example, configuring AJP support in mod_proxy is as easy as replacing http:// with ajp:// in your proxy configurations (including load balancing setups).

Alternate Proxies

Squid http://www.squid-cache.org/
Pound http://www.apsis.ch/pound/

As explained in Chapter 9, “Performance and Scalability,” Apache may not be the best choice for all scenarios. There are a number of other specialized proxy servers that may perform better than Apache, depending on your requirements. Two of the most popular open-source proxy servers are Pound and Squid. Squid has been around about as long as Apache, is highly configurable, and it excels in its caching abilities. Pound is a lightweight proxy server that is often used as an SSL reverse proxy.

Transparent HTTP Proxies

As mentioned earlier, a forward caching proxy requires that each client be properly configured. It is also possible to implement so-called transparent proxies. These machines intercept HTTP requests at the network layer and “transparently” serve them through a proxy server without the end-user noticing it. Transparent proxies are still popular with ISPs that want to cut down on bandwidth costs or control the surfing habits of their customers. Some organizations also use transparent proxies to filter spyware and viruses, as mentioned earlier in the “Enabling Forward Proxy Support” section. A typical transparent proxy setup involves using a transparent-proxying–aware server, such as Squid, and modifying your operating system’s packet forwarding rules. You can learn more about setting up transparent HTTP proxies at the following Linux Documentation Project how to:

http://www.tldp.org/HOWTO/TransparentProxy.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.36.38