Chapter 9. Performance and Scalability

Tuning Apache

This chapter explains which configuration options can affect performance and scalability in Apache, and how to tune them. The good news is that, in most cases, it will not be necessary. Most scalability and speed issues are likely to come from the dynamic content generation engine and database layer, not from the Apache web server. Some of the problems and solutions discussed in this chapter are generic enough that they apply to most server software, while others are Apache-specific.

Understanding Performance and Scalability

Improving the performance and scalability of any computer system involves a mixture of experience, profiling work, and understanding of the server’s inner workings. This chapter provides a number of bite-sized suggestions and ideas that will help you get started. For the sake of simplicity, performance refers to serving requests faster and scalability refers to being able to serve a great number of requests simultaneously.

Tuning Your Hardware

vmstat

Likely, the single most important action that you can take to improve the performance of your server is to increase the amount of RAM. That extra RAM will allow the operating system to cache frequently accessed disk files, as well as to support multiple Apache children running simultaneously.

The second aspect to consider is disk speed. Fast disks with large amounts of onboard cache can significantly improve the load. You may also want to modify different drive parameters, such as enabling Direct Memory Access support for your drive. Under Linux, you can achieve this with the hdparm utility.

vmstat is a useful Unix tool for finding bottlenecks. This tool reports information about processes, memory, paging, block IO, traps, and CPU activity.

If you are using SSL in your server and need to support many simultaneous users, that can require a lot of CPU resources. A faster processor or a dedicated crypto card will help in this situation. Please refer to Chapter 7 and to the improving SSL performance section in Chapter 10 for additional settings that can help. Finally, machines with multiple CPUs and/or multicore CPUs greatly increase the scalability of process-based web servers and are recommended for medium- and heavy-duty hosting.

Increasing OS Limits

ulimit

Several operating system factors can prevent Apache from scaling. These factors are related to process creation, memory limits, and maximum simultaneous number of open files or connections.

The Unix ulimit command enables you to set several of the limits covered in the next few sections on a perprocess basis. Please refer to your operating system documentation for details on ulimit’s syntax.

Increasing OS Limits on Processes

Apache provides settings for preventing the number of server processes and threads from exceeding certain limits. These settings affect scalability because they limit the number of simultaneous connections to the web server, which in turn affects the number of visitors that you can service simultaneously. These settings vary from MPM to MPM and are described in detail in Chapter 11.

The Apache MPM settings are in turn constrained by OS settings limiting the number of processes and threads. The steps needed to change the limits vary from operating system to operating system. In Linux 2.4 and 2.6 kernels, the limit can be accessed and set at runtime by editing the /proc/sys/kernel/threads-max file. You can read the contents of the file with

cat /proc/sys/kernel/threads-max

and write to it using

echo value > /proc/sys/kernel/threads-max

In Linux (unlike most other Unix versions), there is a mapping between threads and processes, and they are similar from the point of view of the OS. In Solaris, those parameters can be changed in the /etc/system file. Such changes don’t require rebuilding the kernel, but might require a reboot to take effect. You can change the total number of processes by changing the max_nprocs entry and the number of processes allowed for a given user with maxuproc.

Increasing OS File Descriptors

Whenever a process opens a file (or a socket), a structure called a file descriptor is assigned until the file is closed. The OS limits the number of file descriptors that a given process can open, thus limiting the number of simultaneous connections the web server can have. How those settings are changed depends on the operating system. On Linux systems, you can read or modify /proc/sys/fs/file-max (using echo and cat as explained in the previous section). On Solaris systems, you must edit the value for rlim_fd_max in the /etc/system file. This change will require a reboot to take effect.

You can find additional information at http://httpd.apache.org/docs/misc/descriptors.html.

Controlling External Processes

RlimitCPU
RLimitMem
RLimitNProc

Apache provides several directives to control the amount of resources that external processes use. This applies to CGI scripts spawned from the server and programs executed via Server Side Includes. Support for the following directives is available only on Unix and varies from system to system:

  • RLimitCPUAccepts two parameters—the soft limit and the hard limit for the amount of CPU time in seconds that a process is allowed. If the max keyword is used, it indicates the maximum setting allowed by the operating system. The hard limit is optional. The soft limit can be changed between restarts, and the hard limit specifies the maximum allowed value for that setting. If you are confused, check Chapter 11 for a similar discussion with ServerLimit and MaxClients.

  • RLimitMemThe syntax is identical to RLimitCPU but this directive specifies the amount (in bytes) of memory used per process.

  • RLimitNProcThe syntax is identical to RLimitCPU but this directive specifies the number of processes.

These three directives are useful to prevent malicious or poorly written programs from running out of control.

Improving File System Performance

Accessing the disk is an expensive operation in terms of resources and is one of the slowing factors for any server. If you can cut the number of times Apache or the operating system need to read from disk or write to disk, performance can be improved significantly. The following sections discuss some of the parameters you can fine tune to achieve this. In addition, most modern operating systems are very efficient with filesystem caching, and thus ensuring that enough RAM is available can also dramatically improve file access speed for commonly accessed files.

Mounting File Systems with noatime Option

Many Linux file systems can be mounted with the noatime option. This means that the operating system will not record the last time a file was accessed when reading it, though it will still keep track of the last time it was written to. This can provide significant speed improvements, especially in heavily loaded servers. The following line shows a sample /etc/fstab entry:

/dev/hda3    /www    ext2   defaults,noatime    1  1

Handling Symbolic Links

Options FollowSymLinks

In Unix, a symbolic link (or symlink) is a special kind of file that points to another file. It is created with the Unix ln command and is useful for making a certain file appear in different places.

Two of the parameters that the Options directive allows are FollowSymLinks and SymLinksIfOwnerMatch. By default, Apache won’t follow symbolic links because they can be used to bypass security settings and provide unwanted access to parts of your filesystem. For example, you can create a symbolic link from a public part of the website to a restricted file or directory not otherwise accessible via the Web. So, also by default, Apache needs to perform a check to verify that the file isn’t a symbolic link. If SymLinksIfOwnerMatch is present, it will follow a symbolic link if the target file is owned by the same user who created the symbolic link. Because those tests must be performed for every path element and for every path that refers to a file system object, they can be expensive. If you control the content creation, you should add an Options +FollowSymLinks directive to your configuration and avoid the SymLinksIfOwnerMatch argument. In this way, the tests won’t take place and performance isn’t affected.

Disabling Per-directory Configuration Files

<Directory />
AllowOverride none
</Directory>

As explained in previous chapters, per-directory configuration files provide a convenient way of configuring the server and allow for some degree of delegated administration. However, if this feature is enabled, Apache has to look for these files in each directory in the path leading to the document being served. You can disable this feature by adding AllowOverride none to your configuration.

Configuring Content Negotiation

As explained in the “Configuring Content Negotiation” section in Chapter 4, Apache can serve different versions of a file depending on client language or preferences. This can be accomplished with file extensions, but for every request, Apache must access the file system repeatedly looking for files with appropriate extensions. If you need to use content negotiation, make sure that you at least use a type-map file, minimizing accesses to disk.

Disabling or Minimizing Logging

BufferedLogs On

In heavily loaded websites, logging can slow down the server significantly. You can minimize its impact by not logging hits to all or certain images (such as navigational buttons). Additionally, you can buffer logs before they are written to disk using the BufferedLogs directive included in mod_log_config in Apache 2 and later. Finally, you can decide to use modules such as mod_log_spread that allow you to log to the network instead of to local disk, improving performance. You can download this module from http://www.backhand.org/mod_log_spread.

Tuning Network and Status Settings

A number of network-related Apache settings can degrade performance. The following sections discuss some of the most relevant.

HostnameLookups

HostnameLookups off

When HostnameLookups is set to on or double, Apache will perform a DNS lookup to capture the hostname of the client, introducing a delay in the response to the client. The default setting is HostnameLookups off. If you need to use the hostnames, you can always process the request logs with a log resolver later, as explained in Chapter 3.

Certain other settings can trigger a DNS lookup, even if HostnameLookups is set to off, such as when a hostname is used in Allow or Deny rules, as covered in Chapter 6.

Request Accept Mechanism

Apache can use different mechanisms to control how Apache children arbitrate requests. The optimal mechanism depends on the specific platform and number of processors. Additional information can be found at http://httpd.apache.org/docs/2.0/misc/perf-tuning.html.

mod_status

This module collects statistics about the server, connections, and requests. Although this can be useful to troubleshoot Apache, it can also slow down the server. For optimal performance, disable this module, or at least make sure that ExtendedStatus is set to off, which is the default.

AcceptFilter

AcceptFilter http data
AcceptFilter https data

A number of operating systems, such as Linux and FreeBSD, allow you to mark certain listening sockets as handling specific protocols. Thus, it is possible to ask the kernel to only pass a request to Apache once all the content of the HTTP request has been received, improving performance. This capability is only implemented in Apache 2.1 and later, although there is an earlier, BSD-specific version of the AcceptFilter directive present in Apache 1.3.22 and later. You can find more in-depth documentation for socket configuration in the AcceptFilter manual page.

KeepAlives

KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 500

HTTP 1.1 allows multiple requests to be served over a single connection. HTTP 1.0 allows the same thing with keep-alive extensions. The KeepAliveTimeout directive enables you to specify the maximum time in seconds that the server will wait before closing an inactive connection. Increasing the timeout means that you will increase the chance of the connection being reused. On the other hand, it also ties up the connection and Apache process during the waiting time, which can prevent scalability, as discussed earlier. The MaxKeepAliveRequest directive allows you to specify the maximum amount of times the connection will be reused.

Preventing Abuse

TimeOut
LimitRequestBody
LimitRequestFields
LimitRequestFieldSize
LimitRequestLine
LimitXMLRequestBody

Denial of service (DoS) attacks work by swamping your site with a great number of simultaneous requests, slowing down the server or preventing access altogether to legitimate clients. DoS attacks are difficult to prevent in general, and usually the most effective way to address them is at the network or operating system level. One example is blocking specific addresses from making requests to the server; although you can block those addresses at the web server level, it is more efficient to block them at the network firewall/router or with the operating system network filters.

Other kinds of abuse include posting extremely big requests or opening a great number of simultaneous connections. You can limit the size of requests and timeouts to minimize the effect of attacks. The default request timeout is 300 seconds, but you can change it with the TimeOut directive. A number of directives enable you to control the size of the request body and headers: LimitRequestBody, LimitRequestFields, LimitRequestFieldSize, LimitRequestLine, and LimitXMLRequestBody.

Limiting Connections and Bandwidth

If you are providing hosting services for several clients, you may face the situation where one of your clients’ websites is degrading the performance of the service as a whole. This may be because the website was linked from a high traffic news page (the so-called “Slashdot effect”) or because it is hosting a popular set of music or video files. There are a number of Apache modules that allow you to measure and control bandwidth and number of connections, to make sure the impact on other customers and the server as a whole is kept to a minimum. Throttling in this context usually means slowing down the delivery of content based on the file requested, a specific client IP address, the number of simultaneous requests, and so on.

The mod_bandwidth Apache 1.3 module enables the setting of server-wide or per-connection bandwidth limits, based on the specific directory, size of files, and remote IP/domain.

http://www.cohprog.com/mod_bandwidth.html

The bandwidth share module provides bandwidth throttling and balancing by client IP address. It supports Apache 1.3 and earlier versions of Apache 2.

http://www.topology.org/src/bwshare/README.html

The mod_throttle module throttles bandwidth per virtual host or user, for Apache 1.3.

http://www.snert.com/Software/mod_throttle/index.shtml

The Robotcop module helps you prevent spiders from accessing parts of their sites they have marked off limits.

http://www.robotcop.org/

mod_require_host allows you to restrict access to those clients (such as many IIS worms) that do not provide a host header and just try to connect to your IP address

http://www.snert.com/Software/mod_require_host/index.shtml

mod_choke is a module for Apache that limits usage by the number of concurrent connections per IP, and limits the rate at which apache sends data to the client.

http://os.cyberheatinc.com/modules.php?name=Content&pa=showpage&pid=7

mod_tsunami allows you to limit the number of Apache children per virtual host.

http://sourceforge.net/projects/mod-tsunami/

Dealing with Robots

http://www.robotstxt.org/

Robots, web spiders, and web crawlers are names that define a category of programs that download pages from your website, recursively following your site’s links. Web search engines use these programs to scan the Internet for web servers, download their content, and index it. Normal users use them to download an entire website or portion of a website for later offline browsing. Normally these programs are well behaved, but sometimes they can be very aggressive and swamp your website with too many simultaneous connections or become caught in cyclic loops.

Well-behaved spiders will request a special file, called robots.txt, that contains instructions about how to access your website and which parts of the website won’t be available to them.

The syntax for the file can be found at http://www.robotstxt.org. You can stop the requests at the router or operating system levels.

But sometimes web spiders don’t honor the robots.txt file. In those cases, you can use the Robotcop Apache module mentioned in the previous section, which enables you to stop misbehaving robots.

Reverse Proxies and Load Balancers

mod_proxy_http
mod_backhand http://www.backhand.org/mod_backhand/

So far we have covered vertical scalability, which deals with how to improve the performance of a single server configuration. Distributing the load across multiple web servers is horizontal scalability. In this set of architectures, you can expand your capacity by simply adding new machines, improving the amount of traffic you can serve as well as the reliability of your setup.

Chapter 10 deals with using Apache as a reverse proxy. In this setup, one or several lightweight front-end servers deal with static content and handling SSL requests and slow connections, while relaying requests for specific URLs to specialized back-end servers. A number of companies provide commercial products that implement this functionality using hardware appliances.

Finally, mod_backhand is an Apache 1.3 module that provides dynamic redirection of HTTP requests within a cluster of machines, based on available resources.

Caching and Compression

The fastest way to serve content is to not serve it! This can be achieved by using appropriate HTTP headers that instruct clients and proxies of the validity in time of the requested resources. In this way, some resources that appear in multiple pages but don’t change frequently, such as logos or navigation buttons, are transmitted only once for a certain period of time.

Additionally, you can use mod_cache (described in Chapter 10) to cache dynamic content so that it doesn’t need to be created for every request. This is potentially a big performance boost because dynamic content usually requires accessing databases, processing templates, and so on, which can take significant resources.

Another way to reduce the load on the servers is to reduce the amount of data being transferred to the client. This, in turn, makes your clients’ website access faster, especially for those over slow links. To help with this, you can reduce the number and size of your images. You can automate part of this process using the ImageMagick command-line tools (http://www.imagemagick.org). Additionally, you can compress big downloadable files or even static HTML files and use content negotiation, as described in previous chapters. Chapter 11 explains how to use the mod_deflate filtering module to compress HTML content. This can be useful if CPU power is available and clients are connecting over slow links. The content will be delivered faster and the process will be free sooner to answer additional requests.

Module-specific Optimizations

As mentioned at the beginning of the chapter, most bottlenecks occur at the content-generation and database access layers. There are a certain number of modules that can help.

For example, FastCGI and mod_perl can be used to speed up CGI script execution, as explained in “Improving CGI Script Performance” in Chapter 4, and a number of encoders and optimizers exist for PHP, the most popular web development language that runs on Apache, as explained in Chapter 12.

Alternatives to Apache

Apache is a portable, secure, extremely flexible web server. Precisely because of that, it is not necessarily the best solution for all scenarios. The servers mentioned here are optimized, lightweight web servers that often perform or scale better than Apache for certain tasks. For example, some popular websites such as Slashdot use Apache running mod_perl to generate content and a different server such as Boa to serve static images files. This is easily accomplished by serving the images from a different domain, such as images.slashdot.org.

Some of the projects also include other popular Apache features, such as URL rewriting and PHP support.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.12.34