This chapter explains how to configure Apache to map requests to files and directories or redirect them to specific pages or servers. This knowledge comes in handy to solve common problems such as maintaining working URLs when the site structure changes, dealing with case-sensitive websites, supporting multiple languages, and so on. It also explains how to use the CGI and server side include functionality present in Apache to provide dynamically generated content.
The structure of your website does not need to match the layout of your files on disk. You can use the Alias
directive to map directories on disk to specific URLs. For example, this directive will cause a request for http://www.example.com/icons/image.gif
to make Apache look for the file in /usr/local/apache2/icons/ image.gif
instead of under the default document root, in /usr/local/apache2/htdocs/icons/image.gif
.
The trailing slashes in the Alias
directive are significant. If you include them, the client request must include the slash as well or the Alias
directive won’t take effect. For example, if you use the following directive
Alias /icons/ /usr/local/apache2/icons/
and request http://www.example.com/icons
, the server will return a 404 Document Not Found error response.
The AliasMatch
directive provides a similar behavior to Alias
, but enables you to specify a regular expression for the URL. The matches can be substituted in the destination path. For example, this directive will match any URL under /help
or /docs
to filesystem paths under the manual
directory. Regular expressions are strings that describe or match a set of strings, according to certain syntax rules. You can learn more about regular expressions at http://en.wikipedia.org/wiki/Regular_expression.
The structure of a typical website changes over time, and you can’t control how other sites link to you, such as search engines with stale links. To avoid errors when people access your website through old links, you can configure Apache with the Redirect
directive to redirect those requests to the correct resource, whether it is in the current server or a different one. Although the Redirect
directive can take optional arguments indicating the type of redirect (such as temporary or permanent), the most commonly used syntax is to provide an origin URL and a destination URL. The destination URL can be in the same web server or can point to a different web server altogether. In this example, a request for http://www.example.com/news/today/index.html will be redirected to http://news.example.com/today/index.html.
The RedirectMatch
directive is similar to Redirect
, but allows the origin URL path to be a regular expression. This allows a great amount of flexibility. For example, imagine you are a software company distributing downloads from your website and release new versions of a particular product over time. You may find that a certain percentage of your users are still downloading older versions of your software through third-party websites that have not yet updated their links. Using RedirectMatch
, users who request old versions of the file can be easily redirected to the latest version. For example, suppose the name of the latest version of your downloadable file is myapp-3.0. This example will redirect requests for http://www.example.com/myapp-2.5.1-demo.tgz
to http://www.example.com/myapp-3.0-demo.tgz
and requests for http://www.example.com/myapp-1.2-manual.pdf
to http://www.example.com/myapp-3.0-manual.pdf
.
The first three elements of the regular expression will match a major and minor number and an optional patch number. Those will be replaced by 3.0. The remaining part of the filename is captured in the final regular expression group and replaced in the destination URL.
If you maintain a popular or complex website, no matter how careful you are, you will receive a number of requests for invalid URLs or documents that no longer exist. Though many of them can be addressed with proper use of Redirect
s, there will always be a number of requests that end up with the dreaded 404 Document Not Found response. For that reason, it may be desirable to replace the default Apache error page and direct your users to a special location in your website. For example, a page that can help your visitors find the resource they were looking for, such as a search page or site map, as shown in the example. On a related note, Chapter 6 provides additional information on customizing access denied pages.
AddHandler cgi-script .pl .cgi <Location "/cgi-bin/*.pl"> Options +ExecCGI SetHandler cgi-script </Location>
Handlers are a way Apache determines which actions to perform on the requested content. Modules provide handlers, and you configure Apache to associate certain content with specific handlers. This functionality is commonly used with content-generation modules such as PHP and mod_cgi
. The example shows how to associate the cgi-handler
handler with the files you want to run as CGIs.
The AddHandler
directive associates a certain handler with filename extensions. RemoveHandler
can be used to remove previous associations. In the example, AddHandler
tells Apache to treat all documents with cgi
or pl
extensions as CGI scripts.
The SetHandler
directive enables you to associate a handler with all files in a particular directory or location. The Action
directive, described later in this chapter, enables you to associate a particular MIME type or handler with a CGI script.
MIME (Multipurpose Internet Mail Extensions) is a set of standards that defines, among other things, a way to indicate the content type of a document. Examples of MIME types are text/html
and audio/mpeg
. The first component of the MIME type is the main category of the content (text, audio, image, video) and the second component is the specific type.
Apache uses MIME types to determine which modules or filters will process certain content, and to add HTTP headers to the response to identify its content type. These headers will be used by the client application to identify and correctly display the contents to the end user.
AddType text/xml .xml .schema <Location /xml-schemas/> ForceType text/xml </Location>
As with content handlers, you can associate MIME types with specific file extensions or URLs. This example shows how to associate the text/xml
MIME type with files ending in .xml
and .schema
and with all the content under the /xml-schemas/
URL. By default, Apache bundles a mime.types
file that includes the most common MIME types and their associated extensions.
CGI stands for Common Gateway Interface. It is a standard protocol used by web servers to communicate with external programs. The web server provides all the necessary information about the request to an external program, which processes it and returns a response. The response is then transmitted back to the client. CGIs were the original mechanism to generate unique content for every request on-the-fly (“dynamic content”) and are supported by nearly every web server. Apache provides support for CGIs using the mod_cgi
Apache module (mod_cgid
when running a threaded Apache server).
Poorly written or sample CGI programs can be a security risk, so if you are not using this functionality, you may want to disable it altogether, as described in Chapter 6.
This section shows a number of ways to tell Apache that the target file for a particular request is a CGI script. This is necessary so Apache will not serve the contents of the file directly to the client, but rather return the results of executing it.
The ScriptAlias
directive is similar to the Alias
directive described earlier in this chapter, but with the difference that Apache will treat every file in the target directory as a CGI script. Alternatively, you can use any <Files>
, <Location>
, and <Directory>
sections in combination with the SetHandler
directive to tell Apache that the contents of these sections are CGI programs. In this case, you will also need to provide an Options +ExecCGI
directive to tell Apache that CGI execution is allowed. The following example tells Apache to treat all URLs ending with a .pl
file extension as CGI scripts.
<Location "/cgi-bin/*.pl"> Options +ExecCGI SetHandler cgi-script </Location>
# Processing all GIF images through a CGI script # before serving them Action image/gif /cgi-bin/filter.cgi # Associating specific HTTP methods with a CGI # script Script PUT /cgi-bin/upload.cgi
In addition to the directives mentioned in the previous section, Apache provides directives that simplify associating specific MIME types, file extensions, or even specific HTTP methods with a particular CGI. The mod_actions
module, included in the base distribution and compiled by default, provides the Action
and Script
directives, shown in this example:
The information about the original requested document is passed to the CGI via the PATH_INFO
(document URL) and PATH_TRANSLATED
(document path) environment variables.
As with the example from the previous section, the directory containing the destination CGI must be marked as allowing CGI execution with either a ScriptAlias
directive or the ExecCGI
parameter to the Options
directive.
In addition to the modules and techniques explained in Chapter 2 and Chapter 3, the mod_cgi
module provides the ScriptLog
directive to aid in the debugging of CGI scripts. If enabled, it will store information for each failed CGI execution, including HTTP headers, POST variables, and so on. This file can grow quickly, so you can limit its growth with the ScriptLogBuffer
and ScriptLogLength
directives.
One of the main drawbacks of CGI development is the performance impact associated with the requirement to start and stop programs per every request.
mod_perl
and FastCGI provide two solutions for this problem. Both require careful examination of existing code because you can no longer assume in your CGIs that all resources will be automatically freed by the operating system after the request is served.
mod_perl
is a module available for Apache 1.3 and 2.0 that embeds a Perl interpreter inside the Apache web server. In addition to a powerful API to Apache internals, mod_perl
includes a CGI compatibility mode that provides an environment that allows existing Perl CGIs to run with little or no modification. Since the scripts are run inside a persistent, in-process interpreter, there is no startup penalty.
FastCGI is a standard that allows the same instance of a CGI program to answer several requests over time. You can read the specs and download modules for Apache 1.3 and 2.x from http://www.fastcgi.com. FastCGI has regained some popularity by its use by web development frameworks such as Ruby-on-Rails.
Document on disk This document, <!--#echo var="DOCUMENT_NAME" -->, was last modified <!--#echo var="LAST_MODIFIED" --> Content received by the browser This document, sample.shtml, was last modified Sunday, 14-Sep-2005 12:03:20 PST
SSI is a simple, “old school” web technology and a predecessor to other HTML embedded languages such as PHP. SSI provides a simple and effective mechanism for adding simple pieces of dynamic content with very little overhead; for example, a common footer for each page that includes the date and time the page was served. As another example, the Apache 2.0 distribution uses SSI to provide a custom look and feel for error messages. It works by embedding special processing instructions inside web pages and evaluating them before the content is returned to the client. You can learn more about Apache SSI support at http://httpd.apache.org/docs/2.0/howto/ssi.html.
Server side includes functionality is provided by the mod_include
module, distributed with Apache. The simplest way to configure it is to associate an extension with the server-parsed
content handler, as shown in the example.
Environment variables are variables that can be shared between modules and that are also available to external processes such as CGIs and server side include (SSI) documents. Environment variables also can be used for intermodule communication and to flag certain requests for special processing.
You can set environment variables with the SetEnv
directive. This variable will be available to CGI scripts and SSI pages, and can be logged or added to a header. For example
SetEnv foo bar
will create the environment variable foo
and assign it the value bar
.
Conversely, you can remove specific variables using the UnsetEnv
directive.
Finally, the PassEnv
directive enables you to expose variables from the server process environment. For example
PassEnv LD_LIBRARY_PATH
will make the environment variable LD_LIBRARY_PATH
available to CGI scripts and SSI pages. This variable contains the path to loadable dynamic libraries in some Unix systems, such as Linux. You can get a listing of standard environment variables in the appendix.
SetEnvIf HTTP_USER_AGENT MSIE iexplorer SetEnvIf HTTP_USER_AGENT MSIE iexplorer=foo SetEnvIf HTTP_USER_AGENT MSIE !javascript
The SetEnvIf
directive enables you to set environment variables based on request information, such as the username, the file being requested, or a specific HTTP header value.
This directive takes a request parameter, a regular expression, and a set of variables that will be modified if the parameter matches the expression. This example matches Microsoft Internet Explorer browsers and shows how you can just set a variable, assign it an arbitrary value foo
, or even assign it a negated expression.
Later, you can check the existence and value of this variable to perform a variety of actions such as logging a specific request or serving different content based on the type of browser. For example, you could provide simplified HTML pages for text browsers such as Lynx, or for PDA and cell phone browsers.
In fact, checking for the client user agent is so common that mod_setenvif
provides the BrowserMatch
directive, allowing you to simply write
BrowserMatch MSIE iexplorer=1
Apache provides a set of special environment variables. If one of those variables is set, Apache will modify its behavior. They are commonly used to work around buggy clients. For example, the nokeepalive
variable disables keepalive support in Apache. This reduces performance on the server, since multiple requests cannot be transmitted over the same connection. Hence, it should only be set when the request is made by a client that does not correctly support this functionality, typically using a BrowserMatch
or SetEnvIf
directive, as shown in the example.
In the appendix you can find a list of all the special environment variables. Chapters 7 and 8 include examples of special variables used to work around issues with SSL and DAV implementations.
AddCharset UTF-8 .utf8 AddLanguage en .en AddEncoding gzip .gzip .gz
The HTTP protocol provides mechanisms that enable you to maintain different versions of a certain resource and return the appropriate content based on the capabilities and preferences of the client. For example, a client may inform you that he is able to accept content that is compressed and that, while its preferred language is English, it will also understand pages written in Spanish. The three main aspects that are negotiated are
Encoding: This is the format in which a resource is stored or represented, and can usually be determined from the file extension. For example, the file listing.txt.gz
has a MIME type of text/plain
and a gzip
encoding. The encoding of the resource will be appended to the Content-Encoding:
header of the response.
Character Set: This property describes the particular character set used by a document. The character set of the resource will be appended to the Content-Type:
header of the response, together with the MIME type.
Language: You can provide different versions of the same resource. For example, the Apache documentation provides index.html.en
, index.html.es
, index.html.de
, and so on. The language of the resource will be appended to the Content-Language:
header of the response.
The example explains how you can associate charsets, languages, and encodings with particular file extensions.
There are two primary ways of configuring content negotiation in Apache: multiviews and type maps.
Multiviews can be enabled by adding an Options +Multiviews
directive to your configuration. This method is not recommended (except for simple websites) because it is not very efficient: For every request, it scans the directory containing the file, looking for similar documents with additional extensions. It will then construct a list of such files and use the extensions to determine content encoding and character set, and return the appropriate content.
It is recommended that you use type maps instead, because they save filesystem lookups. These are special files that map filenames and information (metadata) about them. You can configure a type map for a certain resource by creating a file with the same name and the .var
extension, and adding an AddHandler
directive, as shown in the sample configuration.
The file can contain several entries. Each entry starts with a URI:
that is the name of the document, followed by several attributes such as Content-Type:
, Content-Language:
, and Content-Encoding:
. The following listing shows a sample type map file.
You can specify a default character set for documents without one already associated by using the AddDefaultCharset
, as shown in the example. Another option is to specify AddDefaultcharset Off
to disable adding a character set for documents without one associated.
You can specify a default language with the DefaultLanguage
directive. For a website in English, that would be en
, as shown in the example.
Finally, if the client does not provide a language preference, you can use LanguagePriority
to determine the preferred language order. In this example, if a document in English is found, it will be served. Otherwise, Apache will look for a document in Spanish, and if that is not found, Apache will look for a document in German. You can learn more about this topic at http://httpd.apache.org/docs/2.0/mod/mod_negotiation.html and http://httpd.apache.org/docs/2.0/mod/mod_mime.html.
Apache provides a very powerful module, mod_rewrite
, that allows virtually unlimited URL manipulation capabilities using regular expressions. Due to its complexity, it is outside the scope of this book other than specific references or examples in other chapters. It is mentioned here so you are aware of its existence if you reach the limits of what Redirect
, ErrorDocument
, and Alias
directives can do.
You can learn more about mod_rewrite at http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html.
Sometimes, certain URLs only work only if they have a “/” at the end. This is likely because you have either not loaded mod_dir
into the server or because the redirections made by mod_dir
are not working correctly with the value specified in the ServerName
directive, as explained in the “Redirections Do Not Work” section in Chapter 2.
When accessing certain URLs that map into directories, it is necessary to add a trailing slash (“/”) to the end of the URL to correctly access the content of the directory, which can be either an index file or a directory index. Forgetting to add this trailing slash is a common mistake, so when mod_dir
realizes that may be happening, it issues the appropriate redirection.
For example, if mod_dir
is enabled on the server, and you have a directory named foo
under the document root, a request for http://example.com/foo
will be redirected to http://example.com/foo/
.
This is the default behavior in both Apache 1.3 and 2.0 when mod_dir
is loaded into the server. In Apache 2, you can disable such redirections using a DirectorySlash
directive:
DirectorySlash Off
mod_speling
is a useful Apache module that recognizes misspelled URLs and redirects the user to the correct location for the document. mod_speling
is able to correct URLs with the wrong capitalization or with one letter missing or incorrect. This is most common when users misspell the URL while typing it in the browser.
For example, if a user requests the file file.html
and it is not present, mod_speling
will search for a similar document such as FILE.HTML
, file.htm
, and so on, and if it finds one, will return it. This has a performance impact, but can be quite useful and avoid unnecessary support requests due to broken links.
To enable spelling checks, you can add CheckSpelling on
to your Apache configuration, as shown in the example.
Windows has a non–case sensitive file system, while Unix systems are case sensitive. This usually creates problems when migrating websites from Windows to Unix servers. All of a sudden, URLs such as http://www.example.com/images/icon. PNG that used to work fine on Windows start failing with Document Not Found errors, because the file on disk is named icon.png
and is not equivalent on Unix to the icon.PNG
file requested. This issue can be solved by manually checking and rewriting every link or by enabling the mod_speling
module as described in the previous section.
There is also an alternative, single-purpose module that can be used to this end: mod_nocase
. This module, originally based on mod_speling
, makes GET request for URLs non–case sensitive. It checks for an exact URL match and if it does not find it, it tries a non–case sensitive matching. If multiple files match the non–case sensitive search, the first one will automatically be selected. To enable mod_nocase
, you should load it into the server and include a NoCase
directive in your Apache configuration file, as shown in the example.
You can download mod_nocase
from http://www.misterblue.com/Software/mod_nocase.htm.
Remember that enabling either mod_speling
or mod_nocase
has an impact on the performance of the server.
Independently of whether you have dynamically generated or hand-coded your HTML pages, if they contain markup errors, they may not display correctly in all browsers. Tidy is a useful command-line tool that is able to process malformed HTML and XML, correct many common mistakes, and produce standards-compliant output. You can download it from http://tidy.sourceforge.net/.
You can run Tidy from the command line over static files or, thanks to mod_tidy
and the Apache 2 filter architecture, process content being served on-the-fly. This example shows how to use the SetFilter
directive to associate a Tidy filter with XML and HTML files and how to use TidyOption
to configure the behavior of the Tidy engine. Apache filter architecture and configuration is described in Chapter 11. You can download mod_tidy
from
http://home.snafu.de/tusk/mod_tidy/
.
A related Apache 2 module is mod_validator
, which can be downloaded from
3.141.25.41