Most modules need to offer system administrators and users some means of configuring and controlling them. In some cases, this may even be the primary purpose of a module.
System administrators configure Apache using httpd.conf
, while end users have more limited control through .htaccess
files. Modules give control to both parties by implementing configuration directives that can be used in these files.
This chapter discusses how to implement configuration directives in a module and how to work with directives implemented by other modules.
From the system administrator’s point of view, several kinds of directives exist. These can be broadly classified according to their scope and validity in the configuration files. In other words, some directives are valid for the server as a whole, whereas others apply within a scope such as <VirtualHost>
or <Directory>
.
Conflicting directives may override each other on the basis of order and specificity. For example, where there is a conflict, a directive in a .htaccess
file overrides one set in the same scope in httpd.conf
(provided the system administrator has enabled. htaccess
). In most cases, this applies recursively, although this is controlled by individual modules whose behavior may differ.
Apache supports the following standard contexts:
Directives appearing in httpd.conf
but not inside any container, apply globally, except where overridden. This context is appropriate for setting system defaults such as MIME types, and for once-only setup such as loading modules. Most directives can be used here.
Each virtual host has its (virtual) server-wide configuration set within a <VirtualHost>
container. Most directives that are valid in the main configuration are also valid in a virtual host, and vice versa.
The <Directory>
, <Files>
and <Location>
containers define a hierarchy within which configuration can be set and overridden at any level. This is the most usual form of configuration, and is orthogonal to the virtual hosts. In the interests of brevity, we’ll refer to this configuration collectively as the directory hierarchy.
.htaccess
files are an extension of the directory hierarchy that enables users to set directives for themselves, subject to permissions (the AllowOverride
directive) set up by the server administrator. The .htaccess
files also differ from normal configuration in that, when enabled, they are reread by Apache for every request. This scheme serves two purposes: Users don’t have to bug the administrator to restart the server, and it avoids potential security issues of processing user inputs while Apache has root privilege. Setting AllowOverride
to enable .htaccess
files is always a compromise: It imposes a significant performance overhead and loses the security enjoyed by a tightly controlled configuration, but it empowers users who are not permitted to manipulate httpd.conf
.
Additionally, modules may themselves implement their own containers. For example, mod_proxy
implements <Proxy>
, and mod_perl
implements <Perl>
.
As noted in Section 9.1, there are two orthogonal hierarchies of configuration directives: (virtual) hosts and directories. Internally, this dual hierarchy is based on having two different data structs: the per-server configuration and the per-directory configuration. In fact, every module has its own pointers for implementing each of these structs, although either or both can be unused (NULL
), and it is unusual for a module to use both of them.
The per-server configuration is kept on the server_rec
, of which there is one for each virtual host, created at server start-up. The per-directory configuration is exposed to modules via the request_rec
and may be computed using the merge function for every request.
The configuration structs are instances of configuration vectors, as seen in Chapter 4. Those discussed in this chapter are used for configuration that is initialized at server start-up and should be accessed as read-only thereafter.
No less than five out of the six (usable) elements of the Apache module struct are concerned with configuration:
It is up to each module whether and how to define each configuration struct. Whenever a struct is defined, the module must implement an appropriate create function to allocate and (usually) initialize it:
At this point, just allocating and returning a struct of the right size is often sufficient: Apache uses the return value. Now these values can be accessed at any time a server_rec
or request_rec
, respectively, is available:
So why does Apache have two separate configurations, how are they related, and which should your module use?
Most directives work in the directory hierarchy—for example, all the directives from our mod_choices
and mod_txt
modules in Chapters 6 and 8 do so. This approach offers the greatest flexibility to system administrators who want to control the configuration and to deploy different configurations in different areas of their server, with <Directory>
, <Files>
, <Location>
, and pattern-matching versions such as <DirectoryMatch>
, and, subject to AllowOverride
settings, .htaccess
. When in doubt, implementing a directive in the directory configuration is unlikely to be wrong!
The server hierarchy is simpler. There is no nesting, and only two levels are available: top level or inside a <VirtualHost>
. This approach is appropriate in the following cases:
post_config
or child_init
hookThere is a subtle “gotcha” with directory configuration. When a directive is allowed to appear at the top level in httpd.conf
(i.e., outside any <Directory>
/etc. container), it is also syntactically valid inside a <VirtualHost>
. But the <VirtualHost>
container has no meaning in the directory hierarchy. Thus setting per-directory configuration in a virtual host requires a <Directory>
or similar container, in addition to the <VirtualHost>
. That’s why, for example, most access and authentication control directives are disallowed at the top level.
Configuration directives on the server hierarchy can, and should, address this issue simply by making themselves syntactically invalid in a <Directory>
context.
On a related theme, it is important not to confuse the two hierarchies. The ProxyPassReverse
directive in early releases of Apache 2.0 mod_proxy
offers a cautionary lesson. ProxyPassReverse
directives were valid in a <Location>
context, but were held in the per-server configuration. As a consequence, if multiple ProxyPassReverse
directives appeared in different <Location>
contexts, they would overwrite each other and only the last one would work.
The my_cmds
field of the module struct mentioned earlier is a null-terminated array containing the commands implemented by the module. Normally, these commands are defined using macros defined in http_config.h
. For example:
AP_INIT_TAKE1
is one of many such macros, all having the same prototype (more on that later). It has the following arguments:
NULL
)An essential component of every directive is the function implementing it. Normally, the function serves to set some data field(s) in one of the configuration structs. The function prototype for AP_INIT_TAKE1
is the same, regardless of whether we’re setting per-server or per-directory configuration:
cmd
is a cmd_parms_struct
comprising a number of fields used internally by Apache and available to modules. The following fields are most likely to be of interest in command functions:
void* info
—contains my_ptr
from the command declarationapr_pool_t* pool
—pool for permanent resource allocationapr_pool_t* temp_pool
—pool for temporary resource allocationserver_rec* server
—the server_rec
Other fields are more commonly accessed through accessor functions on the rare occasions when a command function needs to be context sensitive. Here is the full declaration:
cfg
is the directory configuration rec, and arg
is an argument to the directive set in the configuration file we are processing. Because we specified AP_INIT_TAKE1
, there is exactly one argument. Thus, if we are setting per-directory configuration, we just cast the cfg
argument. If we are setting per-server configuration, we need to retrieve this argument from the server_rec
object in cmd_parms
instead.
We can now look at a simple example. Our mod_txt
in Chapter 8 needs a user-defined header and footer, each of which is a separate file. Let’s go ahead and implement the configuration for it. We would like to be able to specify different headers and footers at will, so that a user can apply different looks-and-feels to different areas of a site. Thus we need to implement these directives in the directory hierarchy.
Now we need to implement the functions to set the header and footer. Just for the moment, we’ll simply set them and ignore checking that they’re really files, they’re accessible to the server, and displaying them in a webpage won’t be a security risk.
In the preceding example, we implemented two essentially identical functions to set different fields of the configuration. We can consolidate these functions into a single function by passing it a context variable in cmd->info
. Apache (APR) provides a handy macro for passing a pointer to individual fields of a configuration struct, so we can just set its contents:
In general, we write our own function to implement a directive. This step is not always necessary, however. When a directive simply sets a field in the directory configuration, we can use one of the prepackaged functions to set a field, based on the type of the field to be set: ap_set_string_slot
, ap_set_string_slot_lower
, ap_set_int_slot
, ap_set_flag_slot
, or ap_set_file_slot
.
Our earlier function txt_set_var
is, in fact, an exact copy of ap_set_string_slot
. Since the fields we are setting are actually filenames, we should instead use ap_set_file_slot
. This means that the user can specify either absolute or relative pathnames for the file, and Apache will resolve them correctly according to the underlying filesystem and server root. So we can reduce our mod_txt
configuration to the following code:
We’ve improved our configuration without writing any configuration functions at all!
These functions are provided for directives in the directory hierarchy. There are no equivalent functions for implementing configuration directives in the server hierarchy, so we always have to write our own.
The preceding example used OR_ALL
to say that TxtHeader
/TxtFooter
can be used anywhere in httpd.conf
or in any .htaccess
file (provided htaccess
is enabled on the server). We could instead have used any of these options:
RSRC_CONF
: httpd.conf
at top level or in a VirtualHost
context. All directives using server configuration should use this option, as other contexts are meaningless for a server configuration.ACCESS_CONF
: httpd.conf
in a directory context. This option is appropriate to per-directory configuration directives for a server administrator only. It is often combined (using OR) with RSRC_CONF
to allow its use anywhere within httpd.conf
, giving rise to the “gotcha” mentioned earlier related to directives appearing in ambiguous contexts.OR_LIMIT, OR_OPTIONS, OR_FILEINFO, OR_AUTHCFG, OR_INDEXES:
extend ACCESS_CONF
to allow use of the directive in .htaccess
, where permitted by the AllowOverride
setting.An additional value, EXEC_ON_READ
, can be ORed with any of the preceding options to take control of parsing httpd.conf
into a module. We can use this to implement containers in configuration, as described in Section 9.7.
The preceding example used the AP_INIT_TAKE1
macro, which defines a function having a single string argument. This is one of several such macros defined in http_config.h
:
AP_INIT_NO_ARGS
—no argumentsAP_INIT_FLAG
—a single On/Off argumentAP_INIT_TAKE1
—a single string, file or numeric argumentAP_INIT_TAKE2, AP_INIT_TAKE3
—two/three argumentsAP_INIT_TAKE12
, and so on—directives taking variable numbers of argumentsAP_INIT_ITERATE
—function will be called repeatedly with each of an unspecified number of argumentsAP_INIT_ITERATE2
—function will be called repeatedly with two argumentsAP_INIT_RAW_ARGS
—function will be called with arguments unprocessedLet’s look at some examples. We’ve already seen a TAKE1
case. The other AP_INIT_TAKE*
functions are similar but have different numbers of arguments (those with variable numbers of arguments simply work by passing NULL
values where no argument was specified in the configuration).
In the directory hierarchy, this function can generally be dealt with using ap_set_flag_slot
. For example, in our mod_choices
module from Chapter 6, we need to implement the directive Choices On|Off
. Recall that we have a perdirectory configuration record:
All we need to implement the directive is
In the server hierarchy, you would have to supply a function to set the configuration value, as in an AP_INIT_TAKE1
.
The function is called once for each argument, so it is suitable for directives with variable arguments, all of which have the same semantics.
There are several examples in mod_proxy
, where you can supply a list of addresses or ports to which a proxy is or isn’t allowed to connect. For example:
AllowConnect 21 80 443 8000 8080
This is declared as iterating over the arguments:
Here’s the function: It’s very simple because it has to deal with only one argument at a time. Note that this function is also an example of a directive in the server hierarchy, where we have to look up the server_rec
object from the cmd_parms
supplied.
This is similar to AP_INIT_ITERATE
. It is a two-argument form that takes the first argument from the configuration every time, while iterating over the remaining arguments.
An example is mod_proxy_html
(version 3). The primary purpose of this output filter is to rewrite HTML links into a reverse proxy’s address space. Thus the module needs to know which markup attributes are links and may, therefore, need to be rewritten.
Originally, mod_proxy_html
supported HTML4 and XHTML1, with knowledge of the markup taken directly from the authoritative DTDs (published by the World Wide Web Consortium) and embedded in the module. As its popularity grew beyond those able to adapt it themselves, a frequently requested feature was to support proprietary extensions to HTML. Version 3 accommodates this request by removing the knowledge of HTML from the module and delegating it to the configuration. A configuration directive ProxyHTMLLinks
reads the specification to find out which attributes need to be processed.
A configuration excerpt is bundled with mod_proxy_html
and duplicates the knowledge that was hard-coded into earlier versions:
The arguments to ProxyHTMLLinks
consist of an HTML element followed by a variable number of attributes. We implement this using an ITERATE2
function:
The underlying representation of HTML links used here is an APR hash table of elements, each having an APR array of attributes to be processed. The function first looks up the hash entry for the element (first argument), creates one if none is found, and then appends the attribute (second argument) to the attributes array.
Raw arguments are needed where a directive’s syntax is highly variable and needs to be fully parsed in the configuration function. Such functions by their nature are often long and complex. Instead of giving a real-life example here, we’ll show how to reimplement the previously mentioned set_links
function using raw arguments. The key to this approach is a utility function ap_getword_conf
, which deals with the complexities of parsing arguments that may include whitespace, escape characters, and quotes.
The next topic we need to deal with is managing the configuration hierarchy: how directives set at different levels interact with each other. This is the purpose of the merge functions in the module struct.
A merge function is called whenever directives appear in more than one container in httpd.conf
. It resolves conflicts between directives in the various containers that may be applicable.
Normal behavior in the directory hierarchy meets the following criteria:
<Directory>
or <Location>
overrides a configuration that isn’t in any container..htaccess
file (if enabled) overrides httpd.conf
for the same directory.<Location>
overrides <Files>
, which overrides <Directory>
and .htaccess
.<Location>
and <Files>
containers override each other based not on specificity, but rather on the order in which they appear in httpd.conf
.mod_alias
hooking a relevant function before the map_to_storage
phase.It is strongly recommended that <Directory>
and <Location>
containers should never have an overlapping scope: That way confusion lies! But that’s an issue for system administrators to manage.
Normal behavior in the server hierarchy is simpler: We just need to merge <VirtualHost>
containers with the top-level configuration.
Consider the following example:
with directives to set a
, b
, and c
, and used with the configuration
We normally want a request to /somewhere/else/again/
to have the following behavior:
a
is set to 123 and c
is set to 321; b
is unset.b
to 456. Because a
and c
are not set (overridden) at this level, the previous values are inherited in the merge./var/www/somewhere/else/
, so this level simply inherits from the parent without any need for a merge.c
by overriding the previous setting, while inheriting the previous values of a
and b
. Now we have a
= 123, b
= 456, and c
= 789.If we use <Location>
instead of <Directory>
, then the precedence changes, and the last <Location>
overrides the earlier ones despite being less specific.
Because only the module itself knows the semantics of its own configuration directives, only the module itself can actually implement this behavior. This task is the business of a merge_config
function, which Apache will call whenever directives applicable to the module appear in more than one container. If no such function is provided by the module, configuration cannot be inherited. Thus, in the preceding example, c
is set to 789 at /var/www/somewhere/else/again/
but a
and b
are unset.
The merge function follows this generic form:
Often we may need to do something a little more complex—for example, merge nontrivial structures, or deal with cases where there is no meaningful UNSET
value to test. When merging structures that involve pointers, take care when modifying the originals: It’s safer to make a copy unless you’re using a standard APR data type and its merge functions. You’ll have to make this decision for each case based on its merits.
The next example demonstrates the potential pitfalls in merging structures. Consider a module that supports an unlimited number of some kind of rule in its configuration, and uses a linked list in the configuration struct to represent them:
The configuration function for setting a myrule
is simple enough: We append the new rule at the end of the list, to ensure the rules are applied in the same order as they appear in httpd.conf
:
When we perform the merge, we want the add rules to take precedence, so we put them first. But there’s a pitfall awaiting us if we try to merge using pointers without copying:
This code fails in the general case, because when we appended the base
list to conf
, we actually modified the add
list itself. Add a nontrivial configuration into the mix, and we could easily end up appending add
to itself, leading to a circular list and causing Apache to spin as soon as it applies the rules in processing a request with the merged list.
To avoid this risk, our merge function needs to copy the entire list:
Note that we could have simplified this example by using appropriate APR types. In this case, we could have used the APR array type apr_array_header_t
in place of our linked list, and we could have then used apr_array_append
in our merge function:
For most purposes, the configuration we’ve introduced here offers ample control. Configuration directives don’t care where they appear so long as they are syntactically correct and follow the rules of the appropriate hierarchy (directory or server). Apache itself will manage the hierarchy, and all the module should do is provide a merge function. But occasionally a directive might care where it appears. For example, if it concerns support for a virtual filesystem, it might want to know if it’s within the filesystem in question. And what is the effect of a directive appearing in a context such as <Limit>
that is not part of either hierarchy?
If a configuration function needs to know its context, the information is available in the cmd_parms
struct. The most useful way to access this information, however, is through the function ap_check_cmd_context
from http_config.h
. It provides us with the promised workaround for directory-hierarchy directives appearing misleadingly in a <VirtualHost>
container: We can permit our directive to appear at the top level with RSRC_CONF
or OR_ALL
, yet generate a syntax error if our directive appears in a <VirtualHost>
:
NOT_IN_VIRTUALHOST
is one of several macros we can test in this manner. Others include NOT_IN_LIMIT
, NOT_IN_DIRECTORY
, NOT_IN_LOCATION
, NOT_IN_FILES
, NOT_IN_DIR_LOC_FILE
, and GLOBAL_ONLY
. These macros can be used with a logical OR, and ap_check_cmd_context
will return NULL
if and only if the conditions are satisfied.
<Limit>
is traditionally associated with authentication and access control. After an example was published, it became cargo-cult knowledge, and even today some sources imply that it is an integral part of authentication. In fact, <Limit>
is rarely useful in a regular webserver and, in the context of security, it can be dangerous. Examples of good <Limit>
usage can be found in DAV and Subversion.
The <Limit>
and <LimitExcept>
containers provide a context in which directives may or may not apply, depending on the HTTP method used. Unlike with the standard hierarchy containers, this usage is not automatic, but rather requires cooperation from modules.
Configuration functions can find out if they are in a <Limit>
section by checking the “limited” field of the cmd_parms
: It is set to –1 when not in a <Limit>
or <LimitExcept>
, or to a bit field of <Limit>
ed method numbers. You might wish to use this approach when a directive is applicable only to certain methods, to generate a syntax error if the directive is <Limit>
ed to inappropriate methods:
Alternatively, a directive may unconditionally refuse to work in a <Limit>
by using ap_check_command_context
with NOT_IN_LIMIT
.
Modules more commonly want to know whether they are <Limit>
ed later, when processing a request. At this point, there is an actual request, and hence a method to check against the <Limit>
.
The most common example of a directive that works with <Limit>
is Require
. It is implemented by the core, and accessed by authorization modules (Chapter 7). First, the configuration function records any <Limit>
:
Second, the authorization handlers check the request method against the limit mask of the Require
directive:
So far, we have discussed the standard configuration containers that define the two hierarchies. But an httpd.conf
may contain other sections as well:
In terms of its implementation, a container is simply an extended form of a directive. We can process its entire contents with AP_INIT_RAW_ARGS
, setting the EXEC_ON_READ
flag to indicate that we will do something other than just passively consume the line.
The simplest example of a container is <Comment>
, from mod_comment
:[1]
This container is implemented as a directive <Comment
. Note that the directive here includes the opening angle bracket, but not the closing one: This is because arguments to a container directive will precede the closing bracket.
Now, of course, the start_comment
function applies to the opening <Comment
. But instead of consuming a single line, it takes over processing the input, returning control to the caller only when it reaches the closing </Comment>
.
A particularly interesting example of a container is <Macro>
, from mod_macro
.[2] It introduces macros into Apache configuration. A complementary Use
directive instantiates the macro with arguments matching the <Macro>
template. For example, if we have lots of virtual hosts with similar configurations, we could save ourselves from a lot of repetition by making the basic virtual host skeleton into a macro.
First, we define vhost
as a macro:
Next, we use it to declare virtual hosts using just a single line per host:
To implement this, mod_macro
defines a macro_t
type:
The handler for <Macro
creates and populates a macro_t
structure, while Use
activates the macro’s contents with the arguments supplied. The module is too complex to include here in detail (and to do so would also require its license to be reproduced in full), so we’ll just look at it in outline form.
The function implementing the <Macro>
container is macro_section
. The function implementing the Use
directive is use_macro
. Both functions are declared as AP_INIT_RAW_ARGS
with EXEC_ON_READ
.
The prototype for ap_pcfg_open_custom
is
The preparation for it is omitted here for the sake of brevity, but is based on passing the contents in the param
argument, and supplying functions to read from those contents.
Whereas single-line directives and (occasionally) containers serve well for most modules, sometimes we may want to use more complex forms of configuration, or read the configuration from a standard format such as SQL or XML that may be well suited to a particular module’s requirements. A simple way to take advantage of different formats is to use a configuration directive that takes the name of a configuration file as an argument. The configuration function then reads the file. Variants on this approach include querying a database or running an XPath query on an XML module-configuration file.
Modules can also rely on variables in the Apache core for configuration. For example:
r->handler
field to determine whether to accept a request (as we saw in Chapter 5), so we never need to implement our own directive for this purpose.mod_deflate
reads environment variables such as nogzip
to determine whether to compress a document when the compression filter is active. This approach delegates configuration to modules that set environment variables, and enables configuration using directives such as BrowserMatch
.Configuration is a basic topic, and one that is essential to nearly every module and application. Apache’s configuration is largely straightforward once you appreciate how the hierarchies work and how they relate to one another. Implementing the configuration directives for your modules is usually simple. Although this holds some subtleties (such as <Limit>
sections), these exist largely to maintain backward compatibility among the standard modules, and can usually be ignored by applications.
Specific topics we have looked at in this chapter include the following:
command_rec
, and defining and implementing commandsThis chapter complements the discussion of HTTP request processing in Chapters 5–8, and concludes our presentation of core topics. In the next chapters, we move on to more advanced topics that may be of interest to many, but not all, applications developers: providing a new API or service for other modules, and working with an SQL database.
18.216.55.20