Chapter 9. Configuration for Modules

Most modules need to offer system administrators and users some means of configuring and controlling them. In some cases, this may even be the primary purpose of a module.

System administrators configure Apache using httpd.conf, while end users have more limited control through .htaccess files. Modules give control to both parties by implementing configuration directives that can be used in these files.

This chapter discusses how to implement configuration directives in a module and how to work with directives implemented by other modules.

9.1. Configuration Basics

From the system administrator’s point of view, several kinds of directives exist. These can be broadly classified according to their scope and validity in the configuration files. In other words, some directives are valid for the server as a whole, whereas others apply within a scope such as <VirtualHost> or <Directory>.

Conflicting directives may override each other on the basis of order and specificity. For example, where there is a conflict, a directive in a .htaccess file overrides one set in the same scope in httpd.conf (provided the system administrator has enabled. htaccess). In most cases, this applies recursively, although this is controlled by individual modules whose behavior may differ.

Apache supports the following standard contexts:

Main Configuration

Directives appearing in httpd.conf but not inside any container, apply globally, except where overridden. This context is appropriate for setting system defaults such as MIME types, and for once-only setup such as loading modules. Most directives can be used here.

Virtual Host

Each virtual host has its (virtual) server-wide configuration set within a <VirtualHost> container. Most directives that are valid in the main configuration are also valid in a virtual host, and vice versa.

Directory

The <Directory>, <Files> and <Location> containers define a hierarchy within which configuration can be set and overridden at any level. This is the most usual form of configuration, and is orthogonal to the virtual hosts. In the interests of brevity, we’ll refer to this configuration collectively as the directory hierarchy.

.htaccess

.htaccess files are an extension of the directory hierarchy that enables users to set directives for themselves, subject to permissions (the AllowOverride directive) set up by the server administrator. The .htaccess files also differ from normal configuration in that, when enabled, they are reread by Apache for every request. This scheme serves two purposes: Users don’t have to bug the administrator to restart the server, and it avoids potential security issues of processing user inputs while Apache has root privilege. Setting AllowOverride to enable .htaccess files is always a compromise: It imposes a significant performance overhead and loses the security enjoyed by a tightly controlled configuration, but it empowers users who are not permitted to manipulate httpd.conf.

Additionally, modules may themselves implement their own containers. For example, mod_proxy implements <Proxy>, and mod_perl implements <Perl>.

9.2. Configuration Data Structs

As noted in Section 9.1, there are two orthogonal hierarchies of configuration directives: (virtual) hosts and directories. Internally, this dual hierarchy is based on having two different data structs: the per-server configuration and the per-directory configuration. In fact, every module has its own pointers for implementing each of these structs, although either or both can be unused (NULL), and it is unusual for a module to use both of them.

The per-server configuration is kept on the server_rec, of which there is one for each virtual host, created at server start-up. The per-directory configuration is exposed to modules via the request_rec and may be computed using the merge function for every request.

The configuration structs are instances of configuration vectors, as seen in Chapter 4. Those discussed in this chapter are used for configuration that is initialized at server start-up and should be accessed as read-only thereafter.

9.3. Managing a Module Configuration

9.3.1. Module Configuration

No less than five out of the six (usable) elements of the Apache module struct are concerned with configuration:

image

It is up to each module whether and how to define each configuration struct. Whenever a struct is defined, the module must implement an appropriate create function to allocate and (usually) initialize it:

image

At this point, just allocating and returning a struct of the right size is often sufficient: Apache uses the return value. Now these values can be accessed at any time a server_rec or request_rec, respectively, is available:

image

9.3.2. Server and Directory Configuration

So why does Apache have two separate configurations, how are they related, and which should your module use?

Most directives work in the directory hierarchy—for example, all the directives from our mod_choices and mod_txt modules in Chapters 6 and 8 do so. This approach offers the greatest flexibility to system administrators who want to control the configuration and to deploy different configurations in different areas of their server, with <Directory>, <Files>, <Location>, and pattern-matching versions such as <DirectoryMatch>, and, subject to AllowOverride settings, .htaccess. When in doubt, implementing a directive in the directory configuration is unlikely to be wrong!

The server hierarchy is simpler. There is no nesting, and only two levels are available: top level or inside a <VirtualHost>. This approach is appropriate in the following cases:

  • Any configuration that needs to be accessed outside the scope of processing a request—for example, in a post_config or child_init hook
  • Directives explicitly concerned with virtual host configuration
  • Situations where the directory hierarchy is meaningless or irrelevant, such as in a forward proxy configuration
  • Managing a persistent resource such as a database connection pool or a cache

Gotcha!

There is a subtle “gotcha” with directory configuration. When a directive is allowed to appear at the top level in httpd.conf (i.e., outside any <Directory>/etc. container), it is also syntactically valid inside a <VirtualHost>. But the <VirtualHost> container has no meaning in the directory hierarchy. Thus setting per-directory configuration in a virtual host requires a <Directory> or similar container, in addition to the <VirtualHost>. That’s why, for example, most access and authentication control directives are disallowed at the top level.

Configuration directives on the server hierarchy can, and should, address this issue simply by making themselves syntactically invalid in a <Directory> context.

On a related theme, it is important not to confuse the two hierarchies. The ProxyPassReverse directive in early releases of Apache 2.0 mod_proxy offers a cautionary lesson. ProxyPassReverse directives were valid in a <Location> context, but were held in the per-server configuration. As a consequence, if multiple ProxyPassReverse directives appeared in different <Location> contexts, they would overwrite each other and only the last one would work.

9.4. Implementing Configuration Directives

The my_cmds field of the module struct mentioned earlier is a null-terminated array containing the commands implemented by the module. Normally, these commands are defined using macros defined in http_config.h. For example:

image

AP_INIT_TAKE1 is one of many such macros, all having the same prototype (more on that later). It has the following arguments:

  1. Directive name
  2. Function implementing the directive
  3. Data pointer (often NULL)
  4. Context in which this directive is allowed
  5. A brief help message for the directive

9.4.1. Configuration Functions

An essential component of every directive is the function implementing it. Normally, the function serves to set some data field(s) in one of the configuration structs. The function prototype for AP_INIT_TAKE1 is the same, regardless of whether we’re setting per-server or per-directory configuration:

image

cmd is a cmd_parms_struct comprising a number of fields used internally by Apache and available to modules. The following fields are most likely to be of interest in command functions:

  • void*  info—contains my_ptr from the command declaration
  • apr_pool_t*  pool—pool for permanent resource allocation
  • apr_pool_t*  temp_pool—pool for temporary resource allocation
  • server_rec*  server—the server_rec

Other fields are more commonly accessed through accessor functions on the rare occasions when a command function needs to be context sensitive. Here is the full declaration:

image

cfg is the directory configuration rec, and arg is an argument to the directive set in the configuration file we are processing. Because we specified AP_INIT_TAKE1, there is exactly one argument. Thus, if we are setting per-directory configuration, we just cast the cfg argument. If we are setting per-server configuration, we need to retrieve this argument from the server_rec object in cmd_parms instead.

9.4.2. Example

We can now look at a simple example. Our mod_txt in Chapter 8 needs a user-defined header and footer, each of which is a separate file. Let’s go ahead and implement the configuration for it. We would like to be able to specify different headers and footers at will, so that a user can apply different looks-and-feels to different areas of a site. Thus we need to implement these directives in the directory hierarchy.

image

Now we need to implement the functions to set the header and footer. Just for the moment, we’ll simply set them and ignore checking that they’re really files, they’re accessible to the server, and displaying them in a webpage won’t be a security risk.

image

9.4.3. User Data in Configuration Functions

In the preceding example, we implemented two essentially identical functions to set different fields of the configuration. We can consolidate these functions into a single function by passing it a context variable in cmd->info. Apache (APR) provides a handy macro for passing a pointer to individual fields of a configuration struct, so we can just set its contents:

image

9.4.4. Prepackaged Configuration Functions

In general, we write our own function to implement a directive. This step is not always necessary, however. When a directive simply sets a field in the directory configuration, we can use one of the prepackaged functions to set a field, based on the type of the field to be set: ap_set_string_slot, ap_set_string_slot_lower, ap_set_int_slot, ap_set_flag_slot, or ap_set_file_slot.

Our earlier function txt_set_var is, in fact, an exact copy of ap_set_string_slot. Since the fields we are setting are actually filenames, we should instead use ap_set_file_slot. This means that the user can specify either absolute or relative pathnames for the file, and Apache will resolve them correctly according to the underlying filesystem and server root. So we can reduce our mod_txt configuration to the following code:

image

We’ve improved our configuration without writing any configuration functions at all!

These functions are provided for directives in the directory hierarchy. There are no equivalent functions for implementing configuration directives in the server hierarchy, so we always have to write our own.

9.4.5. Scope of Configuration

The preceding example used OR_ALL to say that TxtHeader/TxtFooter can be used anywhere in httpd.conf or in any .htaccess file (provided htaccess is enabled on the server). We could instead have used any of these options:

  • RSRC_CONF: httpd.conf at top level or in a VirtualHost context. All directives using server configuration should use this option, as other contexts are meaningless for a server configuration.
  • ACCESS_CONF: httpd.conf in a directory context. This option is appropriate to per-directory configuration directives for a server administrator only. It is often combined (using OR) with RSRC_CONF to allow its use anywhere within httpd.conf, giving rise to the “gotcha” mentioned earlier related to directives appearing in ambiguous contexts.
  • OR_LIMIT,  OR_OPTIONS,  OR_FILEINFO,  OR_AUTHCFG,  OR_INDEXES: extend ACCESS_CONF to allow use of the directive in .htaccess, where permitted by the AllowOverride setting.

An additional value, EXEC_ON_READ, can be ORed with any of the preceding options to take control of parsing httpd.conf into a module. We can use this to implement containers in configuration, as described in Section 9.7.

9.4.6. Configuration Function Types

The preceding example used the AP_INIT_TAKE1 macro, which defines a function having a single string argument. This is one of several such macros defined in http_config.h:

  • AP_INIT_NO_ARGS—no arguments
  • AP_INIT_FLAG—a single On/Off argument
  • AP_INIT_TAKE1—a single string, file or numeric argument
  • AP_INIT_TAKE2,  AP_INIT_TAKE3—two/three arguments
  • AP_INIT_TAKE12, and so on—directives taking variable numbers of arguments
  • AP_INIT_ITERATE—function will be called repeatedly with each of an unspecified number of arguments
  • AP_INIT_ITERATE2—function will be called repeatedly with two arguments
  • AP_INIT_RAW_ARGS—function will be called with arguments unprocessed

Let’s look at some examples. We’ve already seen a TAKE1 case. The other AP_INIT_TAKE* functions are similar but have different numbers of arguments (those with variable numbers of arguments simply work by passing NULL values where no argument was specified in the configuration).

AP_INIT_FLAG

In the directory hierarchy, this function can generally be dealt with using ap_set_flag_slot. For example, in our mod_choices module from Chapter 6, we need to implement the directive Choices  On|Off. Recall that we have a perdirectory configuration record:

image

All we need to implement the directive is

image

In the server hierarchy, you would have to supply a function to set the configuration value, as in an AP_INIT_TAKE1.

AP_INIT_ITERATE

The function is called once for each argument, so it is suitable for directives with variable arguments, all of which have the same semantics.

There are several examples in mod_proxy, where you can supply a list of addresses or ports to which a proxy is or isn’t allowed to connect. For example:
AllowConnect  21  80  443  8000  8080

This is declared as iterating over the arguments:

image

Here’s the function: It’s very simple because it has to deal with only one argument at a time. Note that this function is also an example of a directive in the server hierarchy, where we have to look up the server_rec object from the cmd_parms supplied.

image

AP_INIT_ITERATE2

This is similar to AP_INIT_ITERATE. It is a two-argument form that takes the first argument from the configuration every time, while iterating over the remaining arguments.

An example is mod_proxy_html (version 3). The primary purpose of this output filter is to rewrite HTML links into a reverse proxy’s address space. Thus the module needs to know which markup attributes are links and may, therefore, need to be rewritten.

Originally, mod_proxy_html supported HTML4 and XHTML1, with knowledge of the markup taken directly from the authoritative DTDs (published by the World Wide Web Consortium) and embedded in the module. As its popularity grew beyond those able to adapt it themselves, a frequently requested feature was to support proprietary extensions to HTML. Version 3 accommodates this request by removing the knowledge of HTML from the module and delegating it to the configuration. A configuration directive ProxyHTMLLinks reads the specification to find out which attributes need to be processed.

A configuration excerpt is bundled with mod_proxy_html and duplicates the knowledge that was hard-coded into earlier versions:

image

The arguments to ProxyHTMLLinks consist of an HTML element followed by a variable number of attributes. We implement this using an ITERATE2 function:

image

The underlying representation of HTML links used here is an APR hash table of elements, each having an APR array of attributes to be processed. The function first looks up the hash entry for the element (first argument), creates one if none is found, and then appends the attribute (second argument) to the attributes array.

AP_INIT_RAW_ARGS

Raw arguments are needed where a directive’s syntax is highly variable and needs to be fully parsed in the configuration function. Such functions by their nature are often long and complex. Instead of giving a real-life example here, we’ll show how to reimplement the previously mentioned set_links function using raw arguments. The key to this approach is a utility function ap_getword_conf, which deals with the complexities of parsing arguments that may include whitespace, escape characters, and quotes.

image

9.5. The Configuration Hierarchy

The next topic we need to deal with is managing the configuration hierarchy: how directives set at different levels interact with each other. This is the purpose of the merge functions in the module struct.

A merge function is called whenever directives appear in more than one container in httpd.conf. It resolves conflicts between directives in the various containers that may be applicable.

Normal behavior in the directory hierarchy meets the following criteria:

  1. Any applicable <Directory> or <Location> overrides a configuration that isn’t in any container.
  2. A .htaccess file (if enabled) overrides httpd.conf for the same directory.
  3. A directory’s configuration overrides a parent directory’s configuration.
  4. Any applicable <Location> overrides <Files>, which overrides <Directory> and .htaccess.
  5. The <Location> and <Files> containers override each other based not on specificity, but rather on the order in which they appear in httpd.conf.
  6. Where configuration values are not explicitly set, they are inherited rather than overridden.
  7. These relationships may be influenced by a module such as mod_alias hooking a relevant function before the map_to_storage phase.

Note

It is strongly recommended that <Directory> and <Location> containers should never have an overlapping scope: That way confusion lies! But that’s an issue for system administrators to manage.

Normal behavior in the server hierarchy is simpler: We just need to merge <VirtualHost> containers with the top-level configuration.

Consider the following example:

image

with directives to set a, b, and c, and used with the configuration

image

We normally want a request to /somewhere/else/again/ to have the following behavior:

  1. At the top level, a is set to 123 and c is set to 321; b is unset.
  2. The first merge sets b to 456. Because a and c are not set (overridden) at this level, the previous values are inherited in the merge.
  3. There are no configuration directives at /var/www/somewhere/else/, so this level simply inherits from the parent without any need for a merge.
  4. The second merge sets the value of c by overriding the previous setting, while inheriting the previous values of a and b. Now we have a = 123, b = 456, and c = 789.

If we use <Location> instead of <Directory>, then the precedence changes, and the last <Location> overrides the earlier ones despite being less specific.

Because only the module itself knows the semantics of its own configuration directives, only the module itself can actually implement this behavior. This task is the business of a merge_config function, which Apache will call whenever directives applicable to the module appear in more than one container. If no such function is provided by the module, configuration cannot be inherited. Thus, in the preceding example, c is set to 789 at /var/www/somewhere/else/again/ but a and b are unset.

The merge function follows this generic form:

image

Often we may need to do something a little more complex—for example, merge nontrivial structures, or deal with cases where there is no meaningful UNSET value to test. When merging structures that involve pointers, take care when modifying the originals: It’s safer to make a copy unless you’re using a standard APR data type and its merge functions. You’ll have to make this decision for each case based on its merits.

The next example demonstrates the potential pitfalls in merging structures. Consider a module that supports an unlimited number of some kind of rule in its configuration, and uses a linked list in the configuration struct to represent them:

image

The configuration function for setting a myrule is simple enough: We append the new rule at the end of the list, to ensure the rules are applied in the same order as they appear in httpd.conf:

image

When we perform the merge, we want the add rules to take precedence, so we put them first. But there’s a pitfall awaiting us if we try to merge using pointers without copying:

image

This code fails in the general case, because when we appended the base list to conf, we actually modified the add list itself. Add a nontrivial configuration into the mix, and we could easily end up appending add to itself, leading to a circular list and causing Apache to spin as soon as it applies the rules in processing a request with the merged list.

To avoid this risk, our merge function needs to copy the entire list:

image

Note that we could have simplified this example by using appropriate APR types. In this case, we could have used the APR array type apr_array_header_t in place of our linked list, and we could have then used apr_array_append in our merge function:

image

9.6. Context in Configuration Functions

For most purposes, the configuration we’ve introduced here offers ample control. Configuration directives don’t care where they appear so long as they are syntactically correct and follow the rules of the appropriate hierarchy (directory or server). Apache itself will manage the hierarchy, and all the module should do is provide a merge function. But occasionally a directive might care where it appears. For example, if it concerns support for a virtual filesystem, it might want to know if it’s within the filesystem in question. And what is the effect of a directive appearing in a context such as <Limit> that is not part of either hierarchy?

9.6.1. Context Checking

If a configuration function needs to know its context, the information is available in the cmd_parms struct. The most useful way to access this information, however, is through the function ap_check_cmd_context from http_config.h. It provides us with the promised workaround for directory-hierarchy directives appearing misleadingly in a <VirtualHost> container: We can permit our directive to appear at the top level with RSRC_CONF or OR_ALL, yet generate a syntax error if our directive appears in a <VirtualHost>:

image

NOT_IN_VIRTUALHOST is one of several macros we can test in this manner. Others include NOT_IN_LIMIT, NOT_IN_DIRECTORY, NOT_IN_LOCATION, NOT_IN_FILES, NOT_IN_DIR_LOC_FILE, and GLOBAL_ONLY. These macros can be used with a logical OR, and ap_check_cmd_context will return NULL if and only if the conditions are satisfied.

9.6.2. Method and <Limit>

Caution

<Limit> is traditionally associated with authentication and access control. After an example was published, it became cargo-cult knowledge, and even today some sources imply that it is an integral part of authentication. In fact, <Limit> is rarely useful in a regular webserver and, in the context of security, it can be dangerous. Examples of good <Limit> usage can be found in DAV and Subversion.

The <Limit> and <LimitExcept> containers provide a context in which directives may or may not apply, depending on the HTTP method used. Unlike with the standard hierarchy containers, this usage is not automatic, but rather requires cooperation from modules.

Configuration functions can find out if they are in a <Limit> section by checking the “limited” field of the cmd_parms: It is set to –1 when not in a <Limit> or <LimitExcept>, or to a bit field of <Limit>ed method numbers. You might wish to use this approach when a directive is applicable only to certain methods, to generate a syntax error if the directive is <Limit>ed to inappropriate methods:

image

Alternatively, a directive may unconditionally refuse to work in a <Limit> by using ap_check_command_context with NOT_IN_LIMIT.

Modules more commonly want to know whether they are <Limit>ed later, when processing a request. At this point, there is an actual request, and hence a method to check against the <Limit>.

The most common example of a directive that works with <Limit> is Require. It is implemented by the core, and accessed by authorization modules (Chapter 7). First, the configuration function records any <Limit>:

image

Second, the authorization handlers check the request method against the limit mask of the Require directive:

image

9.7. Custom Configuration Containers

So far, we have discussed the standard configuration containers that define the two hierarchies. But an httpd.conf may contain other sections as well:

image

In terms of its implementation, a container is simply an extended form of a directive. We can process its entire contents with AP_INIT_RAW_ARGS, setting the EXEC_ON_READ flag to indicate that we will do something other than just passively consume the line.

The simplest example of a container is <Comment>, from mod_comment:[1]

image

This container is implemented as a directive <Comment. Note that the directive here includes the opening angle bracket, but not the closing one: This is because arguments to a container directive will precede the closing bracket.

image

Now, of course, the start_comment function applies to the opening <Comment. But instead of consuming a single line, it takes over processing the input, returning control to the caller only when it reaches the closing </Comment>.

image

A particularly interesting example of a container is <Macro>, from mod_macro.[2] It introduces macros into Apache configuration. A complementary Use directive instantiates the macro with arguments matching the <Macro> template. For example, if we have lots of virtual hosts with similar configurations, we could save ourselves from a lot of repetition by making the basic virtual host skeleton into a macro.

First, we define vhost as a macro:

image

Next, we use it to declare virtual hosts using just a single line per host:

image

To implement this, mod_macro defines a macro_t type:

image

The handler for <Macro creates and populates a macro_t structure, while Use activates the macro’s contents with the arguments supplied. The module is too complex to include here in detail (and to do so would also require its license to be reproduced in full), so we’ll just look at it in outline form.

The function implementing the <Macro> container is macro_section. The function implementing the Use directive is use_macro. Both functions are declared as AP_INIT_RAW_ARGS with EXEC_ON_READ.

image

image

The prototype for ap_pcfg_open_custom is

image

The preparation for it is omitted here for the sake of brevity, but is based on passing the contents in the param argument, and supplying functions to read from those contents.

9.8. Alternative Configuration Methods

Whereas single-line directives and (occasionally) containers serve well for most modules, sometimes we may want to use more complex forms of configuration, or read the configuration from a standard format such as SQL or XML that may be well suited to a particular module’s requirements. A simple way to take advantage of different formats is to use a configuration directive that takes the name of a configuration file as an argument. The configuration function then reads the file. Variants on this approach include querying a database or running an XPath query on an XML module-configuration file.

Modules can also rely on variables in the Apache core for configuration. For example:

  • Content generators check the r->handler field to determine whether to accept a request (as we saw in Chapter 5), so we never need to implement our own directive for this purpose.
  • mod_deflate reads environment variables such as nogzip to determine whether to compress a document when the compression filter is active. This approach delegates configuration to modules that set environment variables, and enables configuration using directives such as BrowserMatch.

9.9. Summary

Configuration is a basic topic, and one that is essential to nearly every module and application. Apache’s configuration is largely straightforward once you appreciate how the hierarchies work and how they relate to one another. Implementing the configuration directives for your modules is usually simple. Although this holds some subtleties (such as <Limit> sections), these exist largely to maintain backward compatibility among the standard modules, and can usually be ignored by applications.

Specific topics we have looked at in this chapter include the following:

  • The directory and server configurations
  • Configuration data structs
  • The command_rec, and defining and implementing commands
  • Configuration macros and function prototypes
  • Custom and prepackaged configuration functions
  • The configuration hierarchy and merge functions
  • Context, scope, limitations, and availability of configuration records
  • Configuration containers

This chapter complements the discussion of HTTP request processing in Chapters 58, and concludes our presentation of core topics. In the next chapters, we move on to more advanced topics that may be of interest to many, but not all, applications developers: providing a new API or service for other modules, and working with an SQL database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.188.138