11.9. Launching Subprocesses

The last topic we discuss is the API for launching subprocesses. While we don't like to encourage the creation of subprocesses because of the load they impose on a server, there are certain modules that need to do so. In fact, for certain modules, such as mod_cgi, launching subprocesses is their entire raison d'être.

Because Apache is a complex beast, calling fork() to spawn a new process within a server process is not something to be done lightly. There are a variety of issues to contend with, including, but not limited to, signal handlers, alarms, pending I/O, and listening sockets. For this reason, you should use Apache's published API to implement fork and exec, rather than trying to roll your own with the standard C functions.

In addition to discussing the subprocess API, this section covers a number of function calls that help in launching CGI scripts and setting up the environment for subprocesses.


void ap_add_cgi_vars (request_rec *r)


void ap_add_common_vars (request_rec *r)

(Declared in the header file util_script.h.) By convention, modules that need to launch subprocesses copy the contents of the current request record's subprocess_env table into the child process's environment first. This table starts out empty, but modules are free to add to it. For example, mod_env responds to the PassEnv, SetEnv, and UnsetEnv directives by setting or unsetting variables in an internal table. Then, during the request fixup phase, it copies these values into subprocess_env so that the variables are exposed to the environment by any content handler that launches a subprocess.

These two routines are called by mod_cgi to fill up the subprocess_env table with the standard CGI environment variables in preparation for launching a CGI script. You may want to use one or both yourself in order to initialize the environment to a standard state.

add_cgi_vars() sets up the environment variables that are specifically called for by the CGI/1.1 protocol. This includes GATEWAY_INTERFACE, QUERY_STRING, REQUEST_METHOD, PATH_INFO, and PATH_TRANSLATED, among others.

ap_add_common_vars() adds other common CGI environment variables to subprocess_env. This includes various HTTP_ variables that hold incoming HTTP headers from the request such as HTTP_USER_AGENT and HTTP_REFERER, as well as such useful variables as PATH, SERVER_NAME, SERVER_PORT, SERVER_ROOT, and SCRIPT_FILENAME.


char **ap_create_environment (pool *p, table *t)

(Declared in the header file util_script.h.) Among the arguments you need when exec ing a program with the ap_call_exec() command is an environment array. This function will take the key/value pairs contained in an Apache table and turn it into a suitable array. Usually you'll want to use the subprocess_env table for this purpose in order to be compatible with mod_cgi and mod_env.

char **env = ap_create_environment(r->pool, r->subprocess_env);


int ap_can_exec (const struct stat*)

(Declared in the header file httpd.h.) This utility routinely checks whether a file is executable by the current process user and/or group ID. You pass it the pointer to a stat structure, often the info field of the current request record. It returns a true value if the file is executable, false otherwise:

if(!ap_can_exec(&r->info)) {
    . . . log nasty error message . . . 


    return HTTP_FORBIDDEN;
}


int ap_bspawn_child (pool *p, int (*)(void *, child_info *), void *data, enum kill_conditions, BUFF **pipe_in, BUFF **pipe_out, BUFF **pipe_err)

(Declared in the header file buff.h.) The ap_bspawn_child() function is a mixture of the Unix fork() and popen() calls. It can be used to open up a pipe to a child process or just to fork off a child process to execute in the background.

This function has many arguments. The first argument, p, is a pool pointer. The current request's resource pool is the usual choice. The second argument is a function pointer with the following prototype:

int child_routine (void *data, child_info *pinfo);

After forking, Apache will immediately call child_routine() with a generic data pointer (copied from the third argument to ap_bspawn_child(), which we discuss next) and a child_info pointer, a data type needed for the Win32 port. For all intents and purposes, the child_info argument is an opaque pointer that you pass to ap_call_exec(). It has no other use at present. The child routine should return a nonzero value on success or a zero value on failure.

The third argument to ap_bspawn_child() is data, a generic void pointer. Whatever you use for this argument will be passed to the child routine, and it is a simple way to pass information from the parent process to the child process. Since the child process usually requires access to the current request, it is common to pass a copy of the request_rec in this field.

The fourth argument is kill_conditions, an enumerated data type that affects what Apache does with the spawned child when the server is terminating or restarting. The possibilities, which are defined in alloc.h, are kill_never, to never send a signal to the child; kill_always, to send the child a SIGKILL signal; kill_after_timeout, to send the child a SIGTERM, wait 3 seconds, and then send a SIGKILL; justwait, to wait forever for the child to complete; and kill_only_once, to send a SIGTERM and wait for the child to complete. The usual value is kill_after_timeout, which is the same scheme that Apache uses for the listening servers it spawns.

The last three arguments are pipe_in, pipe_out, and pipe_err. If they are non-NULL, ap_bspawn_child() fills them in with BUFF pointers attached to the standard input, output, and error of the spawned child process. By writing to pipe_in, the parent process will be able to send data to the standard input of the spawned process. By reading from pipe_out and pipe_err, you can retrieve data that the child has written to its standard output and error. Pass NULL for any or all of these arguments if you are not interested in talking to the child.


int ap_spawn_child (pool *p, int (*)(void *, child_info *), void *data, enum kill_conditions, FILE **pipe_in, FILE **pipe_out, FILE **pipe_err)

(Declared in the header file alloc.h.) This function works exactly like ap_bspawn_child() but uses more familiar FILE streams rather than BUFF streams for the I/O connection between the parent and the child. This function is rarely a good choice, however, because it is not compatible with the Win32 port, whereas ap_bspawn_child() is.


void ap_error_log2stderr (server_rec *s)

Once inside a spawned child, this function will rehook the standard error file descriptor back to the server's error log. You may want to do this after calling ap_bspawn_child() and before calling ap_call_exec() so that any error messages produced by the subprocess show up in the server error log:

ap_error_log2stderr(r->server);


void ap_cleanup_ for_exec (void)

(Declared in the header file alloc.h.) You should call this function just before invoking ap_call_exec(). Its main duty is to run all the cleanup handlers for all the main resource pools and all subpools.


int ap_call_exec (request_rec *r, child_info *pinfo, char *argv0, char **env, int shellcmd)

(Declared in the header file util_script.h.) After calling ap_bspawn_child() or ap_spawn_child(), your program will most probably call ap_call_exec() to replace the current process with a new one. The name of the command to run is specified in the request record's filename field, and its command-line arguments, if any, are specified in args. If successful, the new command is run and the call never returns. If preceded by an ap_spawn_child(), the new process's standard input, output, and error will be attached to the BUFF*s created by that call.

This function takes five arguments. The first, r, is the current request record. It is used to set up the argument list for the command. The second, pinfo, is the child_info pointer passed to the function specified by ap_bspawn_child().

argv0 is the command name that will appear as the first item in the launched command's argv[] array. Although this argument is usually the same as the path of the command to run, this is not a necessary condition. It is sometimes useful to lie to a command about its name, particularly when dealing with oddball programs that behave differently depending on how they're invoked.

The fourth argument, env, is a pointer to an environment array. This is typically the pointer returned by ap_create_environment(). The last argument, shellcmd, is a flag indicating whether Apache should pass any arguments to the command. If shellcmd is true, then Apache will not pass any arguments to the command (this is counterintuitive). If shellcmd is false, then Apache will use the value of r->args to set up the arguments passed to the command. The contents of r->args must be in the old-fashioned CGI argument form in which individual arguments are separated by the + symbol and other funny characters are escaped as %XX hex escape sequences. args may not contain the unescaped = or & symbols. If it does, Apache will interpret it as a new-style CGI query string and refuse to pass it to the command. We'll see a concrete example of setting up the arguments for an external command shortly.

There are a few other precautionary steps ap_call_exec() will take. If SUEXEC is enabled, the program will be run through the setuid wrapper. If any of the RLimitCPU, RLimitMEM, or RLimitNPROC directives are enabled, setrlimit will be called underneath to limit the given resource to the configured value.

Finally, for convenience, under OS/2 and Win32 systems ap_call_exec() will implement the "shebang" Unix shell-ism. That is, if the first line of the requested file contains the #! sequence, the remainder of the string is assumed to be the program interpreter which will execute the script.

On Unix platforms, successful calls to ap_call_exec() will not return because the current process has been terminated and replaced by the command. On failure, ap_call_exec() will return -1 and errno will be set.[4] On Win32 platforms, successful calls to ap_call_exec() will return the process ID of the launched process and not terminate the current code. The upcoming example shows how to deal with this.

[4] Note that the source code for ap_call_exec() refers to the return value as the "pid." This is misleading.


void ap_child_terminate (request_rec *r)

If for some reason you need to terminate the current child (perhaps because an attempt to exec a new program has failed), this function causes the child server process to terminate cleanly after the current request. It does this by setting the child's MaxRequests configuration variable to 1 and clearing the keepalive flag so that the current connection is broken after the request is serviced.

ap_child_terminate(r);


int ap_scan_script_header_err_buff (request_rec *r, BUFF *fb, char *buffer)

This function is useful when launching CGI scripts. It will scan the BUFF* stream fb for HTTP headers. Typically the BUFF* is the pipe_out pointer returned from a previous call to ap_bspawn_child(). Provided that the launched script outputs a valid header format, the headers will be added to the request record's headers_out table.

The same special actions are taken on certain headers as were discussed in Chapter 9, when we covered the Perl cgi_header_out() method (see Section 9.1.2" in Section 9.1"). If the headers were properly formatted and parsed, the return value will be OK. Otherwise, HTTP_INTERNAL_SERVER_ERROR or some other error code will be returned. In addition, the function will log errors to the error log.

The buffer argument should be an empty character array allocated to MAX_STRING_LENGTH or longer. If an error occurs during processing, this buffer will be set to contain the portion of the incoming data that generated the error. This may be useful for logging.

char buffer[MAX_STRING_LEN];
if(ap_scan_script_header_err(r, fb, buffer) != OK) {
   ... log nasty error message ...


int ap_scan_script_header_err (request_rec *r, FILE *f, char *buffer)

This function does exactly the same as ap_scan_script_header_err_buff(), except that it reads from a FILE* stream rather than a BUFF* stream. You would use this with the pipe_out FILE* returned by ap_spawn_child().


int ap_scan_script_header_err_core (request_rec *r, char *buffer, int (*getsfunc) (char *, int, void *), void *getsfunc_data)

The tongue-twisting ap_scan_script_header_err_core() function is the underlying routine which implements ap_scan_script_header_err() and ap_scan_script_header_err_buff(). The key component here is the function pointer, getsfunc(), which is called upon to return a line of data in the same way that the standard fgets() function does. For example, here's how ap_scan_script_header_err() works, using the standard fgets() function:

static int getsfunc_FILE(char *buf, int len, void *f)
{
   return fgets(buf, len, (FILE *) f) != NULL;
}

API_EXPORT(int) ap_scan_script_header_err(request_rec *r, FILE *f,
                                         char *buffer)
{
   return scan_script_header_err_core(r, buffer, getsfunc_FILE, f);
}

Your module could replace getsfunc_FILE() with an implementation to read from a string or other resource.

11.9.1. A Practical Example

We are going to say "Goodbye World" now but this time in a very big way. We will add a "goodbye-banner" handler to mod_hello. This handler will run the Unix banner command to print out a large, vertically oriented "Goodbye World" message. Although this is a very simple example compared to what happens inside mod_cgi, it does show you everything you need to write basic fork/exec code. For advanced tricks and subtleties, we recommend you peruse the source code for mod_cgi and mod_include.

The additions to mod_hello.c are shown in Example 11.6. At the top, we add util_script.h to the list of included files and hardcode the absolute path to the banner program in the #define BANNER_PGM.

Example 11.6. Additions to mod_hello.c to Launch a Child Process
#include "util_script.h"
#define BANNER_PGM "/usr/bin/banner"

/* Forward declaration so that ap_get_module_config() can find us. */
module hello_module;

static int banner_child(void *rp, child_info *pinfo)
{
   char **env;
   int child_pid;
   request_rec *r = (request_rec *)rp;

   env = ap_create_environment(r->pool, r->subprocess_env);
   ap_error_log2stderr(r->server);
   r->filename = BANNER_PGM;
   r->args = "-w80+Goodbye%20World";
   ap_cleanup_for_exec();
   child_pid = ap_call_exec(r, pinfo, r->filename, env, 0);
 #ifdef WIN32
   return(child_pid);
 #else
   ap_log_error(APLOG_MARK, APLOG_ERR, NULL, "exec of %s failed", r->filename);
   exit(0);
   /*NOT REACHED*/
   return(0);
 #endif
}

static int goodbye_banner_handler(request_rec *r)
{
   BUFF *pipe_output;
   if (!ap_bspawn_child(r->pool, banner_child,
                        (void *) r, kill_after_timeout,
                        NULL, &pipe_output, NULL)) {
       ap_log_error(APLOG_MARK, APLOG_ERR, r->server,
                    "couldn't spawn child process: %s", BANNER_PGM);
       return HTTP_INTERNAL_SERVER_ERROR;
   }
   r->content_type = "text/plain";
   ap_send_http_header(r);
   ap_send_fb(pipe_output, r);
   ap_bclose(pipe_output);
   return OK;
}

static handler_rec hello_handlers[] =
{
   {"hello-handler", hello_handler},
   {"goodbye-banner-handler", goodbye_banner_handler},
   {NULL}
};

Skipping over the definition of banner_child() for now, look at goodbye_banner_handler(). This is the content handler for the request. We are going to access the output of the banner command, so we declare a BUFF pointer for its standard output. Now we attempt to fork by calling ap_bspawn_child(). We pass the request record's resource pool as the first argument and the address of the banner_child() subroutine as the second. For the third argument, we use a copy of the request_rec, cast to a void*. We use kill_after_timeout for the kill conditions argument, which is the usual choice. We don't care about the banner program's standard input or standard error, so we pass NULL for the fifth and seventh arguments, but we do want to recover the program's output, so we pass the address of the pipe_output BUFF* for the sixth argument.

If ap_bspawn_child() succeeds, there will now be two processes. In the child process, ap_bspawn_child() immediately invokes the banner_child() function, which we will examine momentarily. In the parent process, ap_bspawn_child() returns the process ID of the child. If it encounters an error it will return 0, and the parent logs an error and returns HTTP_INTERNAL_SERVER_ERROR.

The remainder of what we have to do in the handler is simple. We set the outgoing response's content type to text/plain and send the HTTP header with ap_send_http_header(). Next we forward the child process's output to the browser by calling ap_send_ fb(), which reads from the child and sends to the client in a single step. When this is done, we clean up by closing pipe_output and return OK.

The banner_child() function is called within the child spawned by ap_bspawn_child(). We're going to set up the environment, do a little cleanup, and then replace the process with the banner program. We begin by recovering the request record and passings its pool and subprocess_env fields to ap_create_environment(), obtaining an environment pointer. We then open the child's standard error stream to the error log by invoking ap_error_log2stderr().

We want to call banner as if it had been invoked by this command at the shell:

% banner -w80 "Goodbye World"

This specifies a banner 80 characters wide with a message of "Goodbye World". To do this, we place the command's full path in the request record's filename field, and set the args field to contain the string -w80+Goodbye%20World. Individual command arguments are separated by + symbols, and any character that would have special meaning to the shell, such as the space character, is replaced with a URL hex escape.

Before we launch banner we should invoke any cleanup handlers that have been registered for the current request. We do so by calling ap_cleanup_ for_exec(). Now we call ap_call_exec() to run banner, passing the routine the request record, the pinfo pointer passed to the routine by Apache, the name of the banner program, and the environment array created by ap_create_environment(). We want Apache to pass arguments to banner, so we specify a shellcmd argument of false.

If all goes well, the next line is never reached on Unix platforms. But if for some reason Apache couldn't exec the banner program, we log an error and immediately exit. The return statement at the end of the routine is never reached but is there to keep the C compiler from generating a warning. As noted above, ap_call_exec() behaves differently on Win32 platforms because the function launches a new process rather than overlaying the current one. We handle this difference with conditional compilation. If the Win32 define is present, banner_child() returns the process ID generated by ap_call_exec(). We do this even though it isn't likely that the banner program will ever be ported to Windows platforms!

There's only one thing more to do to make the goodbye_banner_handler() available for use, which is to add it and a symbolic handler name to the hello_handlers[] array. We chose "goodbye-banner-handler" for this purpose. Now, by creating a <Location> section like this one, you can give the handler a whirl:

<Location /goodbye>
  SetHandler goodbye-banner-handler
</Location>

Figure 11.1 shows our handler in action, and this seems to be a good place to say goodbye as well.

Figure 11.1. "goodbye-banner-handler" re-creates a burst page from a circa-1960 line printer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.93.245