Chapter 14. Web Serving

Various Web servers and extensions for Linux

How to install and configure Apache-derived Web servers under Linux, as well as popular modules, extensions, and key configuration directives are explained, and some examples of their use are given.

With the phenomenal rise in popularity of the World Wide Web (WWW), everyone wants a Web page, be it a small, personal home page where you can pull out your soapbox or a large corporate Web site where a company can distribute information (everything from press releases to hardware driver updates to troubleshooting guides), market themselves, or sell their products online. Linux has proven itself to be a robust, high-performance Web server on everything from a 386 to a DEC Alpha. Still, as with almost any application, there are hardware and software issues to be considered.

A Note on Other Information Services

The Web's popularity has forced many other information services into the background or, in some cases, into obsolescence. Notables include Gopher, WAIS, and Archie. Consequently, we make no effort to cover them in this book and refer the curious to either the Web or any of a number of texts covering Internet information services.

Web Server Software

There are a number of Web servers (HTTP daemons) available for Linux as both commercial and/or free- or shareware. Additionally, there is the choice of using a secure server versus a non-secure server. The difference between the two is the use of SSL (Secure Sockets Layer, a protocol for encrypting/ decrypting data sent via a network socket). Secure servers use it, insecure ones don't. Typically, non-SSL servers are free- or shareware, while those employing SSL are commercially developed and cost several hundred dollars.

In addition, a Web server can be forking or non-forking. Traditionally, HTTP daemons fork off child processes to handle incoming requests. Each child can typically handle a hundred or so requests, but it handles them serially; this means that heavily loaded Web servers may need to have a large number of child processes, each of which takes up additional RAM and CPU. In a server with a large httpd binary (especially a secure server), this can cause problems. The alternative is to have a server that multiplexes incoming connections internally. Typically, this approach has yielded severs with impressive performance.

RAM has been and looks to remain inexpensive, and CPU performance has been increasing by leaps and bounds, so one may be inclined to question whether a multithreaded server is worth the effort. However, it has certainly been one of the main aims of Linux to get as much out of a computer's hardware as possible, which obviously means writing efficient software. This said, there are no non-forking servers that equal the extensibility of the best of the forking servers. It seems likely, though, that soon the extensibility of servers like Apache and the multiplexing of servers like Boa will be combined.

Encryption, the Web, and Uncle Sam

Until recently, there were no Web browsers that supported encryption stronger than 40 bits, mainly due to U.S. export restrictions (based on the rather silly idea that no one outside the U.S. could make stronger encryption). With these restrictions gradually loosening and strong (128-bit) encryption now available on Web servers sold inside and outside the U.S., browsers are appearing which support stronger encryption. Within the U.S., it is possible to get enhanced versions of Netscape's browser, which use 128-bit encryption. For more information on encryption and security, see the chapter on security.

Below is a sampling of secure and unsecure httpd servers.

Non-SSL Servers

  • Apache(http://www.apache.org)—. Currently the most popular Web server, hands down. Loaded with features, it is also fast, free, and extensible.

  • Boa(http://www.boa.org/)—. A very new, very fast Web server, freely available with source code. Unlike Apache, it does not fork off child processes to handle additional clients; it multiplexes all connections internally.

  • The W3Chttpd (formerly known as the CERNhttp://www.w3.org/pub/WWW/Daemon/)—. The first Web server, ever. No longer in production, it is included for the sake of completeness.

  • NCSA(http://hoohoo.ncsa.uiuc.edu/)—. The successor to CERN and later the basis for Apache, which has largely supplanted it.

  • Roxen(http://www.roxen.com/)—. A commercial Web and FTP server. It also uses an extension to HTML called RXML (developed by Roxen) that allows pages to be modified at download time. It is available for free, or you can purchase it, along with commercial support. Source code is available as well. Additionally, an SSL version is in progress and beta versions of it are freely available.

  • WN(http://hopf.math.nwu.edu/)—. Another freeware Web server. Not as extensible as Apache, but it has some nice features for modifying pages as they are served. (Also available paired with a Gopher server under the name GN. See http://hopf.math.nwu.edu:70/ for more information.)

  • Zeus(http://www.zeus.co.uk)—. Another non-forking server (like Boa). It is commercial and somewhat pricey, but then you get commercial support. Source code is unavailable.

SSL Servers

  • Stronghold(http://www.c2.net)—. As its alternate name suggests, this is Apache with SSL built in. It is unlike most commercial software in that it comes with source code, and thus, none of Apache's extensibility is lost. Unlike Apache, Stronghold is not the most popular commercial Web server in its class; it is the second most popular secure server.

  • Apache-SSL(http://www.algroup.co.uk/Apache-SSL/)—. This is freeware Apache with SSL support. Since the SSL libraries that it uses were illegally exported from the U.S., its use within the U.S. is illegal. It is essentially the same as Stronghold as far as the actual code is concerned. Both it and Stronghold support 128-bit encryption and have done so for some time.

  • Roxen(http://www.roxen.com/)—. Roxen also comes in an SSL flavor.

  • Zeus(http://www.zeus.co.uk)—. Similar to Roxen, Zeus also comes in a secure version.

Hardware Issues

One of the primary considerations if you are using a forking server or lots of interpreted CGI is memory. Each server child obviously takes up at least as much memory as the size of the executable. Each CGI script requires the loading of its respective interpreter. Dynamic linking of the executables can alleviate the problem to some extent. There are four subsystems that you need to consider beyond the choice of a good personal computer interconnect (PCI) motherboard: memory, CPU, hard disk, and network. The PCI bus has a throughput of 133 Mb/s, which is several times faster than even an ultra SCSI 3 adapter, and is thus well-equipped to handle fast peripherals.

However, there is simply no substitute for more RAM—not swap space (your hard disk is pitifully slow compared to on-board memory), not CPU (if you are swapping to disk, your disk is what's slowing you down), and not a faster network card. RAM is cheap; buy a lot of it.

After RAM, the next most important subsystem is either disk or CPU, depending on the peculiarities of your site. If you make extensive use of CGI, SSI, or database transactions, you will probably want to put a little more CPU in your system. A note here: Don't bother with MMX CPUs. There's really no gain to be had in this situation; just get a faster CPU for the same price, or put the money into RAM or the hard disk.

In a situation where you are simply serving large numbers of static pages, a good, fast disk is probably a good investment. Spend some money to go from a fast or ultra SCSI 2 to ultra SCSI 3. We're assuming here that you have the good sense not to use IDE (or even EIDE) drives in a server.

If you need higher network throughput and are saturating your T1 (i.e., you have no LAN congestion issues), you may want to look into having your server cohosted at an ISP that has T3 access to the Internet backbone, and consider moving from 10 Mb/s Ethernet to 100 Mb/s (fast) Ethernet. A properly configured Linux Web (and/or FTP) server can saturate a fast Ethernet card and thus multiple T3s.

Apache and ApacheSSL/Stronghold

Because of its popularity, rich feature set, extensibility, and power, we will concentrate on Apache, although much of the information here will apply to any Web server.

As mentioned above, Apache is freely available with source code. Apache also provides an API. These two features allow for a great deal of customization and extensibility. Additionally, Apache is HTTP 1.1-compliant. This latest revision of HTTP introduces many performance enhancements to the protocol.

Getting Started

Since Apache is freely available, a number of Linux distributions are shipped with it, Red Hat included.

  1. Where is the document root for the server? In other words, where is the directory that contains the root of the directory tree that is served when someone hits your server? It is the directory from which files are served if no path information, aside from a filename, is given (for example, http://www.ratatosk.org/davinci.html versus http://www.ratatosk.org/artists/surreal/dali.html). This is typically something like /home/httpd/html or /home/httpd/htdocs (so the second URL above would retrieve /home/httpd/html/artists/surreal/dali.html).

  2. Where are the configuration files? Typically, /etc/httpd/conf, /home/httpd/conf/, /usr/local/etc/httpd/conf, or something similar is the directory used for these files.

  3. Is the server running? Depending on how the server was configured when Linux was installed, the HTTP daemon may start automatically at boot time. You can use the ps command to check if httpd is running as follows: ps -waux | grephttpd. If it is not, you need to invoke it as root and tell it where to find the directory containing the configuration files. For example, if your configuration directory is /etc/httpd/conf, you would type something like httpd -d /etc/httpd. If the directory containing your httpd binary is not in your path, you will need to use the absolute path to invoke it. Typically, httpd lives in one of the sbin directories such as /usr/local/sbin, /usr/sbin, or on rare occasions, /sbin.

Aside from the various modules that can be used to extend Apache at compile time, there are many items that can be specified at run-time in the .conf files, httpd.conf and srm.conf. These two files control parameters, such as aliases for directories, aliases for icons, images, CGI scripts/programs, virtual host directives, the document root directory, the file to be served from a directory if no other file is specified in the URL, and various access control and authentication directives.

<Directory> and .htaccess

These two tools are used to modify access of all types for directory trees. <Directory> directives typically reside in the srm.conf or httpd.conf file. They can also occur in the <Virtualhost> directive (more on this later.) The .htaccess file resides somewhere in a directory tree of HTML documents. In both instances, directives placed in the directories are recursively effective for all the files and subdirectories in those directories. For example, to enforce basic authorization for a directory and all its contents, you could put the following in a .htaccess file in that directory (say /home/web):

AuthType Basic
AuthUserFile /home/cary/etc/passwd
AuthGroupFile /dev/null
AuthName Realm of the Kazoos
<LIMIT GET POST PUT>
require valid-user
</LIMIT>

or in a <Directory> directive:

<Directory /home/web>
AuthType Basic
AuthUserFile /home/cary/etc/passwd
AuthGroupFile /dev/null
AuthName Realm of the Kazoos
<LIMIT GET POST PUT>
require valid-user
</LIMIT>
</Directory>

The <LIMIT> directive tells the server what HTTP methods require a valid user, in this case, GET, POST, and PUT ( PUT is used for file uploading).

CGI and SSI

The Common Gateway Interface (CGI) and Server Side Includes (SSI) are the two most common ways to execute external programs. CGI is designed to both receive and send information from and to the server. SSI is more one-way; mainly it sends information. SSI can do more than simply call external programs. It can also include files, echo some simple system information such as the local time, and even execute CGI.

Perl is the preemptive language for CGI programming. This is something of a double-edged sword. On the one hand, large collections of "canned" scripts for various common CGI tasks exist. It also has very powerful features for processing strings, including a large set of regular expression atoms and operators. Since CGI frequently involves large amounts of text processing, this helps to fuel Perl's popularity. On the other hand, Perl is not the easiest language to learn, nor is its syntax particularly easy to read (there are obfuscated Perl contests!).

After Perl, there are several languages that are popular for use in CGI programs: Python, Tcl, C, and Java. Additionally, there are languages that can be embedded in HTML and parsed later by a CGI program; PHP (discussed later) falls into this category. The PHP parser can also be embedded into Apache and servers based on it.

The number of books published on CGI is considerable and they cover the topic much more thoroughly than is suitable for us to do here. One criticism of most of the literature available is that it tends to focus nearly entirely on Perl, ignoring, for the most part, C, Python, and other languages. C, in particular, is very useful since, as a compiled language, CGI written in it is faster and lighter weight. Of course, development times in C are much longer.

At this point in time, all of the languages mentioned have more or less equivalent extensions/modules /libraries for dealing with the peculiarities of CGI and interfacing with everything from TCP/IP sockets to the operating system to a database. As such, the choice of a language for CGI is largely up to you. All of the above are shipped with most Linux distributions and are available for free from their respective authors.

Web Pages for Your Users

In a situation where you have multiple users on your machine, be it at an ISP or at a publisher, some of them will likely be interested in setting up a home page for themselves. This is not something that will typically consume a large amount of resources on your server, unless you have a large number of home pages or have very popular users. Additionally, serving static HTML poses no security concerns beyond those of running the server itself. All that aside, there will almost certainly be users who will want to use CGI, SSI, or some other server-parsed content that will add a dynamic element to their Web pages. Now, you have a security issue.

The various solutions range from simply allowing users full access to tools like SSI and CGI (in a situation where you trust all your users) to some sort of limited SSI/CGI to completely forbidding the use of them. You can also allow or disallow the use of them on a user-by-user basis.

In a small business, or possibly on a corporate LAN, you may likely choose the former route, while at an ISP, where you know few of the users personally, you will likely choose one of the latter two routes.

Fortunately, Apache makes controlling access to these and other tools fairly easy via the .htaccess file and <Directory> directive.

Restricting the Use of CGI and SSI

The <Directory> directive is your friend. It is the way you specify what can and can't be done in which directories. Couple this with Linux user and group permissions, and you can exercise very fine control over who can do what in which areas of your Web space.

In general, unless the server is already so insecure it doesn't matter, or you can trust your users to not inadvertently or purposefully write CGI that compromises your server in some way, you will not want CGI to be executable from arbitrary directories. Of course, if the server is accessible only to you and possibly a few others, you can leave things fairly open and not worry about it.

Additionally, you will likely want to prohibit browsers from roaming through the filesystem, be it a particular user's or the root filesystem. In other words, if a URL points to a directory that does not contain an index file, the server will return a 404 (File not Found) error instead of displaying the directory's listing.

To make the entire filesystem off-limits to the server, a directive like this would be placed in the access.conf:

<Directory />
 AllowOverride None
 Order deny,allow
 Deny from all
 Options None
</Directory>

"Wait a second! I want to serve Web pages!" Ahh... then you need to tell Apache what directories it can serve pages from. A statement like this:

<Directory /home/*/public_html>
 Order allow,deny
 Allow from all
 AllowOverrides IncludesNOEXEC
</Directory>

lets you serve pages from your users'public_html (this is the default directory from which files are served when a request that ends in ~username is received) directories and lets them use SSI that doesn't execute external programs or CGI scripts, and:

<Directory /home/httpd /htdocs>
Order deny,allow
Allow from all
AlowOverrides All
</Directory>

lets you serve pages from what is commonly the document root for the server and allows all of the restrictions to be overridden. Now you can set up CGI directories in this tree. You can do this with additional <Directory> directives or in an .htaccess file. If you choose the latter route, be sure to make the .htaccess file not writable by the user, unless they can be trusted not to abuse the privilege. However, if you plan to have all CGI pass by you, or perhaps someone else, you may want to set up one directory that is writable by root or httpd only. In any event, to turn on CGI in a directory, assuming it can be overridden, using . htaccess, you need the following:

Options ExecCGI

There are many other options you can turn on and off, including the symbolic links, indexing, and multiviews. For details on them, look at the Apache documentation.

Other Useful Modules and Directives

Whole books have been written on Apache, and since this is only one chapter, we're only going to touch on the more interesting and useful features. We aren't going to talk about most of the directives that appear in the configuration files since it's fairly easy to understand their syntax from the usage, and they are explained fairly well in the Apache documentation.

We've already met some of them: .htaccess, <Directory>, Options, AllowOverride, Allow, and Deny, and we've given a few examples of their use. Now we'll discuss the other most common and useful directives.

  • User and Group—. These set the UID and GID under which the server will run.

  • ServerRoot—. This is the root of the server's directory tree. It is typically the directory where the log file, .conf file, and document root subdirectories are. The locations of these can be overridden on the command line or in the configuration files.

  • DocumentRoot—. This is the base of the document tree.

  • UserDir—. This is the subdirectory within a user's home directory, which is the root of that user's Web space.

  • ServerType—. This specifies the type of server (obviously), either standalone or proxy. Unless you are setting up a proxy server, this should be standalone.

  • Port—. This is the TCP/IP port at which the server will listen. Port 80 is the standard; for an HTTP request to be heard on another port, it will have to be specified in the URL like so: http://www.wombat.net:81/. The server must be run as root to use ports 1023 and lower.

  • Listen—. If you want the server to listen at an additional port, use this. It takes two arguments: the alias and path to the directory. The path can either be relative to the ServerRoot or an absolute path.

  • ScriptAlias—. Often, for security reasons, you will want to have the directories from which CGI programs can be executed outside of the directory tree. Or, you may simply want to provide a shortcut for referencing a directory containing CGI programs.

  • Alias—. Similar to ScriptAlias, but only non-CGI files can be served from the directory.

  • VirtualHost—. If you want requests that come in on a different IP, server name, or port to be served from a different document root, a different server name, or with aliases pointing to different directories, then this is the directive for you. DocumentRoot, Directory, ScriptAlias, Alias, User, Group, UserDir, or ServerName can be used as arguments inside it. A couple of examples should help to demonstrate the power of the VirtualHost directive.

A server for http://www.dognails.com on port 80 with its own cgi-bin and document root is:

<VirtualHost www.dognails.com>
ServerName www.dognails.com
DocumentRoot /web/dognails/www
ScriptAlias /cgi-bin/ /web/dognails/cgi-bin/
</VirtualHost>

To serve all requests on port 1200, as from a separate DocumentRoot looks like:

Listen 1200
<VirtualHost *:1200>
DocumentRoot /usr/local/www1200
</VirtualHost>

Extensions for Apache

One of the more complicated extensions with a difficult setup process, suEXEC is a wrapper that allows CGI programs to be executed under a different UID than that under which the server is running. This is often used by ISPs that let their users maintain their own CGIs; this forces the scripts to execute as those users. This obviously strongly encourages the user to create good, secure CGI scripts. It also allows access control similar to that allowed by the password file.

Because suEXEC can open some serious security holes, its security model imposes several additional restrictions. The target program must reside in the ApacheWeb space and thus, the path to it cannot start with a / or have back references (..). Obviously, the target user and group must be valid and their IDs must be above a minimum (typically 100 or 500), which will prohibit execution as the root user or group. The target program cannot be set uid or gid, and the final, non-trivial restrictions are that the directory containing the program to be executed and the program itself can only be writable by the target user and they must belong to the target user and target group.

suEXEC comes in a separate C source file and accompanying header file. Before compiling suEXEC, the header file needs to be edited to reflect the particulars of your system. The following may need to be changed: HTTPD_USER, typically nobody, or maybe a "Web" or "WWW" user; LOG_EXEC, the log file for suEXEC transactions; DOC_ROOT, the root of the Apache Webspace; and SAFE_PATH, the PATH environment variable for suEXEC. Then you need to compile suEXEC (gcc -o suexec suexec.c), chown it to root, set the set userID bit (chmod4711 suexec), and copy it to its final destination.

Apache must now be recompiled to use the suEXEC wrapper. In the src/httpd.h file, you will need to add or edit a line like this:

#define SUEXEC_BIN "/usr/sbin/suexec"

to reflect where you installed the suid root suexec binary. When you start your new httpd, you should see this message:

Configuring Apache for use with the suexec wrapper.

To disable suEXEC, you can remove the binary, change its ownership from root, or unset the set uid bit.

User and Group directives in VirtualHost directives can be used to tell suEXEC the target user and group for executing CGI. Another way is for the target user to do so in HTTP requests to user directories. For example, http://thppt.org/~dweezle/somescript.cgi-somescript.cgi would be executed as the user dweezle. If neither of these conditions exists, the script will be executed as the main user and group.

PHP

One of the most useful extensions for Apache is the PHP program. PHP is actually usable as CGI, FastCGI, or compiled into Apache (linked either statically or dynamically). Which option to employ is a matter of taste and use. If you plan on using it lightly, employing it as CGI or FastCGI is fine, though for more extensive use, you will likely want to compile it into Apache, but this will raise your memory usage noticeably.

PHP is a scripting language with syntax similar to Perl and C. It is embedded within HTML, set off by a variation on the comment tag: <?>. The code within the tag is parsed (either by the server itself or the CGI script, depending on your setup), and the output, if any, replaces the tag. This is very similar to how the server side includes a function. Typically, a special extension (.php) is used for PHP files to tell Apache to parse the file before sending it to the client. If you plan to use PHP in most or all of your files, you may want to tell Apache to parse all HTML files.

PHP boasts many functions for connecting to and extracting information from various databases, including mSQL, PostgreSQL, MySQL, Solid, Sybase, and Oracle. See the database chapter for more information on the above database systems. Additionally, it has very powerful support for Netscape cookies, file upload, and the GD library. GD is a C library for PNG creation. PHP also has additional functionality for setting arbitrary HTTP headers (besides that which sets cookies).

SSL (Secure Sockets Layer)

As mentioned above, SSL is a protocol for sending encrypted data via a network socket. This allows for some peace of mind when transmitting sensitive data over the Web. There are two implementations of Apache with SSL: Stronghold and Apache-SSL. The legality of Apache-SSL is somewhat in question for commercial use in the U.S., as RSA claims the SSL libraries it uses are covered by patents owned by RSA. So, for commercial use within the U.S., you will likely want Stronghold. In either case, full source code is available.

Since Stronghold is well-documented and comes with commercial support, we'll go through the setup of Apache-SSL.

To use SSL with Apache, you must retrieve two things: the SSLeay and the SSL patch for Apache. These will have names like SSLeay-0.8.0.tar.gz and apache_1.2.0+ssl_1.8.tar.gz, respectively. Make sure the patch you get matches the major and minor versions of Apache that you are using. You will also need at least version 2.1 of the patch.

SSLeay can be obtained at ftp://ftp.psy.uq.oz.au/pub/Crypto/SSL/ and Apache-SSL can found at ftp://ftp.ox.ac.uk/pub/crypto/SSL.

Unpack SSLeay, cd into the source directory, run ./Configure linux-elf (or linux-aout if you don't have an older a.out system), make, make test, and finally make install. It should compile with some warnings, but pass the make test.

Unpack Apache-SSL in the root of the Apache distribution (not in the src subdirectory). Apply the patch: patch < SSLpatch. If this step produces an error, it is probably because you have an older version of the patch. Get a new version from your Linux distribution's Web site or any GNU mirror. Edit src/Configuration as you would normally, and change the SSL-related directives to reflect any peculiarities on your system. If you are using a fresh copy of Apache, remember to add any extras you need for other Apache modules you might be using, such as PHP.

./Configure
make
						

You should now have a shiny new httpsd!

If you get symbol errors, you may need to fiddle with the order of libraries in the link stage of the make, as well as verify you are using the correct library paths.

The httpd.conf

This will likely seem a little bizarre to anyone used to Apache's normal run-time .config file setup, but httpd.conf is the only .config file used. srm.conf and access.conf are both empty. For example, httpd.conf is included with the SSLpatch distribution. There are a few notes on setting up your httpd.conf that we want to give you. These guidelines should work for Stronghold as well, since it is so similar to Apache-SSL.

Edit httpd.conf to reflect your setup and execute httpsd. Try connecting to the SSL server: https://your.host.com/ (note the "s" after the "http"). If you can't connect, check the error_log, the ssl_log, and your .conf file. Fix any problems and restart the server with a kill-1 /pid/, assuming it started on the first try, or just try to start it again.

Unless you want to maintain two sets of .config files, you will want to run both the unsecure and secure servers on the same binary and .config file. The easiest way to do this is to set up httpd.conf just as you would normally, except that you will need to add the contents of your srm.conf and access.conf files as well, and then use the Listen and VirtualHost directives to run the secure server on a separate port. The default port for HTTPS is 443, so you will probably want to use that. Then, simply move the SSLflag on directive into the VirtualHost directive.

Start Apache-SSL and make sure you can connect to both the non-secure server (http://your.host.com/) and the secure one (https://your.host. com/).

By default, Stronghold and Apache-SSL use version 3 of the SSL protocol. If you require SSL v2, add this line to the base level (it can't be in VirtualHost or any other directive) of the httpd.conf:

SSLProtocol SSLv2

Logging

As you might expect, Apache supports the Common Log Format (CLF). It can also write its logs in the format used by NCSA's server; this is nice if you have homegrown log analysis tools developed for NCSA httpd.

The number of tools for analyzing httpd access logs is bewildering. Many are free, but there are commercial tools out there as well, though few run on Linux. On the other hand, more of the free ones run on Linux than on Mac or Windows machines.

Which one you choose to use is a matter of taste. We will describe a couple below, but you will likely just have to try some and see if you like them.

The log analysis tool we have found most useful is http-analyze. It is free and information on it can be found at http://www.netstore.de/Supply/http-analyze/index.html.

One particular tool deserves special note: 3Dstats. It formats its output in VRML, which you can view with a VRML scene viewer like vrweb. 3Dstats's home page is http://www.netstore.de/Supply/3Dstats/ and vrweb's is http://www.iicm.edu/vrweb.

To investigate other analysis tools for your httpd logs, a Web search on http log analysis will produce more choices than you will have time to try out.

Databases and Web Servers

With the desire for dynamic content in the presentation of large amounts of information, it is natural that databases would enter the Web equation. They are the best and fastest way (not the most space-efficient, though!) to organize and retrieve data. There is support for many free and commercial databases in the popular languages used for CGI, including Python, Perl, and C.

The choice of which database to use is dependent on many factors. Among the most important is easy access via CGI or PHP if you choose to employ one of them. For a list of database servers available for Linux, see the chapter on applications.

Setting Up a Killer Web Server

This section will walk you through setting up a "complete" Web server: Apache with PHP compiled in and integrated support for MySQL, a freeware (for most purposes) database server. While this is not a typical setup, it is an extremely powerful one that is becoming more common.

In this setup, Apache depends on PHP, which in turn depends on MySQL. Consequently, you will need to install MySQL first. It should be noted that PHP does not require any database support at all, but the idea behind this exercise is to build a Web server with very fast CGI-like features tied in with a very fast database server.

MySQL

If possible, when you are anticipating large amounts of database activity, MySQL should be installed on a separate physical disk from the document root of your Web server. The MySQL source is available from the MySQL home page at http://www.tcx.se/, as well as from mirror sites in the U.S. If you prefer, you can retrieve a precompiled binary distribution. There is also an active mailing list for MySQL. Information on joining it can be found on the MySQL home page. Documentation is available there as well.

The most recent versions of MySQL require recent versions of libc. If you don't have a new enough libc and are skittish about upgrading it, just install a binary version.

If you install a binary distribution, it will create a directory named mysql-<version> with various subdirectories for libraries, header files, and data files. If you compile and install from the source, mysql subdirectores will be created in /usr/local/lib and /usr/local/include. The data folders and files will be created in /usr/local/var.

Note the prefix path (/usr/local by default); it will be needed when you compile PHP and, later, Apache.

MySQL uses GNU autoconf. There are a few configuration options to be especially aware of:

--prefix                        Installation directory prefix (/usr/local
                                by default)
--enable-thread-safe-client     Make thread safe client library
--without-debug                 Compile without extra debugging code
--without-server                Only compile client library and programs 
--without-perl                  Don't build and install the Perl interface
--enable-shared                 Build a shared client library

After the configuration finishes, you can run make and then make install. If this is your first time installing MySQL, you will also need to install the grants database. Chapter 12 has more details on installing and setting up MySQL. If you are just upgrading MySQL, you need only to start and restart the server.

Okay, this is the paragraph where we tell you that you need to know SQL to start using your new Web server and that this isn't a book on SQL and that you will need to either search out tutorials on the Web or buy a book on SQL. The MySQL home page has some examples in its online documentation and a separately maintained manual, more importantly, tells what additional features MySQL has beyond the standard ones. A good SQL tutorial can be found at http://w3.one.net/~jhoffman/sqltut.htm and there are others to be had, as well as numerous books.

PHP

PHP is the next item to compile. You will, however, need to have Apache unpacked since at the end of the compile, the PHP module and libphp will be copied into the Apache source directory. The PHP home page is at http://www.php.net/; very complete documentation, information on the PHP mailing list, the source code, and other related items can be found there.

Since you are building only the library and not the CGI version of PHP, compilation should be easy; there is no linking step, so there are no linking errors. All the potentially sticky linking will come when you compile and link Apache. You will need to know the MySQL install root and the location of the Apache source code. PHP, as you will notice when you run the install script (something of a misnomer since it does not install anything), supports a large number of databases and enhanced file uploading, logging, and access control. Depending on your system's resources and your Web site's style, you may want to enable some or all of these.

Obviously you will want to answer Yes to MySQL support. If you have any of the other supported databases, you may want to enable them as well. PHP also supports the GD graphics library. If you wish to enable this, make sure libgd and its associated header file are in /usr/lib and /usr/include, or /usr/local/lib and /usr/local/include, or provide the directories in the list of additional directories to search for libraries and header files. It is fine to let PHP use the Linux system regex library, and it should find the gdbm library and header that come with Linux as well. Change directories to the src directory and run make.

At the end of the make, directions for editing the Apache configuration file will be printed. You may want to copy them into a file or editor to keep them for the Apache compilation.

Apache

The Apache configuration and compilation are straightforward. Move into the source directory, edit the file Configuration, and make the changes indicated at the end of the PHPmake. The configuration file is loaded with various modules you can disable or enable by commenting or uncommenting the appropriate lines in the file. In general, the slimmer you can make the server, the better. Comment out what you obviously don't need. If you're not sure, leave the module in. If you anticipate having large numbers of users to authenticate, you will probably want to enable a database-based authentication scheme using (g)dbm, MySQL, or mSQL. A complete list of modules and their descriptions can be found at http://www.apache.org/.

After you have finished editing, run the Configure script and then make. If you have linking errors, 90 percent of the time they can be solved by rearranging the list of libraries in LFLAGS in the Configuration file. Most of the rest of the time it is simply a forgotten library or library path. Persevering though a little trial and error should result in an httpd binary. After testing the server a little, you will likely want to strip it to save on memory requirements.

If you still can't get your server to link, the best forums for help will likely be the PHP and/or MySQL mailing lists, unless you believe the problem to be unrelated to the addition of these packages.

Now check httpd.conf and srm.conf.

Under Linux, Apache is commonly set up as follows:

  • The httpd binary lives in /usr/sbin.

  • The logs are kept in /var/log/http.

  • The configuration files are in /etc/http/conf.

  • The document root is /home/httpd/htdocs.

All of these are alterable. Obviously, you can keep the server binary wherever you choose. The configuration files' location is settable on the command line, and the log location and document root are settable in the configuration files.

To tell Apache to parse . php3 files with the embedded PHP parser, add a line like this to your srm.conf:

AddType application/x-httpd -php .php3

If you want to parse every file and end them all in .html, change ".php3" to ".html" in the statement.

Now start up the server. With the above setup, you would enter httpd-f /etc/ httpd/conf/httpd.conf -d /home/httpd. Make or edit (if it already exists) an index.html in the document root. Somewhere in the document, add a line like <? phpinfo(); ?>.

The phpinfo function should spit out a few screenfuls of CGI environment variables and configuration information. If it doesn't, check to make sure you added the correct line to your srm.conf or httpd.conf to tell Apache to parse .php3 or .html files, and then make sure you have the right extension on your file and mod_php enabled in your Apache source Configuration file.

The PHP documentation describes how to connect to MySQL databases either locally or remotely, as well as the usage of the various functions for performing queries and database and table administration.

There are other goodies you may want to consider adding to your server. If you use lots of Python or Perl CGI, you can embed the interpreters for these into Apache. Respectively, these modules are PyApache and Mod_Perl. If you wish to enhance security, you can use digest authentication instead of basic authentication. Also you can add in SSL support as mentioned earlier in this chapter. Finally, there are modules for powerful URL rewriting using regex pattern-matching and replacement, correcting misspelled URIs, and tracking users through the site using cookies. New modules are being added all the time and not all are shipped with the source. Check the Apache Web site for the most up-to-date list.

Streaming Audio and Video

Streaming is an alternate way to deliver content. Rather then waiting for an entire movie file to download before starting it, the data are displayed as they are sent (more or less; there is some buffering). This, of course, means that the client can start displaying the data much sooner. Additionally, it is easy to then send and output streams from streaming input (as opposed to a static file).

With the rising demand for multimedia content on the Web, sooner or later (probably sooner!) you will need to be able to serve streaming audio and video from your Web server. Lucky for Linux, RealNetworks makes its RealServer for Linux.

The basic version is free and intended for personal and single-site use (i.e., ISPs do not get to use the free version). Installation is as simple as uncompressing the distribution file, running the setup script, and answering some configuration questions.

RealServer will deliver both live and on-demand (stored in a file) streams. The default port for the server is 7070.

The Plus version, which is not free and not yet available for Linux, can be used by a hosting service. Additionally, the Plus version has enhanced performance, a printed manual, GUI, and, as you might expect, technical and upgrade support.

The default installation directory is /usr/local/pnserver. After the installation script completes, change to the install directory and start the server:

# bin/pnserver server.cfg

You will be prompted to register the server. You can skip this step if you wish without any effect on the functionality or performance of the server. Point your Web browser at the machine you installed the server on, port 7070. You will be prompted to enter the username and password you provided during setup. This will bring up a page with links to some samples, the server status page, and links to some areas on RealNetworks' Web site.

Test the server by trying some of the sample files to be sure it's running. RealMedia files can be placed in /usr/local/pnserver/content and accessed via URLs like http://www.foo.net/ramgen/audio.rm. You can also set up aliases or subdirectories for human users to place their own files in.

Generating Content Files

Various encoders exist for creating real audio and video. Unfortunately, only an audio encoder exists for Linux. To encode video or audio/video, you will need an MS Win32 machine or a Macintosh.

The media encoder, rmenc, for Linux is free as well and can be used to convert .wav and .au files into RealMedia files. Additionally, if you have the OSS sound drivers, it can listen to your soundcard's output and record that.

If you wish to record individual tracks from your CD-ROM, you can use cdda2wav to make .wav files and convert them using rmenc.

Summary

This chapter scratched the surface of what has become an incredibly huge and complicated subject in just a few short years. We discussed two types of servers, secure and insecure, as well as two methods of implementing them, the common forking server and the newer and less common multiplexing server.

Because of its popularity, extensibility, and power, we went into detail about Apache and its two secure derivatives: Apache-SSL, a freeware secure server of questionable legality in the U.S.; and Stronghold, a commercially supported and legal secure server available in and outside the U.S.

We also discussed various methods for interacting with the system the server is running on: CGI and SSI, as well as PHP, a scripting language designed to be embedded into HTML and then parsed by the PHP interpreter running as CGI or FastCGI or compiled into Apache.

Setup and configuration of a server for streaming audio and video were examined. Additionally, tools for creating content files were also discussed.

Lastly, we went through the compilation and setup of a non-secure Apache server with PHP compiled into it. In turn, PHP was configured with MySQL support so it could access MySQL databases locally or remotely on other machines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.50.222