Chapter 1. Getting Started

When you connect to the URL of someone's home page—say the notional http://www.butterthlies.com/ we shall meet later on—you send a message across the Internet to the machine at that address. That machine, you hope, is up and running, its Internet connection is working, and it is ready to receive and act on your message.

URL stands for Universal Resource Locator. A URL such as http://www.butter-thlies.com/ comes in three parts:

         <method>://<host>/<absolute path URL (apURL)>
      

So, in our example, < method> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com; and <apURL> is "/ ", meaning the top directory of the host. Using HTTP/1.1, your browser might send the following request:

GET / HTTP/1.1
Host: www.butterthlies.com

The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in three parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) "/"; and the version of the protocol we are using. It is then up to the web server running on that host to make something of this message.

It is worth saying here—and we will say it again—that the whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.

The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom, or a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.

What do we want a web server to do? It should:

  • Run fast, so it can cope with a lot of inquiries using a minimum of hardware.

  • Be multitasking, so it can deal with more than one inquiry at once.

  • Be multitasking, so that the person running it can maintain the data it hands out without having to shut the service down. Multitasking is hard to arrange within a program: the only way to do it properly is to run the server on a multitasking operating system. In Apache's case, this is some flavor of Unix (or Unix-like system), Win32, or OS/2.

  • Authenticate inquirers: some may be entitled to more services than others. When we come to virtual cash, this feature (see Chapter 13) becomes essential.

  • Respond to errors in the messages it gets with answers that make sense in the context of what is going on. For instance, if a client requests a page that the server cannot find, the server should respond with a "404" error, which is defined by the HTTP specification to mean "page does not exist."

  • Negotiate a style and language of response with the inquirer. For instance, it should—if the people running the server can rise to the challenge—be able to respond in the language of the inquirer's choice. This ability, of course, can open up your site to a lot more action. And there are parts of the world where a response in the wrong language can be a bad thing. If you were operating in Canada, where the English/French divide arouses bitter feelings, or in Belgium, where the French/Flemish split is as bad, this feature could make or break your business.

  • Offer different formats. On a more technical level, a user might want JPEG image files rather than GIF, or TIFF rather than either of the former. He or she might want text in vdi format rather than PostScript.

  • Run as a proxy server. A proxy server accepts requests for clients, forwards them to the real servers, and then sends the real servers' responses back to the clients. There are two reasons why you might want a proxy server:

    • The proxy might be running on the far side of a firewall (see Chapter 13), giving its users access to the Internet.

    • The proxy might cache popular pages to save reaccessing them.

    • Be secure. The Internet world is like the real world, peopled by a lot of lambs and a few wolves.[*] The wolves like to get into the lambs' folds (of which your computer is one) and, when there, raven and tear in the usual wolfish way. The aim of a good server is to prevent this happening. The subject of security is so important that we will come back to it several times before we are through.

      [*] We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which, to many people, simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they are Sales Types—dirty fellows.

These are services that the developers of Apache think a server should offer. There are people who have other ideas, and, as with all software development, there are lots of features that might be nice—features someone might use one day, or that might, if put into the code, actually make it work better instead of fouling up something else that has, until then, worked fine. Unless developers are careful, good software attracts so many improvements that it eventually rolls over and sinks like a ship caught in an Arctic ice storm.

Some ideas are in progress: in particular, various proposals for Apache 2.0 are being kicked around. The main features Apache 2.0 is supposed to have are multithreading (on platforms that support it), layered I/O, and a rationalized API.

If you have bugs to report or more ideas for development, look at http://www.apache.org/bug_report.html. You can also try news:comp.infosystems.www.servers.unix, where some of the Apache team lurk, along with many other knowledgeable people, and news:comp.infosystems.www.servers.ms-windows.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.163.58