URIs and URLs

As shown in Figure 16.3, the Internet is a gigantic mass of client systems requesting resources and server systems providing resources. If you look closer at the process, though, you’ll realize that the protocol addressing rules discussed earlier in this book are not enough to support the rich array of services available on the Internet. The IP address or domain name can locate a host. The port number can point to a service running on the host. But what is the client requesting? What is the server supposed to do? Is there input for which the client is requesting output?

Experts have long understood the importance of providing a standard format for requesting Internet resources. Some have argued, in fact, that the presence of a unified request format is another reason why the Internet seems like a single big, cohesive essence rather than just a jumble of computers.

The request format most familiar to Internet users is what is commonly called a Uniform Resource Locator (URL). The URL is best known for the classic web address format: http://www.mercurial.org. URLs are so common now that they appear with little or no explanation on TV commercials and bubble gum wrappers. What we think of as a URL is actually a special case of a more general format known as a Uniform Resource Identifier (URI). The two acronyms are sometimes used interchangeably, but the distinction is important. Recent Internet documents have attempted to converge the terms. RFC 3986, “Uniform Resource Identifier Generic Syntax,” states that future documents should use the more general term URI instead of URL. The term Identifier is better than Locator for the general case because every request doesn’t actually point to a location.

The specification for the structure of a URI is over 60 pages, but the basic format is as follows:

scheme://authority/path?query#fragment

The scheme identifies a system for interpreting the request. The scheme field is often associated with a protocol. Table 16.1 shows some of the schemes used on the Internet today. The classic http scheme is used with web addresses. Although alternative schemes such as gopher are less important than they once were, others, such as ftp, are still in common usage.

Table 16.1. URI Schemes
SchemeDescriptionReference
fileA file on the host systemRFC1738
ftpFile Transfer ProtocolRFC1738
gopherThe Gopher protocolRFC4266
httpHypertext Transfer ProtocolRFC2616
httpsHypertext Transfer Protocol SecureRFC2818
imInstant MessagingRFC3860
ldapLightweight Directory Access ProtocolRFC4516
mailtoElectronic mail addressRFC2368
nfsNetwork File System protocolRFC2224
popPost Office Protocol v3RFC2384
telnetTelnet Interactive sessionRFC4248

The authority, which begins with a double slash (//) defines the user, host, and port associated with the request. A full expression of the authority component might look like:


As you learned in Hour 6, a default port number is often associated with the protocol, so the port number is typically omitted. The username is only necessary if the user must provide credentials to access the resource, which is uncommon for the web but more common with a protocol like FTP.

By the Way

Even if the user is required to provide credentials, you still might not need to specify a user in the URI. Many services prompt for a user ID and password after the initial request.


Without the user and the port, the authority field looks more like the basic web address we all appreciate:

//www.bonzai.com

or coupled with the scheme component:

http://www.bonzai.com

In this example, the host is expressed as a DNS domain name, but you can also refer to a host by its IP address.

The path component points down through a hierarchy of directories to a file that is the subject of the request. In the case of http, if the path is omitted, the request points to a default web page for the domain (the home page). Most users by now are familiar with the need to type in additional directory and filenames after the domain name:

http://www.bonzai.com/trees/LittleTrees.pdf

The query and fragment components of the URI are rarely typed or interpreted by humans. The precise meaning of these components can vary depending on the scheme, and some schemes don’t even support the query and fragment components. The easiest way to observe the query field in the wild is to type a search request into a search engine like Google and then examine the URI that appears in the address bar.

The preceding example considers the URI in the context of the hugely popular HTTP protocol used on the World Wide Web. (You’ll learn more about HTTP and its companion markup language HTML in Hour 17.) Keep in mind, though, that each of the different scheme specifications can define how to interpret the information in the URI. The generic URI specification is intentionally kept separate from the details defined in the specifications for each of the schemes so that the schemes can evolve without requiring a change to the basic format. Table 16.1 also lists the RFCs associated with each scheme.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.131.62