What's in a URL

Uniform Resource Locators (URL), also known as web addresses, provide a convenient way to specify particular web resources. You can navigate to a URL by typing it into your web browser's address bar. Alternately, if you're browsing a web page and click on a link, that link is indicated with a URL.

Consider the http://www.example.com:80/res/page1.php?user=bob#account URL. Visually, the URL can be broken down like this:

The URL can indicate the protocol, the host, the port number, the document path, and hash. However, the host is the only required part. The other parts can be implied.

We can parse the example URL from the preceding diagram:

  • http://: The part before the first :// indicates the protocol. In this example, the protocol is http, but it could be a different protocol such as ftp:// or https://. If the protocol is omitted, the application will generally make an assumption. For example, your web browser would assume the protocol to be http.
  • www.example.com: This specifies the hostname. It is used to resolve an IP address that the HTTP client can connect to. This hostname must also appear in the HTTP request Host header field. This is required since multiple hostnames can resolve to the same IP address. This part can also be an IP address instead of a name. IPv4 addresses are used directly (http://192.168.50.1/), but IPv6 addresses should be put inside square brackets (http://[::1]/).
  • :80: The port number can be specified explicitly by using a colon after the hostname. If the port number is not specified, then the client uses the default port number for the given protocol. The default port number for http is 80, and the default port number for https is 443. Non-standard port numbers are common for testing and development.
  • /res/page1.php?user/bob: This specifies the document path. The HTTP server usually makes a distinction between the part before and after the question mark, but the HTTP client should not assign significance to this. The part after the question mark is often called the query string.
  • #account: This is called the hash. The hash specifies a position within a document, and the hash is not sent to the HTTP server. It instead allows a browser to scroll to a particular part of a document after the entire document is received from the HTTP server.

Now that we have a basic understanding of URLs, let's write code to parse them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.227.251