Unit 1.1. Internet and Web Basics

The Internet

Ask the next person who says Internet what they mean by the term, and you may get a surprisingly vague reply. Simply put, the Internet is a decentralized, worldwide network of interconnected computers that communicate with each other in a standardized way. The computers are connected using modems and phone lines, cable lines, ISDN lines, Ethernet cards, and fiber-optic cables. To provide convenient wireless communication, infrared and radio waves are being used increasingly as well.

Physical connectivity is not enough to define the Internet. Communication would not be possible if these interconnected computers did not use a shared protocol for transmitting information. A protocol is a set of common standards and software for communication. You can think of it as a common language utilized for communication between all of the different types of computers and operating systems across the Internet. If an English-speaking person in London calls a French-speaking person in Paris, the telephone network connects the two parties, but that does not guarantee that they understand each other. To send information such as an email from one computer to another, both computers, as well as any intermediate computers the message must travel through, must be able to read instructions about the type of message, its source and destination, and how it should be handled, so that it can be successfully transmitted to and read by the intended recipient.

The protocol of the Internet is TCP/IP, which stands for Transmission Control Protocol/Internet Protocol. TCP/IP actually consists of two protocols working together, TCP and IP, though they are commonly referred to as a single protocol, TCP/IP. Messages sent according to this protocol contain a header—beginning lines of text that are separate from the information being sent. This header provides instructions for transmitting the data. In ground mail, letters are enclosed in envelopes that contain addresses in a standard format, as well as conventional phrases for special handling such as “Air Mail,” “Poste Restante,” or in less fortunate cases, “Return to Sender.” Similarly, when information is sent across the Internet, it is sent with a TCP/IP “envelope” that indicates who originated the transmission, who should receive it, and other instructions to ensure the information is sent and received correctly.

To make it possible to find a machine on the Internet, each Internet location has been assigned its own address, called an IP address, or Internet Protocol address. The IP address is used in TCP/IP when specifying the source and destination for the transmission. An IP address is a 32-bit number. Rather than using a cumbersome 32-digit procession of ones and zeros, an IP address is expressed as four numbers separated by decimal points. An example of an IP is 192.239.92.40. Each number can range from 0 to 255, resulting in 256 possibilities, or 8 bits (2 to the power of 8), so each number can be referred to as an “octet.”

Sometimes information being transmitted via TCP/IP has its own internal protocol as well. For example, Web pages are transmitted via TCP/IP, but because they are Web pages, they are also sent with an additional header that is compliant with the protocol HTTP, or Hypertext Transfer Protocol. This protocol, HTTP, is a familiar beginning, or prefix, for Web page addresses.

What is the Web?

The World Wide Web, the Web, or the W3, was started by scientists working at CERN, the European Organization for Nuclear Research, in Bern, Switzerland. They needed a convenient way to collaborate and share technical information with their colleagues. By publishing documents on the Internet, they could access colleagues’ technical papers and related information easily. Tim Berners-Lee is credited with being the "Father of the Web" for his vision on how to accomplish this. Standards for the Web are maintained currently by the W3C, or World Wide Web Consortium, an international organization. Its Web site is http://www.w3.org.

As explained earlier in this chapter, there is a separate protocol, HTTP, designed specifically for handling Web pages. Web pages themselves are constructed primarily in HTML. HTML, or Hypertext Markup Language, started as a way to add formatting commands, or “markup” in publishing terms, to a technical paper. The “hypertext” part of the name refers to text that can be selected to provide a link to another document. This makes information retrieval easy, and it is one of the hallmarks of the Web; for example, a technical paper that refers to previously published results can include a link to that previous paper, and the user can click back and forth between documents. HTML is covered in further detail in Chapter 5, “Introduction to HTML: Basic Tags, Tables, and Frames,” and Chapter 6, “Advanced HTML: Forms, Nested Tables, and Nested Frames.”

For a preview of what HTML looks like, go to a Web page, right-click in the browser window, and choose “View Source.” This text is what your browser is reading and displaying in the browser window.



Usage of the Web quickly spread beyond the scientific community. With the advent of browsers such as Netscape and Internet Explorer, many users could easily enjoy the ease of publishing and accessing material on the Internet. The Web was born from a need to exchange ideas freely, and that spirit is a large part of the success of the Web.

How Do Web Browsers Work?

The Web browser’s basic role is to take a request for a Web page from a user, and then display that page to the user. This is referred to as “rendering” a Web page.

Typing the address of a Web page into a browser to retrieve a Web page is an increasingly familiar activity. What happens behind the scenes?

When a user requests a Web page, the browser locates the Web page on the Internet, retrieves it, and displays it in the browser’s window.

Someone who is enthused about U.S. tax returns might enter the following address:

http://www.irs.gov/search/srchelp.html

This address is also known as a URL. A URL, or Uniform Resource Locator, is the complete address for locating a specific page, or “resource,” on the Internet. Each Web page has a unique URL, and to view a Web page, you must know its URL. You can also use the term URI, Uniform Resource Identifier, though this is really synonymous with URL.

The first step for the browser is to identify who has the file being requested by the user, using the URL. The browser divides a URL such as the one displayed above into three parts. The first part of the URL is the protocol. In this case, the protocol is “http://”, which indicates that the file should be retrieved using HTTP, or Hypertext Transfer Protocol. Another protocol you may see often, particularly if you conduct any business on the Web, is HTTPS. This indicates that a Secure Socket Layer, or SSL, is layered on top of the HTTP protocol for additional security. Despite the HTTP or HTTPS, realize that communication also still takes place using the protocol TCP/IP. TCP/IP governs the connection and transmission of the information, whereas HTTP is concerned with the Web page itself. Because the protocol is identified as HTTP in the URL, the browser knows how to structure its request for the Web page.

The second part of the URL is the domain. In this example, it is “www.irs.gov”. The domain identifies a specific server machine on the Internet. A domain ends with characters that give a general indication of the site’s purpose, such as .com, .org, .net, .gov, .mil, and .edu, as well as many country suffixes such as .uk for United Kingdom, .de for Germany (Deutschland), and .fi for Finland. The middle portion of the domain provides specific information on the site being accessed. In “www.irs.gov”, it is “irs”, which happens to be the United States Internal Revenue Service, or IRS. For name recognition, most organizations use their own name, or a familiar abbreviation of it, for this part of a URL, unless the preferred URL is already claimed by another organization.

The leftmost part of the domain is the actual machine name, or hostname. Many organizations use “www” as a convention, instead of a specific machine on their network. When a request arrives for a page on the “www” machine, the company’s network has configuration files that redirect such requests to an appropriate machine. Using “www” instead of a specific machine name makes it easier for a company to move pages from one machine to another, or to use multiple machines to handle requests, because the user is not specifying a machine name.

As discussed earlier in this chapter, machines on the Internet are identified by an IP address—four numbers divided by decimal points, such as 192.239.92.40. Yet the domain listed above, “www.irs.gov”, is alphanumeric. Which is the correct syntax to identify a machine on the Internet? In fact, both methods of identifying a machine on the Internet are correct. One advantage of using a name like “www.irs.gov” is that it is easier to remember than a numeric IP address, to the relief of U.S. taxpayers on April 14. Another advantage is that the systems administrator can associate an alphanumeric domain with another IP address if need be. If you use the IP address for a site that has been moved to a new location on the Internet, you may have problems getting to the site.

If you know the IP address of the site you wish to visit, then you can use that IP address in place of the domain name, and a browser can understand the request. In fact, you eliminate an extra step for the browser. When an alphanumeric domain is used, the browser has to translate that domain name into a numeric IP address before it can request the Web page. To resolve domains, the browser contacts a specific type of server known as a Domain Name System server, or DNS server. The browser may need to contact several DNS servers to translate the domain into an IP address. Companies may also have an internal DNS server, which must be contacted to provide further translation of domains within a company. Eventually, the browser resolves the domain name into a valid IP address, or returns an error.

The third part of the URL, to the right of the domain, is optional. It may be a file name, perhaps with a path. This part refers to the specific resource being requested. In this example, it is “/search/srchelp.html”. The user is requesting the resource called “srchelp.html”, which is located in a directory called “search.” If the URL does not contain this third part, then a default page is displayed. Most Web sites have an initial page with an address that contains only the domain name, for ease in remembering the URL.

Once the browser reads the URL and knows the protocol to use, the IP address of the machine to contact, and the specific path and file being requested, if any, it initiates a request. It connects to the IP address using a specific point of connection on the target server called a port. Ports are numbered, and by convention, the overwhelming majority of systems administrators configure their machines to use port 80 as the default for HTTP Web documents. For this reason, the port number is not usually included in a URL, but the port can be included after the domain if needed. The syntax is a colon followed by the port number. The IRS example could also have been written:

http://www.irs.gov:80/search/srchelp.html 

to retrieve the same Web page as:

http://www.irs.gov/search/srchelp.html 

Through the connection it has established, the browser sends a request, called a GET request, for the Web page. The server processes the request by looking for the directory and file requested. Sometimes the request is for a page that exists on the server already; this type of page is a static page, and the server can transmit the page to the browser that requested it. In other cases, the page has to be generated. The URL invokes scripts that produce the page dynamically, and then transmit the page. This is how you create the application in this book—using PL/SQL to dynamically generate Web pages that contain database content.

If the request is successful, the server returns the requested page to the browser. If it cannot find the resource being requested, it returns an error message to the browser.

Although a user apparently requests a single file to view a Web page, that page may in turn spark additional requests for files to complete the Web page. When rendering a Web page, the browser may find that the page contains several images, which need to be retrieved in turn. Images on a Web page are themselves separate files, as are audio and video files. If you request a Web page with six separate images, the browser will need to retrieve seven documents to display the full page. Thankfully, the browser figures out which additional files are needed, requests them one by one, and assembles them for you, which would be a tedious task otherwise.

This transparent operation allows users to think of visiting a single Web page, even though that one page may be a combination of several files, or the result of a script that created the page upon request.

Unit 1.1 Exercises

a) What is the protocol of the Internet?
b) What is the main protocol of the Web?
c) Will the two addresses below retrieve different pages? Why or why not?
http://www.bbc.co.uk:80 
http://www.bbc.co.uk 

Unit 1.1 Exercise Answers

a)What is the protocol of the Internet?
Answer:TCP/IP, which contains instructions for how to handle a message being transmitted from one location to another across the Internet.
b)What is the main protocol of the Web?
Answer:HTTP. If a message contained within a TCP/IP transmission has an HTTP header, this indicates that the message happens to be a Web transmission; for example, a Web page or a request for a Web page.
c)Will the two addresses below retrieve different pages? Why or why not?
http://www.bbc.co.uk:80 
http://www.bbc.co.uk 

Answer:They retrieve the same page. The default port for HTTP is 80, so including the port number 80 in the URL is not necessary.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.210