Origins of the Internet

An extended description of the history and operation of the Internet would take more space than is available. If you've read this far, you know the general technological principles. This section aims only to describe those elements of the Internet that are unique or unusual by comparison to basic network operations.

From DARPA to ARPA

The Internet was created by the Defense Advanced Projects Research Agency (DARPA) all the way back in 1969. It was the first implementation of a packet-switched network, and DARPA funded it both for basic research purposes and as a way of connecting universities and government laboratories that were doing other work it was paying for. The connection to universities was important because the presence of the network on campus got a lot of bright young people involved in network R&D. You will recall that it was a team at Berkeley that integrated the Internet's TCP/IP protocol into Unix, thus creating an extraordinarily productive relationship. Though DARPA continued to provide funding for many years, a great deal of the development of the Internet was provided free by creative university faculty and students at such places as Stanford, Berkeley, MIT, and Carnegie Mellon. Later, most of the major research universities in the U.S. were involved in one way or another in extending and/or scaling the Internet.

Early Uses

The Internet grew as it did because TCP/IP was cheap and effective, and because it integrated smoothly with Unix. The latter point is important—every Unix machine, off-the-shelf, could function as a router. So when a college or university wanted to connect to the Internet, all it had to do was run a line (i.e., a leased analog telephone line or maybe just a dial-up connection) to the nearest Internet-backbone connected university or institution. Of course the backbone-connected institution had to agree, but acquiescence wasn't hard to get. Part of the reason is the culture of universities in general and scientists in particular; cooperative work and resource sharing are a part of their way of life (and nationalism isn't—the Internet swiftly expanded to include universities and laboratories around the world). Another reason is that costs to the backbone-connected institution weren't high, at least not in those early days. The feds were paying for the backbone, so there weren't additional costs there. Nor were local costs a big deal in the earliest days. A server functioning as a router and linking another server to the Internet typically did only some simple stuff like forwarding e-mail and brokering occasional file transfers and remote terminal connections. For these chores, it only required adding a little bit of memory for the server's owners to not know the difference.

Internet Scale-up

So most of the cost of expanding the Internet went to those who were joining. This led to an interesting dynamic. If the original interest in the Internet came from the physics department at Solid State U, it would be their budget that would bear the entire cost of the line to the connected institution. Obviously, it was in the interests of the physicists to find some other departments that wanted to connect as well. That way, costs could be shared. So the electrical engineers and the computer scientists signed up. Soon, everybody wanted a faster line. One way to do that was to get their friends at the state college down the line to join them. The state college folks were asked to pay for their line to Solid State and a share of the cost of Solid State's line to the institution that was in turn connected to the Internet backbone. Solid State thus became an Internet Service Provider. Internet growth was facilitated as informal arrangements grew into formal consortia; for example, Solid State and its partners could form an organization and share the costs of getting their own direct backbone connection. The National Science Foundation (NSF) accelerated all this with programs that provided grants to connect smaller colleges and universities.

The Internet scaled well both politically and technically. As traffic increased, the software architecture allowed vendors to design special purpose computers that did nothing but handle the movement of packets. These machines, which became what we now call routers, relieved the local servers of network responsibilities. Because routers could deal with multiple alternate paths through the network, Internet users could build multiple links to the backbone, so Solid State would lease a second line and connect to the backbone through a different point. This made it possible to balance the traffic load (e.g., splitting eastbound and westbound flows) and also provided for redundancy in case there were problems with lines or routers. Routing did become more complex, as we've discussed earlier, but problems were dealt with by standards organizations (IETF) in which key vendors like Cisco (developers of the first routers) were active participants.

Tech Talk

IETF: The Internet Engineering Task Force (IETF) is responsible for setting many Internet-related standards, for example, the new version of the Internet Protocol, IPv6.


The NSF, which took over management of the backbone from the Department of Defense, over time spun it off to the long-distance carriers from whom the lines were leased. The Internet backbone today is provided predominantly by WorldCom (including the former UUnet, CompuServe, and Sprint networks), GTE, and Cable and Wireless (most of the former MCI system).

The Internet was principally a higher education/federal government service until the late 1980s. Large corporations at that time used private networks, based usually on IBM or DEC software, that provided e-mail and remote terminal connections. There was a growing cadre of individuals and small organizations that used proprietary carriers like CompuServe and America OnLine (AOL) for e-mail and information services. The explosion of the installed base of PCs and LANs drove increased demand for wide area connectivity. Services like CompuServe, which had per-minute connect time charges that could quickly add up to big dollars, were too expensive for most organizations. The Internet, on the other hand, was essentially free. We need a short digression to explain why this is so.

The Internet vs. Proprietary Networks

Proprietary networks like those pioneered by CompuServe (now a division of AOL) were built on a star topology. Before its remaking as an Internet Service Provider (ISP), CompuServe's only computers were in Columbus, Ohio. The company provided national access by leasing long-distance lines that connected Columbus with local telephone company exchanges. At each exchange, CompuServe had dedicated local telephone numbers that connected to these leased lines. In effect, this was a private long-distance telephone system. There are economies of scale in this—CompuServe offered connections that were cheaper to the user than either of the other options: a long-distance call billed directly to the customer or access via a toll-free number. Also, national usage was more attractive since CompuServe averaged the network costs. This meant that a user in Los Angeles paid the same amount for connect time as one in Columbus. In any case, though, there were long-distance charges and they had to be billed back to the user. In addition, CompuServe had to size its computers proportionally (we're talking mainframes here) to accommodate simultaneous users and information stored on its disk drives. Then there was the cost of billing and administration. Finally, of course, they had to add profit. Add this up and the user had to pay for every minute of connection time.

The Internet is very different. First, there is no center, no equivalent of CompuServe's Columbus headquarters. Among other things, this means there is no central point that can handle billing. Indeed, as you think about how the Internet is organized, you will understand that it would take a total reorganization and consolidation to have any centralized, distance-based billing. Instead, we have flat-rate access costs that are charged at the network periphery by the Internet Service Provider (ISP). Originally, these were just the costs of the physical connection, including a line or lines and a router. However, as traffic increased and NSF got out of the backbone business, there was also a need for some kind of charge-back process. As the backbone moved from T-1 to T-3 to OC-3 and up, and as more and more powerful routers were needed, someone had to pay for it. The result has been a cascading economic model. Those connected directly to the backbone are charged for access by the backbone provider. They then charge those who connect through them, and so on. So, as Solid State had to pay more to MCI, it increased the charges both to its departments and to the universities and other organizations that used Solid State as their access point. This was OK for everyone because, although they were paying more, they were also using the network more.

Tech Talk

ISP: An Internet Service Provider (ISP) provides the link from a residence or business to the regional or backbone levels of the Internet. ISPs range from small mom-and-pop operations to giant companies like AT&T.


Organization

The Internet has no definable physical topology. Indeed, no one knows how many servers, let alone how many nodes, are connected and active at any given time. As noted, the Internet was managed for a time by DARPA, and later by the National Science Foundation. But "manage," in this context, is deceptive. Once the Internet expanded beyond the initial handful of nodes, there was no way to control its far-flung activities. Instead, DARPA, and later NSF, took responsibility for the backbone. Initially, this meant ensuring capacity and paying for it. Later, as growth made it all but impossible to distinguish NSF's part of the backbone from the amazing proliferation of other high-speed links, and as the ascent of commercial use made subsidization inappropriate, the Internet was left to its own devices. Two organizations have a role in keeping it going. One, the Internet Engineering Task Force, an international standards group, deals with advances in technology, most importantly, addressing and routing. Another, Network Solutions, Inc., was initially responsible for the allocation of domain names (this function has since been privatized). Various academic groups, working with NSF and other government agencies, are actively developing the next-generation Internet, variously known as Internet2, and the NGI (Next Generation Internet). The basic idea behind Internet2 is to build a separate network, employing the same protocols but using very high speed lines and routers or switches.

Current Architecture

There is a standard architectural description of the Internet that uses four levels. This discussion approaches these from the bottom up, beginning with Level 4. We'll describe each level as a discrete entity, but remember that real-world companies span these categories in a variety of ways. Some companies, like WorldCom's Sprint subsidiary, exist at all four levels. Others, like some of the old telephone companies, span two or three of the levels. In fact, there has been considerable consolidation and this trend is likely to accelerate.

Level 4: Internet Service Providers • Level 4 is the Internet Service Provider, usually called the ISP. An ISP is the business that connects directly with the user. ISPs come in a variety of sizes, from a couple of teenagers with some gear in thegarage to huge, multinational corporations. There are about 4,500 in the U.S. at the moment, with a typical size being about 3,000 subscribers. We'll focus on the small side for our example. Think of Pete's Texaco, Video Rental, and Internet Service and his 500 customers (there really are a lot of companies of this kind still out there). This typical small ISP will have a telephone number that its customers use. The customers' modems dial the ISP through the telephone network as if making a regular call. At the ISP's place of business, the call is answered by a modem. The modem is connected not to a computer, but to a router (remember that a router is a special kind of computer). The router's very straightforward job is to take the IP packets and, except for the rare case when they are directed to one of Pete's other customers, to put them on an outgoing line to the next layer of the Internet, Level 3, or the Regional Network (see Figure 14.1). The router doesn't have to be running a very sophisticated routing protocol, nor does it need a very fast CPU or a lot of memory.

Before we go on to Level 3, some notes about the ISP. Access doesn't have to be exclusively by regular phone line and modem. In some cases, the customer will have an ISDN or Frame Relay connection, or a leased line through the telephone network, to the ISP. Also, the ISP may make an arrangement with the phone company (or companies) so that a call from the customer to the ISP appears to the caller to be a free local call, even if the ISP is in a different telephone service area.

The ISP's office may also have a number of arrangements to handle incoming calls more effectively. It likely has set up a "hunt group" with the phone company. This simply means that a call to the main number, for example 481-3425, is automatically rolled over to the ISP's other lines if that number is busy. This way, the ISP can give out one number, but have a dozen or more actual connections to the telephone system. The ISP will also likely need a terminal server once the number of modems exceeds two or three. A terminal server is a device that connects an array of serial devices (modems in this case) to the router. The terminal server multiplexes the incoming lines into one high-speed connection. There are a variety of other electronic devices and software systems that an ISP will need as business grows. For example, there must be a way to manage billing and other business functions.

Level 3: Regional Networks • The connection from our small ISP at Level 4 to the Regional Network at Level 3 (see Figure 14.2) is almost certainly a leased line, probably one or more T-1s. T-3s are pretty much limited to the big boys. Let's digress (again) to talk about the size of this line. You'll recall that a T-1 equals 24 digital phone lines, so you would think that our ISP will need about 20 T-1s for his 500 customers. No way. First, all of them won't be using the net at the same time, and the first limiting factor is the number of lines and modems for incoming calls. If there are 48 modems for 500 customers (about the 1 to 10 ratio that is the ISP rule of thumb), then only two T-1s would be needed to accommodate all at once. In fact, even that would be too much. It's very unlikely that all 48 modems could saturate one outgoing T-1. The reason is that computer traffic is bursty and that the connection is packet-switched. If 48 connections are active, it's still probable that fewer than half are then actually sending or receiving packets at any particular second. And, even for that unusual fraction of a second when the circuit is oversubscribed, it's in the nature of packet-switching to just put the packets in buffers and wait until the capacity becomes available. As we've mentioned on a number of occasions, this waiting for a second or so can be a big problem for voice or video, but doesn't trouble normal data transfers like requesting Web pages or other data. Needless to say, ISPs have to think constantly about how to size their systems, especially about such key factors as how many incoming lines and modems, how much router processing power and memory, and how much bandwidth to the Regional Network are needed.

Figure 14.1. The Internet service provider level. 1. In the first stage of our packet's journey across the Internet, it gets sent from the user's modem to the ISP. The packet is sent under TCP/IP, but the protocol is not used for this leg. The link here is through the public switched telephone network (PSTN), which means that it is circuit-switched (and also that it is slow). 2. When the packet reaches the ISP, a modem answers the call and sends the packet to the terminal server, which interleaves it with packets coming in from other modems and sends it to the router. The server manages accounts, billing, etc. 3. Since Pete's ISP is at the bottom tier of the Internet, the router, which works only with TCP/IP, has a small, straightforward routing table. Packets coming in from the terminal server are almost always sent out to the regional network provider, which means that they are addressed for the router's T-1 port. For packets arriving from the Internet, the number of choices is equal to the direct subscribers to Pete's ISP. Thus, both incoming and outgoing addresses can be entered manually into the routing table; there is no need for dynamic routing.


Figure 14.2. From the ISP to the regional backbone. 1. When our packet leaves Pete's place, it goes to the local telephone company central office on the T-1 he has leased. Unfortunately, since Pete's regional network provider doesn't have a point of presence (POP) at this office, Pete has to pay for the T-1 to continue from there to another, more distant central office which is co-located with a POP (or has one very close). Remember that the T-1 is a point-to-point circuit-switched link; it goes through the central office, but is not switched there. POPs are small, perhaps a corner of a switching office or maybe a small building nearby. One sine qua non of a POP is an independent, backup power supply, likely provided by a diesel generator. 2. At the POP, our packet is routed onto the regional network provider's T-3. The router here is heavy-duty in the sense that it handles a lot of packets. It has only one upstream connection (to the T-3), but there are multiple routes for downstreams packets coming in from the Internet to the ISPs and businesses who are connected through this network. Still, there are a fixed number of destinations and paths in both directions, so static routing is adequate. 3. Note that the regional network provider has some customers who don't go through Level 4 ISPs like Pete. Shown here is a large business that has a T-1 from its LAN directly to the regional network's POP. Another example is a small business that has an ISDN connection through the local central office to the POP (which has ports for ISDN as well as T-1 lines).


The Regional Network is owned by a good sized business or organization. No more amateurs. The Regional Network is a collection of leased lines and routers/ switches (see Figure 14.3). On one end, they connect to ISPs; on the other they are attached, at one or more points, to the Internet Backbone. The Regional Network could be a telephone company, it could be one of the university consortia that developed in the NSF days, or it could be a specialized business. In any case, the lines are almost certainly leased—from one or more telephone companies, from one of the emerging nontelephone fiber optic bandwidth providers, or from some combination of both. The Regional Network doesn't actually use the switched part of the public telephone system, however. Instead, it exists as leased lines tied together with IP routers or (perhaps) ATM switches. The physical place at which an external entity like an ISP connects to this private network is known as a POP (for point of presence), a term that comes from telephone company jargon. In the telco world, a POP is the point where a local telephone system connects to a long-distance carrier.

Tech Talk

Point of Presence (POP): A point of presence (POP) is the location where one network touches another. The term is borrowed from telephone company jargon, in which a POP is the place where the local network connects to the long-distance system.


The Regional Network normally bills its customers (the ISPs) according to the speed (capacity) of their connections to it. Often, however, the simple size-of-the-pipe charge is modified by some traffic-based considerations. For example, a carrier may offer T-3 service, then sample the traffic at regular intervals (e.g., every 15 minutes). The average of these samples would then be the basis for the monthly charge. Fractional T-1 and T-3 connections are also offered, but remember that the available bandwidth depends on the electronics. If the router at the network POP can't handle more than, for example, half of a T-3, it doesn't matter that the line itself will accommodate T-3. To get the higher bandwidth, you have to upgrade the electronics.

Figure 14.3. A regional Internet network provider. 1. The regional network is a collection of routers and high speed lines. It has its own backbone, comprising T-3s and a few T-1s as tributaries. Each router represents a city, with T-3s in the large ones and T-1s in smaller centers. Of course, there might be cities with more than one router. 2. In this example the backbone is meshed, but not fully meshed. This means that there are multiple connections, but that there isn't a direct path from every node to every other node. 3. A regional network provider probably leases the circuits from a telco (either a local or long-distance carrier, or both), but provides its own routers. More likely than not, the T-3s are not individual lines, but circuits on a SONET ring. 4. Our regional network has two connections to the Internet backbone, using two different providers. This allows for redundancy and load balancing. 5. At this level of the network, it is easy to see that there are many potential paths that our packet might take, and that the routers have to be able to both evaluate options very quickly and handle a heavy packet load. Thus, the routers use dynamic protocols based on continuous router-to-router communications.


The table below shows the charges from Sprint to its customers as of 1997. You'll see that there are substantial discounts for volume, but that none of the connections are cheap. Note that these rates are for access to the network at one of Sprint's POPs. The cost of a line from the customer's place of business to the POP is additional. Connections could be directly from end users as well as from ISPs, since a company like Sprint is at once an ISP, a regional network provider, and a backbone carrier.

The Cost of Connecting to the Backbone
Size of pipeMonthly
T-3$20,620
12 Mbps (8xT-1; Fractional T-3$13,839
T-1$2,216
128 Kbps (Fractional T-1)$986
Source: Boardwatch Magazine Web site.

Routers in Regional Networks don't have the simple gateway responsibility that an ISP's routers have. At this level, thousands of user connections are aggregated. Traffic can go in a number of different directions, and the routers have to use the more sophisticated exterior protocols, such as Border Gateway Protocol, to manage links.

Regional Networks obviously need very fast connections to Level 2, the Internet Backbone. As the Internet has grown in size and complexity, it is common for these Level 3-to-Level 2 connections to exist at more than one point. The reasons for multiplicity, called multihoming, include redundancy in the event of outages as well as the ability to balance traffic—something that itself helps to prevent Internet outages or their increasingly common little brothers, "Internet brownouts."

Tech Talk

Multihoming: A network or computer that has two or more connections to the Internet or other wide area network is said to be multihomed.


Level 2: The Backbone • Internet Backbone providers are all large businesses. MCI, now part of WorldCom, is the long-time leader. Others are Sprint and GTE. A company like MCI/WorldCom is a full service provider; it owns not only switches and routers, but also the fiber that connects them. In other cases, companies own routers but lease fiber, or vice versa. The highest speed backbone links, which are typically ATM over SONET, have now reached 2.48 Gbps. They will quickly scale beyond that. Figure 14.4 illustrates a backbone.

Level 1: The Network Access Point • What could be higher than the backbone? Well, actually nothing, but remember that there is now more than one backbone provider. If packets are to cross from one to another of these super-capacity lines, they have to connect. This happens at Level One, Network Access Points (NAPs). These were initially set up on contract basis by NSF. The locations of the original four are San Francisco, Chicago, New York, and Washington, D.C. It's not that simple, of course. There were already other such connection points when NSF launched the NAPs, and these continue to exist. One example is the so-called MAEs, Metropolitan Area Ethernets. These are links provided by telephone companies or Competitive Access Providers (CAPs) to serve high-volume data needs for businesses. Backbone providers realized that these Ethernets would provide a cheap and easy way of exchanging traffic; instead of sending packets to Chicago, you could swap them right there in Houston. In the grand tradition of cutesy naming, these MAEs have handles like MAE West. In fact, most of the exchange of backbone data now occurs at private peering points, which are NAP-like, large-scale, high-speed interchanges. In addition, since Regional Networks often have links to two or more of the backbone networks, routes through these networks provide numerous de facto alternatives to the NAPs. When you put all the possibilities together, you will understand why routing at the backbone and regional levels is a daunting task.

Other Architectural Issues • Before we move on from Internet architecture, we need to mention two issues: peering and the idea of a Level 5. First, peering. Back when NSF ran the backbone, it was all one happy family and packets flowed without passports. Now, with businesses operating backbones for profit, the question of who pays for what becomes of acute importance to the bottom line. Backbone providers charge Regional Networks the same way the regionals charge ISPs—according to the speed of the connection. But this logic loses its value at the backbone level. If MCI and Sprint connect to each other at 622 Mbps at a NAP, do they exchange bills for the service? No. The obvious answer is to call it a wash; that's what peering means.

The problem comes when a littler guy wants to connect under the same rules. MCI and Sprint can reasonably wonder if they are really peers with the small fry. Probably not—the company with the greater capacity is likely giving something away to the one with less bandwidth. The idea that someone would refuse to peer is shocking to collegial traditionalists in the Internet world, but it has happened and it likely presages some very complicated commercial arrangements in the future. In our fiercely competitive business world, we might worry that the top-level backbone providers would fall into conflict and that Internet users would be confronted by systems refusing to carry each other's traffic. Fortunately, this kind of warfare isn't technically feasible; if vendors engaged in this kind of fighting, they would render their product useless.

Figure 14.4. An Internet backbone. 1. This Internet backbone provider uses a mesh of ATM switches for its core. TCP/IP packets enter from routers, are converted to ATM cells at the switch, and are then sent through the ATM system (on a permanent virtual circuit) to a switch at the appropriate exit point. Finally, they are reassembled as TCP/IP packets and sent on to a router. There are, of course, far more connections to the core than are shown here. 2. The backbone provider may be a long distance telephone company that owns the fiber links, or it may be an independent company that leases them, or some combination of both. Switches are in major metropolitan areas. Router-based POPs that feed into the ATM mesh are distributed around the world. 3. The backbone is fully meshed. The connections needn't be all at the same speed, but are likely OC-48 or os (2.48 Gbps). 4. This backbone is connected to others at two different kinds of peering points— the official ones, the Network Access Points (NAPs), and also at some Metropolitan Area Ethernets (MAEs).


The idea of an Internet Level 5 is based on the fact that many of the users who connect to ISPs are actually organizations that in turn provide access to many users, often installing routers of their own, etc. Whether or not you agree to call these local networks part of the Internet, the fact is that they are a critically important part of the overall phenomenon.

Addressing • Internet addresses are IP numbers. These are 32-bit numbers that can be organized in four different ways for different classes (kinds) of networks. IP classes are an issue of interest to technical managers only, so we won't go into it. The length of the address is a more general concern, however. As you know, 232 yields around 4 billion addresses, which seems like a lot but isn't, as the Internet expands. It's probable, for example, that not only computers as we think of them, but also television set-top boxes, cell phones, cars, and such, will all want their own unique address on the Internet before long. Thus, the IETF is preparing to release IPv6, which will support a heady 128-bit address space. This will provide a number big enough to handle the needs of this and a few other worlds. IPv6 will also make available ATM-like QoS for IP networks. Another feature will deal with a very important weakness in IP—variable packet size; IPv6 networks will be able to establish end-to-end connections to determine the optimum packet size for a given path. Once this is done, packets will no longer need to undergo the slow process of being sawn apart and then stitched back together, as occurs when ATM is used in the "cloud;" this should reduce latency and thereby improve the quality of voice and video transmissions considerably.

Tech Talk

IPv6: The current version of the Internet Protocol (IP) is version 4, written IPv4. The next generation, now being introduced, is IPv6. IPv6 has many advantages, including the capacity for far more separate numbers and the ability to maintain quality of service (QoS).


The existing IP protocol and packet structure are flexible and can accommodate changes; there is even a field in the packet for version number. Still, it won't be easy to phase in such a big change as the one to IPv6. The most likely scenario is that once everything is agreed on, vendors will slowly make their new product lines v6-compliant. The two systems will coexist for a while. After perhaps five years or so, when it will be clear that all regional and backbone routers, switches, and stacks are compliant, the changeover at the ISP and user levels should begin to happen very quickly.

Another important element to understand in addressing is domain name service (DNS). This simply refers to software (a database) that translates a Web or e-mail address into the numbers that are used to route packets through the Internet. DNS software is normally located on servers at the periphery of the network; for example, an ISP will likely maintain a DNS server. There is a tradeoff here. On the positive side, distributing DNS servers widely minimizes the amount of traffic needed for this simple service. On the other hand, distribution means that there must be an effectively implemented process for updating these remote servers; at least one major Internet problem has been a consequence of flawed DNS tables being circulated from router to router.

The Web

The Internet, for all its success, represented a significant discontinuity in computing. While microcomputers began the shift to graphical operating systems in 1984, and completed it about 1991, the Internet continued to be a character-based, colorless communications channel. People who were used to writing documents with a variety of fonts, and who built their businesses on spreadsheets that included sophisticated charts and graphs, loved e-mail but couldn't understand why they weren't able easily to exchange these documents with their friends and colleagues in distant locations. They also chafed at the Internet's interface, which was in almost all cases the same as that of terminals from the 1970s. People were thus more than ready for the appearance of graphical network software.

The World Wide Web, which was based on software developed by Tim Berners-Lee of CERN (Centre Européenne pour la Recherche Nucléaire, officially the European Laboratory for Particle Physics in English), exploded on to the software scene in the spring of 1993. While the advent of the Web represents a critical, seminal event in computer and communications history, it was no technical feat at all. What Berners-Lee did was entirely within the capability of existing operating systems, applications software, and networks. The tools were all there, lying around on the ground. But creativity does not depend on something new; Berners-Lee used the tools in a way that was both elegant and practicable.

The essence of the Web is twofold: 1) software that is optimized to exchange binary information across networks using TCP/IP, and 2) software that can assemble and reassemble this binary information into documents at either end of the network link. It doesn't sound like much, and it really isn't, but it created a revolution.

The software that exchanges binary information on top of TCP/IP is called HTTP, for Hypertext Transfer Protocol. HTTP has several elements. One is an addressing capability. TCP/IP finds the computer and its communications socket, and HTTP is responsible for moving data back and forth to the appropriate applications software—usually the Web browser.

Tech Talk

HTTP: The Hypertext Text Transfer Protocol (HTTP) is responsible for moving documents across the Web. The Universal Resource Locater (URL) in your Web browser, e.g., www.prenhall.com, is part of HTTP.


HTML

In addition to a transfer protocol, the Web uses a standard document formatting language—HTML (Hypertext Markup Language). HTML is a subset of an incredibly complex language created by the publishing industry—SGML (Standard Graphics Markup Language). HTML is much less sophisticated, which is a good thing since ordinary people would go crazy trying to understand SGML. HTML's formatting is, in fact, fairly primitive. It includes commands that tell you how to organize the page—for example, where text is vis à vis other elements such as margins. It also has an easy to use mechanism for embedding graphics.

Tech Talk

HTML: The Hyper Text Markup Language (HTML) is the programming language used to create Web documents. HTML uses a system of tags (visible in most browsers by clicking on something like View Page Source on the menu bar), that define a page's layout.


XML

The Worldwide Web Consortium (www3.org) has developed a specification for a new approach called XML. This new language is also a dumbed-down version of SGML, but is much smarter than HTML. XML has most of SGML's flexibility and power, but leaves out its most arcane elements. The key to XML is that it is a metalanguage; this means that it is a language that can be used to describe other languages. For example, the sender of an XML document provides the receiver with a new set of document formatting information (a DTD or Document Type Definiton). These use commands drawn from SGML. An XML DTD has many of the characteristics of a program; in addition to telling the receiver how to display the information, it can include commands for manipulation—for example, linking to a database. The hope is that with XML, those who want to do really complicated documents will be able to do so, but those who only want to publish basic information will not have a more complicated task than they now have with HTML. This is a logical approach. Together with revised versions of HTML proper, and such extensions as Dynamic HTML (see below), XML should allow a more responsive and flexible Web.

Tech Talk

XML: The eXtensible Markup Language (XML) offers the possibility of creating Web documents that are far more powerful and flexible than HTML. XML documents can provide specialized commands that ease the manipulation of data sets.


Tech Talk

DTD: XML uses a Data Type Definition (DTD) to allow one Web source to tell another how to read and work with specialized information.


HTML Compared to XML

The difference between these two seems subtle to most people, and to some extent it is, but the impact of implementing XML will be far from trivial.

Think of HTML as a fairly simple language that everyone has learned—in this case, everyone refers to Web browsing software. If you, as the originator of an HTML document, restrict yourself to the commands of that language, you know that the machine at the other end will be able to understand it. The problem is that HTML is simple. It doesn't, for example, describe how to manage a document that in turn connects to a database—a key function of e-commerce. How to solve this? One approach would be to replace HTML with a huge, complex language that could satisfy everyone's needs. This presents several problems, however. One is that the new language would cause the browser to use enormous amounts of storage and memory, making it difficult to deploy on the increasingly popular smaller devices. Second, no matter how big the vocabulary became, there would always be additions as systems evolved. Keeping everything up-to-date would be a tremendous challenge.

Essentially, what XML does is equip every browser with the ability to learn not so much a new language as a new dialect that is based on some general rules. The new dialect, which is communicated in the Document Type Definition (DTD) when a Web page is accessed, then becomes the basis for efficient, specialized communication. When the new dialect is no longer needed on the receiving computer, it can be dumped (or stored if it is required frequently). Java can do the same things as XML, but the consensus at the moment is that XML is more efficient for e-commerce kinds of applications.


Making the Web Go Faster

The accelerating popularity of the Web has resulted in cynical comments about the "world wide wait." Now that the Web is perceived to be the mechanism on which an entire new generation of entertainment and commercial activities will be built, speeding it up has become a major societal undertaking. The most important factor in a faster Web, of course, is faster communications links. At the top end of the Net, a continuing stream of improvements has allowed systems to stay up with, and in some cases move ahead of, the increasing traffic. Technologies such as ATM switches, ASIC-based routers, and dense and ultra-dense wave division multiplexing are driving enormous increases in backbone and regional network capacity. Cable modems, xDSL loops, and various wireless strategies offer the promise of better home and business access to this speedy network core. Unfortunately, most of these improvements are irrelevant to the average user with a 33.6 dial-up modem. For the vast majority of Web surfers, this will be the fastest they can go for at least the next few years. What can we do about making bits move faster over a slow line?

One way to speed things up, of course, is to improve compression so that less is actually transmitted. Web browsers have been pretty good at compression from the beginning, but more can be done. Currently, HTML graphics are usually in GIF (Graphics Image File) format, a bitmap structure that provides an excellent compression ratio. But, because the average computer now has processing power to spare, a move to vector images should occur quickly. These will not only be smaller and therefore faster to transmit, they will also allow for better quality at varying resolutions. Newer compression techniques, such as wavelets, will allow even better graphics in less time. There are some significant barriers here, though. Since wavelets are very computationally-intensive even for decompressing, better algorithms and faster hardware will be important.

Tech Talk

Wavelets: A very powerful approach to compressing graphical images uses a technique called wavelets. Wavelet-based compression builds on a complex mathematical technique originally applied to the analysis of music.


An especially promising way of making the Web go faster in the short term is the use of more intelligent software methods. The current HTML page over HTTP approach is amenable to lots of improvements. First, consider that getting one Web page often requires more than one TCP connection (end-to-end negotiation). The page itself needs a connection, as does each graphic embedded in the page. This makes for a lot of network overhead. In the long run, improvements to the network architecture, such as label switching, will minimize this busy work. In the interim, though, simple caching can provide tremendous benefits. In Web caching, your ISP will retain many frequently used pages in the disk cache of its local servers. If you use Netscape as your home page (millions do, and Netscape and its owner, AOL, reap big advertising bucks as a result), you will get the page not from Netscape but from your ISP's server. The local server, called a proxy server, gets updates from Netscape as needed, so what you see will (almost) always be current. The proxy can also do the kind of read-ahead caching that CPUs and other hardware systems do. Given the appropriate software at both ends, the proxy can guess what a client's next request will be and pull it into its cache while the previous page is being read.

Tech Talk

Proxy server: A server that functions in place of another is called a proxy server. They can have many functions, including security and serving as a network cache.


Even better than compression and caching is not to send information at all. This idea animates much of the popularity of Java applets, Active X controls, XML, and similar extensions to HTML (sometimes generically referred to as Dynamic HTML, or DHTML). To illustrate, consider Web servers used for database access. In the early days of the Web, when you looked up something on a Web database, for example, a catalog of music CDs, this is what happened when you sent a query to the server: First, the Web server software intercepted the query and sent it to a program using a scripting language running on the server (e.g., Perl). This program then used a protocol, called the Common Gateway Interface, to send a request to the database program. Once the database had looked up the information and returned it to the program, a new Web page was created by the Web server software and sent to the client. This process was slow on the server (it created lots of additional processes and/or threads) and resulted in a great deal of HTTP and TCP/IP traffic.

Tech Talk

Dynamic HTML: A variation of HTML, Dynamic HTML, allows designers considerable additional flexibility in page layout as well as the opportunity to add things like animation.


In its simplest form, Dynamic HTML is a way of getting around the limitations of HTML. Let's say you have a Web site and want to do something cool, like animation on your customers' computers. Doing it with HTML means that you will need to send a long string of verbose instructions requiring a lot of TCP/IP activity. DHTML, on the other hand, is similar to XML in that you now send a compact set of instructions to your DHTML-aware browser and it uses these to do the animation.

The new approach works by sending a Java applet (or an Active X control) to the client when a database page is first requested. This makes the beginning of the session slower as the extra code is pulled over the wire. Once downloaded, however, things proceed differently. A database request now goes first to the client side of the Java applet, which formats it for greatest efficiency. Once on the server, the server side of the Java application uses a driver to talk directly to the database. The report is then sent out to the client. It isn't necessary to create an entirely new page on the server and send it through a bunch of HTTP and TCP/IP connections. Instead, the client side of the Java applet takes the incoming data and generates or revises thepage as needed. This process can be even more powerful with distributed objects. The client side of the Java applet can grow and extend itself by simply pulling in a few remote objects (Java "beans"). If the Java applet is cached on the client hard disk, then using the same or similar database again later would mean that a substantial amount of information could be retrieved with very little use of network bandwidth.

Virtual Private Networks, Firewalls, and the Concept of an Intranet

Two terms that have cropped up since the Internet explosion to add to the general linguistic fog are intranet and virtual private network. The first simply refers to a network, normally one used by a business, that employs all of the Internet standards (TCP/IP), but that either is not physically a part of the Internet or is isolated from it in some way. A linked collection of TCP/IP-based LANs on a corporate campus would be an intranet. Given the explosion of inexpensive and well understood TCP/IP-based software and hardware that has followed the success of the Internet, it makes a lot more sense for a business to organize its network in this way rather than with proprietary protocols like SNA, DECnet, or Novell's IPX/SPX.

Tech Talk

Intranet: A private network, separate from the Internet, but that uses the Internet's TCP/IP structure, is called an intranet. This term is used very loosely.


One approach to keeping an intranet separate from the Internet is to use a firewall. A firewall is simply a router that will only pass certain kinds of packets. The most common use of a firewall is to hide IP addresses. For example, when a node on a LAN sends a request for a file across the Internet, the firewall intercepts the request and replaces the node's IP address with its own. From the outside, the only IP address that is visible is that of the firewall; the nodes on the LAN don't exist. Hiding the individual nodes and directing all traffic to the well-protected firewall significantly improves security. The chances of damage from a virus or similarly destructive piece of alien software are therefore greatly reduced.

Tech Talk

Firewall: A firewall is a server that sits at the interface of a local and a wide area network in order to provide some security function. Typically, a firewall hides the computers on a LAN so that outsiders cannot see and attack them.


Another way to use the Internet without exposing yourself to its security dangers is to create a virtual private network (VPN). In a VPN, two physically separated corporate sites would use the Internet as a link, but all packets exchanged between the two would be encrypted, and only encrypted packets (see the next section) of aspecific kind could enter either of the sites. In this way, for a modest investment inencryption software, a business can take advantage of the Internet's ubiquity and low cost without the danger of having its security breached. Many Internet carriers, such as MCI, Sprint, and GTE, offer VPN services to customers. Packets leave a business site on a leased line. At the carrier's POP, they are encrypted before going onto the Internet. Decryption is accomplished by the carrier on the other end. Purchasing VPN services from a carrier allows medium and small businesses to have a high level of security without investing in the expertise needed to support the ciphering systems. The appeal of VPNs is strong—the cost of T-1 access to the Internet is a fraction of a coast-to-coast leased T-1, and the Internet offers a lot more flexibility.

Tech Talk

Virtual private network: A virtual private network (VPN) is one that uses the Internet protocols and connections, but encrypts packets at the entry and exit points to provide increased security.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.69.152