The Layers of a Network

Sending data across a network is a complex operation that must be carefully tuned to the physical characteristics of the network as well as the logical character of the data being sent. Software that sends data across a network must understand how to avoid collisions between packets, how to convert digital data to analog signals, how to detect and correct errors, how to route packets from one host to another, and more. The process becomes even more complicated when the requirement to support multiple operating systems and heterogeneous network cabling is added.

To make this complexity manageable and to hide most of it from the application developer and end user, the different aspects of network communication are separated into multiple layers. Each layer represents a different level of abstraction between the physical hardware (e.g., wires and electricity) and the information being transmitted. Each layer has a strictly limited function. For instance, one layer may be responsible for routing packets, while the layer above it is responsible for detecting and requesting retransmission of corrupted packets. In theory, each layer talks only to the layers immediately above and immediately below it. Separating the network into layers lets you modify or even replace one layer without affecting the others as long as the interfaces between the layers stay the same.

There are several different layer models, each organized to fit the needs of a particular kind of network. This book uses the standard TCP/IP four-layer model appropriate for the Internet, shown in Figure 2.1. In this model, applications such as Netscape Navigator and Eudora run in the application layer and talk only to the transport layer. The transport layer talks only to the application layer and the internet layer. The internet layer in turn talks only to the host-to-network layer and the transport layer, never directly to the application layer. The host-to-network layer moves the data across the wires, fiber-optic cables, or other medium to the host-to-network layer on the remote system, which then moves the data up the layers to the application on the remote system.

The layers of a network

Figure 2-1. The layers of a network

For example, when a web browser sends a request to a web server to retrieve a page, it’s actually talking only to the transport layer on the local client machine. The transport layer breaks up the request into TCP segments, adds some sequence numbers and checksums to the data, and then passes the request to the local internet layer. The internet layer fragments the segments into IP datagrams of the necessary size for the local network and passes them to the host-to-network layer for actual transmission onto the wire. The host-to-network layer encodes the digital data as analog signals appropriate for the particular physical medium and sends the request out the wire, where it will be read by the host-to-network layer of the remote system to which it’s addressed.

The host-to-network layer on the remote system decodes the analog signals into digital data, then passes the resulting IP datagrams to the server’s internet layer. The internet layer does some simple checks to see that the IP datagrams aren’t corrupt, reassembles them if they’ve been fragmented, and passes them to the server’s transport layer. The server’s transport layer checks to see that all the data has arrived and requests retransmission of any missing or corrupt pieces. (This request actually goes back down through the server’s internet layer, through the server’s host-to-network layer, and back to the client system, where it bubbles up to the client’s transport layer, which retransmits the missing data back down through the layers. This is all transparent to the application layer.) Once the datagrams composing all or part of the request have been received by the server’s transport layer, it reassembles them into a stream and passes that stream up to the web server running in the server application layer. The server responds to the request and sends its response back down through the layers on the server system for transmission back across the Internet and delivery to the web client.

As you can guess, the real details are much more elaborate. The host-to-network layer is by far the most complex, and much has been deliberately hidden. For example, it’s entirely possible that data sent across the Internet will actually be passed through various routers and their layers before reaching its final destination. However, 90% of the time your Java code will work in the application layer and will need to talk only to the transport layer. The other 10% of the time you’ll be in the transport layer and talking to the application layer or the internet layer. The complexity of the host-to-network layer is hidden from you; that’s the point of the layer model.

Note

If you read the network literature, you’re also likely to encounter an alternative seven-layer model called the Open Systems Interconnection (OSI) Reference Model. For network programs in Java, the OSI model is overkill. The biggest difference between the OSI model and the TCP/IP model used in this book is that the OSI model splits the host-to-network layer into data link and physical layers and inserts presentation and session layers in between the application and transport layers. The OSI model is more general and better suited for non-TCP/IP networks, though most of the time it’s still overly complex. In any case, Java’s network classes work on only TCP/IP networks and always in the application or transport layers, so for purposes of this book, nothing is gained by using the more complicated OSI model.

To the application layer, it seems as if it is talking directly to the application layer on the other system; the network creates a logical path between the two application layers. It’s easy to understand the logical path if you think about an IRC chat session. Most participants in an IRC chat would say that they’re talking to another person. If you really push them, they might say that they’re talking to the computer, (really the application layer), which is talking to the other person’s computer which is talking to the other person. Everything more than one layer deep is effectively invisible, and that is exactly the way it should be. Let’s consider each layer in more detail.

The Host-to-Network Layer

As a Java programmer, you’re fairly high up in the network food chain. A lot happens below your radar. In the standard reference model for IP-based Internets (the only kind of network Java really understands), the hidden parts of the network belong to the host-to-network layer (also known as the link layer, data link layer, or network-interface layer). The host-to-network layer defines how a particular network interface, such as an Ethernet card or a PPP connection, sends IP datagrams over its physical connection to the local network and the world.

The part of the host-to-network layer made up of the hardware used to connect different computers (wires, fiber-optic cables, microwave relays, or smoke signals) is sometimes called the physical layer of the network. As a Java programmer you don’t need to worry about this layer unless something goes wrong with it—the plug falls out of the back of your computer, or someone drops a backhoe through the T-1 line between you and the rest of the world. In other words, Java never sees the physical layer.

For computers to communicate with each other, it isn’t sufficient to run wires between them and send electrical signals back and forth. The computers have to agree on certain standards for how those signals are interpreted. The first step is to determine how the packets of electricity or light or smoke map into bits and bytes of data. Since the physical layer is analog, and bits and bytes are digital, this involves a digital-to-analog conversion on the sending end and an analog-to-digital conversion on the receiving end.

Since all real analog systems have noise, error correction and redundancy need to be built into the way data is translated into electricity. This is done in the data link layer. The most common data link layer is Ethernet. Other popular data link layers include TokenRing and LocalTalk. A specific data link layer requires specialized hardware. Ethernet cards won’t communicate on a TokenRing network, for example. Special devices called gateways convert information from one type of data link layer such as Ethernet to another such as LocalTalk. The data link layer does not affect you directly as a Java programmer. However, you can sometimes optimize the data you send in the application layer to match the native packet size of a particular data link layer, which can have some affect on performance. This is similar to matching disk reads and writes to the native block size of the disk. Whatever size you choose, the program will still run, but some sizes let the program run more efficiently than others, and which sizes these are can vary from one computer to the next.

The Internet Layer

The next layer of the network, and the first that you need to concern yourself with, is the internet layer. In the OSI model, the internet layer goes by the more generic name network layer. A network layer protocol defines how bits and bytes of data are organized into larger groups called packets, and the addressing scheme by which different machines find each other. The Internet Protocol is the most widely used network layer protocol in the world and the only network layer protocol Java understands. IP is almost exclusively the focus of this book. IPX is the second most popular protocol in the world and is used mostly by machines on NetWare networks. AppleTalk is a protocol used mostly by Macintoshes. NetBEUI is a Microsoft protocol used by Windows for Workgroups and Windows NT. Each network layer protocol is independent of the lower layers. AppleTalk, IP, IPX, and NetBEUI can each be used on Ethernet, TokenRing, and other data link layer protocol networks, each of which can themselves run across different kinds of physical layers.

Data is sent across the internet layer in packets called datagrams. Each IP datagram contains a header from 20 to 60 bytes long and a payload that contains up to 65,515 bytes of data. (In practice most IP datagrams are much smaller, ranging from a few dozen bytes to a little more than eight kilobytes.) The header of each IP datagram contains these 13 items in this order:

4-bit version number

Always 0100 (decimal 4) for current IP; will be changed to 0110 (decimal 6) for IPv6, but the entire header format will also change in IPv6.

4-bit header length

An unsigned integer between and 15 specifying the number of 4-byte words in the header; since the maximum value of the header length field is 1111 (decimal 15), an IP header can be at most 60 bytes long.

1-byte type of service

A 3-bit precedence field that is no longer used, 4 type-of-service bits (minimize delay, maximize throughput, maximize reliability, minimize monetary cost), and a bit. Not all service types are compatible. Many computers and routers simply ignore these bits.

2-byte datagram length

An unsigned integer specifying the length of the entire datagram, including both header and payload.

2-byte identification number

A unique identifier for each datagram sent by a host; allows duplicate datagrams to be detected and thrown away.

3-bit flags

The first bit is 0; second bit is if this datagram may be fragmented, 1 if it may not be; third bit is if this is the last fragment of the datagram, 1 if there are more fragments.

13-bit fragment offset

In the event that the original IP datagram is fragmented into multiple pieces, it identifies the position of this fragment in the original datagram.

1-byte time-to-live (TTL)

Number of nodes through which the datagram can pass before being discarded; used to avoid infinite loops.

1-byte protocol

Six for TCP, 17 for UDP, or a different number between and 255 for each of more than one hundred different protocols (some quite obscure); see http://www.iana.org/assignments/protocol-numbers for the complete current list.

2-byte header checksum

A checksum of the header only (not the entire datagram) calculated using a 16-bit one’s complement sum.

4-byte source address

The IP address of the sending node.

4-byte destination address

The IP address of the destination node.

In addition, an IP datagram header may contain from to 40 bytes of optional information used for security options, routing records, timestamps, and other features Java does not support. Consequently, we will not discuss these here. The interested reader is referred to TCP/IP Illustrated, Volume 1, by W. Richard Stevens for more details on these fields. Figure 2.2 shows how these different quantities are arranged in an IP datagram. All bits and bytes are big-endian, from most significant to least significant from left to right.

The structure of an IPv4 datagram

Figure 2-2. The structure of an IPv4 datagram

The Transport Layer

Raw datagrams have some drawbacks. Most notably, there’s no guarantee that they will be delivered. Furthermore, even if they are delivered, they may have been corrupted in transit. The header checksum can detect corruption only in the header, not in the data portion of a datagram. Finally, even if the datagrams arrive uncorrupted, they do not necessarily arrive in the order in which they were sent. Individual datagrams may follow different routes from source to destination. Just because datagram A is sent before datagram B does not mean that datagram A will arrive before datagram B.

The transport layer is responsible for ensuring that packets are received in the order they were sent and making sure that no data is lost or corrupted. If a packet is lost, then the transport layer can ask the sender to retransmit the packet. IP networks implement this by adding an additional header to each datagram that contains more information. There are two primary protocols at this level. The first, the Transmission Control Protocol (TCP), is a high-overhead protocol that allows for retransmission of lost or corrupted data and delivery of bytes in the order they were sent. The second protocol, the User Datagram Protocol (UDP), allows the receiver to detect corrupted packets but does not guarantee that packets are delivered in the correct order (or at all). However, UDP is often much faster than TCP. TCP is called a reliable protocol; UDP is an unreliable protocol. Later we’ll see that unreliable protocols are much more useful than they sound.

The Application Layer

The layer that delivers data to the user is called the application layer. The three lower layers all work together to define how data is transferred from one computer to another. The application layer decides what to do with that data after it’s transferred. For example, an application protocol such as HTTP (for the World Wide Web) makes sure that your web browser knows to display a graphic image as a picture, not a long stream of numbers. The application layer is where most of the network parts of your programs spend their time. There is an entire alphabet soup of application layer protocols; in addition to HTTP for the Web, there are SMTP, POP, and IMAP for email; FTP, FSP, and TFTP for file transfer; NFS for file access; NNTP for news transfer; and many, many more. In addition, your programs can define their own application layer protocols as necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.166.7