25. Networking in Java

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 25. Networking in Java

Everything You Need To Know about TCP/IP but Failed to Learn in Kindergarten
A Client Socket in Java
Sending Email by Java
A Server Socket in Java
HTTP and Web Browsing: Retrieving HTTP Pages
A Multithreaded HTTP Server
Further Reading
Exercises
Some Light Relief—500 Mile Limit on Email

	“If a packet hits a pocket on a socket on a port, and the bus is interrupted and the interrupt's not caught, then the socket packet pocket has an error to report.”
	-- Programmer's traditional nursery rhyme

The biggest barrier to understanding Java networking features is getting familiar with network terms and techniques. If you speak French, it doesn't mean that you can understand an article from a French medical journal.

Similarly, when you learn Java, you also need to have an understanding of the network services and terminology before you can write Internet code. So this chapter starts with the basics of TCP/IP networking, Everything You Need To Know about TCP/IP but Failed to Learn in Kindergarten, followed by a description of Java support, starting with A Client Socket in Java.

There is a lot of knowledge in this chapter. After the TCP/IP basics, we'll develop some socket examples. We'll see how a client gets services from a remote server using sockets. Then we will look at server sockets to see how incoming connections are accepted. Our first example will merely print HTTP headers. We will add to it little by little until it is a complete HTTP web server.

Everything You Need To Know about TCP/IP but Failed to Learn in Kindergarten

Networking at heart is about shifting bits from point A to point B. We bundle the data bits into a packet, and add some more bits to say where they are to go. That, in a nutshell, is the Internet Protocol or IP. If we want to send more bits than will fit into a single packet, we can divide the bits into groups and send them in several successive packets. The units that we send are called user datagrams or packets. Packet is the more common term these days.

User datagrams can be sent across the Internet using the User Datagram Protocol (UDP), which relies on the Internet Protocol for addressing and routing. UDP is like going to the post office, sticking on a stamp, and dropping off the packet. IP is what the Postal Service does to sort, route and deliver the packet. Two common applications that use the UDP are: SNMP, the Simple Network Management Protocol, and TFTP, the Trivial File Transfer Protocol. See Figure 25-1.

Figure 25-1. IP and UDP (datagram sockets)

When we send several pieces of postal mail to the same address, the packages might arrive in any order. Some of them might even be delayed, or even on occasion lost altogether. This is true for UDP too; you wave goodbye to the bits as they leave your workstation, and you have no idea when they will arrive where you sent them, or even if they did.

Uncertain delivery is equally undesirable for postal mail and for network bit streams. We deal with the problem in the postal mail world (when the importance warrants the cost) by paying an extra fee to register the mail and have the mail carrier collect and bring back a signature acknowledging delivery. A similar protocol is used in the network work to guarantee reliable delivery in the order in which the packets were sent. This protocol is known as Transmission Control Protocol or TCP. Some applications that run on top of (i.e. use) TCP are: FTP, the File Transfer Protocol, SMTP (sending email), POP3 (downloading email from server), IMAP (manipulation of email on the server), HTTP (requests from a browser and fulfillment by a web server), and Telnet.

What is your IP address?

On Unix workstations including MacOS and Linux, you can run the “ifconfig” (interface configuration) program to find out your IP address.

On Windows 2K and XP, you can run “ipconfig” to get some of the information. Type this in a command tool:

c:> ipconfig/all

On 9x, the command is “winipcfg”. It will pop up a window that lists the host name, IP address, subnet mask, gateway, and even the MAC address of your network card. The MAC (Media Access Control) address is the address burned into ROM on your network interface card. It is not used in TCP/IP because, unlike IP addresses, it does not have a hierarchy of Internet/WAN/LAN/switch/host. To route packets using MAC addresses, each router would need a list of every MAC address in the world. The very last hop of a packet (from a switch to a computer) is addressed to the computer's MAC address.

An IPv4 address looks like:	207.142.131.236 - 32 bits in 4 bytes separated by periods
An IPv6 address looks like:	1080:0:0:0:0:800:0:417A - 128 bits in eight groups of 4 hex digits
A MAC address looks like:	E0:0A:42:F3:56:25 - 48 bits, in six pairs of hex digits

TCP uses IP as its underlying protocol (just as UDP does) for routing and delivering the bits to the correct address. The “correct address” means the IP address; every computer on the Internet has an IP address. However, TCP is more like a phone call than a registered mail delivery in that it supports an end-to-end connection for the duration of the transmission session. It takes a while to set up this stream connection, and it costs more to assure reliable sequenced delivery, but the cost is usually justified. See Figure 25-2.

Figure 25-2. TCP/IP (stream sockets)

The access device at each endpoint of a phone conversation is a telephone. The access object at each endpoint of a TCP/IP session is a socket. Sockets were developed in Berkeley, Calif in the late 1970's for Berkeley Unix. Today, every operating system has adopted IP and Berkeley Unix sockets. Sockets are connection endpoints between processes on (usually) different machines connected by a TCP/IP network.

Please do not teach students poor acronyms

There is an architectural model of networking, known as the ISO seven-layer model. It doesn't exactly match any real network, but it's a good tool for understanding networking. The seven-layer model says that there are seven layers in a network connection. Each layer only talks to the layer immediately above or below it, but its communications are directed to the same layer on the remote computer. Together they form what the marketing droids call an “IP stack” though it's really a FIFO-queue of course. The layers are shown in Table 25-1. Read it from the bottom up.

Table 25-1. The ISO seven-layer model for networks

Layer	Layer name	Description
	Application	This layer defines protocols for programs to communicate. HTTP is an application layer protocol
	Presentation	This layer does any necessary character set conversion (such as Unicode to ASCII) so two systems can talk.
	Session	This is the layer that sets up, maintains, then tears down the active connection between two users. The connection stays in place even if not continuously sending data. Each endpoint of a session is a socket.
	Transport	It sets up a logical connection with the remote host, and sends data over it. It manages the flow of data so neither end is overrun.
	Network	The most complex layer. It maintains a connection between two endpoints. This layer handles IP addresses and packets, and does addressing and routing.
	Datalink	This layer splits a transmission into frames with a MAC address. It provides device independence and the appearance of error-free transmission to the layers above.
	Physical	An example of a physical layer is the 100 base T ethernet wiring standard.

People use the acronym All People Seem To Need Data Processing to remember the seven layers, but I think it makes more sense to consider the stack from the bottom up, so Please Do Not Teach Students Poor Acronyms.

The TCP/IP four-layer reality

TCP/IP is actually built from four layers:

4 The application layer (the networking protocols like HTTP, POP3 and IMAP).

3 The transport layer (TCP and UDP),

2 The network layer (packets and IP addresses),

1 The datalink or link layer (frames and MAC addresses),

Protocols built on IP

There are three big protocols built on top of IP. Both ends of the protocol have an OS data structure called a socket. A socket does the same job for an IP connection that a telephone handset does for a phone conversation: it is an object that makes it convenient to send and receive. In the case of a phone handset we send and receive noises. In the case of an IP connection, we send and receive bytes.

IP supports the following protocols, using socket connections:

Slower, reliable delivery using TCP (this is termed a stream socket).
Faster but unguaranteed delivery using UDP (this is a datagram socket).
Fast raw bits using ICMP (Internet Control Message Protocol) datagrams. They are not delivered to the application layer at all. Their purpose is to ask one of the lower layers at the remote end to do something or respond in some way.

ICMP is a low-level protocol for message control and error reporting. It uses IP packets, but its messages are directed at the IP software itself and don't come through to the application layer. Java doesn't support ICMP and we won't say anything more about it.

Client versus server sockets

A client socket is different from a server socket. The client socket is good at asking for something, while the server socket is good at listening for requests. In the phone world, a server socket is the equivalent of a call center that only takes calls and never initiates outgoing calls. A client socket is the equivalent of someone who dials into the call center for support. The phone world is a great analogy for understanding many things about networked communication (because the phone world is networked communication, but of a type where everyone is familiar with the end point features).

Note that the number of socket writes is not at all synchronized with the number or timing of socket reads. A packet may be broken into smaller packets as it is sent across the network, so your code should never assume that a read will get the same number of bytes that were just written into the socket.

IPv4 versus IPv6

The most widely used version of IP today is Internet Protocol Version 4 (IPv4). However, IP Version 6 (IPv6 or IPng) is also beginning to enter the market. IPv6 uses 128 bit addresses, not 32 bit, and so allows many more Internet users. IPv6 is fully backward compatible with (can process packets sent using) IPv4, but it will take a long time before IPv4 is displaced by v6. IPv4 is supported with hardware-based routing at wire speed on 2.5Gb links. IPv6 currently uses software routing.

An IPv4 feature called “Network Address Translation” (NAT) has greatly reduced the pressure to move to v6. A few years ago, it looked like we were going to run out of IP addresses. Today NAT lets your big site have just one assigned address, which you use for the computer with the internet connection. You use any IP address you like for the computers on your side of the firewall. You may be duplicating numbers that someone else uses behind their firewall, but the two systems don't interfere with each other. When you access the internet, NAT updates your packets dynamically. It rewrites your internal IP address in packets changing it into the externally visible one, and rewrites IP addresses in incoming packets so they'll go to your host. A NAT server keeps track of who's doing what, so it knows who should get which packets from the outside. From the outside, it looks like all your traffic is coming from your server computer that runs the NAT service.

Common network hardware

Here are some common pieces of hardware in the network world. When we talk about “layer 3” etc. in the following definitions, it is with reference to the OSI 7-layer model.

Router

A router is a computer with at least two interfaces, connecting it to two different networks. It looks at IP addresses and moves packets from one of these networks to the other (that is the definition of routing) when necessary. Since it looks at IP addresses, by definition it is a layer 3 device. Routers look at the IP headers and consult their forwarding tables to determine the best path for forwarding the packets. They use protocols such as ICMP to communicate with each other and guess the best router to pass a packet, to send it from source to destination. As we saw in the previous traceroute example, a packet might be forwarded between 20 routers, with each hop bringing it closer to its destination.

Companies also sell products called “layer 3 switches”, or routing switches. They are actually routers rather than switches. They mean that the router also does some layer 2 switching functions.

Switch

A switch is like a telephone exchange. In fact, switch is the modern name for both a telephone exchange and the computer inside it. Instead of phone numbers, a network switch uses MAC addresses. Years ago, we connected a computer to the ethernet by physically attaching the computer to a piece of ether cable that also ran to 20 other computers. Today, each computer has a dedicated line to a switch, and runs at the full speed, e.g. Gb ether, on every line. The switch filters and forwards packets either to the final destination, or to a router to move it nearer the final destination. When a frame comes in, the switch looks at its MAC address, and sends that frame to the one port that is connected to that MAC address. A switch is the modern incarnation of a bridge.

Looking at a packet traveling over the Net

Packets are moved along by routers, which are special-purpose computers that connect networks. Every IP packet that leaves your computer goes to a nearby router which will move the packet to another router closer to the destination. This transfer continues until finally the packet is brought to a router that is directly connected to the subnet serving the destination computer.

Routers maintain large configuration tables of what addresses are served by what routers, what the priorities are, and what rules they should use for security and load balancing. These tables can be updated dynamically as the network runs.

Windows has a program that lets you trace a packet's movement between routers. Here's the output from a sample run, tracing the route between my PC and java.sun.com. Unix has a similar program, called “traceroute.”

c:> tracert java.sun.com
Tracing route to java.sun.com [192.18.97.71]over a maximum of 30 hops:
...
7    15 ms    16 ms    15 ms  28.ge13-0.mpr2.pao1.us.above.net [64.125.12.61]

8    15 ms    17 ms    16 ms  so-6-1-0.mpr4.sjc2.us.above.net [64.125.29.126]

9    16 ms    17 ms    16 ms  so-0-0-0.cr2.sjc3.us.above.net [64.125.29.138]

10    16 ms    16 ms    16 ms  pos1-0.er2a.sjc3.us.above.net [64.125.28.198]

11    17 ms    17 ms    16 ms  64.124.81.56.sun.com [64.124.81.56] Trace complete.

traceroute can be problematical if one is behind a firewall. You may have to use the “-I” option, "traceroute -I java.sun.com". Traceroute is good for troubleshooting network connectivity. Here it tells us that overall packets travel from me to Java-World HQ (ten miles) in under a fiftieth of a second.

Hub

A hub is a frame repeater. It doesn't read addresses. It just copies each incoming frame to every connection it has. Computers discard packets that are not addressed to them, so you can use a hub to share a network connection between several computers. The drawback is that everyone is delayed by everyone else's traffic. A hub is a layer 1 device, and is also known as a multi-port repeater.

These days, switches have become so cheap that the default is to buy a switch instead of a hub; switches lead to less congestion on a LAN since each host only sees its own traffic.

DNS server

There will be a local server known as the Domain Name Server (usually one per subnet, per campus, or per company) that resolves the symbolic name into an IP address. That allows people to deal with names. Programs will make calls to the DNS server to convert IP addresses to names and vice versa. The premier DNS software is the Berkeley Internet Name Daemon (BIND).

Firewall

A firewall can be hardware or software or a combination. Its purpose is to discard unauthorized packets, and stop them coming from the Internet into your private network. There are several different techniques for this. Packet filtering looks at each individual packet and accepts or rejects it based on configurable rules such as “deny all ftp requests except to host foo”.

Proxy server

A proxy server sits between a client application, such as a Web browser, and a real server. It intercepts all requests to the real server, and may be able to fulfill the requests itself, e.g. it may be able to retrieve a web page from a from a local cache. If not, it makes the request to the real server. A proxy server can be used to deny clients access to a set of websites or network services. Finally and most importantly a proxy server hides the details of your internal network from the outside world. Proxy servers are often part of a firewall server.

If you use a proxy server you will need to tell Java the details in order to access hosts outside the firewall. You do this by defining properties, perhaps when starting the code:

java -DproxySet=true -DproxyHost=SOMEHOST -DproxyPort=SOMENUM code.java

Without this, you'll get an UnknownHostException. At work, your systems administrator will know the values. At home, you won't be using a proxy server unless you set it up yourself.

There! Now you know everything you need to use the Java networking features.

What's in the networking library?

If you browse the network library API, you'll find the following classes. There are a few other classes, but these are the key ones.

• `Socket`	This is the client Socket class. It lets you open a connection to another machine, anywhere on the Internet (anywhere that the remote end permits, that is).
• `ServerSocket`	This is the server Socket class. ServerSocket lets an application accept TCP connections from other systems and exchange I/O with them.
• `URL`	The class represents a Uniform Resource Locator—a reference to an object on the web. You can create a URL reference with this class.
• `URLConnection`	You can open a URL and retrieve the contents, or write to it, using this class.
• `HttpURLConnection`	The class extends URLConnection and supports functions specific to HTTP, like GET, POST, PUT, HEAD, TRACE, and OPTIONS.
• `URLEncoder/URLDecoder`	These two classes have static methods to allow you to convert a String to and from MIME x-www-form-urlencoded form. This is convenient for posting data to servlets or CGI scripts.

The class DatagramSocket supports the use of UDP packets. We don't deal with UDP here because it is less widely used than TCP. Most people want the reliability feature that TCP offers. Ironically, the widespread use of subnets using directly connected switches (instead of shared ethernet segments) has made UDP much more reliable, to the point where people are using it on LANs instead of TCP, and getting performance and reliability.

Let me try that last sentence again. When we started extensive networking in the late 1970s, ethernet was the medium of choice. You strung a single ethernet cable down a corridor and workstations physically attached to the net by tapping into the cable. That meant that all the network traffic was visible to all the workstations that used that cable. It was electronically noisy and slow. Today, nearly everyone uses 10baseT or 100baseT wiring. The number is the speed in Megabits, and the “T” part means “Twisted pair.” There is a twisted pair wire from your workstation directly to the switch that controls your subnet. No other workstation shares your twisted pair wiring. Result: faster performance, less electronic noise, and more reliable subnets, leading to greater confidence using UDP.

TCP/IP client/server model

Before we look at actual Java code, a diagram is in order showing how a client and server typically communicate over a TCP/IP network connection. Figure 25-3 shows the way processes contact each other by using IP address and a port number. The IP address identifies a unique computer on the Internet. The port number is a simple data structure that the OS maintains to direct an incoming network connection to a specific process.

Figure 25-3. Client and server communication using a TCP/IP connection

An IP address is like a telephone number, and a port number is like an extension at that number. Together they specify a computer and a service request. The combination of an IP address plus a port number is the definition of a socket. To talk to each other, the client and server must open a dialog using the same port number.

For simplicity, in Java network socket connections are made to look like I/O streams. You simply read and write data using the usual stream methods (all socket communication is in 8-bit bytes), and it automagically appears at the other end. Unlike a stream, a socket supports two-way communication. There is a method to get the input stream of a socket, and another method to get the output stream. This allows the client and server to talk back and forth.

Almost all Internet programs work as client/server pairs. The server is on a host system somewhere in cyberspace, and the client is a program running on your local system. When the client wants an Internet service (such as retrieving a web page from an HTTP server), it issues a request, usually to a symbolic address such as www.sun.com rather than to an IP address (though that works just fine, too).

The bits forming the request are assembled into a packet and routed to the server. The server reads the incoming packet, notes what the request is, where it came from, and then tries to respond to it. It does so by providing either the service (web page, shell account, file contents, etc.) or a sensible error message. The response is sent back across the Internet to the client.

All the standard Internet utilities (telnet, rdist, FTP, ping, rcp, NFS, and so on) operate in client/server mode connected by a TCP or UDP socket. Programs that send mail don't really know how to send mail—they just know how to take it to the Post Office port. In this case, mail has a socket connection and talks to a demon at the other end with a fairly simple protocol. The standard mail demon knows how to accept text and addresses from clients and transmit it for delivery. If you can talk to the mail demon, you can send mail. There is little else to it.

Many of the Internet services are actually quite simple. But often considerable frustration comes in doing the socket programming in C and in learning the correct protocol. The socket programming API presented to C is quite low level and all too easy to screw up. Errors are poorly handled and diagnosed. As a result, many programmers naturally conclude that sockets are brittle and hard to use. Sockets aren't hard to use. The C socket API is hard to use.

Port = service; socket = IP address + port

An IP address says which computer you are trying to reach. You still need to tell that computer what you want from it. That is done with a port number.

A port number is an integer under 65,536 (16 bits). A large number of Internet services are at predefined port numbers. If you want to ask for this kind of service from a specific computer, you would send a request to this port number at its IP address:

port number	service
20	ftp data
21	ftp control
22	secure shell remote login protocol
23	telnet
25	Simple Mail Transfer Protocol
37	time service
80	http connection
110	Post Office Protocol 3
135	The dcom rpc server port that Microsoft left open on Windows, allowing the Blaster worm to penetrate millions of PCs from August 2003.
194	Internet Relay Chat
445	The “Local Security Authority Subsystem Service” that Microsoft left open on Windows, allowing the Sasser worm to penetrate hundred of thousands of PCs in May 2004
458	Apple QuickTime
1080	Alternative port for http
5190	Alternative port for SMTP

A firewall works in part by looking at the port number in incoming packets and throwing away ones that ask for a service you don't want to offer. Port numbers under 1024 on Unix can only be accessed by the superuser. So http and some other services are sometimes bumped up, e.g. to port 1080. A socket is defined as an IP address plus a port on that computer.

Don't believe me? Take a look. The C code to establish a socket connection is:

  int set_up_socket(u_short port) {
    char   myname[MAXHOSTNAME+1];            Horrid C / C++ Sockets
    int    s;
    struct sockaddr_in sa;
    struct hostent *he;

    bzero(&sa,sizeof(struct sockaddr_in));   /* clear the address */
    gethostname(myname,MAXHOSTNAME);         /* establish identity */
    he= gethostbyname(myname);               /* get our address  */
    if (he == NULL)                          /* if addr not found... */
        return(-1);
    sa.sin_family= he->h_addrtype;           /* host address */
    sa.sin_port= htons(port);                /* port number */

if ((s= socket(AF_INET,SOCK_STREAM,0)) <0)   /* finally, create socket */
        return(-1);
    if (bind(s, &sa, sizeof(sa), 0) < 0) {
        close(s);
        return(-1);                          /* bind address to socket */
    }

    listen(s, 3);                            /* max queued connections */
    return(s);
  }

By way of contrast, the equivalent Java code is:

ServerSocket servsock = new ServerSocket(port, 3);

That's it! Just one line of Java code to do all the things the C code does.

Java handles all that socket complexity “under the covers” for you. It doesn't expose the full range of socket possibilities, so Java avoids the novice socketeer choosing contradictory options. On the other hand, a few obscure sockety things cannot be done in Java. You cannot create a raw socket in Java, and hence cannot write a ping program that relies on raw sockets (you can do something just as good, though). The benefit is overwhelming: You can open sockets and start writing to another system just as easily as you open a file and start writing to hard disk.

A “ping program,” in case you're wondering, is a program that sends ICMP control packets over to another machine anywhere on the Internet. This action is called “pinging” the remote system, rather like the sonar in a ship “pings” for submarines or schools of fish. The control packets aren't passed up to the application layer, but tell the TCP/IP library at the remote end to send back a reply. The reply lets the pinger calculate how quickly data can pass between the two systems.

The story about ping

If you want to know how quickly your packets can reach a system, use ping.

c:> ping java.sun.com
Pinging java.sun.com [64.124.81.57] with 32 bytes of data:
Reply from 64.124.81.57: bytes=32 time=17ms TTL=245
Reply from 64.124.81.57: bytes=32 time=17ms TTL=245
Ping statistics for 64.124.81.57:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:
Minimum = 17ms, Maximum = 22ms, Average = 18ms

This confirms that the time for a packet to hustle over from Mountain View to Santa Clara is about 0.16 seconds on this particular day and time. “TTL” is “Time To Live.” To prevent infinite loops, each router hop decrements this field in a packet, and if it reaches zero, the packet just expires where it is.

I can't resist mentioning that a book review at Amazon.com for The Story About Ping is refreshing—especially the review by John E. Fracisco. Check it out.

The most-used methods in the API for the client end of a socket are:

public class Socket extends Object {
    public Socket();
    public Socket(String,int) throws UnknownHostException,
                                   java.io.IOException;
    public Socket(InetAddress,int) throws java.io.IOException;

    public java.nio.channels.SocketChannel getChannel();
    public InputStream getInputStream() throws IOException;
    public OutputStream getOutputStream()
                                    throws IOException;

     public synchronized void setSoTimeout(int) throws SocketException;
     public synchronized void close() throws IOException;

    public boolean isConnected();
    public boolean isBound();
    public boolean isClosed();
    public boolean isInputShutdown();
    public boolean isOutputShutdown();

     public boolean shutdownOutput() throws IOException;
     public boolean shutdownInput() throws IOException;
     public static void setSocketImplFactory(
                                              SocketImplFactory fac);
}

The constructor with no arguments creates an unconnected socket which you can later bind() to a host and port you specify. After binding, you will connect() it. It's easier just to do all this by specifying these arguments in the constructor, if you know them at that point.

The setSoTimeout(int ms) will set a timeout on the socket of ms milliseconds. When this is a non-zero amount, a read call on the input stream will block for only this amount of time. Then it will break out of it by throwing a java.net.SocketTimeoutException, but leaving the socket still valid for further use.

The setSocketImplFactory() method is a hook for those sites that want to provide their own implementation of sockets, usually to deal with firewall or proxy issues. If this is done, it will be done on a site-wide basis, and individual programmers won't have to worry about it.

The socket API has one or two dozen other get/set methods for TCP socket options. Most of the time you don't need these and can ignore them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 25. Networking in Java

Create new playlist

Sign In

Sign Up