In this chapter, we will learn primitives provided in the Java programming language for building distributed applications. We will see primarily two programming styles: sockets and remote method invocations. Sockets provide a lower-level interface for building distributed programs but are more efficient and flexible. Remote method invocations (RMI) are easier to use.
In this chapter we first describe the class InetAddress
, which is useful for network programming no matter which style of primitives are used. Then we discuss primitives for programming using sockets. These sockets may use either the Universal Datagram Protocol (UDP), or the Transmission Control Protocol (TCP). We give an example of an echo server using sockets based on the UDP protocol and a simple name server using sockets based on the TCP protocol. Finally, we discuss programming using remote method invocations.
For any kind of distributed application, we need the notion of an Internet address. Any computer connected to the Internet (called a host) can be uniquely identified by an address called an IP address. Since addresses are difficult to remember, each host also has a hostname. It is the task of a domain name system (DNS) server to provide the mapping from a hostname to its address. Java provides a class Java.net
. Inetaddress
, which can be used for this translation. The relevant methods for the class InetAddress are given below:
public byte[] getAddress()
Returns the raw IP address of this InetAddress object.
public static InetAddress getByName(String)
Determines the IP address of a host, given the host’s name.
public String getHostAddress()
Returns the IP address string "%d.%d.%d.%d"
public String getHostName()
Returns the fully qualified host name for this address.
public static InetAddress getLocalHost()
Returns the local host.
Sockets are useful in writing programs based on communication using messages. A Socket is an object that can be used to send and receive messages. There are primarily two protocols used for sending and receiving messages: Universal Datagram Protocol (UDP) and Transmission Control Protocol (TCP). The UDP provides a low-level connectionless protocol. This means that packets sent using UDP are not guaranteed to be received in the order sent. In fact, the UDP protocol does not even guarantee reliability, that is, packets may get lost. The protocol does not use any handshaking mechanisms (such as acknowledgments) to detect loss of packets. Why is UDP useful, then? Because, even though UDP may lose packets, in practice, this is rarely the case. Since there are no overheads associated with error checking, UDP is an extremely efficient protocol.
The TCP protocol is a reliable connection-oriented protocol. It also guarantees ordered delivery of packets. Needless to say, TCP is not as efficient as UDP.
The first class that we use is DatagramSocket
which is based on the UDP protocol. This class represents a socket for sending and receiving datagram packets. A datagram socket is the sending or receiving point for a connectionless packet delivery service. Each packet sent or received on a datagram socket is individually addressed and routed. Multiple packets sent from a machine to another may be routed differently, and may arrive in any order. This class provides a very low level interface for sending and receiving messages. There are few guarantees associated with datagram sockets. An advantage of datagram sockets is that it allows fast data transmission.
The details for the methods in this class are given below. To construct a DatagramSocket, we can use one of the following constructors:
public DatagramSocket()
public DatagramSocket(int port)
public DatagramSocket(int port, InetAddress laddr)
The first constructor constructs a datagram socket and binds it to any available port on the local host machine. Optionally, a port may be specified as in the second constructor. The last constructor creates a datagram socket, bound to the specified local address. These constructors throw SocketException
if the socket could not be opened, or if the socket could not bind the specified local port.
The other important methods of this class are as follows:
1. public void close():
This method closes a datagram socket.
2. public int getLocalPort():
To get the information about the socket, one can use this method, which returns the port number on the local host to which this socket is bound.
3. public InetAddress getLocalAddress():
This method gets the local address to which the socket is bound.
4. public void receive(DatagramPacket p):
This method receive
receives a datagram packet from this socket. When this method returns, the Data-gramPacket’s buffer is filled with the data received. The datagram packet also contains the sender’s IP address and the port number on the sender’s machine. Note that this method blocks until a datagram is received. The length field of the datagram packet object contains the length of the received message. If the message is longer than the buffer length, the message is truncated. It throws IOException if an I/O error occurs. The blocking can be avoided by setting the timeout.
5. public void send(DatagramPacket p):
This method sends a datagram packet from this socket. The DatagramPacket includes information indicating the data to be sent, its length, the IP address of the remote host, and the port number on the remote host.
The DatagramSocket
class required data to be sent as datagram packets. The class java.net.DatagramPacket
is used for that. Its definition is given below.
public final class java.net.DatagramPacket
extends java.lang.Object {
public DatagramPacket(byte ibuf[], int ilength);
public DatagramPacket(byte ibuf[], int ilength,
InetAddress iaddr, int iport);
public InetAddress getAddress();
public byte[] getData() ;
public int getLength();
public int getPort();
public void setAddress(InetAddress)
public void setData(byte [])
public void setLength(int)
public void setPort(int)
}
The first constructor
public DatagramPacket(byte ibuf[], int ilength)
constructs a DatagramPacket for receiving packets of length ilength.
The parameter ibuf
is the buffer for holding the incoming datagram, and ilength
is the number of bytes to read.
The constructor for creating a packet to be sent is
public DatagramPacket(byte ibuf[], int ilength, InetAddress iaddr, intiport)
It constructs a DatagramPacket for sending packets of length ilength
to the specified port number on the specified host. The parameters iaddr
and iport
are used for the destination address and the destination port number, respectively. The method getAddress
returns the IP address of the machine to which this datagram is being sent, or from which the datagram was received. The method getData
returns the data received, or the data to be sent. The method getLength
returns the length of the data to be sent, or the length of the data received. Similarly, the method getPort
returns the port number on the remote host to which this datagram is being sent, or from which the datagram was received. The set
methods are used to set the IP address, port number, and other elements appropriately.
We give a simple example of a program that uses datagrams. This example consists of two processes—a server and a client. The client reads input from the user and sends it to the server. The server receives the datagram packet and then echoes back the same data. The program for the server is given in Figure 6.1.
Figure 6.1: A datagram server
The client process reads a line of input from System.in.
It then creates a datagram packet and sends it to the server. On receiving a response from the server it displays the message received. The program for the client is given in Figure 6.2.
The second style of interprocess communication is based on the notion of streams. In this style, a connection is set up between the sender and the receiver. This style allows better error recovery and guarantees on the delivery of packets. Thus, in a stream the packets are received in the order they are sent.
The socket
class in Java extends the Object
class. We will give only a subset of constructors and methods available for Socket.
The constructor public Socket (String host, int port)
creates a stream socket and connects it to the specified port number on the named host. It throws UnknownHostException
, and IOException.
Here we have used the name of the host. Alternatively, IP address can be used in the form of the class InetAddress as below:
public Socket(InetAddress address, int port)
The methods for the socket are
• public InetAddress getInetAddress()
, which returns the remote IP address to which this socket is connected.
• public InetAddress getLocalAddress()
, which returns the local address to which the socket is bound.
• public int getPort()
, which returns the remote port to which this socket is connected.
• public InputStream getInputStream()
, which returns an input stream for reading bytes from this socket.
• public OutputStream getOutputStream()
, which returns an output stream for writing bytes to this socket.
• public synchronized void close()
, which closes this socket.
Note that many of these methods throw IOException
if an I/O error occurs when applying the method to the socket.
Figure 6.2: A datagram client
On the server side the class that is used is called ServerSocket.
A way to create a server socket is public ServerSocket(int port)
This call creates a server socket on a specified port. Various methods on a server socket are as follows:
• public InetAddress getInetAddress()
, which returns the address to which this socket is connected, or null if the socket is not yet connected.
• public int getLocalPort()
, which returns the port on which this socket is listening.
• public Socket accept()
, which listens for a connection to be made to this socket and accepts it. The method blocks until a connection is made.
• public void close()
, which closes this socket.
We now give a simple name server implemented using server sockets. The name server maintains a table of (name, hostName, portNumber)
to give a mapping from a process name
to the host and the port number. For simplicity, we assume that the maximum size of the table is 100 and that there are only two operations on the table: insert
and search.
This table is
kept by the object NameTable
shown in Figure 6.3.
Now let us look at the name server. The name server creates a server socket with the specified port. It then listens to any incoming connections by the method accept.
The accept
method returns the socket whenever a connection is made. It then handles the request that arrives on that socket by the method handleclient.
We call getInputStream
and getOutputStream
to get input and output streams associated with the socket. Now we can simply use all methods associated for reading and writing input streams to read and write data from the socket.
In our implementation of the name server shown in Figure 6.4, at most one client is handled at a time. Once a request is handled, the main loop of the name server accepts another connection. For many applications this may be unacceptable if the procedure to handle a request takes a long time. For these applications, it is quite common for the server to be multithreaded. The server accepts a connection and then spawns a thread to handle the request. However, it must be observed that since the data for the server is shared among multiple threads, it is the responsibility of the programmer to ensure that the data is accessed in a safe manner (for example, by using synchronized
methods).
The client program in Figure 6.5 can be used to test this name server.
Figure 6.3: Simple name table
Figure 6.4: Name server
Figure 6.5: A client for name server
We now show a java class Linker that allows us to link a given set of processes with each other. Assume that we want to start n processes P1, P2,. . . , Pn in a distributed system and establish connections between them such that any of the process can send and receive messages with any other process. We would like to support direct naming to send and receive messages; that is, processes are unaware of the host addresses and port numbers. They simply use process identifiers (1 . . . n} to send and receive messages.
We first read the topology of the underlying network. This is done by the method readNeighbors
in the class Topology
shown in Figure 6.6. The list of neighbors of P1 are assumed to be enumerated in the file “topologyi.” If such a file is not found, then it is assumed that all other processes are neighbors.
Figure 6.6: Topology class
Now we discuss the Connecter
class, which establishes connections between processes. Since processes may start at different times and at different locations, we use the NameServer
to help processes locate each other. Any process Pi that starts up first creates a ServerSocket
for itself. It uses the ServerSocket
to listen for incoming requests for communication with all small numbered processes. It then contacts the NameServer
and inserts its entry in that table. All the smaller numbered processes wait for the entry of Pi to appear in the NameServer.
When they get the port number from the NameServer
, they use it to connect it to Pi. Once Pi has established a TCP connection with all smaller number processes, it tries to connect with higher-number processes. This class is shown in Figure 6.7. For simplicity, it is assumed that the underlying topology is completely connected.
Once all the connections are established, the Linker
provides methods to send and receive messages from process Pi to Pj. We will require each message to contain at least four fields: source identifier, destination identifier, message type (or the message tag), and actual message. We implement this in the Java class shown in Figure 6.8.
The Linker class is shown in Figure 6.9. It provides methods to send and receive messages based on process identifiers. Different send
methods have been provided to facilitate sending messages of different types. Every message is assumed to have a field tag
that corresponds to the message tag (or the message type).
A popular way of developing distributed applications is based on the concept of remote procedure calls (RPCs) or remote method invocations (RMIs). Here the main idea is that a process can make calls to methods of a remote object as if it were on the same machine. The process making the call is called a client and the process that serves the request is called the server. In RMI, the client may not even know the location of the remote object. This provides location transparency to the client. In Java, for example, the remote object may be located using rmiregistry.
Alternatively, references to remote objects may be passed around by the application as references to local objects.
A call to a method may have some arguments, and the execution of the method may return some value. The arguments to the method when the object is remote are sent via a message. Similarly, the return value is transmitted to the caller via a message. All this message passing is hidden from the programmer, and therefore RMI can be viewed as a higher-level programming construct than sending or receiving of messages.
Although the idea behind RMI is quite simple, certain issues need to be tackled in implementing and using RMI. Since we are passing arguments to the method, we have to understand the semantics of the parameter passing. Another issue is that of a failure. What happens when the messages get lost? We will look at such issues in this section.
Figure 6.7: Connector class
Figure 6.8: Message class
Figure 6.9: Linker class
An RMI is implemented as follows. With each remote object there is an associated object at the client side and an object at the server side. An invocation to a method to a remote object is managed by using a local surrogate object at the client called the stub object. An invocation of a method results in packing the method name and the arguments in a message and shipping it to the server side. This is called parameter marshaling. This message is received on the server side by the server skeleton object. The skeleton object is responsible for receiving the message, reconstructing the arguments, and then finally calling the method. Note that a RMI class requires compilation by a RMI compiler to generate the stub and the skeleton routines.
An object is called remote object if its methods can be invoked from another Java virtual machine running on the same host or a different host. Such an object is described using a remote
interface. An interface is remote if it extends java.rmi.Remote. The remote interface serves to identify all remote objects. Any object that is a remote object must directly or indirectly implement this interface. Only those methods specified in a remote interface are available remotely. Figure 6.10 gives a remote interface for a name service.
Figure 6.10: Remote interface
Any object that implements a remote interface and extends UnicastRemoteObject
is a remote object. Remote method invocation corresponds to invocation of one of the methods on a remote object. We can now provide a class that implements the NameService
as shown in Figure 6.11.
To install our server, we first compile the file NameServiceImpl.java.
Then, we need to invoke the RMI compiler to generate the stub and skeleton associated with the server. On a UNIX machine, one may use the following commands to carry out these steps:
Figure 6.11: A name service implementation
> javac NameServiceImpl.java
> rmic NameServiceImpl
> rmiregistry &
Now assuming that the rmiregistry
service is running on the machine, we can start our server. There is just one last thing that we need to take care of: security. We need to specify who can connect to the server. This specification is done by a security policy file. For example, consider a file called policy
as follows:
grant {
permission java.net.SocketPermission "*:1024-65535",
"connect,accept";
permission java.net.SocketPermission "*:80", "connect";
};
This policy allows downloaded code, from any code base, to do two things: (1) connect to or accept connections on unprivileged ports (ports greater than 1024) on any host, or (2) connect to port 80 [the port for HTTP(Hypertext Transfer Protocol)].
Now we can start the NameServiceImpl server as follows:
> java-Djava.security.policy=policy NameServiceImpl
If a local object is passed as an argument to a local method on a local object, then in Java we simply pass the reference to the object. However, if the method is to a remote object, then reference to a local object is useless at the other side. Therefore, arguments to remote methods are handled differently.
There are three ways of passing arguments (and returning results) in remote method invocations. The primitive types in Java (e.g., int
and boolean)
are passed by values.
Objects that are not remote are passed by value using object serialization, which refers to the process of converting the object state into a stream of bytes. Any object that implements the interface Serializable
can be communicated over the Internet using serialization. The object is written into a stream of bytes at one end (“serialized”) and at the other end it is reconstructed from the stream of bytes received (“deserialized”). An interesting question is what happens if the object has references to other objects. In this case, those objects also need to be serialized; otherwise references will be meaningless at the other side. Thus, all objects that are reachable from that object get serialized. The same mechanism works when a nonremote object is returned from a remote method invocation. Java supports referential integrity, that is, if multiple references to the same object are passed from one Java Virtual Machine (JVM) to the other, then those references will refer to a single copy of the object in the receiving JVM.
Finally, references to objects that implement remote
interface are passed as remote references. In this case, the stub for the remote object is passed.
One difference between invoking a local method and a remote method is that more things can go wrong when a remote method is invoked. The machine that contains the remote object may be down, the connection to that machine be down, or the message sent may get corrupted or lost. In spite of all these possible problems, Java system guarantees at-most-once semantics for a remote method invocation: any invocation will result in execution of the remote method at most once.
The client program first needs to obtain a reference for the remote object. The java.rmi.Naming
class provides methods to do so. It is a mechanism for obtaining references to remote objects based on Uniform Resource Locator (URL) syntax. The URL for a remote object is specified using the usual host, port, and name:
rmi://host:port/name
where host
is the host name of registry (defaults to current host), port
is the port number of registry (defaults to the registry port number), and name
is the name for the remote object.
The key methods in this class are
bind(String, Remote)
Binds the name to the specified remote object.
list(String)
Returns an array of strings of the URLs in the registry.
lookup(String)
Returns the remote object for the URL.
rebind(String, Remote)
Rebind the name to a new object; replaces any existing binding.
unbind(String)
Unbind the name.
We now show how a client can use lookup
to get a reference of the remote object and then invoke methods on it (see the program in Figure 6.12).
Figure 6.12: A RMI client program
In this chapter, we have focused on classes that allow you to write distributed programs. For cases when a process simply needs data from a remote location, Java provides the Uniform Resource Locator (URL) class. A URL consists of six parts: protocol, hostname, port, path, filename, and document section. An example of a URL is
http://www.ece.utexas.edu:80/classes.html#distributed
The java.net.URL class allows the programmer to read data from a URL by methods such as
public final InputStream openStream()
This method returns a InputStream
from which one can read the data. For different types of data such as images and audio clips there are methods such as public Image getImage(URL u, String filename)
and
public void play(URL u).
We will not concern ourselves with these classes and methods.
6.1. Make the NameServer
class fault-tolerant by keeping two copies of the server process at all times. Assume that the client chooses a server at random. If that server is down (i.e., after the timeout), the client contacts the other server. You may assume that at most one server goes down. When the server comes up again, it would need to synchronize with the other server to ensure consistency.
6.2. Message passing can also be employed for communication and synchronization among threads. Implement a Java monitor library that provides message passing primitives for threads in a single Java Virtual Machine (JVM).
6.3. Develop a Linker
class that provides synchronous messages. A message is synchronous if the sender of the message blocks until the message is received by the receiver.
6.4. Give advantages and disadvantages of using synchronous messages (see Problem 6.3) over asynchronous messages for developing distributed applications.
6.5. Write a Java program to maintain a large linked list on multiple computers connected by a message passing system. Each computer maintains a part of the linked list.
6.6. List all the differences between a local method invocation and a remote method invocation.
6.7. How will you provide semaphores in a distributed environment?
6.8. Solve the producer consumer problem discussed in Chapter 3 using messages.
6.9. Give advantages and disadvantages of using RMI over TCP sockets for developing distributed applications.
Details on the Transmission Control Protocol can be found in the book by Comer [ComOO]. Remote procedure calls were first implemented by Birrell and Nelson [BN84].
3.145.81.98