Chapter 2. Characteristics of RMI

  • Syntax—Semantics—Local method invocation—Remote method invocation—Summary—Exercises

In this chapter

There are fundamental differences between programming in a single machine and distributed programming. Calling a remote method via remote method invocation (RMI) is not quite the same as calling a method in a local object, even though it uses the same syntax.

This chapter defines the semantics of RMI and makes a detailed comparison between “local method invocation” and remote method invocation. The issues we raise are serious concerns, which should have a considerable impact on how a distributed system is designed and implemented.

Syntax

Syntax is the set of rules governing the arrangement of words. Semantics is the set of rules concerned with their meaning. In computer languages, the semantics of a statement define (a) additional compilation rules not forming part of the syntax and (b) how a statement is executed.

As we saw in the previous chapter, the syntax of a remote method invocation is identical to the syntax of a local method invocation:

try
{
      result = remoteInterface.invoke(arguments);
}
catch (RemoteException ex)
{
      // handle a remote exception ...
}

Semantics

The semantics of local (normal) method invocation are as follows:

  • arguments of the proper number and type must be provided so that exactly one matching method can be found in the class of the object reference[1]

  • a method may or may not be declared to throw an exception

  • the caller must catch any checked exception declared to be thrown by the method, except those which the caller is itself declared to throw

  • arguments of object type are passed by reference; arguments of primitive type are passed by value

  • any result of object type is returned by reference; any result of primitive type is returned by value

  • any exception thrown is returned by reference

  • if the method returns normally, the method has been invoked exactly once

  • if the method throws an uncaught exception to the caller, the method has been invoked exactly once up to the point where the exception was thrown

  • local objects are subject to local garbage-collection via a technique which detects cycles of garbage.[2]

Remote method invocation has the same semantics except as follows:

  • a method can only be invoked as a remote method via a remote interface which declares it

  • a remote method must be declared to throw a remote exception

  • clients of remote methods must catch and deal with remote exceptions

  • arguments of object type to a remote method invocation are passed by deep copy, not by reference

  • any result of object type of a remote method invocation is returned by deep copy, not by reference

  • any exception thrown by a remote method invocation is returned by deep copy, not by reference

  • an exported remote object is passed or returned by remote reference, not by deep copy

  • the semantics of java.lang.Object are specialized for remote objects and remote references to them

  • the RMI system assures that when a remote invocation returns (normally or via an exception), the remote method has been invoked “at most once”

  • remote objects are subject to distributed garbage-collection via a reference-counting technique, prior to local garbage-collection.[3]

Remote objects

Repeating the rules we gave in the Introduction, to be accessible via RMI a remote object must:

  • implement a remote interface

  • be exported to the RMI system.

Arguments and results

A method in a local object can modify objects passed as parameters, or it can modify some other object to which both it and the caller have access (e.g. static data). In either case, the modifications are visible to the caller.

A remote method cannot use this technique for communicating with its caller. A remote object may still modify its parameters or other objects, but any such modifications are not visible to the caller. A remote object can only communicate with its caller via return values or exceptions. This has important consequences for the design of remote methods.

The reason for this restriction is that RMI's argument- and result-passing mechanism is different from Java's normal argument-passing mechanism. This is not a trivial observation.

Java supports two methods of passing arguments and returning results: “by value” and by “reference”.

When passing by value, data is copied. The sender and receiver have different instances of the data. If the called method modifies a parameter, the caller will not see any such change.

When passing by reference, a reference to the original value is passed—in some languages, a pointer. The sender and receiver both refer to the same data. If the called method modifies a parameter, the change in its value will be seen by the caller.

When invoking local methods, Java passes primitive types by value, and object types by reference.

When invoking remote methods, RMI passes primitive types and object types by value—except for exported remote objects, which are passed by reference. “Pass by value” for values of type object is implemented as deep copy. “Pass by reference” for exported remote objects is implemented by remote references—remote stubs.

The reason for this is that, like pointers in other programming languages, references to Java objects are only valid within the memory in which they were created—the “address space” of the virtual machine. They have no meaning in the virtual machine at the other end of a remote method invocation.

A shallow copy is a bitwise copy. The object itself is copied, but all object references in the object are unchanged, and continue to refer to the same objects as the original. A deep copy is a copy in which the deep-copied object itself is copied, and all objects referenced by the deep-copied object are themselves deep-copied. A shallow copy yields the same object graph with a different root; a deep copy yields a new object graph equal to the original.

This is summarized in Table 2.1.

Table 2.1. Argument-passing

Type Local method Remote method
Primitive types by value by value
Object by reference by value (deep copy)
Exported remote object by reference by remote reference

“At most once” semantics

Depending largely on the underlying communications, remote procedure call mechanisms like RMI generally implement either “at least once” or “at most once” semantics. An “at least once” system guarantees that the remote procedure has executed at least once, possibly more than once. An “at most once” system guarantees that the remote procedure has executed at most once, possibly never.

RMI implements “at most once” semantics. A remote method which returns normally is guaranteed to have executed exactly once; a remote method which throws a remote exception may or may not have executed at all.

A remote method which returns normally does not need to be retried, and cannot be retried while preserving “at most once” semantics. Whether a remote method which throws an exception may be retried while preserving “at most once” semantics depends on the exception. For application-defined exceptions, this is a matter for the application designer. For remote exceptions, the matter is more complex.

After a remote exception, RMI clients cannot always determine whether the remote method has executed or not. For example, an UnmarshalException may be thrown either while the server is receiving the arguments or while the client is receiving the result of the completed method. An RMI client cannot necessarily distinguish these two situations, and therefore cannot know whether or not to retry the call. In this circumstance, often the only sensible thing to do is to retry the call and have the server deal with duplicate transmissions as best it may.

It is sometimes stated that remote methods which may be retried—deliberately, accidentally, or unavoidably due to the operation of one or more lower-level software or protocol layers—must be idempotent. This term means that the call may be repeated without affecting the overall result of the computation. The property of idempotence is very important in transactional systems.

Consider the example of transmitting a transaction which credits an amount to a bank account. Such a transaction is not idempotent: every time you apply it, the bank balance increases. However, if we can somehow arrange to discard transactions which have already been applied, e.g. by using unique sequence numbers, the transaction would be idempotent: it could be transmitted as often as you like without crediting the account more than once.

All this has two implications for the design and implementation of remote calls and callers: (1) if you can, design all your remote methods to be idempotent, or (2) don't retry a remote method unless you are assured it hasn't already been executed (or (3) both).

Semantics of local method invocation

To understand why RMI's semantics are the way they are, let us consider the semantics of normal method invocation in Java—“local method invocation”.

Local method invocation executes a method in an object via JVM instructions, generated by a compiler which is reliable; the JVM instructions themselves are reliable; and they are ultimately executed by a processor which is reliable. To quantify this reliability, the total mechanism underlying local method invocation is capable of being executed by currently available computer hardware at a rate of hundreds of millions of times per second, for years on end. The entire invocation mechanism—argument marshalling, dispatch, parameter unmarshalling, return value marshalling, return, and return value unmarshalling—is so close to being 100% reliable that its potential fallibility during a single invocation is not a practical concern. If anything goes wrong, the JVM—or possibly the entire computer—will crash. In either case, both the caller and the called method will fail simultaneously—there is no question of the caller continuing after a failure in the called method.

Execution of a local method is synchronous. The same processor executes the caller and the called method. The processor takes care of all synchronization between the caller and the called method by simply switching between the two instruction streams. The caller doesn't have to do anything special to wait for the called method to return; once execution resumes at the caller, the method has returned.

Semantics of remote method invocation

Remote method invocation replaces this mechanism, which is practically 100% reliable, by a software implementation of argument and return value marshalling and unmarshalling. This implementation—RMI—is written in special-purpose Java code, and uses a communications network to transmit this data to and from the remote object. Execution of a remote method is asynchronous, and uses software and network facilities to control synchronization between the caller and the called method.

Partial failure

Communications networks are many orders of magnitude less reliable than the “LMI” mechanism described above: intentionally so. Communications networks are required to exhibit extremely high availability; in other words they are required never to crash. They must instead be fault-tolerant; in particular they are entitled by design to discard data packets in order to ease congestion in the network or a memory shortage in any given network node (computer, router, bridge, and so on). Of course upper-level communications protocols such as TCP are designed to overcome such data losses. Nevertheless, use of a network inherently introduces many possible types of failure, not all of which can be dealt with automatically by the network protocol software.

Apart from the trivial case where both the client and the remote object are actually executing inside the same computer, a remote method invocation normally involves two or more computers. This raises the possibility that one or more may fail while one or more does not—a partial failure.

The links between the computers can also fail. Further, they can fail in ways which are indistinguishable from a failure in one of the computers, leading one or both computers to believe falsely that the other has failed.

Memory access

A remote method invocation normally involves two or more separate JVMs. Each JVM has its own distinct memory space and cannot directly access the memory of another. This is the reason why RMI arguments and results are passed by “deep copy” rather than by reference.

The inaccessibility of the remote memory space in effect narrows the communication channel back from the invoked method to the invoker, so that it can only contain a return value or an exception—thrown by either the invoked method or the RMI system itself. There is simply no way for any other state information to be conveyed. This severely restricts the information available to the client: in particular it can make it difficult or impossible for the client to distinguish between failure in transmission to the server, failure in the server (i.e. in the remote method proper) , and failure in the transmission back to the client.

In other words, it can be difficult or impossible for the client to tell whether or not the server has or has not completely executed the remote method.

It is up to the application to determine, from the exception thrown or the method result alone, the success or failure of a call. If an exception is thrown, it is not in general possible to determine the point of failure—whether the call has been executed by the server, and if so how far it proceeded. In a few specific RMI cases, it is assured that the call has not been executed at all.

The presence of two or more processors also introduces the possibility that software versions in the two processors may not be identical, in ways which may affect their ability to “understand” each others' communications. For our purposes, this issue principally concerns the versions of Java itself and of the application proper.

Networks

When programming network communications, a number of conditions may arise that don't normally have to be dealt with when programming entirely locally. Data transmission times over computer networks are several orders of magnitude slower than within the memory of a single computer. Due to network failure, data may never arrive at all. TCP/IP connections can fail in ways that cannot be detected by either party to the connection. For these reasons, a “wait with timeout” operation should be performed at the client after dispatching a remote method invocation, rather than an indefinite wait.

Summary

We have seen that although local method calls and remote method calls share the same syntax, they cannot share the same semantics, and that the differences have a great impact on how RMI systems must be designed and built.

Exercises

1:

Is it possible to detect at runtime whether method arguments are passed by value or by reference?



[1] The Java Programming Language, § 6.9.1.

[2] ibid., § 12.

[3] RMI specification, § 3.3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220