Distributed Data Vs. Distributed Processing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Distributed Data Vs. Distributed Processing

The examples we have shown so far are mostly of distributed processing rather than distributed data. The data in these examples (e.g., information on flights and seats available) sits in one place and is accessed by many users simultaneously. You can well imagine that having one single repository of data with a huge number of concurrent users, but also strict limits on access, is very difficult to do. IBM, which developed the first airline reservation system, Sabre, for American Airlines, would agree, and expects to be well-compensated for its hardware, software, and organizational expertise. Of course, the Sabre system, which is as complex as anything around, isn't just one machine—there are local and remote backup sites with machines in "failover" mode (if the primary machine fails, they automatically take over).

One very expensive aspect of centralized data systems is that they have to have very complex, very high-capacity communications links. Actually, they have to have at least two of them for redundancy, since construction crews and others who like to dig have a tendency to cut even deeply buried cables. Couldn't you simplify this situation by distributing the data as well as the processing? The answer is yes, depending on what kind of data you have. If, like the airlines, you have a database with multiple seekers of a unique item, you have to have a single data source or else you could sell the same thing twice (aargh! the airline example fails again!).

A Traveling Salesman Joke

There was this traveling salesman, and he had only one laptop. One day. . . . Sorry, no prurience here. The point of this sidebar is cautionary. Having owned a lot of laptops, I can tell you that I would want at least one backup all the time if my job depended on it. To say they are fragile is an understatement. Of the seven or so I've had, every one has had some kind of major mechanical problem.

A feasible sort of database for distributed data would be one in which you have multiple copies of an item being sought, and while you don't expect to run out of it, you want to keep a pretty tight grip on demand and supply so you can reorder when needed. In the case of a tool salesman traveling with his laptop to hardware stores, you would have a local database on the hard drive showing inventory. As you went to make a sale, you would link to a network and log the sale to the regional database. It would then download any changes to inventory that had occurred since you last connected (and perhaps also some sales advisory information, including new pricing). The regional database would give you the national picture, but because it would only connect to the national system every hour or so, it would not necessarily be valid as of the moment you got it. Instead of real-time validity, the regional database in this system would be updated upward (information about new sales) and downward (revised inventory) at regular intervals, rather than continuously.

The process of moving data records to synchronize disparate databases is called replication. Replication will work well in our hardware sales example. It's going to be very unusual that the salesman will mislead the customer about supply. In return for this occasional error, the tool company can have a much simpler computer and communications system. The central computer doesn't have to have as much transaction processing capability, and most important, the communication links don't require permanent, dedicated bandwidth. Instead, the company can use switched virtual circuits for a huge cost savings.

Distributed Data Structures

We've discussed some of the general issues in distributed data; now let's consider the kinds of systems that are currently in use. Mostly, these consist of new layers of software that bind the front and back ends of the system together, allowing it to function seamlessly and with greater speed.

Middleware

Middleware, a term that is used very loosely, refers to a piece of software that sits in between the client's presentation software and the server's (or host's) data system. It may or may not reside on a separate machine. In some versions, middleware simply performs translations. These could be from one data format to another, from one protocol to another, or any number of other conversions or combinations of conversions. In other approaches, middleware serves a more active role. For example, it may consolidate database requests in a way that allows the server to launch the minimum number of threads. This function, which is a higher level of what cluster controllers do, can sharply increase throughput in high-demand transaction processing database systems. In these environments, middleware can also have a data integrity function; for example, some middleware products are responsible for "two-phase commit." This means making sure that a transaction is successfully recorded in all host locations before the client is told that it is completed; if this can't be done, the middleware takes responsibility for rolling it back.

Object-based Approaches

We described object software in some depth in Chapter 8 and won't go over it again. You will recall, though, that there are considerable increases in productivity that can be obtained when a piece of software is reused within a program. Obviously, benefits continue to accrue if code can be reused for multiple programs. Microsoft facilitates this by providing an object library, Microsoft Foundation Classes, to developers of Windows and Windows NT/2000 programs. These objects speed program development and, not incidentally, help make applications and the OS look and act in similar ways—something that users greatly appreciate. Wouldn't it be nice if you could extend this object approach from programs on one computer to programs on multiple computers on a network?

The Challenge of Distributed Objects • The problem of distributed objects can be illustrated with a simple analogy. Suppose you were a member of a club or organization, and someone asked you to write a set of rules for its meetings and activities. This would likely take some time, but it wouldn't be a great intellectual challenge. You would get basic information from Roberts Rules of Order and perhaps other sources, then adapt them to the special activities of the Olde Neighborhood Knitting Group. Let's say, though, that you find yourself having to share the recreation center with the Balkan Paintball Society. It would be a lot harder to extend these rules to cover both groups; providing continuous translation among three or four languages would be the least of your problems. Scale this kind of problem further and you can appreciate what software developers face when they try to use objects across disparate networks with varying protocols, operating systems, and applications.

Object Request Brokers (ORBs) • The distributed object problem is one that has natural appeal to standards-setting groups. It's hard enough for one group of programmers to set rules for applications and environments they don't understand well; it's even more of a challenge if people from the other side don't like your rules. So the Object Management Group (OMG), an international object standards group, has created the Common Object Request Broker Architecture (CORBA). You won't be surprised to find that the solution to the problem of distributed objects is to create a common layer to which all connect. In fact, in this case layering is essential. It's too late to mandate standardized objects, and even if this were feasible, there would still be the problem of progress; unless you make the unreasonable assumption that everyone updates their software on the same schedule, at any given time there will be objects written to an old standard needing to communicate with cousins written to a new one. Some translating will always be needed, and the CORBA layer is the right place to do it. CORBA provides four essential functions: 1) a common interface to which all objects communicate; 2) software that translates between different kinds of objects; 3) a database function that registers objects so that others can find them; and 4) a means of making sure that objects that are in shared use continue to be available even if their host system is no longer connected (this is called persistence).

Distributed Computing Environment (DCE)

The Open Software Foundation (OSF), parent of the standard Unix, has taken the lead in developing a series of standards that make possible a multivendor database environment. Using DCE technologies, a business could, for example, integrate Microsoft desktop software with workgroup servers running Oracle and an Enterprise server (mainframe) using DB2. In addition to specifying how data is organized and translated, DCE includes security services, directory services (a database of users and resources), and such mundane but nevertheless critical things as time services (if clocks aren't synchronized throughout a transaction processing network, and if time/date information isn't used in a consistent way, serious confusion would result). Unlike the situation with OMG's CORBA, Microsoft is an active participant in DCE, along with most everyone who is anyone in the database/transaction processing world.

An Authoritative View of Distributed Objects

"In the future, a component's location on the network must be as irrelevant to the developer as its source language is today."

Bill Gates, in Byte (March 1998)

CORBA has been widely adopted, which is not to say that it is widely used. True object-based applications and systems are relatively rare, and distributed ones are rarer still. What "adoption" means in this case is that a lot of software companies, including powerhouses like IBM, say that they will make their systems CORBA-compliant. On the other hand, just because IBM says it will support something doesn't mean that its customers will use it (unlike the good old days). Further, Microsoft is not a supporter of CORBA. Microsoft has created its own distributed object standards, COM (Common Object Model) and DCOM (Distributed COM), and shows little interest in connecting these to CORBA. Microsoft's rationale, as in the case of a standardized Java, is simple: everyone should be using Microsoft OSs and applications, so there isn't any need to provide connections to anything else. Microsoft isn't (yet) much of a player in the high-end enterprise server world where common objects are of greatest interest. Even so, as NT/2000 continues its climb from workgroup to enterprise server, and Microsoft's applications on the desktop achieve near ubiquity, the opportunity for a coherent extended object model fades. Adding another layer, or layer element, that translates between CORBA and DCOM will likely be the response for at least the short term.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Distributed Data Vs. Distributed Processing

Create new playlist

Sign In

Sign Up

Distributed Data Vs. Distributed Processing

A Traveling Salesman Joke

Distributed Data Structures

Middleware

Object-based Approaches

Distributed Computing Environment (DCE)

An Authoritative View of Distributed Objects

Table of Contents for
Distributed Data Vs. Distributed Processing