Preface

This Preface and the entire book are a little bit different — and that is by design. Both authors wrote this book understanding that our target audience often does not have time to read a whole book, or the Oracle documentation, from cover to cover. As such, we wrote this book with the idea that the table of contents and headings should tell you exactly what is being covered. Bullet lists will be used to quickly highlight key points where appropriate. Where concepts need to be explained in more detail, a supporting narrative is supplied. Another difference is that we make multiple references to Oracle documentation rather than attempting to rewrite everything. This is also by design. Having seen Oracle documentation evolve over the years, both authors, and our publisher, recognize the intrinsic value of getting specific detailed information straight from the "horse's mouth". To promote the development of overall expertise, we focus on helping our readers effectively use all the tools available. The Oracle documentation is one of your most valuable tools. At times, Oracle documentation can be difficult to follow or find information within, but once you develop an expertise in using the documentation, the expertise in the functionality is not far behind. The focus of this book is not to replace the Oracle documentation, but rather to be a quick reference companion to the Oracle documentation.

Replication in general

The concept of replication is simply to duplicate. Birds do it, bees do it, and even cells do it. However, replication is not limited to the biological world. Accurately duplicating data from which information is derived is the foundation of human communication. Whether that data be the words or gestures used to convey a story that is handed down from generation to generation, or the numbers used to quantify the quantifiable, or the grouping of on/off bits stored in a computer file; humans have been replicating data since they discovered the need to communicate.

Now that we have evolved into the wonderful age of computerized technology, we recognize the limitless advantages of sharing data, and the need to accurately and efficiently duplicate and distribute that data.

Distributed database systems

We all know that a database is a collection of data objects that are typically accessed through a client/server architecture, and where the database is the server.

We also know that client/server architecture uses a network communication channel that allows the client to send or get data to/from the database. The client can be local (on the same computer as the database), or it can be remote (on a different computer than the database). Either way, the client uses some type of network connection to access the database.

The sharing of data between two or more databases constitutes a distributed database system (even if the databases reside on the same computer). Distributed database systems can be homogeneous (all on the same platform, such as Oracle) or heterogeneous (two or more platforms, such as Oracle, MS SQL Server, SYBASE, and so on.) These systems can utilize a number of data distribution methods (unidirectional, bidirectional, read-only, synchronous, and asynchronous). The glue that holds this all together is the network and database links between the various databases.

A database link is a one-way communication channel from one database (source) to another database (target) that allows the source database to access the objects in the target database.

Key terms that have been discussed and should be understood here are: database link, communication channel, and network connections. These all work to provide connectivity in a distributed system. It is very important to understand that network connectivity makes or breaks a distributed system. No network connection means no data distribution. An unstable network means unstable data distribution.

Now that you have a distributed database system, add client applications that access one or more databases in that distributed database system, and voila! You have a full-blown distributed system.

What is Data Replication?

Data Replication is literally the act of accomplishing data object changes throughout a distributed system. Period. Replication can be manual, or it can be automated. Automated is the preferred mode of 3 out of 4 DBA's surveyed (we do not really count the 4th, he's semi-retired and has nothing better to do).

How do "Replication" and "Distributed Systems" interact?

Replication makes data located in different databases available to all databases within the distributed system. So replication is the method behind a distributed system. It moves the data around to different sites.

Tip

Databases within a distributed system are often referred to as sites. As mentioned earlier, databases can be physically co-located on the same computer, but the databases themselves could still be referred to as separate sites. The term 'site' is more of a logical distinction, than a physical distinction.

Why would we want to replicate?

There are a number of reasons to replicate data, but it is a good bet that they all boil down to increased availability. This means that the same data is available at different sites, and the flow of data between those sites is automated. Replication supports increased availability by providing the following:

  • Change consistency: Ensures that all sites get the same change.
  • Mass deployment/disconnected computing: Data can be sent to secondary computers (laptops, desktops) so that it is available when these devices might be offline.
  • Faster access: Load balancing is the art of distributing client connections over multiple databases. This comes in really handy when the system has a large number of users, and even more so if those users are geographically separated from the system databases. The user just connects the geographically closest database. Network load can also be reduced by directing traffic over different routers for different database sites.
  • Survivability: Data is still accessible if one site fails.

Tip

When not to use replication for survivability purposes

If the need is to only support survivability and data changes made at a single site, there are better tools to use to support survivability that require a little less configuration, maintenance, and monitoring. For example: Data Guard!

Replication architecture

Replication architecture refers to the overall structure of the replicated environment. This includes what is replicated between the sites and the role of each site. The following terms are used to make these distinctions:

Master table/object: A table or object that is replicated to another database. A replicated table can be a master table for a snapshot/materialized view, or a table that is duplicated at a remote site. For tables, both the structure and the data are replicated. For non-table objects, the object definition is replicated.

Master/Source site: A database which hosts master tables/objects. The tables can be a master table for a snapshot/materialized view, or a table that is replicated to a remote master site.

Secondary/Target site: A database which hosts replicated objects to which changes are sent by a master site. This can be another master site, or a materialized view site. The expectation of a secondary site is that if a data conflict occurs when attempting to apply the change from the sending master site, the conflicting secondary site data is always replaced by the values from the sending master site.

Replication methods

A replication method describes how data is replicated between sites. This can be broken down into commit synchronization and directional flows.

Commit synchronization flow refers to when changes are committed at and between sites. There are two methods of commit synchronization; synchronous and asynchronous.

Synchronous replication requires that all sites be able to commit the change before it is committed at the originating site. If any site is not able to commit the change, the change is rolled back at all sites, including the originating site. This requires all database sites in the distributed system to be writable over network connections. The nature of synchronous replication keeps the data at all sites synchronized, thus (at least theoretically), eliminating the need for conflict resolution. Synchronous is used for real-time, mission-critical replication.

Asynchronous replication allows the transaction to be committed at the originating site regardless of whether it is successfully committed at the other target sites in the distributed system. In this method, if the commit is successful at the originating site, appropriate deferred transactions for each target site are created and stored to be propagated and applied at a later time (keep in mind "a later time" can be as little as a few seconds). This allows work to continue at the originating site even if the changes cannot be applied to the other sites within the distributed system immediately. This does, however, open up the possibility of data divergence, and requires some form of conflict resolution (manual or automated) to be implemented should divergence occur.

Replication from one site to another can only be synchronous or asynchronous. It cannot be both (in other words, it is mutually exclusive).

Directional flow refers to the direction in which changes are passed between two sites.

Unidirectional means that data changes only flow one way. In this case, changes are made at a primary master and are sent to a secondary site. Direct changes made at secondary sites are either not allowed, or not sent to the primary master site. If changes are made at a secondary site that causes data divergence from the primary master database, subsequent changes from the primary master will either fail due to the data differences, or overwrite that change if conflict resolution mechanisms are in place. Read-only snapshots are an example of unidirectional replication.

Bidirectional (N-Way) replication means that data changes can flow to and from sites within a distributed system. Changes can be made at any master or updateable snapshot site. These changes are then propagated to all other sites. If the bidirectional replication is asynchronous it can lead to data divergence, and requires some form of conflict resolution (manual or automated) to be implemented, should divergence occur. Master-to-Master and Updateable Snapshots are examples of bidirectional replication.

Replication of an object between two sites can only be unidirectional or bidirectional. It cannot be both (again, mutually exclusive).

A commit synchronization method can be applied to either directional flow method, and vice versa.

Replication configurations

Now that you understand replication architecture and methods, these can be combined to create a replication configuration. A replication configuration can also be referred to as a replication environment. The following define the different replication configurations that you can implement:

N-Way/Master-to-Master/Multi-Source: A distributed environment that has two or more change source sites. These source sites push changes to other change source sites and receive changes from other change source sites.

Uni-directional/Master-to-Secondary/Single-Source: A distributed environment where one site is the (change) source site (primary/master). It, in turn, pushes changes to other sites (secondary). If data changes directly at a secondary site, this could result in data divergence and must be addressed through conflict resolution methods.

Hybrid: A distributed environment that has a combination of multi and single source configurations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.150.163