Other factors to consider

Now that you have looked at the core factors of your design, you need to turn your attention to extraneous factors that will have an impact on your environment. Some of these factors will play a major role in determining your design while others a minor role. Either way, include these factors in your design consideration, and you will be glad you did.

Network capabilities

You always want to know what your network capabilities will be from the start. Network bandwidth and stability—we just can't say it enough—is the key. If you have a low network bandwidth, the connection may bog down or collapse with the push of a large number of changes. Unstable networks mean unstable propagation and higher risks of data loss. Also, look at network traffic high and low peaks. Can you relieve network workload by propagating at low peak hours?

Another factor is the site or data location and availability. Once again, network limitations come into play. Look at the geographic location and associated WAN/LAN performance. Are there any "Window" limits; when and how long connections can be made. If the Propagation cannot complete within the scheduled amount of time, repeated failures will disable the Propagation all together.

Avoid scheduling Propagation to sites that have limited or intermittent connectivity (laptops, mobile users). This can cause queue back up and slow propagation to other sites scheduled to propagate from the same queue. Consider creating separate capture queue(s) for sites with limited or intermittent connectivity. Or, consider materialized views as part of your distributed environment. Materialized views are alive and well in 11g and can greatly enhance the flexibility of your streamed environment by accommodating those "hard to reach places".

Transaction sizes

Be cognizant of transaction sizes (the LCR message count). Remember that basic Streams replication is at the row level and generates a message for each row changed. If the transactions are large bulk updates, consider user enqueued procedure calls as an alternative to large row level LCR's. This may require more application coding, but will alleviate issues encountered by LCR with a large number of messages. You will need to include conflict detection and resolution in the procedure code as well as tags to avoid change cycling.

Potential queue growth

Be aware of the rate at which LCR's are enqueued and dequeued. Queue growth is the biggest factor here. If the Propagation and Apply process dequeue cannot keep up with the enquiring, the queues will keep growing. This can cause buffered queues to spill on to the disk, and persistent queues to require additional extents. Network speed and stability plays a large role in allowing the queue operations and size to remain consistent.

Additional hardware resource requirements

You also want to understand your hardware system requirements when planning your streamed environment. Streams replication requires additional resources such as:

  • System O/S resources: More memory (shared pool) and O/S processes/threads
  • Disk space: You will need more storage for structures (queues, streams metadata, additional rollback, and so on)
  • CPU time: Streams generates additional background processes that require CPU. Those processes are listed next:
    • Propagation is CPU intensive
    • Near-Real time constantly uses CPU
    • If you plan to use parallelism this—means multiple processes are using CPU as well as the associated dependency tracking
    • Heavy DML means larger data transfers are using more CPU
  • Things to watch out for that cause CPU and other resource contention:
    • Avoid conflicts with other CPU intensive operations
    • Avoid propagation during hot backups
  • Every system is different; testing is the best way to determine the impact of Streams on CPU resources

Administration and maintenance costs

Don't underestimate your administration costs. Enterprise Manager offers a number of features to help reduce the time and effort in the actual implementation and maintenance of a Streamed environment. However, it is only a tool. The blue-print comes from the design and planning stage. This will, and should be, where the bulk of your time is spent.

The activities to include in your plan are:

  • Implementation:

    Design: Global and single site designs must be developed to meet the goals of the Streamed environment.

    Setup: Each site must be configured for its role in the environment; master, secondary, pass-through, and so on.

  • Backup and recovery:

    Design: The backup plans need to be designed to allow for the most efficient and effective recovery for foreseeable failure scenarios.

    Test: The backup and recovery procedures should be tested to ensure restorable backups and successful resynchronization.

  • Schema/application changes:

    Design: The schema and application changes will have global and single site impacts. These impacts must be studied and incorporated in the design.

    Test: All changes should be tested globally and singularly to avoid implementation problems.

    Setup: Changes must be implemented at all required sites in such a way that structure and data divergence are kept to a minimum.

  • Resynchronization of Diverged Data: Propagation failures can cause data divergence and manual review, recovery, and resynchronization may be required. Additional administration is required for continual monitoring of the apply error queues. The erred LCRs will need to be reviewed to determine the cause of the error, the cause corrected, and the LCR reapplied. This can be very time consuming; especially if the conflict resolution has not been adequately defined.
  • Database Tuning: Streams introduces additional processes and resource consumption. Tuning may need to be revisited after instantiating Streams and as the environment matures.

The administration of a streamed environment requires extra hours from the DBAs to monitor the system and streamed sites. These DBAs will require specialized training to understand how to keep the system and Streams healthy.

Third party application requirements

In the event that your distributed environment is configured as part of an implementation of a third party technology stack, make sure you understand the full scope and design of that implementation. Review your third party application documentation to identify how Streams replication is implemented, and used by the application, what database user accounts and schemas are created and what privileges are granted to what accounts.

Security

Database security requirements should be reviewed. Certain database users must be created to administer the streamed environment, and capture, send, receive, and execute the LCR's. These users require a higher level of database privileges than the normal application end user. This must be taken into consideration along with the environment's security requirements. The basic security models that can be implemented are Trusted and Untrusted:

  • Trusted: This model implements lower security measures. One user is usually configured to administer the environment as well as capture, propagate, and apply the LCR's. This user has access to all stream queues at all sites, as well as being able execute any procedure. This results in decreased security, but allows for increased flexibility.
  • Untrusted: This model separates the administration, capture, send, receive, and execute roles and privileges between different users. This allows each user to be only granted those privileges needed to accomplish their particular role. This results in higher security, but less flexibility in the overall Stream flow. It also requires additional implementation and design coordination.

Keep in mind that the replication of data within a Streams distributed environment is at the database level. It is highly recommended that the user accounts configured to support Streams replication be specifically and exclusively assigned for the task and separate from all other database and third party application database user accounts. This is due to the level of access to database objects that must be granted to the Streams users. This access level, if granted to application user accounts, could result in unexpected and unknown security loopholes that can be exploited.

Change auditing

When using Oracle Streams, regular Oracle auditing will not capture the same change audit information from the originating site at the apply site. You can expect that change audit information will be specific to the Apply process that is applying the change at the destination site. In cases where you need to track change audit information throughout a Streamed environment, it can be done by including the change audit information as fields in the replicated tables that are associated with the changed data. This can be accomplished in a number of ways. The main focus is that, if the change audit data needs to be persisted throughout the distributed environment, the structure of the change audit data should be included in the replicated data model and the collection of the required data values supported at the application level and persisted with the associated changed data. In Oracle 11gR2, you can use the DBMS_STREAMS_ADM.MAINTAIN_CHANGE_TABLE feature as discussed in Chapter 1.

Platform and version compatibility

Database platform compatibility and interoperability will have an impact on your design. If your system is homogeneous (All Oracle), it is preferable, and highly recommended, to have all Oracle databases on the same release and patch level. Overall capability of the Streamed environment is limited to the capabilities of the lowest Oracle version.

If your system is heterogeneous (Oracle and Non-Oracle), you will need to accommodate special configuration. Streaming between Oracle and non-Oracle platforms is supported via Oracle Gateway and user enqueued messages. Capture and apply queues and processes will always be on the Oracle database. Be cognizant of data type conversions and special handling that may be required when using Oracle Gateway connectivity.

KISS

Keep it simple and sweet. The greatest advantage that Streams has over Oracle's Advanced Replication is its flexibility. The golden rule of functionality is that with flexibility comes complexity. Flexibility is not a bad thing; you just need to plan for the additional complexity that implementation and maintenance will incur. One way to reduce complexity is to use the most efficient tool for the job. Do your research and choose your foundation and tools wisely (no doubt, the tower builders in Pisa, Italy would tell you the same).

A special note on Streams versus Data Guard.

In the Why? section, we discussed understanding why you would choose one tool over another tool to accomplish your goal. The key word we used in that discussion was 'efficiencies'. While you may be able to use either tool to accomplish your goal, one tool may be more efficient than the other.

A note about using Streams versus Data Guard to support a redundant failover system. A redundant failover system is by design, single-site ownership with unidirectional change flow.

If your main purpose is to provide a redundant failover system, do yourself a favor and avoid making mortal enemies of the production DBAs; use Data Guard's physical standby database. If you wish or need to access the standby for queries as well, use Data Guard's logical standby database—it is built on the same Streams apply technology (sent via remote procedure calls rather than advanced queues). However, Data Guard enforces single-site ownership without you having to define conflict resolution and error handling rules as you must do with Streams. This recommendation is not made lightly. Oracle highly recommends Data Guard over Streams for the purpose of a redundant failover site, and has done so since the inception of both technologies. Data Guard physical and logical standby functionality is included in the Enterprise License just as Streams, so there is no extra cost. Enterprise Manager Grid Control provides setup and maintenance support for Data Guard just as for Streams. Since Logical Standby and Streams use the same underlying technology, your data type and DDL limitations are about the same, as well as your network dependencies. The main difference is that there is less work in setting up and maintaining, and DBA's will have less work in monitoring the system. And maybe, they even thank you for the available automated switchover/failover, and database recovery capabilities that Data Guard provides (I know I would). Data Guard is designed to focus on zero or minimal data loss in the case of switchover/failover, this is not the main focus of Streams.

If you use Streams, you will need to incorporate the development and implementation of conflict resolution rules for each piece of data that is replicated. In some cases, one rule may be sufficient for all data. In other cases, you will need to differentiate how change conflicts are resolved based on data ownership requirements.

Streams is not a "set and forget" environment. It must be continually monitored to ensure transactions are successfully captured, propagated, and applied at all sites.

Recovery and resynchronization of a failed or lost site may require a full re-instantiation of the site was well as data. Remember, Streams does not guarantee zero or minimal data loss in the case of failure. Depending on the number of objects replicated, capture, propagation, and apply rules defined, and amount of data, this could take anywhere from minutes to days. Special consideration and procedures must be followed if a point-in-time recovery is necessary.

Think of Streams versus Data Guard this way: yes, you can use Streams to provide a redundant failover system. But would you use a butter knife to cut a steak? Yes, you could do it, but you'd have to work a lot harder than if you used the steak knife that is in the same drawer (over to the left, next to the dinner forks).

Never say never: Now, the above recommendation does not intend to imply that you cannot or should not use a streamed database that has been configured to support shared ownership as a fail-to instance should a primary site fail. If it is there, and maintains data consistency at the required level, go ahead and use it. Every business circumstance is different. In some circumstances you may decide you want to use Streams to support a redundant failover system. Just think twice about implementing Streams specifically to support a redundant failover system rather than Data Guard.

Design aid: Streams site matrix

If a picture is worth a thousand words, then a matrix should be worth at least 500. This section discusses the use of a handy-dandy little tool that we like to call the Streams Site Matrix. This matrix provides a visual summary of who sends what where, and who's changes get applied to where and how. Overall, this matrix, when combined with the design considerations, helps you more accurately predict the level of effort to implement Streams and identify, early on, potential data and design conflicts, as well as identify resource requirements and potential limitations. For instance, from this matrix, you can accurately provide information concerning necessary hardware, Oracle licensing, network connections needs to your system architects; necessary user application data values to support the distributed data model, and optimal DML transaction behavior to your software architects; and, additional data fields necessary to support conflict resolution, and change audit throughout the distributed environment to your data architects. The Matrix also provides a succinct, organized list of the Capture, Propagation, and Apply processes and queues, advanced functionality such as conflict resolution, transformations, and so on, needed for each site.

The Matrix template

First, let's talk about the matrix template. The template shown here provides a starting point. It highlights the main components of our intended Streamed environment. As you work through your own designs, feel free to modify and expand the template to reflect information that makes the most sense to you and your business. Add more detail as you develop your design. You will find that doing so produces an excellent overview to include with your environment documentation.

The Matrix template

Our template begins with listing information in the first couple of rows that help us understand the type of replication environment we are designing: Adding Single-Source schema level replication for the HR schema where STRM1 is the master site and STRM2 is the secondary site. Artistic license is allowed, modify this section as needed to show information that is helpful to you and your business; but DO show it.

Streams Site Matrix for:

Replication Level:

Schema

 

Color Key:

Existing

 

Name:

HR

  

Add/Remove

 

Add/Remove:

Add

Comment:

Single Source

 
    

STRM1 master to STRM2 secondary

 

Then we begin our matrix.

In column A, we list out each site that needs to be configured for Streams, listing the three streams processes under each site section. Column B lists specific information that we want to identify for each Streams process at that local site. For instance, process and queue names, whether or not the process includes DDL, has associated advanced components like transformations rules, tags, and so on (covered in Chapter 6, Get Fancy with Streams Advanced Configurations), and conflict resolution (for Apply processes only— covered in Chapter 5, N-Way Replication). If you are replicating tables rather than schemas, you may want to include a link to a list of tables associated with each process. Again, show information that is helpful to you and your business.

Local Sites

 

STRM1

Capture

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

 

Table list

  

Propagate

Process Name:

 

From Queue Name:

 

To Queue Name:

 

DDL (Y/N):

 

Transformations:

  

Apply

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

 

Conflict Resolution

  

STRM2

Capture

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

  

Propagate

Process Name:

 

From Queue Name:

 

To Queue Name:

 

DDL (Y/N):

 

Transformations:

  

Apply

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

 

Conflict Resolution

One variation here would be that you have multiple Capture processes. To show this, add a sub-section for each Capture process, like so:

STRM1

Capture

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

  
 

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

Propagate

Process Name:

 

From Queue Name:

 

To Queue Name:

 

DDL (Y/N):

 

Transformations:

  

Apply

Process Name:

 

Queue Name:

 

DDL (Y/N):

 

Transformations:

 

Conflict Resolution

In the subsequent columns, list each database site that is a member of the distributed system.

Remote Site

 

STRM1

STRM2

As we move across the columns, we record what needs to be created at this local site for each of the processes in the appropriate column for the remote site. If the local site does not interact with a remote site, leave the remote site column blank (showing which sites do not interact is just as important as showing what sites do interact). If the "local" Site is the same as the "remote" site, don't worry that it is not technically "remote" (if you are that detail—oriented—a.k.a "anal"; might we recommend taking up a right-brained hobby such as Yoga?). This is a great section to record the Capture information for the Local Site. Think of it as "we are setting up capture AT this site, FOR this site".

Let's define our Single-Source design (See Chapter 4, Single-Source Configuration, for actual implementation). As mentioned, this design is just a simple 2-site, Single-Source environment. STRM1 is the site where changes are made. STRM2 is the secondary site to which we will replicate these changes. This means we need a Capture process and queue at STRM1 for the HR schema, and a Propagation process at STRM1 (to STRM2) that will be from queue-to-queue. We also need an Apply process and queue at STRM2. Since this is a Single-Source configuration, we do not need to create any capture or Propagation processes on STRM2, or an Apply process on STRM1. This not only reduces the internal operations for replication, it protects the data at STRM1 from potential changes made at STRM2. We will call our Capture process SCHEMA_HR_CAPTURE and assign it to the SCHEMA_HR_CAPTURE_Q queue. We will call our Apply process SCHEMA_HR_APPLY and assign it to the SCHEMA_HR_APPLY_Q queue. Our Propagation process name will be SCHEMA_HR_PROPAGATION, and will propagate from the SCHEMA_HR_CAPTURE_Q queue at STRM1 to the SCHEMA_HR_APPLY_Q queue at STRM2. We want to include DDL changes in this stream, so we will show this for all processes as well.

Based on this, let's fill in our Site Matrix:

First, the Capture at STRM1:

  

Remote Sites

 

Local Sites

 

STRM1

STRM2

STRM1

Capture

Process Name:

SCHEMA_HR_CAPTURE

 
 

Queue Name:

SCHEMA_HR_CAPTURE_Q

 
 

DDL (Y/N):

Y

 
 

Transformations:

  

Next, the Propagation at STRM1 to STRM2:

  

Remote Sites

 

Local Sites

 

STRM1

STRM2

STRM1

   
    

Propagate

Process Name:

 

SCHEMA_HR_PROPAGATION

 

From Queue Name:

 

SCHEMA_HR_CAPTURE_Q

 

To Queue Name:

 

SCHEMA_HR_APPLY_Q

 

DDL (Y/N):

 

Y

 

Transformations:

  

And finally, the Apply at STRM2:

  

Remote Sites

 

Local Sites

 

STRM1

STRM2

STRM2

   
    

Apply

Process Name:

SCHEMA_HR_APPLY

 
 

Queue Name:

SCHEMA_HR_APPLY_Q

 
 

DDL (Y/N):

Y

 
 

Transformations:

  
 

Conflict Resolution

  

The completed Streams Site Matrix appears as follows:

The Matrix template

Even though we will not be sending changes from STRM2 to STRM1, we still need to consider that changes can be made (whether intended or not) directly at STRM2 that can result in data conflicts and apply errors for changes from STRM1. While we don't address it in this example, you would want to implement some form of conflict resolution to avoid having to continually monitor for these conflicts (the OVERWRITE method would be recommended here). See Chapter 5 for discussion on conflict resolution.

Let's now expand this concept to show an N-Way Replication configuration (See chapter 5 for actual implementation). In this configuration, we are going to replicate a single table EMPLOYEE in the LEARNING schema. DML and DDL changes to the table can be made at either STRM1 or STRM2 and replicated to the other master site. Since data changes can be made at either site, as good DBAs we know we need conflict resolution defined for our Apply processes. Our business rules tell us that the latest change should "win" in the case of a data conflict so we will implement maximum—time resolution (a.k.a. latest timestamp resolution for all you Advanced Replication buffs).

Our Streams Site Matrix appears as follows:

The Matrix template

The Streams Site Matrix provides a simple yet elegant visual aid. From here you can develop your level of effort and resources requirements, detail level blueprints, and understand what you need to implement your Streamed environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.90.131