Monitoring Replication

After replication is up and running, it is important for you to monitor the replication and see how things are running. You can do this in several ways, including SQL statements, SQL Enterprise Monitor, and Windows Performance Monitor. You are interested in the agent's successes and failures, the speed at which replication is done, and the synchronization state of tables involved in replication. Other things to be watched for are the sizes of the distribution database, the growth of the subscriber databases, and the available space on the distribution server's snapshot working directory.

SQL Statements

One way to look at the replication configuration and do things like validate row counts is to use various replication stored procedures.

These include the following:

  • sp_helppublication— Info on the publication server

  • sp_helparticle— Article definition information

  • sp_helpdistributor— Distributor information

  • sp_helpsubscriberinfo— Subscriber server information

  • sp_helpsubscription— The subscription information

  • sp_replcounters— Shows the activity of this replication session. You can see the volume of traffic and the throughput here

    exec sp_replcounters
    go
    

    yields

    database repl_trans rate trans/sec latency (sec) etc.
    Northwind 110    71.428574        2.1830001
    

For actual row count validation:

  • sp_publication_validation— Goes through and checks the row counts of the publication and subscribers

exec sp_publication_validation @publication = N'Northwind2Northwind'
go

yields

Generated expected rowcount value of 53 for Territories.
Generated expected rowcount value of 58 for Suppliers.
Generated expected rowcount value of 6 for Shippers.
Generated expected rowcount value of 4 for Region.
Generated expected rowcount value of 154 for Products.
Generated expected rowcount value of 1690 for Orders.
Generated expected rowcount value of 2155 for Order Details.
Generated expected rowcount value of 49 for EmployeeTerritories.
Generated expected rowcount value of 18 for Employees.
Generated expected rowcount value of 95 for Customers.
Generated expected rowcount value of 0 for CustomerDemographics.
Generated expected rowcount value of 0 for CustomerCustomerDemo.
Generated expected rowcount value of 16 for Categories.

Another way to monitor replication is to look at the actual data that is being replicated. To do this, first run a SELECT count (*) FROM tblname statement against the table where data is being replicated. Then verify directly if the most current data available is in the database. If you make a change to the data in the published table, do the changes show up in the replicated tables? If not, you might need to investigate how replication was configured on the server.

SQL Enterprise Manager

As you have seen, Enterprise Manager provides considerable information about the status of replication. Most of this is available via the Replication Monitor branch. In Replication Monitor, you can see the activity for publishers, agents, and the ability to configure alerts:

  • Publishers— This folder contains information about publishers on the machine. By selecting any publisher on the machine, you can view information about any computers that have subscribed to the publication. This will tell you the current status and the last action taken by the subscriber.

  • Agents— The Agents folder contains information about the different agents on the machine. By choosing any Agents folder, you can see the current status of that agent. Selecting an agent and double-clicking it will display the history of that agent.

  • Replication Alerts— The Replication Alerts folder allows you to configure alerts to fire in response to events that occur during replication. These can activate when errors occur, or in response to success messages.

Through Enterprise Manager and Replication Monitor, you also can invoke the validate subscriptions processing to see whether replication is in sync. This is only available from the Distributor Server branch in Enterprise Manager. Under the Publishers branch of Replication Monitor, simply right-click on the publication you wish to validate. You will see the menu option to Validate Subscriptions. You can validate all subscriptions or just a particular one, as shown in Figure 7.41. After this has been invoked, the validation results can be viewed via the distribution agent history (also shown in Figure 7.41).

Figure 7.41. Validate subscriptions to ensure that publisher and subscriber are in sync.


The Performance Monitor

You also can use Windows NT Performance Monitor to monitor the health of your replication scenario. Installing SQL Server adds several new objects and counters to Performance Monitor:

  • SQLServer:Replication Agents— This object contains counters used to monitor the status of all replication agents, including the total number running.

  • SQLServer:Replication Dist— This object contains counters used to monitor the status of the distribution agents, including the latency and the number of transactions transferred per second.

  • SQLServer:Replication Logreader— This object contains counters used to monitor the status of the log reader agent, including the latency and the number of transactions transferred per second.

  • SQLServer:Replication Merge— This object contains counters used to monitor the status of the merge agents, including the number of transactions and the number of conflicts per second.

  • SQLServer:Replication Snapshot— This object contains counters used to monitor the status of the snapshot agents, including the number of transactions per second.

Backup and Recovery in a Replication Configuration

Something that will reap major benefits for you after you have implemented a data replication configuration is a replication-oriented backup strategy. You must realize that the scope of data and what you must back up together has changed. In addition, you must be aware of what the recovery timeframe is and plan your backup/recovery strategy for this. You might not have multiple hours available to you to recover an entire replication topology. You now have databases that are conceptually joined, and you might need to back them up together as one synchronized backup.

When backing up environments, back up the following at each site:

  • Publisher (published db, msdb, and master)

  • Distributor (distribution db, msdb, and master)

  • Subscribers (Optional subscriber db)

Maintaining a regular backup of the publisher databases and leveraging the SQL Server replication's built-in ability to reinitialize one or more subscriptions on-demand provides a simple recovery strategy.

You could further limit regular backups to your publication data bases and rely on SQL Server replication scripting to provide a method for reestablishing replication if you need to restore the entire replication environment.

Another strategy includes backing up only the publisher and the distributor as long as the publisher and distributor are synchronized. This strategy allows you to restore a replication environment completely. Backing up a subscriber is optional but can reduce the time it takes to recover from a failure of the subscriber.

Always make copies of your replication scripts and keep them handy. At a very minimum, keep copies at the publisher and distributor and one more location, such as at one of your subscribers. You will end up using these for recovery someday.

Don't forget to back up master and msdb when any new replication object is created, updated, or deleted.

Back up the publication database after

  • Creating new publications.

  • Altering any publication property including filtering.

  • Adding articles to an existing publication.

  • Performing a publication-wide re-initialization of subscriptions.

  • Altering any published table using a replication schema change.

  • Performing on-demand script replication.

  • Cleaning up merge metadata (running sp_ mergecleanupmetadata).

  • Changing any article property including changing the selected article resolver.

  • Dropping any publications.

  • Dropping any articles.

  • Disabling replication.

Back up the distribution database after

  • Creating or modifying replication agent profiles.

  • Modifying replication agent profile parameters.

  • Changing the replication agent properties (including schedules) for any push subscriptions.

Back up the subscription database after

  • Changing any subscription property.

  • Changing the priority for a subscription at the publisher.

  • Dropping any subscriptions.

  • Disabling replication.

Back up the msdb system database after

  • Enabling or disabling replication.

  • Adding or dropping a distribution database (at the distributor).

  • Enabling or disabling a database for publishing (at the publisher).

  • Creating or modifying replication agent profiles (at the distributor).

  • Modifying any replication agent profile parameters (at the distributor).

  • Changing the replication agent properties (including schedules) for any push subscriptions (at the distributor).

  • Changing the replication agent properties (including schedules) for any pull subscriptions (at the subscriber).

Back up the master database after

  • Enabling or disabling replication.

  • Adding or dropping a distribution database (at the distributor).

  • Enabling or disabling a database for publishing (at the publisher).

  • Adding the first or dropping the last publication in any database (at the publisher).

  • Adding the first or dropping the last subscription in any database (at the subscriber).

  • Enabling or disabling a publisher at a distribution publisher (at the publisher and distributor).

  • Enabling or disabling a subscriber at a distribution publisher (at the subscriber and distributor).

In general, you will find that even when you walk up and pull the plug on your distribution server, publication server, or any subscribers, automatic recovery works well to get you back online and replicating quickly without human intervention.

Alternate Synchronization Partners

Similar to log shipping, using alternate synchronization partners during merge replication is an option that supports continuous synchronization in the event of a failure of the primary publisher. Specifying an alternate synchronization partner for publications defined at a publisher provides a method to synchronize data changes to replicated tables with servers other than the publisher where a subscription originated. Synchronizing with alternate synchronization partners provides the ability for a subscriber to synchronize data even if the primary publisher is unavailable and increases availability greatly.

The following are requirements when using alternate synchronization partners:

  • The feature is available only with merge replication.

  • The alternate synchronization partner must have the data and schema required by the subscription.

  • It is recommended that the publication created on the alternate server be an exact copy of the publication created on the original publisher.

  • The publication properties must specify that subscribers can synchronize with other publishers.

  • The subscriber must be enabled at the alternate synchronization partner so that the subscriber can synchronize data with that publisher.

  • A subscription with the same attributes as the subscription at the primary publisher will be added automatically at the alternate synchronization partner.

This approach is mentioned because of its high availability capabilities. However, in practical experience, most companies have found that the added complexity of merge replication is more self-defeating than practical.

Worldwide Sales and Marketing—Scenario #2 with Data Replication

As we defined in Chapter 1, “Essential Elements of High Availability,” this common business scenario is about a major chip manufacturer that has created a highly successful promotion and branding program, which results in billions of dollars in advertising dollars being rebated back to their worldwide sales channel partners. These sales channel partners must enter in their complete advertisements (newspaper, radio, TV, other) and be measured against ad compliance and logo usage and placements. If a sales channel partner is in compliance, they will receive up to 50% of the cost of their advertisement back from this chip manufacturer. There are three major advertising regions: Far East, Europe, and North America. Each region produces a huge daily influx of new advertisement information that is processed on the primary server in North America. Then for the rest of each day, the regions review compliance results and run other types of major sales reports for their region only. As you might also recall, application mix is approximately 75% online entry of advertisement events and 25% regional management and compliance reporting.

Availability:

  • 24 hours per day

  • 7 days a week

  • 365 days a year

Planned Downtime: 3%

Unplanned Downtime: 2% will be tolerable

Turns out that the regional reporting and query processing is the most important (most critical) part of this application. Each region must be able to query the compliance and advertisement information in this database as they interact with their regional channel partners and provide compliance status and rebate information rapidly (including dollar figures). This often requires specialized reports that span numerous advertisement events and impact very large amounts of money. The online data entry of the advertisement information is done around the clock by third-party data entry companies and must be done directly on the central database (behind extensive firewalls and security). Performance of the OLTP portion of this application must not be sacrificed in any way.

As you can see in Figure 7.42, an optimal HA solution was chosen that basically consists of five servers servicing a global set of users. Each separate server will have basic hardware/disk redundancy, one SQL Server instance, and will be configured with SQL Server's robust “transactional” data replication implementation. This will create three regional reporting images of the primary marketing database (MktgDB). These distributed copies will alleviate the major reporting burden against the OLTP (primary database) and any one of them can serve as a warm standby copy of the database in the event of a major database problem at headquarters. Overall, this distributed architecture is easy to maintain and keep in sync, and is highly scalable. To date, there has never been a major failure that required a complete switch over to one of the subscribers. But, they are in position to do this if ever required. In addition, the performance of each reporting server has been so outstanding that each region has brought in business objects and built their own unique reporting front-end to this data.

Figure 7.42. Scenario #2 HA solution using transactional data replication.


From an ROI point of view, the Sales and Marketing group calculated theirs as follows:

We had previously estimated the total incremental costs to be in the range of between $10K and $100K, which included

  • Three new two-way servers (with 4GB RAM, local SCSI disk system RAID 10—15 new drives total) at $10K per server (one for the North American Reporting/spare server, one for Europe, and one for the Far East).

  • Three MS Windows 2000 Server licenses ~ $1.5K per server.

  • Two days of additional training costs for system admin personnel ~ $5K.

  • Four new SQL Server licenses (SQL Server 2000—the remote distributor and three new subscribers) at $5K per server.

There are no special hardware or SCSI controllers needed to implement data replication. The total incremental cost to build this high availability solution is approximately $89,500 (total costs—as follows).

Now, let's work through the complete ROI calculation for these incremental costs along with the cost of downtime:

  1. Maintenance cost (for a one-year period):

    • $5K (estimate)— Yearly system admin personnel cost (additional time for training of these personnel)

    • $24.5K (estimate)— Recurring software licensing cost (of additional HA components; [3] OS + [4] SQL Server 2000)

  2. Hardware cost:

    • $30K hardware cost (of the additional HW in the new HA solution)

  3. Deployment/assessment cost:

    • $20K deployment cost (develop, test, QA, production implementation of the solution)

    • $10K HA assessment cost

  4. Downtime cost (for a one-year period):

    • If you kept track of last year's downtime record, use this number; otherwise produce an estimate of planned and unplanned downtime for this calculation. We estimated the cost of downtime/hour to be $5K/Hour.

    • Planned downtime cost (revenue loss cost) = Planned downtime hours × cost of hourly downtime to the company:

      1. 3% (estimate of planned downtime percentage in one year) × 8760 hours in a year = 262.8 hours of planned downtime

      2. 262.8 hours × $5K/hr (hourly cost of downtime) = $1,314,000/year cost of planned downtime.

    • Unplanned downtime cost (revenue loss cost) = Unplanned downtime hours × cost of hourly downtime to the company:

      1. 2% (estimate of unplanned downtime percentage in one year) × 8760 hours in a year = 175.2 hours of unplanned downtime

      2. 175.2 hours × $5K/hr (hourly cost of downtime) = $876,000/year cost of unplanned downtime.

ROI totals:

  • Total costs to get on this HA solution = $89,500 (for the initial year and roughly $24.5K/year for subsequent years)

  • Total of downtime cost = $2,190,000 (for the year)

    The incremental cost is about 4% of the downtime cost for one year. In other words, the investment of this particular HA solution will pay for itself in 18.9 hours! This is a huge ROI in a very short amount of time. And provides a great scalable and flexible platform to grow on.

After building this HA solution, the uptime goal was achieved easily. Occasionally, there were some delays in resyncing the data at each regional site (subscribers). But, overall, the users were extremely happy with performance and availability. This is a great example of knowing what your HA options are and how to minimize hardware, software, and maintenance costs. Exceptional!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.118.40