11.7. Best practice considerations: high availability

As with performance tuning where there will always be a bottleneck somewhere, each high-availability plan typically has its own weak points. On its own, database mirroring won't protect against poor maintenance practices and corrupted backups; it should therefore be seen as one component of a much broader high-availability plan.

  • Service level agreements should be considered your binding contract. You wouldn't pay a builder to build your house without specific instructions and contract conditions; likewise, you can't effectively design and administer a highly available database system without a set of service level agreements in place. Know what they are and have confidence that they're achievable through well-thought-out and simulated disaster-recovery plans.

  • During the process of developing service level agreements, prepare option papers that clearly list alternate options, costs, and corresponding service levels that can be achieved. Such papers assist the decision-making and budgeting process, and set expectations at the appropriate level for both customers and management.

  • To prevent unexpected problems leading to system outages (therefore impacting availability targets), ensure appropriate development and test systems exist to assess the impact of database changes before they reach production; by the time a change makes its way to production, everyone responsible for the change should have complete confidence that it won't have any adverse effects.

  • Database object definition and modification scripts should be stored in a source control tool in order to support reproducing a database at a defined version with options to roll forward changes as required to achieve a desired state.

  • Aside from resource capacity such as the amount of RAM or CPU power, development, test, and production environments should be as identical as possible to minimize unexpected problems. Items that should be the same include collation, CPU platform, SQL version/edition, and service pack/hotfix level.

  • For each production environment, a corresponding load-testing environment should be available and ideally configured identically to production, or at a scale such that the performance differences are well understood. Such environments, in combination with a load-testing tool, are critical in ensuring changes will work under production load.

  • Consider the importance of accurate test data in development/test environments. If possible, provide complete copies of production data, or obfuscate where appropriate for security. Consider cloning a database's histograms and statistics to enable more accurate query execution plans for development databases if production-sized databases can't be used.

  • Use a schema comparison tool such as Visual Studio Team System Database Edition (aka Data Dude) or a third-party tool. These tools are invaluable in troubleshooting schema difference problems.

  • Document each database change, including details such as the date and time of the action, the location of the script, who performed the change, change output or log files, and so forth, and where possible, have undo scripts for each change. Use database backups or snapshots before significant changes if undo scripts aren't feasible.

  • Unless the organization has invested in infrastructure capable of achieving five nines (99.999%) availability, it's crucial that planned outages be scheduled on a regular basis in order to manage user (and management) expectations and plan for critical maintenance operations such as service pack upgrades.

  • Consider combining high-availability solutions to minimize the limitations of each. For example, use clustering to protect instances and databases as a group together with mirroring or log shipping critical databases offsite for geographical protection.

  • Use an independent monitoring instance in a log shipping configuration—that is, don't use the primary or secondary instance as the monitor.

  • If using transaction log shipping on a cluster, set up the file share for the transaction logs as a clustered resource. This will ensure the file share survives a cluster node failover, enabling log shipping to continue running.

  • The default mirroring configuration is high safety with automatic failover (if a witness server is included). Prior to production implementation, simulate various network latencies in a test lab to measure the impact on transaction performance and the chances of unwanted failovers. Starting with high performance (asynchronous) mirroring is perhaps the best approach until the environment-specific nuances of mirroring are well understood.

  • Allow plenty of free disk space for transaction logs; in situations where the mirror database is unavailable, transactions can't be removed from the principal's transaction log until delivered to the mirror.

  • Like a clustering implementation, ensure both mirroring partners are configured identically (hardware capacity, SQL Server edition and version, collation, and so forth), and make sure the additional load from a failover condition can be handled by the mirror partner.

  • To enable applications to continue working after a mirroring failover, create the same logins on both the principal and mirror instance and use the same SID values (via the SID = clause of the create login statement).

  • Avoid using a witness instance for high performance (asynchronous) mirroring. Unlike high-safety (synchronous) mirroring, the witness instance can't initiate automatic failure in high-performance mode, which means the major benefit of using a witness is unavailable. On the downside, a witness can introduce a "false positive," whereby the database is taken offline if the principal loses network contact with the mirror and witness. In essence, you get no benefit while keeping the downside.

  • Before commissioning a production database mirroring solution, observe application connection behavior with test failovers to ensure logins (and other aspects of system behavior) continue working on the mirror instance.

  • Avoid reconfiguring mirroring endpoints while in use. Doing so may introduce an unwanted failover.

  • Consider pausing high-safety (synchronous) mirroring during periods of high activity such as large bulk loads and index maintenance. Doing so will speed up the processing time of these actions (at the price of running exposed). Before doing so, ensure enough transaction log disk space is available on the partner servers.

  • In a two-site high-safety (synchronous) mirroring solution used for disaster-recovery purposes, place the witness instance on the side of the principal instance. This avoids a common situation whereby the link between the sites goes down, and the principal instance loses quorum (in other words, loses contact with both the mirror and witness). In this case, the principal instance will take its database offline. By placing the witness with the principal, losing the network link will enable the principal to continue running, albeit in an exposed mode.

  • On a server containing multiple mirrored databases, ensuring all of the databases fail to the same mirror server will simplify application connection processes. Further, ensuring all mirrored databases on a server are either mirrors or principals will enable rolling upgrades (manual failovers) to proceed in a smooth manner.

  • Carefully monitor the CPU usage on servers participating in high safety with automatic failover mirroring. Servers with very high CPU usage may fail to respond to ping requests within the nominated timeout period, and may be subjected to unwanted failover events.

  • The minimum timeout period for a mirroring session should be 10 seconds or greater. A value less than this may lead to false positive failover conditions during temporary network or server conditions.

  • If a mirroring principal is set up as a clustered instance, consider adjusting the mirroring session timeout value to greater than the default 10 seconds. Doing so will prevent mirroring failover during a clustering failover. Typical cluster failovers take up to 90 seconds to complete, so adjusting the mirroring timeout to at least this value is recommended. Use ALTER DATABASE x SET PARTNER TIMEOUT 90 or something similar.

  • If combining mirroring and log shipping together in one solution, set up mirroring first, and make sure both mirror servers are configured to use the same share for transaction log backups.

  • Review the index maintenance strategy when using database mirroring. Given the requirement for a mirrored database to be in full recovery mode, running unnecessary full index rebuilds may saturate the network link between the mirroring partners for little overall performance gain. While log stream compression helps alleviate the transaction load, using a lighter index maintenance technique, covered in chapter 13, will reduce the network usage in a mirrored solution.

  • Use snapshots on the mirror database with caution. The additional load of the copy on write process may lead to a significant redo queue, lengthening failover time. If the mirror database needs to be available for queries, consider other high-availability solutions such as transaction log shipping with standby restore mode, peer-to-peer transactional replication, or shared scalable databases.

  • Finally, if using synchronous mirroring with automatic failover, consider the ability of the client connections to reconnect to the mirror. If you're not using the SQL Server Native Client (SNAC), custom reconnection logic is required. If the client can't detect connection failure and reconnect, the value of mirroring as a high-availability solution is significantly diminished.

Additional information on the best practices covered in this chapter can be found online at http://www.sqlCrunch.com/HA.

One of the benefits of database mirroring in SQL Server 2008 is the ability of mirror partners to exchange pages to recover from corruption, a topic we'll focus on in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.103.204