Chapter 14: Fault Tolerance

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

C H A P T E R 14

Fault Tolerance

bWinInteractive Entertainment AG is a company that operates platforms for sports betting, casino games, and games of skill. The sports betting infrastructure supports about a million bets per day and hosts 100 Terabytes of information across 100 instances of SQL Server. During peak loads, SQL Server handles more than 30,000 database transactions per second. Imagine the amount of angry people there would be if the system went down after they placed bets. This would ruin the company’s credibility in the industry. Designing fault tolerance in your environment is critical no matter what database platform you support or industry you are employed in. There is no doubt that bWin’s solution involved highly available fault tolerant technologies. These technologies you will learn about in this chapter. If you are interested in reading more about the bWin case study, check out the following URL: www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000001470.

Uptime is defined as the amount of time a computer system goes without crashing. For mission-critical database applications, keeping the systems running without failure is critical. As DBAs, we are always planning for disaster, because over time, everything fails. For example, hard disks have mechanical parts and will eventually fail. Not too long ago, hard disk manufactures proudly wrote the MTBF (mean time between failures) on their hard drives to distinguish the more expensive drives. To mitigate the hard drive failure problem, we could add more hard drives and employ RAID techniques like disk mirroring or disk striping. If a particular disk failed, it could be swapped out, and our data would still be available.

The mechanical spin of a disk is not the only thing to be concerned with in a datacenter. Over time, network cards, routers, switches, and anything electronic could fail for lots of reasons. For example, a poor solder joint on a board could loosen up because of the natural heating and cooling of the environment. Although most datacenters are climate controlled, one more risk still trumps our fine mitigation work: tornados, floods, earthquakes and other natural disasters occur and could instantly destroy a datacenter. For this reason, datacenters are usually replicated out to remote sites geographically far from the main datacenter. This geographical separation causes more challenges for DBAs and system administrators. Providing fault tolerance with remote datacenters is possible through SQL Server by leveraging Windows Server Failover Clusters. These kinds of remote datacenter clusters are called geoclusters or stretch clusters and will be discussed in more detail later in this chapter.

Defining a Service Level Agreement (SLA)

If you randomly ask DBAs at a few companies how they manage their databases, chances are you will get a variety of answers in return. Some have the DBAs solely focus on the actual database and its performance; they are hands off of the backend storage. Some DBAs are very much involved in the SAN architecture and understand its limitations. Despite these slight variations on the role of DBA, one concept is common among everyone in the IT world—uptime. The IT organization commits to the business a guarantee for the business applications (and databases) to be available, or up. This commitment is usually referred to as a service level agreement (SLA).

The “Nines”

When referring to the amount of uptime, or downtime, people refer to the nines. For example, if your customer would only allow about 8 hours of downtime a year, this would be considered three nines (99.9%). In other words, this application has to be up 99.9% of 365 days. Table 14-1 shows a list of common nines and their corresponding allowable downtimes per year.

images

In general, the more nines you are asked to provide, the more expensive the solution will be. Most groups in my travels are three nines, because the cost from this point upward is substantial.

Other Metrics

When thinking about SLAs, there are other discussions to have and decisions to make. First, you should think through the current size of your databases, their expected growth patterns, and the workloads for these databases. The workload analysis should include how much transaction log is created during peak times. The Performance Data Collector feature within SQL Server can help you capture historical server performance. You can read more about this feature in Chapter 13.

Within the topic of uptime, you may hear two additional metrics used to satisfy SLAs. The first, recovery point objective (RPO), is how much data can be lost. In the database world, think of a solution where your disaster recovery plan is to perform log shipping to a remote server. If you log ship every 5 minutes and your RPO is 5 minutes, your log shipping will not satisfy this requirement. Remember that for log shipping, you must also include the time it takes to file copy the log to the remote server. Thus our RPO would have to be more than 5 minutes to be satisfied.

The next metric is recovery time objective (RTO), and it defines how much time is allowed to pass in order for a restore of the database in case of a complete failure. “Complete failure,” in this case, means recovering from a backup.

When databases go down, the outage can be planned or unplanned. Planned outages are usually known, and the end users expectations are set accordingly. These outages can occur because of deployment of operating system or database system patches, upgrades, or migrations. An unplanned downtime is a failure that occurs generally without warning. It can occur because of hardware or software failure or user error. Users can cause outages to database applications by accidently deleting data needed by the application to function properly. It is a best practice to grant users the fewest privileges possible. If a user needs to perform specific action that requires elevated privileges, consider writing the functionality within a stored procedure, executing the stored procedure as an administrator and just granting the user EXECUTE permission on that stored procedure.

Planning an effective recovery plan ahead of time eases the work necessary to recover from failure. SQL Server has a number of features that help you architect a highly available and fault-tolerant solution.

High Availability Features in SQL Server

Architecting a highly available SQL Server is not a simple click of a check box in SSMS. There are a number of different features and each one has its own pros and cons for a given situation. In the following sections, we will examine each of these major features.

Backup and Restore

Backup and Restore has been around since the first versions of SQL Server. As a DBA, being able to backup and restore your database is a key skill to master. From a high availability standpoint, database backups are usually the last line of defense against total data loss. It is critical that they are a part of your disaster recovery plan. This book devotes a chapter to each of these topics: Chapter 8 for backup and Chapter 9 for restore. Please refer to these chapters for an in-depth discussion of these topics.

Log Shipping

There are three types of database backups that you may incorporate into your disaster recovery plan: full, differential, and transaction logs. As you learned in Chapter 8, a full database backup is a point-in-time backup of all objects and data that reside in the database. A differential backup is a backup of the changes made to the database since the last differential or full database backup. In both of these backup types, it is important to note that, even though we are backing up the contents of the database within the data files, the information that is stored in the transaction log of the database is not backed up. There is a third backup type that is called transaction log backup. A transaction log backup does what its name implies; it backs up the transaction log of a particular database. Having a transaction log backup in addition to a full or differential allows us the most granular point-in-time restoration option available.

Log shipping is the automated process of continually backing up the transaction log from the primary server, copying the backup to a secondary or remote server, and restoring the database in either STANDBY or NORECOVERY mode. Using NORECOVERY puts the secondary servers in a state where no users can connect to that specific database. The STANDBY mode allows users connections, but these connections will be dropped when a new transaction log is applied to the database. You can configure this mode when you configure log shipping in SSMS.

Configuration

In this example we have two instances of SQL Server installed: ROB-DENALI-1 and ROB-DENALI-1INST2. We have a database called UsedCars that we will create and use to configure log shipping. First, let’s connect to the first instance and create our database.

To create the UsedCars database connect to the default instance via SSMS and open a new query editor window. Type the following code:

USE MASTER

GO

CREATE LOGIN BobLogin WITH PASSWORD='pass@word1'

GO

CREATE DATABASE UsedCars

GO

USE UsedCars

GO

CREATE USER Bob FOR LOGIN BobLogin

GO

CREATE SCHEMA Sales

AUTHORIZATION Bob

GO

CREATE SCHEMA Product

AUTHORIZATION Bob

GO

CREATE TABLE Product.Inventory

(car_id INT NOT NULL PRIMARY KEY,

car_make VARCHAR(50) NOT NULL,

car_model VARCHAR(50) NOT NULL,

car_year SMALLINT NOT NULL)

GO

CREATE TABLE Sales.Orders

(order_id INT NOT NULL PRIMARY KEY,

order_date DATETIME NOT NULL,

order_carsold INT REFERENCES Product.Inventory(car_id),

order_saleprice SMALLMONEY NOT NULL)

GO

INSERT INTO Product.Inventory VALUES (1,'Saab','9-3',1999),

(2,'Ford','Mustang',2003),(3,'Nissan','Pathfinder',2005)

GO

Note This is the same code used to create the UsedCars database in Chapter 4, so if you went through the example in that chapter, you may already have this database created.

To configure log shipping in SSMS, right-click the UsedCars database, and select Ship Transaction Logs from the Tasks context menu. This will launch the UsedCars Database Properties dialog with the Transaction Log Shipping panel open. This context menu is just a shortcut to the panel in the Database Properties dialog, shown in Figure 14-1.

Figure 14-1. Transaction Log Shipping panel in Database Properties dialog

When the “Enable this as a primary database in a log shipping configuration” check box is checked, the Backup Settings button will be enabled. There are not a lot of default settings, so to enable transaction log shipping, you will have to supply a lot of answers to the dialog. The Backup Settings dialog is shown in Figure 14-2.

Figure 14-2. Transaction Log Backup Settings dialog

This dialog requires the following information:

“Network path to backup folder”: This location needs to be a network share so that the secondary servers can connect and copy the transaction log backup. You need to make sure there are read permissions on this share for the SQL Server Agent account on the secondary machine. Alternatively, you can define a proxy account on the secondary server so that you do not have to use the SQL Server Agent account. For this particular configuration we are log shipping between two instances on the same machine and our SQL Server Agent account is running as the Network Service account. We do not have to worry about any additional permissions for this configuration. To create a share, you can use Windows Explorer or open a Command Shell and type NET SHARE Backup=C:ackup.

Local path to the backup folder: If the backup folder is on the same server as the primary server, type in the path. In our case, it’s C:Backup.

As soon as those two pieces of information are entered, the OK button is enabled. There are default settings for the other information in this dialog that are important to note. Backing up the transaction log, in most cases, doesn’t take much disk space. However, over time, a bunch of transaction log backups can take a lot of space. By default, a SQL Server Agent job will delete transaction log backup files older than 72 hours. Also, you will be alerted by default if, for any reason, a transaction log backup fails to happen within an hour.

The Schedule button will launch the schedule dialog of the SQL Server Agent. By default, the job will execute every 15 minutes.

Starting in SQL Server 2008, database backups can be compressed. The Transaction Log shipping feature leverages the same BACKUP DATABASE statement and can thus compress the transaction logs before they are written to the file share. The “Set backup compression” drop-down will allow you to either force compression or just do whatever is already defined at the server instance.

Clicking OK on this dialog will enable us to add secondary servers. We can add a secondary server by clicking the Add button under the “Secondary server and instances” grid control. Adding a secondary server launches the dialog shown in Figure 14-3.

Figure 14-3. The Secondary Database Settings dialog’s initialize panel

To add a secondary database, connect to the SQL Server instance by clicking the Connect button. This will launch a connection dialog. Once connected to the secondary server, we can enter a name for the database in the Secondary Server combo box.

The Secondary Database Settings dialog has three panels: Initialize Secondary Database, Copy Files, and Restore Transaction Log. In the Initialize Secondary Database panel, shown in Figure 14-3, you can tell SSMS to take a full backup now or use an existing backup. In order for Transaction Log Shipping to start, a backup of the database needs to exist on the secondary.

Transaction Log Shipping is a three-step process: backup, file copy, and restore. Details about the File Copy process are defined in the Copy Files panel shown in Figure 14-4.

Figure 14-4. Secondary Database Settings dialog’s Copy Files panel

In this panel, we define the destination folder where the secondary server will restore the transaction logs. We can also specify the time to automatically delete files and schedule the job that performs the file copy itself.

In the Restore Transaction Log panel, shown in Figure 14-5, we can specify how we want the secondary database to act with regard to clients.

Figure 14-5. Secondary Database Settings dialog’s Restore Transaction Log panel

If we specify “No recovery mode”, the database will not be available to clients on the secondary servers. If we select “Standby mode”, users can make active connections to the database on the secondary server.

We can make one more configuration decision regarding Transaction Log Shipping and that is if we want to have a monitor server. By default, SQL Server Agent job information is stored on the primary and secondary servers. A monitor server stores the same information in a central place for easy reporting. In the scenario where we have one secondary, it’s not that useful. However, if we have multiple secondaries and multiple transaction log shipping sessions, a monitor server is useful. To configure a monitor server, check the “Use a monitor server instance” check box, and click the Connect button to configure the monitor server. The monitor server can be any SQL Server server; it does not have to be a separate installation of SQL Server.

Now that we have answered all the questions, we can go ahead and start transaction log shipping by clicking the OK button.

A few tables and stored procedures provide status and configuration information on your log shipping configuration. More information can be found in the article, “Monitoring Log Shipping” in SQL Server Books Online located at http://msdn.microsoft.com/en-us/library/ms190224.aspx.

SQL Server also has an out-of-the-box report available to view the status of your transaction log shipping jobs. To access this report, select the Transaction Log Shipping Status report from the Standard Reports menu of the Reports menu of the SQL Server instance context menu. A sample report is shown in Figure 14-6.

Figure 14-6. Sample Transaction Log Shipping Status report showing an alert

Pros and Cons

One of the key advantages of log shipping is that you can have multiple secondary servers. Before SQL Server 2012 and AlwaysOn Availability Groups, there wasn’t an easy way to replicate across more than one secondary. The performance impact is relatively minor, because it’s only that of a database backup on the production server.

There are a few cons with transactional log shipping. Since we are shipping a single database, it does not include automatically all the server objects like SQL Server logins and SQL Server Agent jobs that may be related to the database being shipped. For applications that leverage multiple databases, this solution may not be the best in a failover situation. Also, there is no concept of automatic failover, since all we are really doing is creating some SQL Server Agent jobs to do a database backup, file copy, and database restore.

Replication

When the scope of the data that we want to replicate is almost the size of the database, we use features like log shipping or database mirroring to ensure we have two or more copies of the data. If what you are looking for is a subset of the data to be replicated, you could use one of the three types of replication: snapshot, merge, and transactional. In a high availability scenario, we could replicate only those tables and objects that are absolutely necessary for our applications to work. Replication usually isn’t the main solution for high availability, but it is important to have a basic understanding of this technology as it is widely used for other scenarios. For an in-depth review of replication, check out the Books Online article, “SQL Server Replication” located at http://msdn.microsoft.com/en-us/library/ms151198(v=SQL.110).aspx.

In a replication solution, there are three common components: a publisher, a distributor, and a subscriber. A publisher is the source of the replicated data, and it defines one or more publications. Each publication contains one or more articles. These articles define which particular data and objects such as stored procedures, views, and user-defined functions should be replicated. A subscriber uses a subscription that either pushes or pulls the source data. The distributor is a database instances that keeps track of the subscriptions for various publishers. In some cases, the subscriber gets its data directly from the publisher, and in other configurations, it gets its data from the distributor. In the case of a local distributor, a single distributor can act as both the publisher and the distributor. This is a common configuration. If the distributor is on a separate database server instances, it is known as a remote distributor.

Snapshot Replication

Snapshot replication is used when refreshing the entire dataset is easier than replicating portions of it. Consider the scenario where we have an online product catalog. The database that the web page uses to get the information might be getting this data from other database instance in the manufacturing group. Since the web pages will never update the product catalog, they can receive a fresh copy of the catalog when needed.

Snapshot replication is also used as an initial step in other forms of replication. Just as a full database backup is restored on the secondary before log shipping can begin, snapshot replication can be used as the initial dataset for transactional replication.

Merge Replication

Merge replication will send only the changes made to the data at scheduled intervals. Consider the scenario where we have sales people connecting to our corporate network periodically to update the Sales database. The orders that the sales people enter are unique, so updating these within the same database doesn’t lead to many conflicts. The new updates and inserts done by the sales people are merged in with the data created by the other sales people. If a conflict does occur, there are ways to detect and resolve conflicts automatically.

Transactional Replication

Out of the three core replication topologies, transactional replication is the most popular one to implement from a high-availability standpoint. Transactional replication replicates data from the publisher to the subscriber each time a transaction completes at the publisher.

Note If you consider transactional replication as part of your high availability story, it is important to note that there is no automatic failover of clients using any replication topology. Setup and troubleshooting of replication is more difficult than other technologies like database mirroring.

Database Mirroring

Database mirroring takes the concept of log shipping and builds it inside the SQL Server engine. Instead of periodically packaging up the transaction log and sending it to a secondary server, database mirroring opens a connection to the remote server and sends the transactions themselves. Since it’s built within the SQL Server engine, database mirroring can perform synchronous two-phase commits with a secondary server. Database mirroring also supports automatic failover via passing a failover partner parameter in the connection string.

Motivation and Benefits

Database mirroring resolves some of the downfalls of log shipping. The idea behind log shipping is to periodically send the transaction log out to secondary servers, so those servers have an exact copy of the primary database. Given the transaction log backup time—the time it takes to copy the log file to the secondary server and restore the backup on the secondary server—you can imagine that there is no way for us to have an exact replica on the secondary. This is one of the downfalls of log shipping; it is not possible to guarantee a transactionally consistent secondary server. In addition, it is not possible to perform an automatic failover. Upon a failure of the primary, someone would have to tell the secondary that it’s now the primary. In addition, applications that connect to a primary server will have to know where the secondary server is and reconnect the application to that secondary. It is out of these deficiencies with log shipping that database mirroring was born.

Note Database mirroring, similar to log shipping, provides protection at the database-level instance. The database that is the source of the mirror is the called the principal. The destination of the mirror is called the mirror. Together, this setup is called the database mirroring session. With database mirroring, you can only have one mirror in the database mirroring session. This is one of the more significant limitations of database mirroring. Some DBAs use log shipping when they want more than one remote copy of the.

Modes of Operation

Database mirroring has two modes of operation: high safety and high performance. You can think of high-safety mode as a synchronous mode and high-safety mode as asynchronous. In high-safety mode, transactions are committed on both the principal and mirrored servers, which decreases the overall performance but ensures that you will always have two copies of the same data. In high-performance mode, the mirror database makes the best effort to keep up with the transactions sent from the primary. There can be a lag in this configuration. However, the performance improvements with asynchronous communication will most likely outweigh the risk of the small data loss in the event of a total failure.

There is another mode of operation that is a variation of high-safety called high safety with automatic failover. This configuration is the same as high safety, except that there is another separate SQL Server instance called the witness server. The witness server’s job is to know if the principal database is offline. It will issue an automatic failover if it can not connect to the primary and it has a connection to the mirrored database. The SQL Server instance that is the witness server does not have to be dedicated. You can easily leverage an existing SQL Server instance for this role.

Configuring Database Mirroring

Similar to when you use log shipping, you must perform a full database backup on the primary, copy the backup to the mirrored server instance, and restore the database using the WITH NO RECOVERY option within the BACKUP DATABASE statement. Once the database is restored to the mirrored server, you can configure mirroring for the database by selecting Mirror from the Tasks menu or the specific database node in Object Explorer. For this example, we will mirror the UsedCars database from the default instance on ROB-DENALI-1 to the named instance, ROB-DENALI-1INST2. On our test machine, we created a folder C:ackupdm to hold the database backup. Next, we connected to the default instance, ROB-DENALI-1, and issued the following statement using the Query Editor in SSMS:

BACKUP DATABASE UsedCars FROM DISK='c:ackupdmUsedCars.bak'

If, at this point, the SQL Server instance we are mirroring to was on a remote server, we would copy the backup file to that server. In this example, the SQL Server named instance resides on the same physical machine, so we do not need to copy the backup file.

Connect to the INST2 named instance, ROB-DENALI-1INST2, using SSMS and restore the UsedCars database with the following statement. Before you execute the statement, make sure that the directory C:data exists. Create that directory if necessary.

RESTORE DATABASE UsedCars FROM DISK='c:ackupdmUsedCars.bak' 

WITH MOVE 'UsedCars' TO 'C:dataUsedCarsInst2.mdf',

MOVE 'UsedCars_log' TO 'C:dataUsedCarsInst2.ldf',

NORECOVERY;

If you refresh Object Explorer, you can see that the UsedCars database node is shown as “UsedCars (Restoring . . .)”. At this point, we can connect back to the default instance and select Mirror. This will launch the database properties dialog with the mirroring label preselected. This dialog is shown in Figure 14-7.

Figure 14-7. Mirroring panel of the Database Properties dialog

The first thing that needs to be done is to configure security. Once you’ve done that, the other sections including the server network address and operating mode will be enabled and available for editing.

Clicking the Configure Security button will launch the Configure Database Mirroring Security Wizard. The first page will ask you if you wish to include a witness server to enable the synchronous mode with automatic failover. In this example, we will configure asynchronous mirroring, so a witness is not needed and we will select No.

The next page in the wizard, shown in Figure 14-8, will define information about our primary instance such as the TCP/IP listener port that will be used.

Figure 14-8. Configure Principal Server Instance

The default listener port is 5022 and can be changed. Database mirroring uses a feature called endpoints within SQL Server. At a high level, endpoints define an entry point into SQL Server. An endpoint defines a protocol, such as shared memory, TCP/IP, or named pipes, and a payload, such a TDS. Once an endpoint is defined, you can lock down who specifically can access that endpoint. In database mirroring, an endpoint is created and only the SQL Server instances involved with the database mirroring session can connect to it. The endpoint name in the case of this example mirroring configuration is given a generic name of Mirroring. This can be changed to something more descriptive.

The next page in the wizard will configure the mirror server instance. Similar to the way you defined the Principal Server instance in Figure 14-8, you will define a listener port on that server instance and provide an endpoint name.

Since database mirroring will define endpoints on both the principal server and mirror server instance, the wizard will next ask you for the service accounts for these instances. This wizard page is shown in Figure 14-9.

Figure 14-9. Specifying service accounts wizard page

After setting the accounts, the wizard has enough information to create the mirror. After completing the wizard, you will be prompted with a dialog box asking if you wish to start the mirroring session. If you elect to do so, you will see the dialog indicated that the mirrors are being synchronized.

Alternatively, you could script this whole process. A good article on scripting a database mirroring creation is located at www.sqlservercentral.com/articles/Database+Mirroring/72009/.

At this point, the database mirroring session should be enabled and running. There is a database mirroring monitor that is available to see the health of your mirror. It can be accessed via the Launch Database Mirroring Monitor context menu command under the Tasks menu of the database container node in object explorer.

Note There are a few moving parts to setting up database mirroring. If you have been following along and run into problems, they are most likely caused by security settings. You should use a domain account for the principal and mirror service accounts. Also, if you are having connectivity issues, check out the SQL Server Books Online topic “Use Certificates for a Database Mirroring Endpoint” located at the following URL: http://msdn.microsoft.com/en-us/library/ms191477(v=SQL.110).aspx. And for more information on Database Mirroring in general, check out the Books Online section on database mirroring at the following URL: http://msdn.microsoft.com/en-us/library/ms365599.aspx.

AlwaysOn Failover Clustering

SQL Server 2012 introduces a new technology called AlwaysOn Failover Clustering. Log shipping and database mirroring work at the database instance level. With those technologies, you can fail over a specific database and have the other databases still online in the primary server. One of the biggest differences in those two technologies versus failover clustering is that failover clustering is a protection of the SQL Server instance itself. If a failure occurs in a SQL Server and its clustered, the whole instance will be failed over. SQL Server leverages Windows Server Failover Clustering to enable this capability, and the resulting new feature set is termed AlwaysOn Failover Clustering.

In the clustering world there are active and passive nodes. Active nodes are servers that have active connections and are doing work. Passive nodes are not handling any active user workload. On an active node, the SQL Server service is started, and there are active user connections. On a passive node, the SQL Server service is stopped, and the server is waiting for a failover. In a configuration where there are two servers with one being active and the other being passive, this is said to be an active/passive configuration. Upon a failure of the active node, the passive node will start the SQL Server instance and allow users to make connections. This is possible since both servers share a common disk. A common disk also means a single point of failure, so in most clustering designs, the disk architecture has redundancy and fault tolerance. This shared storage must be in the form of a storage area network (SAN), iSCSI targets or both.

The example where we have one active and one passive is known as a two-node cluster. SQL Server supports as many cluster nodes as Windows Server supports. In Windows Server 2008, SQL Server supports up to 16 nodes. Typically, most enterprises have between two and four nodes in a cluster. When a cluster has more than two nodes, the remaining nodes are always passive and failed over when the other passive nodes are not available.

You may be thinking that it would be a waste of electricity to keep multiple passive nodes running and essentially not doing any work. This is the case, and some IT departments end up creating active/active clusters. In this configuration, you may have server A with databases 1, 2, and 3 fail over to server B and server B databases 4, 5, and 6 failover to server A. The important thing to note is that in the event of a failure, servers A and B need to assume the workload of all the databases, so we need to make sure that the hardware can support this load.

Typically, in our scenario, servers A and B are located within the same datacenter. In geoclustering (sometimes called stretch clustering), your cluster server nodes can be spread across different subnets. This means that you could failover to another geographical place in the world. SQL Server 2012 supports geoclusters via Windows Server 2008 R2.

From a performance perspective, enabling clustering doesn’t impact SQL Server performance as much as it did in earlier versions. Prior to SQL Server 2012, tempdb had to be placed on the shared drive as well, which caused performance issues if your applications heavily used tempdb. Now, this is no longer the case, and even in a failover cluster configuration, tempdb can be located on a local drive to the node.

Failover clustering is one of the most popular high-availability features used within IT departments today. Proper setup clustering depends on a variety of factors including the shared disk infrastructure, networking configuration, and Windows Server versions. Diving in and walking through a failover clustering setup is beyond the scope of this book. However, if you are interested in learning more check out the SQL Server Books Online topic “Create a New SQL Server Failover Cluster” at the following URL: http://msdn.microsoft.com/en-us/library/ms179530(v=SQL.110).aspx.

AlwaysOn Availability Groups

The AlwaysOn Availability Group feature has evolved from database mirroring. While the plumbing is much different with Availability Groups, the idea is similar to database mirroring. With AlwaysOn Availability Groups, we can specify one or more databases to fail over as a group. Now, when a failover occurs or we manually issue a failover, all the databases defined within the Availability Group will be failed over together. This failover can occur to a failover partner known as an availability replica server, just like in database mirroring; however, unlike database mirroring, AlwaysOn Availability Groups allow you to have multiple availability replicas, up to four.

Availability replicas can be written to asynchronously or synchronously. The primary role availability replica is the host of the database and supports read and write queries. The secondary role availability replica supports active user connects and read-only queries. One of these availability replicas can be defined as the failover partner. A great user scenario for connecting to a secondary role availability replicas would be for reporting and database backups. Since these connections need to be read-only, if a user submits a write query, the query will simply fail.

Configuring an AlwaysOn Availability Group

Before you can configure an AlwaysOn Availability Group, there are a few prerequisites. First, the servers that host the SQL Server instance must be clustered together. Instead of using a witness server as with the database mirroring feature, AlwaysOn Availability Groups leverage the internode health and failure detection capabilities native within Windows Server failover cluster feature. One significant difference between setting up failover clustering for AlwaysOn Availability Groups and for a failover cluster is that AlwaysOn Availability Groups have no requirement of a shared disk. Thus, the setup involved with creating a cluster of Windows Servers is simple.

In the example to follow, we have four servers. The first is called DENALI-DC and it’s a domain controller of the consoto.lab domain. The second, third, and fourth are SQL-EAST, SQL-WEST, and SQL-SOUTH respectively. Our objective is to create an Availability Group of three databases, DatabaseA, DatabaseB, and DatabaseC with SQL-WEST as the primary role availability replica and SQL-EAST and SQL-SOUTH as a secondary role availability replicas.

There is a domain user called CONTOSOSQLService. The SQL Server instances for all three SQL Server servers are using this domain account for their SQL Server service account.

Creating a Windows Server Failover Cluster

The first step is creating a Windows Server failover cluster group with SQL-EAST, SQL-SOUTH, and SQL-WEST. This step doesn’t have anything to do with the SQL Server instance; rather we are creating the Windows Server failover group containing these three Windows servers.

To create a Windows server failover cluster, each Windows server needs to have the Failover Cluster role installed. You can install this via the Server Manager. Once this is role is installed, you can create a cluster by using the Create a Cluster wizard. This wizard can be launched in Server Manager by selecting Failover Cluster Manager under the Server Manager node and then clicking Create a Cluster hyperlink in the Actions pane.

The second page in the wizard is the Select Servers page. On this page, we add all the servers that are to be a part of this cluster—in our example, SQL-EAST, SQL-SOUTH, and SQL-WEST. This page is shown in Figure 14-10.

Figure 14-10. Select Server page in the Create Cluster Wizard

The next page in the wizard asks you about validating the cluster. Prior to Windows Server 2008, if you wanted to set up a failover cluster, you had to run it on specific hardware. All that changed, and you can set up a cluster on almost any hardware. You can skip this validation test and choose No.

A virtual IP is created for administering the cluster. This next page asks you for this virtual name and is shown in the Figure 14-11.

Figure 14-11. Access point for cluster administration

After the confirmation page, the wizard forms the cluster. Upon successful completion, the Summary shown in Figure 14-12 is displayed.

Figure 14-12. Successful cluster failover page

Configuring the SQL Server Instance

Now that we have all three Windows servers in a failover cluster, we need to enable the AlwaysOn Availability Groups feature within each SQL Server instance. To do this, we connect to each server: SQL-EAST, SQL-WEST, and SQL-SOUTH and launch the SQL Server Configuration Manager found under the Configuration Tools folder within Microsoft SQL Server 2012. When the SQL Server Configuration Manager launches, click the SQL Server Services node, then select the Properties context menu of the SQL Server service account. Click the AlwaysOn High Availability tab, and check the Enable AlwaysOn Availability Groups check box, as shown in Figure 14-13.

Figure 14-13. Enabled AlwaysOn Availability Groups check box

Restart the SQL Server service, and now, your SQL Server instance is ready for AlwaysOn Availability Groups.

Creating an Availability Group

Connect to the SQL-WEST instance using SSMS and create the sample databases as follows:

USE MASTER

GO

CREATE DATBASE [DatabaseA]

GO

CREATE DATBASE [DatabaseB]

GO

CREATE DATABASE [DatabaseC]

GO

Next, we need to take a full database backup of these databases. If we do not do this first, the Availability Group Wizard will not allow us to create the availability group with these databases. On the SQL-WEST server backup all three databases. For simplicity, we can create a folder, C:Backup, and issue the following statements within SSMS:

USE MASTER

GO

BACKUP DATABASE DatabaseA TO DISK='C:ackupDatabaseA.bak'

GO

BACKUP DATABASE DatabaseB TO DISK='C:ackupDatabaseB.bak'

GO

BACKUP DATABASE DatabaseC TO DISK='C:ackupDatabaseC.bak'

GO

Next, launch the New Availability Group wizard by selecting it from the context menu of the Availability Groups node within the Management node in Object Explorer. The first question the wizard will ask is for an Availability Group name. Type TestAG, and click Next.

The Select Databases page shown in Figure 14-14 is where we can select one or more databases to be part of the Availability Group. Select DatabaseA, DatabaseB, and DatabaseC, and click Next.

Figure 14-14. Select Databases page

The next page is where we define the replicas. This page is shown in Figure 14-15, and here, we can specify each secondary availability replica and whether or not the data will flow asynchronously (i.e., high safety) or synchronously (i.e., high performance). Click the Add Replica button, and add SQL-EAST and SQL-SOUTH. Next, change the Replica mode for SQL-SOUTH to “High performance” by clicking the Replica Mode drop-down for SQL-SOUTH. Notice that there are three options for Replica Mode: Automatic Failover, High performance, and High safety. You can only have two server instances involved with automatic failover in this example that would be SQL-WEST, the primary, and SQL-EAST, the secondary automatic failover partner.

Figure 14-15. Specify Replicas page

The fourth column, “Connection Mode in Secondary Role,” determines if you want give users the ability to connect directly to the secondary availability replica. “Disallow connections” is the default and does not allow any active connections to that database on the secondary server. Note that this is for the database itself, not for the SQL Server instance that is hosting the database. Users can still connect to the SQL Server instance; they just can’t use the database that is part of this availability group. The next option is “Allow all connections,” which, as the name implies, will allow users to connect to and use the database. However, if the user issues a write query, the query will fail. To mitigate the end users’ frustration with their application periodically erroring out, there is another option called, “Allow only read-intent connections.” When this option is selected, an extra parameter is required on the connection string to signal to the user who connects that he or she will be issuing read-only queries. This is mainly to force the users to know the expected behavior.

For this example, we will select “Allow all connections” for the SQL-SOUTH server instance, and click Next.

From a client perspective, it’s easy to enable automatic failover. While using database mirroring, we had to specify the failover partner in the connection string. In this case, you can define an availability group listener, which is essentially a virtual IP that users connect to. In the event of a failure, the users still connect to this virtual IP, but now, they are being served by the secondary instance. The clients do not have to change or add anything to the connection string to achieve this behavior. The next page in the wizard, Figure 14-16 allows you to specify this availability group listener.

Figure 14-16. Specify the Availabilty Group Listener dialog

The wizard can also handle the initial data synchronization for you. The next page in the wizard, shown in Figure 14-17, allows you to specify a network share on SQL-WEST to obtain the backup from or to skip the initial data synchronization altogether.

Figure 14-17. Data Synchronization page

The wizard will then perform a validation check to see if it can access the network share, if there is enough free disk space, and so on. Once this validation checks out, the wizard will provide you a summary of your selections. When you click Finish, the wizard will create your availability group.

If you go back to Object Explorer, you will see a new node under the Availability Groups node within the Management node of Object Explorer. By right-clicking and selecting Show Dashboard, you can see the overall health of the group including synchronization status and any issues that need to be addressed.

Summary

This chapter introduced the concept of SLA and the importance of fault tolerance within your environment. There are many different solutions to help with fault tolerance: some are hardware based, and some software. It would take many books to cover all the possibilities, and there are books on the market that deep dive into the topic of high availability. To read more about this topic, check out Allan Hirt’s books Pro SQL Server 2005 High Availability and Pro SQL Server 2008 Failover Clustering, available from Apress. Allan may be updating the high availability book for SQL Server 2012, but if this doesn’t happen, the SQL Server 2005 book is a great primer on the topic. In SQL Server 2012, there are key advancements to high availability including AlwaysOn Availability Groups and AlwaysOn Failover Clustering. Look for IT departments to heavily leverage these features for their SQL Server high availability solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 14: Fault Tolerance

Create new playlist

Sign In

Sign Up

C H A P T E R 14

Fault Tolerance

Defining a Service Level Agreement (SLA)

The “Nines”

Other Metrics

High Availability Features in SQL Server

Backup and Restore

Log Shipping

Configuration

Pros and Cons

Replication

Snapshot Replication

Merge Replication

Transactional Replication

Database Mirroring

Motivation and Benefits

Modes of Operation

Configuring Database Mirroring

AlwaysOn Failover Clustering

AlwaysOn Availability Groups

Configuring an AlwaysOn Availability Group

Creating a Windows Server Failover Cluster

Configuring the SQL Server Instance

Creating an Availability Group

Summary

Table of Contents for
Chapter 14: Fault Tolerance