CHAPTER 5

image

Implementing AlwaysOn Availability Groups

AlwaysOn Availability Groups provide a flexible option for achieving high availability, recovering from disasters, and scaling out read-only workloads. The technology synchronizes data at the database level, but health monitoring and quorum are provided by a Windows cluster.

This chapter demonstrates how to build and configure availability groups for both high availability (HA) and disaster recovery (DR). We also discuss aspects such as performance considerations and maintenance. We also discuss using availability groups to scale out read-only workloads.

Image Note  For the demonstrations in this chapter, we use a domain that contains a domain controller and a two-node cluster. These servers are in a site called Site1. A second site, called Site2, contains a third server. During the course of the chapter, we add this as a node to the cluster. The cluster has no shared storage for data and there is no AlwaysOn failover clustered instance. Each node has a stand-alone instance of SQL Server installed on it named ClusterNode1PrimaryReplica, ClusterNode2SyncHA, and ClusterNode3AsyncDR, respectively.

Implementing High Availability with AlwaysOn Availability Groups

Before implementing AlwaysOn Availability Groups, we first create three databases, which we will use during the demonstrations in this chapter. Two of the databases relate to the fictional application, App1, and the third database relates to the fictional application, App2. Each contains a single table, which we populate with data. Each database is configured with Recovery mode set to FULL. This is a hard requirement for a database to use AlwaysOn Availability Groups because data is synchronized via a log stream. The script in Listing 5-1 creates these databases.

Configuring SQL Server

The first step in configuring AlwaysOn Availability Groups is enabling this feature on the SQL Server service. To enable the feature from the GUI, we open SQL Server Configuration Manager, drill through SQL Server Services and select Properties from the context menu of the SQL Server service. When we do this, the service properties display and we navigate to the AlwaysOn High Availability tab, shown in Figure 5-1.

9781484217627_Fig05-01.jpg

Figure 5-1. The AlwaysOn High Avaiability tab

On this tab, we check the Enable AlwaysOn Availability Groups box and ensure that the cluster name displayed in the Windows Failover Cluster Name box is correct. We then need to restart the SQL Server service. Because AlwaysOn Availability Groups uses stand-alone instances, which are installed locally on each cluster node, as opposed to a failover clustered instance, which spans multiple nodes, we need to repeat these steps for each stand-alone instance hosted on the cluster.

We can also use PowerShell to enable AlwaysOn Availability Groups. To do this, we use the PowerShell command in Listing 5-2. The script assumes that CLUSTERNODE1 is the name of the server and that PRIMARYREPLICA is the name of the SQL Server instance.

The next step is to take a full backup of all databases that will be part of the availability group. We create separate availability groups for App1 and App2, respectively, so to create an availability group for App1, we need to back up the Chapter5App1Customers and Chapter5App1Sales databases. We do this by running the script in Listing 5-3.

Creating the Availability Group

You can create an availability group topology in SQL Server in several ways. It can be created manually, predominantly through dialog boxes, via T-SQL, or through a wizard. The following sections explore each of these options.

Using the New Availability Group Wizard

When the backups complete successfully, we invoke the New Availability Group wizard by drilling through AlwaysOn High Availability in Object Explorer and selecting the New Availability Group wizard from the context menu of the Availability Groups folder. The Introduction page of the wizard, displayed in Figure 5-2, now displays, giving us an overview of the steps that we need to undertake.

9781484217627_Fig05-02.jpg

Figure 5-2. The Introduction page

On the Specify Name page (see Figure 5-3), we are prompted to enter a name for our availability group.

9781484217627_Fig05-03.jpg

Figure 5-3. The Specify Name page

On the Select Databases page, we are prompted to select the database(s) that we wish to participate in the availability group, as illustrated in Figure 5-4. On this screen, notice that we cannot select the Chapter5App2Customers database, because we have not yet taken a full backup of the database.

9781484217627_Fig05-04.jpg

Figure 5-4. The Select Database page

The Specify Replicas page consists of four tabs. We use the first tab, Replicas, to add the secondary replicas to the topology. Checking the Synchronous Commit option causes data to be committed on the secondary replica before it is committed on the primary replica. (This is also referred to as hardening the log on the secondary before the primary.) This means that, in the event of a failover, data loss is not possible, meaning that we can meet an SLA (service level agreement) with an RPO (recovery point objective) of 0 (zero). It also means that there is a performance impediment, however. If we choose not to check the option for Synchronous Commit, then the replica operates in Asynchronous Commit mode. This means that data is committed on the primary replica before being committed on the secondary replica. This stops us from suffering a performance impediment, but it also means that, in the event of failover, the RPO is nondeterministic. Performance considerations for synchronous replicas are discussed later in this chapter.

When we check the Automatic Failover option, the Synchronous Commit option is also selected automatically if we have not already selected it. This is because automatic failover is only possible in Synchronous Commit mode. We can set the Readable Secondary drop-down to No, Yes, or Read-intent. When we set it to No, the database is not accessible on replicas that are in a secondary role. When we set it to read-intent, the Availability Group Listener is able to redirect read-only workloads to this secondary replica, but only if the application has specified Application Intent=Read-only in the connection string. Setting it to Yes enables the listener to redirect read-only traffic, regardless of whether the Application Intent parameter is present in the application’s connection string. Although we can change the value of Readable Secondary through the GUI while at the same time configuring a replica for automatic failover without error, this is simply a quirk of the wizard. In fact, the replica is not accessible, since active secondaries are not supported when configured for automatic failover. The Replicas tab is illustrated in Figure 5-5.

9781484217627_Fig05-05.jpg

Figure 5-5. The Replicas tab

Image Note  Using secondary replicas for read-only workloads is discussed in more depth in the Adding AlwaysOn Readable Secondary Replicas section of this chapter.

On the Endpoints tab of the Specify Replicas page, illustrated in Figure 5-6, we specify the port number for each endpoint. The default port is 5022, but we can specify a different port if we need to. On this tab, we also specify if data should be encrypted when it is sent between the endpoints. It is usually a good idea to check this option, and if we do, then AES (Advanced Encryption Standard) is used as the encryption algorithm.

9781484217627_Fig05-06.jpg

Figure 5-6. The Endpoints tab

Optionally, you can also change the name of the endpoint that is created. Because only one database mirroring endpoint is allowed per instance, however, and because the default name is fairly descriptive, there is not always a reason to change it. Some DBAs choose to rename it to include the name of the instance, since this can simplify the management of multiple servers. This is a good idea if your enterprise has many availability group clusters.

The service account each instance uses is displayed for informational purposes. It simplifies security administration if you ensure that the same service account is used by both instances. If you fail to do this, you will need to grant each instance permissions to each service account. This means that instead of reducing the security footprint of each service account by using it for one instance only, you simply push the footprint up to the SQL Server level instead of the Operating System level.

The endpoint URL specifies the URL of the endpoint that availability groups will use to communicate. The format of the URL is [Transport Protocol]://[Path]:[Port]. The transport protocol for a database mirroring endpoint is always TCP (Transmission Control Protocol). The path can either be the fully qualified domain name (FQDN) of the server, the server name on its own, or an IP address, which is unique across the network. I recommend using the FQDN of the server, because this is always guaranteed to work. It is also the default value populated. The port should match the port number that you specify for the endpoint.

Image Note  Availability groups communicate with a database mirroring endpoint. Although database mirroring is deprecated, the endpoints are not.

On the Backup Preferences tab (see Figure 5-7), we can specify the replica on which automated backups will be taken. One of the big advantages of AlwaysOn Availability Groups is that when you use them, you can scale out maintenance tasks, such as backups, to secondary servers. Therefore, automated backups can seamlessly be directed to active secondaries. The possible options are Prefer Secondary, Secondary Only, Primary, or Any Replica. It is also possible to set priorities for each replica. When determining which replica to run the backup job against, SQL Server evaluates the backup priorities of each node and is more likely to choose the replica with the highest priority.

9781484217627_Fig05-07.jpg

Figure 5-7. The Backup Preferences tab

Although the advantages of reducing IO on the primary replica are obvious, I, somewhat controversially, recommend against scaling automated backups to secondary replicas in many cases. This is especially the case when RTO (recovery time objective) is a priority for the application because of operational supportability issues. Imagine a scenario in which backups are being taken against a secondary replica and a user calls to say that they have accidently deleted all data from a critical table. You now need to restore a copy of the database and repopulate the table. The backup files, however, sit on the secondary replica. As a result, you need to copy the backup files over to the primary replica before you can begin to restore the database (or perform the restore over the network). This instantly increases your RTO.

Also, when configured to allow backups against multiple servers, SQL Server still only maintains the backup history on the instance where the backup was taken. This means that you may be scrambling between servers, trying to retrieve all of your backup files, not knowing where each one resides. This becomes even worse if one of the servers has a complete system outage. You can find yourself in a scenario in which you have a broken log chain.

The workaround for most of the issues that I just mentioned is to use a share on a file server and configure each instance to back up to the same share. The problem with this, however, is that by setting things up in this manner, you are now sending all of your backups across the network rather than backing them up locally. This can increase the duration of your backups as well as increase network traffic.

On the Listener tab, shown in Figure 5-8, we choose if we want to create an availability group listener or if we want to defer this task until later. If we choose to create the listener, then we need to specify the listener’s name, the port that it should listen on, and the IP address(es) that it should use. We specify one address for each subnet, in multi-subnet clusters. The details provided here are used to create the client access point resource in the availability group’s cluster role. You may notice that we have specified port 1433 for the listener, although our instance is also running on port 1433. This is a valid configuration, because the listener is configured on a different IP address than the SQL Server instance. It is also not mandatory to use the same port number, but it can be beneficial, if you are implementing AlwaysOn Availability Groups on an existing instance because applications that specify the port number to connect may need fewer application changes. Remember that the server name will still be different, however, because applications will be connecting to the virtual name of the listener, as opposed to the name of the physical serverinstance. In our example, applications connect to APP1LISTENPRIMARYREPLICA instead of CLUSTERNODE1PRIMARYREPLICA. Although connections via CLUSTERNODE1 are still permitted, they do not benefit from high availability or scale our reporting.

9781484217627_Fig05-08.jpg

Figure 5-8. The Listener tab

Image Tip  If you do not have Create Computer Objects permission within the OU, then the listener’s VCO (virtual computer object) must be prestaged in AD and you must be assigned Full Control permissions on the object.

On the Select Initial Data Synchronization screen, shown in Figure 5-9, we choose how the initial data synchronization of the replicas is performed. If you choose Full, then each database that participates in the availability group is subject to a full backup, followed by a log backup. The backup files are backed up to a share, which you specify, before they are restored to the secondary servers. After the restore is complete, data synchronization, via log stream, commences.

9781484217627_Fig05-09.jpg

Figure 5-9. The Select Data Synchronization page

If you have already backed up your databases and restored them onto the secondaries, then you can select the Join Only option. This starts the data synchronization, via log stream, on the databases within the availability group. Selecting Skip Initial Data Synchronization allows you to back up and restore the databases yourself after you complete the setup.

Image Tip  If your availability group will contain many databases, then it may be best to perform the backup/restore yourself. This is because the inbuilt utility will perform the actions sequentially, and therefore, it may take a long time to complete.

On the Validation page, rules that may cause the setup to fail are checked, as illustrated in Figure 5-10. If any of the results come back as Failed, then you need to resolve them before you attempt to continue.

9781484217627_Fig05-10.jpg

Figure 5-10. The Validation page

Once validation tests are complete and we move to the Summary page, we are presented with a list of the tasks that are to be carried out during the setup. As setup progresses, the results of each configuration task display on the Results page. If any errors occur on this page, be sure to investigated them, but this does not necessarily mean that the entire availability group needs to be reconfigured. For example, if the creation of the availability group listener fails because the VCO had not been prestaged in AD, then you can re-create the listener without needing to re-create the entire availability group.

As an alternative to using the New Availability Group wizard, you can perform the configuration of the availability group using the New Availability Group dialog box, followed by the Add Listener dialog box. This method of creating an availability group is examined later in this chapter.

Scripting the Availability Group

We can also script the activity by using the script in Listing 5-4. This script connects to both instances within the cluster, meaning that it can only be run in SQLCMD mode. First, the script creates a login for the service account on each instance. It then creates the TCP endpoint, assigns the connect permission to the service account, and starts the health trace for AlwaysOn Availability Groups (which we discuss later in this chapter). The script then creates the availability group on the primary and joins the secondary to the group. Next, we perform a full and log backup and a restore of each database that will participate in the availability group before we add the databases to the group. Note that the databases are backed up, restored, and added to the group in a serial manner. If you have many databases, then you may want to parallelize this process.

Creating the availability group via T-SQL gives you the most flexibility in terms of configuration. Table 5-1 contains a complete list of arguments, along with their explanation.

Table 5-1. The CREATE AVAIABILITY GROUP Arguments

Argument

Description

Acceptable Values

AUTOMATED_BACKUP_PREFERENCE

Defines where backups run from automated jobs should be taken.

PRIMARY

SECONDARY_ONLY

SECONDARY

NONE

FAILURE_CONDITION_LEVEL

Specifies how sensitive the failover will be. Further details in Table 5-2.

1 through 5

HEALTH_CHECK_TIMEOUT

Configures the amount of time, in milliseconds, that SQL Server has to return health check information to the cluster before the cluster assumes that the instance is not responding, which triggers a failover when FAILOVER_MODE is set to AUTOMATIC.

15000ms through 4294967295ms

DATABASE

A comma-separated list of databases that will join the availability group.

REPLICA ON

A comma-separated list of serverinstance names that will be replicas within the group. The following arguments in this table form the WITH clause of the REPLICA ON argument.

ENDPOINT_URL

The URL of the TCP endpoint that the replica will use to communicate.

AVAILABILITY_MODE

Determines if the replica operates in synchronous or asynchronous mode.

SYNCHRONOUS_COMMIT

ASYNCHRONOUS_COMMIT

FAILOVER_MODE

When the AVAILABILITY_MODE is set to synchronous, determines if automatic failover should be allowed.

AUTOMATIC

MANUAL

BACKUP_PRIORITY

Gives the replica a weight when SQL Server is deciding where an automated backup job should run.

0 through 100

SECONDARY_ROLE

Specifies properties that only apply to the replica when it is in a secondary role. ALLOW_CONNECTIONS specifies if the replica is readable, and if so, by all read_only connections or only those that specify read-intent in the connection string. READ_ONLY_ROUTING_URL specifies the URL for applications to connect to it, for read only operations, in the format: TCP://ServerName:Port.

-

PRIMARY_ROLE

Specifies properties that only apply to the replica when it is in the primary role. ALLOW_CONNECTIONS can be configured as All to allow any connection, or Read_Write, to disallow read-only connections. READ_ONLY_ROUTING_LIST is a comma-separated list of serverinstance names that have been configured as read-only replicas.

-

SESSION_TIMEOUT

Specifies how long replicas can survive without receiving a ping before they enter the DISCONNECTED state.

5 to 2147483647 seconds

The FAILOVER_CONDITION_LEVEL argument determines the group’s sensitivity to failover. Table 5-2 provides a description of each of the five levels.

Table 5-2. The FAILOVER_CONDITION_LEVEL Arguement

Level

Failover Triggered By

1

Instance down.

AOAG lease expires.

2

Conditions of level 1 plus:

HEALTH_CHECK_TIMEOUT is exceeded.

The replica has a state of FAILED.

3 (Default)

Conditions of level 2 plus:

SQL Server experiences critical internal errors.

4

Conditions of level 3 plus:

SQL Server experiences moderate internal errors.

5

Failover initialed on any qualifying condition.

Using the New Availability Group Dialog Box

Now that we have successfully created our first availability group, let’s create a second availability group for App2. This time, we use the New Availability Group and Add Listener Dialog boxes. We begin this process by backing up the Chapter5App2Customers database. Just like when we created the App1 availability group, the databases are not selectable until we perform the backup. Unlike when we used the wizard, however, we have no way to make SQL Server perform the initial database synchronization for us. Therefore, we back up the database to the share that we created during the previous demonstration and then restore the backup, along with a transaction log backup, to the secondary instance. We do this by using the script in Listing 5-5, which must be run in SQLCMD mode for it to work. This is because it connects to both instances.

If we had not already created an availability group, then our next job would be to create a TCP endpoint so the instances could communicate. We would then need to create a login for the service account on each instance and grant it the connect permissions on the endpoints. Because we can only ever have one database mirroring endpoint per instance, however, we are not required to create a new one, and obviously we have no reason to grant the service account additional privileges. Therefore, we continue by creating the availability group. To do this, we drill through AlwaysOn High Availability in Object Explorer and select New Availability Group from the context menu of availability groups.

This causes the General tab of the New Availability Group dialog box to display, as illustrated in Figure 5-11. On this screen, we type the name of the availability group in the first field. Then we click the Add button under the Availability Databases window before we type the name of the database that we wish to add to the group. We then need to click the Add button under the Availability Replicas window before we type the serverinstance name of the secondary replica in the new row.

9781484217627_Fig05-11.jpg

Figure 5-11. The New Availability Group dialog box

Now we can begin to set the replica properties. We discussed the Role, Availability Mode, Failover Mode, Readable Secondary, and Endpoint URL properties when we created the App1 availability group. The Connection In Primary Role property defines what connections can be made to the replica if the replica is in the primary role. You can configure this as either Allow All Connections, or allow Read/Write connections. When Read/Write is specified, applications using the Application Intent = Read only parameter in their connection string will not be able to connect to the replica.

The Session Timeout property sets how long the replicas can go without receiving a ping from one another before they enter the DISCONNECTED state and the session ends. Although it is possible to set this value to as low as 5 seconds, it is usually a good idea to keep the setting at or above 10 seconds, otherwise you run the risk of a false positive response, resulting in unnecessary failover. If a replica times out, it needs to be resynchronized, since transactions on the primary will no longer wait for the secondary, even if the secondary is running in Synchronous Commit mode.

Image Note  You may have noticed that we have configured the replica in Asynchronous Commit mode. This is for the benefit of a later demonstration. For HA, we would always configure Synchronous Commit mode, since otherwise, automatic failover is not possible.

On the Backup Preferences tab of the dialog box, we define the preferred replica to use for automated backup jobs, as shown in Figure 5-12. Just like when using the wizard, we can specify Primary, or we can choose between enforcing and preferring backups to occur on a secondary replica. We can also configure a weight, between 0 and 100 for each replica, and use the Exclude Replica check box to avoid backups being taken on a specific node.

9781484217627_Fig05-12.jpg

Figure 5-12. The Backup Preferences tab

Once we have created the availability group, we need to create the availability group listener. To do this, we select New Listener from the context menu of the App2 availability group, which should now be visible in Object Explorer. This invokes the New Availability Group Listener dialog box, which can be seen in Figure 5-13.

9781484217627_Fig05-13.jpg

Figure 5-13. The New Availability Group Listener dialog box

In this dialog box, we start by entering the virtual name for the listener. We then define the port that it will listen on and the IP address that will be assigned to it. We are able to use the same port for both of the listeners, as well as the SQL Server instance, because all three use different IP addresses.

Performance Considerations for Synchronous Commit Mode

Unlike traditional clustering, Availability Group topology does not have any shared disk resources. Therefore, data must be replicated on two devices, which of course, has an overhead. This overhead varies depending on various aspects of your environment, such as network latency and disk performance, as well as the application profile. However, the script in Listing 5-6 runs some write-intensive tests against the Chapter5App2Customers database (which is in Asynchronous Commit mode) and then against the Chapter5App1Customers database (which is Synchronous Commit mode). This indicates the overhead that you can expect to witness.

Image Tip  It is important to remember that there is no overhead on read performance. Also, despite the overhead associated with writes, some of this is offset by distributing read-only workloads if you implement readable secondary replicas.

The relevant parts of the results of this query are displayed in Listing 5-7. You can see that the index rebuild was almost three times slower when the Availability Group was operating in Synchronous Commit mode, the insert was over six times slower, and the delete was also marginally slower. In productions environments in which I have implemented availability groups in Synchronous Commit mode, they have generally been around two times slower than in Asynchronous Commit mode.

Because of the performance challenges associated with Synchronous Commit mode, many DBAs decide to implement high availability and disaster recovery by using a three-node cluster, with two nodes in the primary data center and one node in the DR data center. Instead of having two synchronous replicas within the primary data center, however, they stretch the primary replica across a failover clustered instance and configure the cluster to only be able to host the instance on these two nodes, and not on the third node in the DR data center. This is important, because it means that we don’t need to implement SAN replication between the data centers. The DR node is synchronized using availability groups in Asynchronous Commit mode. If you combine an AlwaysOn failover clustered instance with AlwaysOn Availability Groups in this way, then automatic failover is not supported between the clustered instance and the replica. It can only be configured for manual failover. There is also no need for availability groups to fail over between the two nodes hosting the clustered instance, because this failover is managed by the cluster service. This configuration can prove to be a highly powerful and flexible way to achieve your continuity requirements.

Implementing Disaster Recovery with Availability Group

Now that we have successfully implemented high availability for the Chapter5App1Customers and Chapter5App1Sales databases through the App1 availability group, we need to implement disaster recovery for these databases. To do this, we first need to build out a new server in our second site and install a stand-alone instance of SQL Server. Because the cluster now spans two sites, we need to reconfigure it as a multi-subnet cluster. We also need to reconfigure the quorum model to remove its dependency on the shared storage, which we currently have for the quorum. Once this is complete, we are able to add the instance on the new node to our availability group. The following sections assume that you have already built out a third server with a SQL Server instance called CLUSTERNODE3ASYNCDR, and they demonstrate how to reconfigure the cluster as well as the availability group.

Configuring the Cluster

We need to perform several cluster configuration steps before we begin to alter our availability group. These include adding the new node, reconfiguring the quorum, and adding a new IP to the cluster’s client access point.

Adding a Node

The first task in adding DR capability to our availability group is to add the third node to the cluster. To do this, we select Add Node from the context menu of nodes in Failover Cluster Manager. This causes the Add Node Wizard to be invoked. After passing through the Before You Begin page of this wizard, you are presented with the Select Servers page, which is illustrated in Figure 5-14. On this page, you need to enter the server name of the node that you plan to add to the cluster.

9781484217627_Fig05-14.jpg

Figure 5-14. The Select Servers page

On the Validation Warning page, you are invited to run the Cluster Validation Wizard. You should always run this wizard in a production environment when making changes of this nature; otherwise, you will not be able to receive support from Microsoft for the cluster. Details of running the Cluster Validation wizard can be found in Chapter 3. Running the Cluster Validation wizard in our scenario is likely to throw up some warnings, which are detailed in Table 5-3.

Table 5-3. Cluster Validation Warnings

Warning

Reason

Resolution

This resource does not have all the nodes of the cluster listed as Possible Owners. The clustered role that this resource is a member of will not be able to start on any node that is not listed as a Possible Owner.

This warning has been displayed because we have not yet configured our availability group to use the new node.

Configuring the availability group to use the new node is discussed later in this chapter.

The RegisterAllProvidersIP property for network name ’Name: App1Listen’ is set to 1. For the current cluster configuration this value should be set to 0.

Setting the RegisterAllProvidersIP to 1 will cause all IP addresses to be registered, regardless of whether they are online or not. When we created the Availability Group Listener through SSMS, this setting was automatically configured to allow clients to fail over faster, and this warning should always be ignored. If we had created the Listener through Failover Cluster Manager, the property would have been set to 0 by default.

No resolution is required, but RegisterAllProvidersIP is discussed in more detail later in this chapter.

On the Confirmation page, we are given a summary of the tasks that will be performed. On this page, we deselect the option to add eligible storage, since one of our aims is to remove the dependency on shared storage. The Confirmation page is displayed in Figure 5-15.

9781484217627_Fig05-15.jpg

Figure 5-15. Confirmation page

On the Configure The Cluster page, the progress on the tasks displays until it is complete. The Summary page then displays, giving an overview of the actions and their success.

Modifying the Quorum

Our next step in configuring the cluster will be to modify the quorum. As mentioned earlier, we would like to remove our current dependency on shared storage. Therefore we need to make a choice. Since we now have three nodes in the cluster, one possibility is to remove the disk witness and form a node majority quorum. The issue with this is that one of our nodes is in a different location. Therefore, if we lose network connectivity between the two sites for an extended period, then we have no fault tolerance in our primary site. If one of the nodes goes down, we lose quorum and the cluster goes offline. On the other hand, if we have an additional witness in the primary location, then we are not maintaining best practice, since there are an even number of votes. Again, if we lose one voting member, we lose resilience.

Therefore, the approach that we take is to replace the disk witness with a file share witness, thus removing the shared disk dependency. We then remove the vote from the node in the DR site. This means that we have three voting members of the quorum, and all of them are within the same site. This mitigates the risk of an inter-site network issue causing loss of redundancy in our HA solution.

In order to invoke the Configure Cluster Quorum Wizard, we select Configure Cluster Quorum from the More Actions submenu within the context menu of our cluster in Failover Cluster Manager. After moving through the Before You Begin page of this wizard, you are asked to select the configuration that you wish to make on the Select Quorum Configuration Option page, which is shown in Figure 5-16. We select the Advanced Quorum Configuration option.

9781484217627_Fig05-16.jpg

Figure 5-16. The Select Quorum Configuration Option page

On the Select Voting Configuration page, we choose to select nodes and remove the vote from CLUSTERNODE3. This is demonstrated in Figure 5-17.

9781484217627_Fig05-17.jpg

Figure 5-17. The Select Voting Configuration page

On the Select Quorum Witness page, we choose the Configure A File Share Witness option, as shown in Figure 5-18.

9781484217627_Fig05-18.jpg

Figure 5-18. The Select Quorum Witness page

On the Configure File Share Witness page, which is illustrated in Figure 5-19, we enter the UNC of the share that we will use for the quorum. This file share must reside outside of the cluster and must be an SMB file share on a machine running Windows Server.

9781484217627_Fig05-19.jpg

Figure 5-19. The Configure File Share Witness page

Image Tip  Although many non-Windows–based NAS (network-attached storage) devices have support for SMB 3, I have experienced real-world implementations of a file share quorum on a NAS device work only intermittently, without resolution by either vendor.

Image Caution  Remember that using a file share witness, with only two other voting nodes, can lead to a partition-in-time scenario. For more information, please refer to Chapter 3.

On the Confirmation page of the wizard, you are given a summary of the configuration changes that will be made, as shown in Figure 5-20.

9781484217627_Fig05-20.jpg

Figure 5-20. The Confirmation page

On the Configure Cluster Quorum Settings page, a progress bar displays. Once the configuration is complete, the Summary page appears. This page provides a summary of the configuration changes and a link to the report.

Adding an IP Address

Our next task is to add a second IP address to the cluster’s client access point. We do not add an extra IP address for the Availability Group Listener yet. We perform this task in the “Configuring the Availability Group” section later in this chapter.

In order to add the second IP address, we select Properties from the context menu of Server Name in the Core Cluster Resources window of Failover Cluster Manager. On the General tab of the Cluster Properties dialog box, we add the IP address for administrative clients, following failover to DR, as illustrated in Figure 5-21.

9781484217627_Fig05-21.jpg

Figure 5-21. The General tab

When we apply the change, we receive a warning saying that administrative clients will temporarily be disconnected from the cluster. This does not include any clients connected to our availability group. If we choose to proceed, we then navigate to the Dependencies tab of the dialog box and ensure that an OR dependency has been created between our two IP addresses, as shown in Figure 5-22.

9781484217627_Fig05-22.jpg

Figure 5-22. The Dependencies tab

After this process is complete, the second IP address resource in the Cluster Core Resources group shows up as offline. This is normal. In the event of failover to the server in the second subnet, this IP address comes online, and the IP address of the subnet in the primary site goes offline. This is why the OR dependency (as opposed to an AND dependency) is critical. Without it, the Server Name resource could never be online.

Configuring the Availability Group

To configure the availability group, we first have to add the new node as a replica and configure its properties. We then add a new IP address to our listener for the second subnet. Finally, we look at improving the connection times for clients.

Adding and Configuring a Replica

In SQL Server Management Studio (SSMS), on the primary replica, we drill through Availability Groups | App1 and select Add Replica from the context menu of the Availability Replicas node. This causes the Add Replica To Availability Group wizard to be displayed. After passing through the Introduction page of the wizard, you see the Connect To Replicas page, as displayed in Figure 5-23. On this page, you are invited to connect to the other replicas in the availability group.

9781484217627_Fig05-23.jpg

Figure 5-23. The Connect To Replicas page

On the Replicas tab of the Specify Replicas page, shown in Figure 5-24, we first use the Add Replica button to connect to the DR instance. After we have connected to the new replica, we specify the properties for that replica. In this case, we leave them as-is because it will be a DR replica. Therefore, we want it to be asynchronous and we do not want it to be readable.

9781484217627_Fig05-24.jpg

Figure 5-24. The Replicas tab

On the Endpoints tab, shown in Figure 5-25, we ensure that the default settings are correct and acceptable.

9781484217627_Fig05-25.jpg

Figure 5-25. The Endpoints tab

On the Backup Preferences tab, the option for specifying the preferred backup replica is read-only. We are, however, able to specifically exclude our new replica as a candidate for backups or change its backup priority. This page is displayed in Figure 5-26.

9781484217627_Fig05-26.jpg

Figure 5-26. The Backup Preferences tab

On the Listener tab, illustrated in Figure 5-27, we can decide if we will create a new listener. This is a strange option, since SQL Server only allows us to create a single listener for an availability group, and we already have one. Therefore, we leave the default choice of Do Not Create An Availability Group Listener selected. It is possible to create a second listener, directly from Failover Cluster Manager, but you would only want a second listener for the same availability group in very rare, special cases, which we discuss later in this chapter.

9781484217627_Fig05-27.jpg

Figure 5-27. The Listener tab

On the Select Data Synchronization page, we choose how we want to perform the initial synchronization of the replica. The options are the same as they were when we created the availability group, except that the file share will be prepopulated, assuming that we choose the Full synchronization when creating the availability group. This screen is shown in Figure 5-28.

9781484217627_Fig05-28.jpg

Figure 5-28. The Select Data Synchronization page

On the Validation page, which is illustrated in Figure 5-29, we should review any warnings or errors and resolve them before continuing.

9781484217627_Fig05-29.jpg

Figure 5-29. The Validation page

On the Summary page, displayed in Figure 5-30, we are presented with a summary of the configurations that will be carried out.

9781484217627_Fig05-30.jpg

Figure 5-30. The Summary page

After the reconfiguration completes, our new replica is added to the cluster. We should then review the results and respond to any warnings or errors. We could also have used T-SQL to add the replica to the availability group. The script in Listing 5-8 performs the same actions just demonstrated. You must run this script in SQLCMD Mode since it connects to multiple instances.

Add an IP Address

Even though the replica has been added to the availability group and we are able to fail over to this replica, our clients are still not able to connect to it in the DR site using the Availability Group Listener. This is because we need to add an IP address resource, which resides in the second subnet. To do this, we can select Properties from the context menu of App1Listen in Object Explorer, which causes the Availability Group Listener Properties dialog box to be displayed, as in Figure 5-31. Here, we add the listener’s second IP address.

9781484217627_Fig05-31.jpg

Figure 5-31. The Availability Group Listener Properties

We can also achieve this through T-SQL by running the script in Listing 5-9.

SQL Server now adds the IP address as a resource in the App1 role, and also configures the OR dependency on the Name resource. You can view this by running the dependency report against the Name resource in Failover Cluster Manager, as illustrated in Figure 5-32.

9781484217627_Fig05-32.jpg

Figure 5-32. The dependency report

Improving Connection Times

Clients using .NET 4 or higher are able to specify the new MultiSubnetFailover=True property in their connecting strings when connecting to an AlwaysOn Availability Group. This improves connection times by retrying TCP connections more aggressively. If clients are using older versions of .NET, however, then there is a high risk of their connections timing out.

There are two workarounds for this issue. The first is to set the RegisterAllProvidersIP property to 0. This is the recommended approach, but the problem with it is that failover to the DR site can take up to 15 minutes. This is because the IP address resource for the second subnet is offline until failover occurs. It can then take up to 15 minutes for the PTR record to be published. In order to reduce this risk, it is recommended that you also lower the HostRecordTTL. This property defines how often the resource records for the cluster name are published.

The script in Listing 5-10 demonstrates how to disable RegisterAllProvidersIP and then reduce the HostRecordTTL to 300 seconds.

The alternative workaround is to simply increase the time-out value for connections to 30 seconds. However, this solution accepts that a large volume of connections will take up to 30 seconds. This may not be acceptable to the business.

Adding AlwaysOn Readable Secondary Replicas

It can be very useful to add readable secondary replicas to an AlwaysOn Availability Group topology in order to implement vertically scaled reporting. When you use this strategy, the databases are kept synchronized, with variable, but typically low latency, using log streaming. The additional advantage of readable secondary replicas from SQL Server 2014 onward is that they stay online, even if the primary replica is offline. The limitation here, however, is that users must connect directly to the instance, as opposed to the Availability Group Listener.

You can further improve read performance in readable secondary replicas by using temporary statistics, which you can also use to optimize read-only workloads. Also, snapshot isolation is also used exclusively on readable secondary replicas, even if other isolation levels or locking hints are explicitly requested. This helps avoid contention, but it also means that TempDB should be suitably scaled and on a fast disk array.

The main risk of using readable secondary replicas is that implementing snapshot isolation on the secondary replica can actually cause deleted records not to be cleaned up on the primary replica. This is because the ghost record cleanup task only remove rows from the primary once they are no longer required at the secondary. In this scenario, log truncation is also delayed on the primary replica. This means that you potentially risk having to kill long-running queries that are being satisfied against the readable secondary. This issue can also occur if the secondary replica becomes disconnected from the primary. Therefore, there is a risk that you may need to remove the secondary replica from the Availability Group and subsequently re-add it.

To make a secondary replica readable, you need to perform three tasks. First configure the secondary replica to allow read-only connections. Second, specify a read-only URL for reporting. The Availability Group Listener then directs appropriate traffic to this URL. The final task is to update the read-only routing list on the primary replica. These tasks are performed by the script in Listing 5-11.

Summary

AlwaysOn Availability Groups can be implemented with up to eight secondary replicas, combining both Synchronous and Asynchronous Commit modes. When implementing high availability with availability groups, you always use Synchronous Commit mode, because Asynchronous Commit mode does not support automatic failover. When implementing Synchronous Commit mode, however, you must be aware of the associated performance penalty caused by committing the transaction on the secondary replica before it is committed on the primary replica. For disaster recovery, you will normally choose to implement Asynchronous Commit mode.

The availability group can be created via the New Availability Group Wizard, though dialog boxes, through T-SQL, or even through PowerShell. If you create an availability group using dialog boxes, then some aspects, such as the endpoint and associated permissions, must be scripted using T-SQL or PowerShell.

If you implement disaster recovery with availability groups, then you need to configure a multi-subnet cluster. This does not mean that you must have SAN replication between the sites, however, since availability groups do not rely on shared storage. What you do need to do is add additional IP addresses for the administrative cluster access point and also for the Availability Group Listener. You also need to pay attention to the properties of the cluster that support client reconnection to ensure that clients do not experience a high number of timeouts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.74.211