Horizontal scaling
or sharding refers to partitioning the data from one single big table in a database, across multiple independent databases based on
a sharding or partitioning key. For example, a customer table is partitioned across multiple independent databases on
CustomerID
. Each independent database stores data for one or more customers.
Horizontal scaling can be helpful when:
Sharding is supported natively in Azure SQL Databases, so we don't have to implement the sharding mechanism from scratch. However, we do need to create and manage shards. This can be easily done using the Elastic Database Tools.
This diagram represents a generic sharded environment. The customer table is horizontally partitioned across three shards, Shard 1, Shard 2, and Shard 3. Let's understand each of these components in detail:
CustomerID
is the sharding key. Each shard stores data for a different customer ID. You can also define a composite sharding key.
Key (Sharding Key - CustomerID) |
Value (Shard/database) |
---|---|
1 |
Shard 1 |
2 |
Shard 2 |
3 |
Shard 3 |
4 |
Shard 1 |
Key (Sharding Key - CustomerID) |
Value (Shard/database) |
---|---|
1 - 100 |
Shard 1 |
100 - 200 |
Shard 2 |
200 - 300 |
Shard 3 |
300 - 400 |
Shard 1 |
_ShardManagement
schema in the Shard Map Manager database._ShardManagement
schema.As discussed
earlier, a Shard Map Manager is a special database that maintains the global mapping information of a shard set. The mappings are maintained in tables that are automatically created under the
_ShardManagement
schema:
The global shard maps are maintained in three tables, as shown in the preceding diagram:
RangeShardMap
.Shardmaps_global
table. In our example, the RangeShardMap
has two shards, Shard 1 and Shard 2. The table also stores the Server Name the shard belongs to. This information is used when connecting to the shards.The information in the Shard Map Manager is used by the client application to redirect requests to the correct SQL Database based on the sharding key.
Data dependent routing refers to routing the query to the correct database (shard) based on the sharding key specified in the query. This is the fundamental way of querying a sharded environment. The application doesn't maintain connection strings to the different shards. The application doesn't even implement the logic of selecting the shards based on the sharding key. This is done natively by using the functions provided in the elastic client library.
The application defines a single connection using the
OpenConnectionForKey
method defined in the Elastic Database Client Library. The syntax for
OpenConnectionForKey
is given in the following snippet:
public SqlConnection OpenConnectionForKey<TKey>( TKey key, string connectionString, ConnectionOptions options )
It accepts three parameters, which are:
TKey
: This is the sharding key used to determine which shard or SQL Database in a shard set the query is to be made on.connectionString
: The connection string only contains the credentials. The database and the server name are decided from the Shard Map Manager System tables, based on the sharding key.ConnectionOptions
: A connection option can be either none or validate. When it's set to validate, it queries the local shard map or the shardlet to validate that the shard key exists in the databases specified in the cached maps (in the application). This is important in an environment where shard maps change frequently.If the validation fails, then the shard map manager queries the global shard maps for the correct values and updates the application cache.
If the parameters specified are correct,
OpenConnectionForKey
returns a database connection that can be used to query the correct shard.
Multi-tenant data models refer to how the tenants are placed in a sharded environment. There are two distinct models for placing the tenants, database-per-tenant (single tenant model) and shared database – sharded (multi-tenant model):
As the name suggests, each tenant gets its own database. The tenant-specific data is limited to the tenant's database and isolated from other tenants and their data.
As the name suggests, a single shard or database is shared among multiple tenants. The tenants can either be mapped to shards or databases by using either Range or List mappings, as discussed earlier.
Choosing between the two models depends on the following factors:
In this section, we’ll learn how to create an Azure SQL Database Alert. Consider the following scenario from ToyStore Ltd.: Mike has an Azure SQL Database in the Basic Service Tier and has been asked to configure auto scaling to change the Service Tier to Standard S0 when the DTU is greater than 70%. For this purpose, he needs to first create an Azure SQL Database Alert, which is triggered when the DTU is greater than 70%. Let’s see how this can be done.
Create Azure Automation Account and Configure Runbook:
Azure Automation is an Azure Service that allows you to automate Azure Management tasks through Runbooks.
A Runbook is a job that accepts PowerShell or Python code and executes it as and when scheduled, or when invoked from an external program through a webhook:
toystore
Click Create to provision the Automation Account:
toystorejobs
Automation Account. Locate and select Runbooks under the Process Automation section:C:CodeLesson6VerticalScaling
and select the Set-AzureSqldatabaseEdition.ps1
file.Click Create to import the PowerShell runbook:
Once the Runbook is imported, it'll be listed on the
toystore
jobs – Runbooks pane.
toystore
jobs – Runbooks pane, find and select Set-AzureSqldatabaseEdition, as shown in the following screenshot:The Edit pane has the option to further edit the workflow. The PowerShell script is wrapped in a workflow tag specifying that it's a PowerShell Runbook workflow:
The
PowerShell script connects to the given Azure SQL Database and uses the
Set-AzureSqldatabase
command to change the database edition.
Once you are familiar with the script, select Publish in the top menu to publish the Runbook.
Navigate to the
toystorejobs Runbook
pane, and find and select
Credentials under the
Shared Resources section:
Click Create to create the credentials:
The credentials are shared assets and can be used in multiple runbooks.
webhook
for this runbook. Navigate to the Set-AzureSqldatabaseEdition
Runbook overview pane and select the Webhook:Set-AzureSqldatabaseEdition
PowerShell runbook.Click OK to continue. On the Add Webhook pane, select Create to create the webhook and set the parameter values:
Now you have created and configured a PowerShell runbook, which runs a PowerShell command when triggered by a webhook.
The next step is to create an Azure SQL Database alert that is triggered when the DTU percentage is greater than 70%. The alert, when triggered, will call the webhook created earlier.
toystore
Azure SQL Database. In the overview pane, find and click on Alert rules, then select Add alert:In the Add an alert rule pane:
Click OK to create the Alert.
This completes the auto scale setup. The next step is to run the workload and see auto scaling in action:
C:CodeLesson6VerticalScaling
, right-click the Start-Workload.ps1
folder and select Run with PowerShell:A new PowerShell console window will open. Provide the following parameters to the script:
Press Enter to continue. You should see similar output to that shown here:
The scripts start an instance of the ostress utility. The ostress utility runs 25 threads in parallel, executing the
workload.sql
file 30 times each.
toystore
overview page in the Azure portal:The alert status is in the Alert rules page under the Monitoring section on the
toystore
database pane. Once the alert is triggered, the Last Active column value will be changed to Just Now:
Click Completed to further check the job status:
You can verify the parameters passed to the job by clicking on Input, and review the output from the PowerShell script by selecting the Output box.
The status indicates that the job has run successfully.
Switch to the
toystore
overview page and observe that the database edition is now Standard (S0).
This completes the activity.
In this activity, we’ll discuss how to shard our
toystore
database. Consider the following scenario: Mike has been asked to implement sharding to improve the application performance of the
toystore
database. For this purpose, he can shard the
Sales.Customers
and
Sales.Orders
tables into two shards,
toystore_1_100
(with values of customerid from 1-100) and
toystore_200
(with values from 100-200). The steps below describe how this can be done:
toystore_SMM
.toystore
database to toystore_shard_1_100
.toystore_shard_200
Azure SQL Database.toystore_SMM
to Shard Map Manager. This will create the shard management tables and procedures in the toystore_SMM
database: https://www.amazon.in/b?ie=UTF8&node=14072630031&pf_rd_p=bb15c10e-5e2c-47e5-848f-52131f35146f&pf_rd_r=CQGPKFAXBS7QKABXWGMP.Elastic DB Tool Scripts
The Elastic DB Tool scripts are a set of PowerShell modules and scripts provided by Microsoft to easily create and manage Azure SQL Database shards. They use the functions exposed by the Elastic database Client library to provide helper PowerShell cmdlets to easily create and manage shards.
The Elastic DB Tool scripts
are available at
C:codeLesson06 Elastic DB tool scripts.
You can download the latest version from here: https://gallery.technet.microsoft.com/scriptcenter/Azure-SQL-DB-Elastic-731883db
Navigate to
C:CodeLesson06Elastic DB tool scriptsShardManagement
and open
ShardManagement.psm1
.
ShardManagement.psm1
contains functions such as
New-ShardMapManager
,
Get-ShardMapManager
,
New-RangeShardMap
, and
Add-Shard
. Each function has a Synopsis section that briefly describes the function's purpose.
We won't use all of the functions listed in
ShardManagement.psm1
. However, you are free to explore them once you complete the activity.
When you first import the
ShardManagement
module, it checks for the Elastic Client libraries' DLLs (in the folder from where the PowerShell script is executed), and downloads and registers them if not found.
Save Azure Profile Details to a File
Saving your Azure profile details to a file enables you to log in to your Azure account from PowerShell using the saved profile information. Otherwise, you would have to provide your Azure credentials in the authentication dialog box every time you run the Azure command from PowerShell.
powershell
and hit Enter. This will open a new PowerShell console window:Add-AzureRmAccount
You'll have to enter your Azure credentials into the pop-up dialog box. After a successful login, control will return to the PowerShell window.
Run the following command to save the profile details to a file:
Save-AzureRmProfile -Path C:codeMyAzureProfile.json
MyAzureProfile.json
file in JSON format.If you wish to explore the
profile.json
file, you can open it in any editor to review its contents:
Sharding
Toystore
database
We'll now learn
to write PowerShell commands using the Elastic DB Tool scripts to shard the existing
toystore
database.
PowerShell_ISE.exe
in the run command window and hit Enter. This will open a new PowerShell ISE editor window. This is where you'll write the PowerShell commands:C:CodeLesson06Sharding
directory. In the file name textbox, type Shard-toystore.ps1
and press Save to save the file:toystore.ps1
file one after another to implement sharding. The code explanation, wherever required, is given in the following code snippet and in the comments within the code snippet.param ( [parameter(Mandatory=$true)] [String] $ResourceGroup, [parameter(Mandatory=$true)] [String] $SqlServer, [parameter(Mandatory=$true)] [String] $UserName, [parameter(Mandatory=$true)] [String] $Password, [parameter(Mandatory=$true)] [String] $ShardMapManagerdatabase, [parameter(Mandatory=$true)] [String] $databaseToShard, [parameter(Mandatory=$false)] [String] $AzureProfileFilePath )
The script accepts seven parameters:
ResourceGroup
: This is the Azure resource group that contains the Azure SQL Server and the database. This should be the same as the one you provided when creating the toystore
database in Chapter 1, Microsoft Azure SQL Database Primer.SqlServer
: This is the logical Azure SQL Server name that hosts the toystore
database.UserName
and Password
: The Azure SQL Server admin username and password.ShardMapManagerdatabase
: The name of the Shard Map Manager database. Prefix _SMM against the toystore
database to name the shard map manager database.databaseToShard
: The database you wish to shard; toystore
in our case.AzureProfileFilePath
: The path of the JSON file that contains your Azure profile details. If not yet created, follow the steps in the Save Azure Profile Details to a File section to create one.# log the execution of the script
Start-Transcript -Path ".LogShard-
toystore
.txt" -Append
…
…
}
#Login to Azure Account
if((Test-Path -Path $AzureProfileFilePath))
{
$profile = Select-AzureRmProfile -Path $AzureProfileFilePath
$SubscriptionID = $profile.Context.Subscription.SubscriptionId
…
…
# Provide your Azure Credentials in the login dialog box
$profile = Login-AzureRmAccount
$SubscriptionID = $profile.Context.Subscription.SubscriptionId
…
…
#Set the Azure Context
Set-AzureRmContext -SubscriptionId $SubscriptionID | Out-Null
This script does the following things:
C:CodeLesson06ShardingLog
folder.AzureProfileFilePath
parameter to the Azure Profile JSON file if the path isn't provided as the parameter.Login-AzureRmAccount
command. In this case, you will have to provide your Azure subscription username and password in the pop-up windows.Set-AzureRmContext cmdlet
. This tells PowerShell to create and manage objects in your Azure Profile.Press Ctrl + S to save your work before moving further.
shardmanagement
module. This will allow us to use the functions in shardmanagement.psm1
in our PowerShell script:# Import the ShardManagement module Import-Module '..Elastic DB tool scriptsShardManagementShardManagement.psm1'
Observe the
shardmanagement.psm1
file path. It is relative to the directory from where you are executing the
shard-toystore.ps1
script. Therefore, you will need to make sure that the Elastic DB tool scripts folder is available in the
~Lesson06Sharding
folder.
SQLServerFQDN
variable:$SQLServerFQDN = "$SqlServer.database.windows.net"
The
SQLServerFQDN
variable has a fully qualified name logical Azure SQL Server Name. This is required later in the script.
# Provision a new Azure SQL database # call ProvisionAzureSQLdatabase.ps1 created in chapter 1 to create a new Azure SQL database to act as Shard Map Manager $command = "....Lesson01ProvisionAzureSQLdatabase.ps1 -ResourceGroup $ResourceGroup -SQLServer $SqlServer -UserName $UserName -Password $Password -SQLdatabase $ShardMapManagerdatabase -Edition Standard" Invoke-Expression -Command $command
The command variable specifies the
ProvisionAzureSQLdatabase.ps1
file and the required parameters. You can check
Chapter 1,
Microsof Azure SQL Database Primer to find out how to run the
ProvisionAzureSQLdatabase.ps1
PowerShell script.
Observe the relative path of
ProvisionAzureSQLdatabase.ps1
. You will have to change the path if
Lesson01
and
Lesson06
are not in the same parent directory.
The
Invoke-Expression cmdlet
runs the command specified in the command variable.
# Setup the shards # Rename existingtoystore
database totoystore
_shard1 $Shard1 = $databaseToShard + "_Shard_1_100" $Shard2 = $databaseToShard + "_Shard_200" # Establish credentials for Azure SQL database server $SqlServercredential = new-object System.Management.Automation.PSCredential($UserName, ($Password | ConvertTo-SecureString -asPlainText -Force)) … … # Create tables to be sharded in Shard2 $files = Get-ChildItem -Path ".TableScripts" ForEach($file in $files) { Write-Host "Creating table $file in $shard2" -ForegroundColor Green Invoke-Sqlcmd -ServerInstance $SQLServerFQDN -Username $UserName -Password $Password -database $shard2 -InputFile $file.FullName | out-null }
The preceding code does the following things:
databaseToShard
variable is toystore
, then Shard1 = toystore_Shard_1_100
and Shard2 = toystore_Shard_200.
toystore
database to Shard1
, that is, toystore_Shard_1_100
. The Set-AzureSqldatabase cmdlet
is used to rename the database.toystore
_Shard_200. It uses ProvisionAzureSQLdatabase.ps1
as described in step 8 to provision a new database.Sales.Customer
, Sales.Orders
, and Application.Countries
, in the newly provisioned shard2
database.C:CodeLesson06ShardingTableScripts
. The Get-ChildItem cmdlet
gets all of the files present in the TableScripts
directory.Invoke-Sqlcmd cmdlet
executes the scripts file on the Shard2 database.# Register the database created previously as the Shard Map Manager Write-host "Configuring database $ShardMapManagerdatabase as Shard Map Manager" -ForegroundColor Green $ShardMapManager = New-ShardMapManager -UserName $UserName -Password $Password -SqlServerName $SQLServerFQDN -SqldatabaseName $ShardMapManagerdatabase -ReplaceExisting $true
This code uses the
New-ShardMapManager cmdlet
from the
ShardManagement.psm1
module to register the newly created database in step 8 as the Shard Map Manager.
This creates the database objects required for shard management in the shard map manager database under the
ShardManagement
schema.
# Create Shard Map for Range Mapping
$ShardMapName = "
toystore
rangemap"
$ShardMap = New-RangeShardMap -KeyType $([int]) -ShardMapManager $ShardMapManager -RangeShardMapName $ShardMapName
This code uses the
New-RangeShardMap
function from the
ShardManagement
module to create a new range shard map in the shard map manager database.
The
keytype
parameter defines the data type of the
sharding
key. In our case, the
sharding
key is
customerid
, which is of the integer data type.
ShardMapManager
is the shard map manager object assigned to the
$ShardMapManager
variable in step 10. This tells the function to create the shard map in this particular shard map manager.
The
RangeShardMapName
variable is the name of the shard map,
toystorerangemap
.
# Add shards (databases) to shard maps Write-host "Adding $Shard1 and $Shard2 to the Shard Map $ShardMapName" -ForegroundColor Green $Shards = "$Shard1","$shard2" foreach ($Shard in $Shards) { Add-Shard -ShardMap $ShardMap -SqlServerName $SQLServerFQDN -SqldatabaseName $Shard }
This code uses the
Add-Shard
function from the
ShardManagement
module, and adds the individual shards Shard1 (toystore_Shard_1_100
) and Shard2 (toystore_Shard_200
) to the
toystorerangemap
created in step 11.
ShardMap
is the shard map object assigned to the
$ShardMap
variable in step 11. This tells the function the shard map in which the shards are to be added.
SqlServerName
and
SqldatabaseName
are the logical server name and the database name of the shards to be added to the shard map.
This step will create the local shard management objects in the individual shards under the
__ShardManagement
database.
toystore_Shard_1_100
):# Add Range Key Mapping on the first Shard # Mapping is only required on first shard currently it has all the data. $LowKey = 0 $HighKey = 200 Write-host "Add range keys to $Shard1 (Shard1)" -ForegroundColor Green Add-RangeMapping -KeyType $([int]) -RangeShardMap $ShardMap -RangeLow $LowKey -RangeHigh $HighKey -SqlServerName $SQLServerFQDN -SqldatabaseName $Shard1
This code uses the
Add-RangeMapping
function from the
ShardManagement
module to specify the key range for the first shard only. It takes the following parameters:
Keytype
: The data type of the sharding key column. It is Integer in our case.RangeShardMap
: The range shard map object. This is assigned to the $ShardMap
variable in step 11.RangeLow
: The lower boundary of the range mapping. 0 in our case.RangeHigh
: The higher boundary of the range mapping. 200 in our case.SqlServerName
: The logical Azure SQL Server name that hosts the shards.SqldatabaseName
: The name of the shard.Mappings are added only to the first shard because it has all of the customer records (200 customers) at the moment.
In the next activity, you'll split the records between the shards using the Split-Merge utility.
# Add Schema Mappings to the $shardMap # This is where you define the sharded and the reference tables Write-host "Adding schema mappings to the Shard Map Manager database" -ForegroundColor Green $ShardingKey = "Customerid" $ShardedTableName = "Customers","Orders" $ReferenceTableName = "Countries" … … # Get the schema info collection for the shard map manager $SchemaInfoCollection = $ShardMapManager GetSchemaInfoCollection() # Add the SchemaInfo for this Shard Map to the Schema Info Collection if ($($SchemaInfoCollection | Where Key -eq $ShardMapName) -eq $null) { $SchemaInfoCollection.Add($ShardMapName, $SchemaInfo) } else { $SchemaInfoCollection.Replace($ShardMapName, $SchemaInfo) } Write-host "$databaseToShard is now Sharded." -ForegroundColor Green
This code adds the schema information of the sharded and reference table in the
shard
map manager database
. The schema information includes schema name, table name, and key column.
This is done by initializing a schema info object of type
Microsoft.Azure.Sqldatabase.ElasticScale.ShardManagement.Schema.SchemaInfo
, and then adding the table details to this object using the
Add
function.
The
Schemainfo.Add
function takes three arguments, schema name, table name, and key column name.
The
SchemaInfoCollection
variable gets the shard map manager schema info collection object.
The schema is then added to the shard map manager by a
SchemaInfoCollection.Add
function call that takes two arguments, the shard map to add the schema details to and the schema details as defined in the schema info object.
toystore
database, you can restore it using the bacpac file provide with the code files: C:Code _databaseBackup oystore.bacpac
.Executing PowerShell Script
To
execute the
shard-toystore.ps1
:
shard-toystore.ps1
script. For example, if the script is in the C:CodeLesson06Sharding
directory, then run the following command to switch to this directory:cd C:CodeLesson06Sharding
C:CodeLesson06Executions.txt
file:.shard-toystore
.ps1 -ResourceGrouptoystore
-SqlServer toyfactory -UserName sqladmin -Password Packt@pub2 -ShardMapManagerdatabasetoystore
_SMM -databaseToShardtoystore
-AzureProfileFilePath C:CodeMyAzureProfile.json
You may get the following warning during script execution. Ignore such warnings:
WARNING: Could not obtain SQL Server Service information. An attempt to connect to WMI on 'Microsoft.WindowsAzure.Commands.Sqldatabase.dll' failed with the following error: The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)
Once you have changed the parameter values, hit Enter to run the command. This command will:
toystore.txt
file in the Log folder. Use this file for troubleshooting script errors.toystore_SMM
toystore
database to toystore_Shard_1_100 (shard1)
toystore
_Shard_200 (shard2)toystore_SMM
database under the __ShardManagement
schematoystorerangemap
toystore_Shard_1_100 (shard1)
and toystore_Shard_200 (shard2)
to the range shard maptoystore_Shard_1_100 (shard1)
Sales.Customers
, Sales.Orders
, and Application.Countries
in the shard map manager databaseYou should get the following output after successful execution of the script:
Configuring databasetoystore
_SMM as Shard Map Manager Addingtoystore
_Shard_1_100 to the Shard Maptoystore
rangemap Addingtoystore
_Shard_200 to the Shard Maptoystore
rangemap Add range keys totoystore
_Shard_1_100 (Shard1) Adding schema mappings to the Shard Map Manager databasetoystore
is now Sharded.
Reviewing the Shard Configuration
You'll now review the shard configuration the PowerShell script performed:
toyfactory
Azure SQL Server.
toystore_Shard_1_100
is the
toystore
database renamed.
toystore_Shard_200
is the new Shard 2 database.
toystore_SMM
is the Shard Map Manager database.
toystore_SMM
and select New Query from the context menu.SELECT st.Name As ShardTables FROM sys.tables st JOIN sys.schemas ss on st.schema_id=ss.schema_id WHERE ss.Name='__ShardManagement'
You should get the following output:
Observe that six tables are added to the
toystore_SMM
database.
ShardMapsGlobal
table:SELECT * FROM __ShardManagement.ShardMapsGlobal
You should see the following output:
The
ShardMapsGlobal
table will have one row for each shard map you create. Observe that it contains the
toystorerangemap
that was created by the
Shard-toystore.ps1
script. Each shard map is assigned a unique
ShardMapId
.
ShardsGlobal
table:SELECT ShardId,ShardMapId,ServerName,databaseName FROM __ShardManagement.ShardsGlobal
You should get the following output:
The
ShardsGlobal
table contains one row for each shard in the sharded environment. Observe that it has two rows, one for each shard,
toystore
_
shard_1_100 and toystore_Shard_200
.
The
ShardMapId
column is used to map the shard with its corresponding Shard Map in the
ShardsMapGlobal
table.
The table also stores the
ServerName
for each of the shards (databases). This table is used to route the requests to the correct shard based on the sharding key when a request is received from an application.
ShardMappingsGlobal
table:SELECT MappingId,ShardId,ShardMapId,MinValue,MaxValue FROM __ShardManagement.ShardMappingsGlobal
You should get the following output:
ShardMappingsGlobal
stores the low and high key-value mapping for each shard in the
ShardsGlobal
table.
The
ShardId
and
ShardMapId
columns map the rows with their corresponding Shards and Shard Map in the
ShardsGlobal
and
ShardMapsGlobal
tables respectively.
ShardeddatabaseSchemaInfosGlobal
table:select * from __ShardManagement.ShardeddatabaseSchemaInfosGlobal
You should get the following output:
The
ShardeddatabaseSchemaInfosGlobal
table stores the schema info for each shard map defined in the
ShardsMapGlobal
table.
These are the same schema details as provided in step 14 of the
Sharding toystore database
section in the
shard-toystore.ps1
script.
In
the results pane in SSMS, click the XML under the
Schemainfo
column. You should see the following XML:
<Schema xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <ReferenceTableSet i:type="ArrayOfReferenceTableInfo"> <ReferenceTableInfo> <SchemaName>Application</SchemaName> <TableName>Countries</TableName> </ReferenceTableInfo> </ReferenceTableSet> <ShardedTableSet i:type="ArrayOfShardedTableInfo"> <ShardedTableInfo> <SchemaName>Sales</SchemaName> <TableName>Customers</TableName> <KeyColumnName>Customerid</KeyColumnName> </ShardedTableInfo> <ShardedTableInfo> <SchemaName>Sales</SchemaName> <TableName>Orders</TableName> <KeyColumnName>Customerid</KeyColumnName> </ShardedTableInfo>
Observe that it contains the schema, table, and sharding key column values for the
Sales.Customer
,
Sales.Orders
, and
Application.Countries
tables.
toystore_Shard_1_100
and then expand Tables:Observe that as
toystore_SMM
has global shard management tables,
toystore_Shard_1_100
has local shard map management tables.
toystore_shard_200
will also have local shard management tables.
The local shard management tables store shard metadata specific to the particular shard. You can query the tables to review the data for a better understanding.
This completes the activity.
In the previous activity, Mike created two shards,
toystore_Shard_1_100
and
toystore_Shard_200
. However, all of the data is available in the
toystore_Shard_1_100
database and he has been requested to split the data between
toystore_Shard_1_100
and
toystore_Shard_200
. Therefore, he can use the split service to split the data.
In this activity, you'll use the split-merge service to split the data between
toystore_Shard_1_100
and
toystore_Shard_200
.
The split-merge tool is an Azure Web service deployed to your Azure environment. Once deployed, you can either invoke the web service from the web service URL or from PowerShell.
To deploy the split-merge cloud service in your Azure environment and then call the cloud service function to split the data, follow these steps:
Deploying Split-Merge Cloud Service:
Follow the instructions listed at this URL to deploy the split-merge service.
In addition to the steps mentioned in the URL, make the following additional changes before deploying the web service:
ServiceConfiguration.cscfg
file, set the value of the following settings to false
:<Setting name="SetupWebAppForClientCertificates" value="false" /> <Setting name="SetupWebserverForClientCertificates" value="false" />
C:CodeLesson06Splitting
folder for the sample files:Serviceconfigurtion.cscfg
: C:CodeLesson06SplittingSplitMergeLibrariesMicrosoft.Azure.Sqldatabase.ElasticScale.Service.SplitMerge.1.2.0contentsplitmergeservice
SplitMergeService.cspkg
: C:CodeLesson06SplittingSplitMergeLibrariesMicrosoft.Azure.Sqldatabase.ElasticScale.Service.SplitMerge.1.2.0contentsplitmergeservice
C:CodeLesson06SplittingCertificate
.toyfactory.cer
and select Install to install the certificate on your local machine.toyfactory.pfx
file on Azure Cloud as per instructions at the URL given previously.toyfactory
Azure SQL Server firewall to allow connections from services within Azure.Once your web service is deployed, you should see this output in the Azure portal cloud service overview section:
You can split the data by either filling out the web form or by calling the web service using PowerShell.
Call Split-Merge Cloud Service Using PowerShell
PowerShell_ISE.exe
in the run command window and hit Enter
. This will open a new PowerShell ISE editor window. This is where you'll write the PowerShell commands:In the PowerShell ISE, select
File from the top menu and click
Save. Alternatively, you
can press
Ctrl +
S to save the file. In the File Save Dialog box, browse to the
C:CodeLesson06Splitting
directory. In the File name text box, type
Split-toystore-shard.ps1
and click
Save to save the file:
Split-toystore-shard.ps1
file one after another to implement the split operation. The code explanation, wherever required, is given in the following code snippet and in the comments within the code snippet.param ( [parameter(Mandatory=$true)] [String] $ResourceGroup, [parameter(Mandatory=$true)] [String] $SqlServer, [parameter(Mandatory=$true)] [String] $UserName, [parameter(Mandatory=$true)] [String] $Password, … … [String] $SplitRangeHigh, [parameter(Mandatory=$true)] [String] $SplitValue, [bool] $CreateSplitMergedatabase = $false )
Most of the parameters have been described in the previous activity. The following are the descriptions of the additional parameters:
SplitMergedatabase
: This is the split-merge database you will have created as part of the split-merge cloud service deployment.SplitMergeServiceEndpoint
: The split-merge cloud service URL copied in step 2 of the Deploy Split-Merge Cloud Service sectionShardMapManagerdatabaseName
: The shard map manager database you created in a previous Activity.toystore_Shard_200
) you created in a previous Activity.ShardMapName
: The shard map name (toystorerangemap
) you created in a previous Activity. SplitRangeLow
: The lower value for the range mapping. This is 0 in our case.SplitRangeHigh
: The higher value for the range mapping. This is 200 in our case.SplitValue
: The value at which the split will take place. This is 100 in our case.CreateSplitMergedatabase
: This is a Boolean value which, when set to true, will provision a new database to be used as the split-merge database. You can use this to provision the database if you haven't created it yet.Start-Transcript -Path "$ScriptPathLogSplit
ToyStore
Shard.txt" -Append
$CertificateThumbprint = $null
# Get the parent directory of the script.
$ScriptPath = split-path -parent $MyInvocation.MyCommand.Definition
# set the AzureProfileFilePath
$AzureProfileFilePath = "....MyAzureProfile.json"
#Login to Azure Account
if((Test-Path -Path $AzureProfileFilePath))
{
$profile = Select-AzureRmProfile -Path $AzureProfileFilePath
$SubscriptionID = $profile.Context.Subscription.SubscriptionId
}
…
…
#Create a database to store split merge status
$command = "....Lesson01ProvisionAzureSQLdatabase.ps1 -ResourceGroup $ResourceGroup -SQLServer $SqlServer -UserName $UserName -Password $Password -SQLdatabase $SplitMergedatabase -Edition Basic"
Invoke-Expression -Command $command
Exit;
}
This code calls the
ProvisionAzureSQLdatabase.ps1
PowerShell script to create a new Azure SQL database to store the split-merge cloud service status.
The database is created only if
CreateSplitMergedatabase
is set to true.
# Import SplitMerge module $ScriptDir = Split-Path -parent $MyInvocation.MyCommand.Path Import-Module $ScriptDirSplitMerge -Force
The split-merge PowerShell module has helper functions to call the split-merge cloud service.
Write-Output 'Sending split request' $splitOperationId = Submit-SplitRequest ' -SplitMergeServiceEndpoint $SplitMergeServiceEndpoint ' -ShardMapManagerServerName "$SqlServer.database.windows.net" ' -ShardMapManagerdatabaseName $ShardMapManagerdatabaseName ' -TargetServerName "$SqlServer.database.windows.net" ' -TargetdatabaseName $Shard2 ' -UserName $UserName ' -Password $Password ' -ShardMapName $ShardMapName ' -ShardKeyType 'Int32' ' -SplitRangeLowKey $SplitRangeLow ' -SplitValue $SplitValue ' -SplitRangeHighKey $SplitRangeHigh ' -CertificateThumbprint $CertificateThumbprint
This code calls the
Submit-SplitRequest
functions defined in the
SplitMerge
module. The
Submit-SplitRequest
function submits the split request by specifying the different parameter values.
The
SplitMerge
module contains helper functions for Merge requests as well. Merge refers to merging two range mappings into a single shard.
The
Submit-SplitRequest
returns the operation
id
value. The operation id is assigned to the
$splitOperationId
variable and is used to get the split request status.
# Get split request output Wait-SplitMergeRequest -SplitMergeServiceEndpoint $SplitMergeServiceEndpoint -OperationId $splitOperationId -CertificateThumbprint $CertificateThumbprint
This code calls the
Wait-SplitMergeRequest
helper function defined in the SplitMerge PowerShell module. The function checks for the split operation status for the operation id
$splitOperationId
and writes the status to the console.
Executing PowerShell Script
toystore
.ps1 script. For example, if the script is in the C:CodeLesson06Sharding
directory, then run the following command to switch to this directory:cd C:CodeLesson06Splitting
C:CodeLesson06Executions.txt
file:.Split-toystore
-shard.ps1 -ResourceGrouptoystore
-SqlServer toyfactory -UserName sqladmin -Password Packt@pub2 -SplitMergedatabasetoystore
_splitmerge -SplitMergeServiceEndpoint "https://splitmerge.cloudapp.net/" -ShardMapManagerdatabaseNametoystore
_SMM -Shard2toystore
_Shard_200 -ShardMapNametoystore
rangemap -SplitRangeLow 0 -SplitRangeHigh 200 -SplitValue 100 -AzureProfileFilePath C:CodeMyAzureProfile.json
Once you have changed the parameter values, copy and paste the command in the PowerShell console window opened in step 1 and hit Enter.
If the script executes successfully, you should get the following output:
Sending split request Polling request status. Press Ctrl-C to end Progress: 0% | Status: Queued | Details: [Informational] Operation has been queued. Progress: 5% | Status: Starting | Details: [Informational] Starting Split-Merge state machine for request. Progress: 5% | Status: Starting | Details: [Informational] Performing data consistency checks on target shards. Progress: 20% | Status: CopyingReferenceTables | Details: [Informational] Successfully copied reference table [Applicati on].[Countries]. … … Progress: 80% | Status: CopyingShardedTables | Details: [Informational] Successfully copied key range [190:200) for shar ded table [Sales].[Orders]. Progress: 90% | Status: Completing | Details: [Informational] Deleting any temp tables that were created while processin g the request. Progress: 100% | Status: Succeeded | Details: [Informational] Successfully processed request.
RequestStatus
table in the split-merge database.The
RequestStatus
tables has one row for each split-merge request. The Details column contains the XML with the error details if the request fails.
Verify the Split Operation
To review that the split request has correctly moved the data:
toystore_Shard_1_100
database and select New Query from the context menu.SELECT DB_NAME() AS databaseName, COUNT(*) AS TotalRows FROM Sales.Customers
You should get the following output:
toystore_Shard_200
database and select New Query from the context menu.SELECT DB_NAME() AS databaseName, COUNT(*) AS TotalRows FROM Sales.Customers
You should get the following output:
This validates that the Split-Merge operation has successfully split 200 rows between the two shards,
toystore_Shard_1_100 (100 rows)
and
toystore_Shard_200 (100 rows)
.
toystore_SMM
database and select New Query from the context menu. Execute the following query in the new query window:SELECT sg.databaseName AS ShardName ,sg.ServerName AS ServerName ,smg.Name AS ShardMapName ,smg.KeyType ,CAST(MinValue AS SMALLINT) AS RangeLowKey ,CAST(MaxValue AS SMALLINT) AS RangeHighKey FROM [__ShardManagement].[ShardMapsGlobal] smg JOIN [__ShardManagement].[ShardsGlobal] sg ON sg.ShardMapID = smg.ShardMapId JOIN [__ShardManagement].[ShardMappingsGlobal] smng ON smg.ShardMapID=smng.ShardMapID AND sg.ShardId=smng.ShardId
You should get the following output:
The
MinValue
and
MaxValue
columns are
varbinary
columns and are therefore converted to
SmallInt
.
If you remember the first activity of this chapter, the table
ShardMappingsGlobal
had only one
mapping, which was added as the part of sharding configuration.
However, it now has two rows, and the second row for the shard
toystore_Shard_200
is added as part of the split operation.
This completes the activity.
In this activity, we will use elastic or cross-database queries to query the sharded tables (created in previous activities) across the shards as a single table.
To query multiple shards as a single table using elastic queries, follow these steps:
toystorereporting
database:CREATE DATABASE toystorereporting;
toystorereporting
database, and select New Query from the context menu.CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Packt@pub2';
You may get the following error if a master key already exists in the database:
Msg 15578, Level 16, State 1, Line 3 There is already a master key in the database. Please drop it before performing this statement.
Ignore the error and proceed to the next step.
CREATE DATABASE SCOPED CREDENTIAL toystore_creds WITH IDENTITY = 'sqladmin', SECRET = 'Packt@pub2'
The identity and secret should be the same as your Azure SQL Server administrator username and password.
CREATE EXTERNAL DATA SOURCE toystore_dsrc WITH ( TYPE=SHARD_MAP_MANAGER, LOCATION='toyfactory.database.windows.net', DATABASE_NAME='toystore_SMM', CREDENTIAL= toystore_creds, SHARD_MAP_NAME='toystorerangemap' );
This query creates an external data source
toystore_dsrc
of type
Shard_Map_Manager
, which connects to the shard map manager database
toystore_SMM
using the
toystore_creds
database scoped credentials created in step 5.
The shard map name in the external data source will help resolve the individual shards to get the data from.
We didn't specify individual shards as the external data source
database_Name
because it'll return the data of individual shards. However, our goal is to get data for a table from all shards.
The external data source type can be Hadoop, RDBMS, or Blob Storage. For more details on external data sources, refer to this link: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql.
customers
table in the toystorereporting
database. Observe that the table is created with the External keyword and on the external data source toystore_dsrc
created in step 6:CREATE EXTERNAL TABLE [dbo].[Customers]( [CustomerID] [int] NOT NULL, [CustomerName] [nvarchar](100) NOT NULL, [BillToCustomerID] [int] NOT NULL, [CustomerCategoryID] [int] NOT NULL, [BuyingGroupID] [int] NULL, [PrimaryContactPersonID] [int] NOT NULL, [AlternateContactPersonID] [int] NULL, ... ... );
dbo.Customers
is an external table that gets its data from the
toystore_dsrc
external
data source, the
Sales.Customers
table.
The Distribution parameter specifies how the data is distributed for this table. In our case, the table is horizontally partitioned, hence the distribution used is sharded with
customerid
(sharding key). The other available distributions are:
SELECT * FROM dbo.Customers
You should get all 200 rows.
The database engine uses the information specified in the
toystore_dsrc
external data source to connect and return data from all the shards.
-- Get Existing External Data sources SELECT * FROM sys.external_data_sources; -- Get Existing External Tables SELECT * FROM sys.external_tables
This completes the activity.
3.145.55.198