© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. A. CarterPro SQL Server 2022 Administrationhttps://doi.org/10.1007/978-1-4842-8864-1_9

9. Database Consistency

Peter A. Carter1  
(1)
SOUTHAMPTON, UK
 

Databases involve lots of I/O. When you have a lot of I/O, you inherently run the risk of corruption. Your primary defense against database corruption is to take regular backups of your database and to periodically test that these backups can be restored. You need to look out for database corruption, however, and SQL Server provides tools you can use to check the consistency of your database, as well as to resolve consistency issues if backups are not available. This chapter will look at the options you have for both checking and fixing consistency issues.

Consistency Errors

Consistency errors can occur in user databases or system databases, leaving tables, databases, or even the entire instances in an inaccessible state. Consistency errors can occur for many reasons, including hardware failures and issues with the Database Engine. The following sections discuss the types of error that can occur, how to detect these errors, and what to do if your system databases become corrupt.

Understand Consistency Errors

Different database consistency errors can occur; these cause a query to fail or a session to be disconnected and a message to be written to the SQL Server error log. The most common errors are detailed in the following sections.

605 Error

A 605 error can point to one of two issues, depending on the error severity. If the severity is level 12, then it indicates a dirty read. A dirty read is a transactional anomaly that occurs when you are using the Read Uncommitted isolation level or the NOLOCK query hint. It occurs when a transaction reads a row that never existed in the database, due to another transaction being rolled back. Transactional anomalies will be discussed in more detail in Chapter 18. To resolve this issue, either rerun the query until it succeeds or rewrite the query to avoid the use of the Read Uncommitted isolation level or the NOLOCK query hint.

The 605 error may indicate a more serious issue, however, and often it indicates a hardware failure. If the severity level is 21, then the page may be damaged, or the incorrect page may be being served up from the operating system. If this is the case, then you need to either restore from a backup or use DBCC CHECKDB to fix the issue. (DBCC CHECKDB is discussed later in this chapter.) Additionally, you should also have the Windows administrators and storage team check for possible hardware or disk-level issues.

823 Error

An 823 error occurs when SQL Server attempts to perform an I/O operation and the Windows API that it uses to perform this action returns an error to the Database Engine. An 823 error is almost always associated with a hardware or driver issue.

If an 823 error occurs, then you should use DBCC CHECKDB to check the consistency of the rest of the database and any other databases that reside on the same volume. You should liaise with your storage team to resolve the issue with the storage. Your Windows administrator should also check the Windows event log for correlated error messages. Finally, you should either restore the database from a backup or use DBCC CHECKDB to “fix” the issue.

824 Error

If the call to the Windows API succeeds but there are logical consistency issues with the data returned, then an 824 error is generated. Just like an 823 error, an 824 error usually means that there is an issue with the storage subsystem. If an 824 error is generated, then you should follow the same course of action as you do when an 823 error is generated.

5180 Error

A 5180 error occurs when a file ID is discovered that is not valid. File IDs are stored in page pointers, as well as in system pages at the beginning of each file. This error is usually caused by a corrupt pointer within a page, but it can potentially also indicate an issue with the Database Engine. If you experience this error, you should restore from a backup or run DBCC CHECKDB to fix the error.

7105 Error

A 7105 error occurs when a row within a table references an LOB (Large Object Block) structure that does not exist. This can happen because of a dirty read in the same manner as a 605 severity 12 error, or it can happen as the result of a corrupt page. The corruption can either be in the data page that points to the LOB structure or in a page of the LOB structure itself.

If you encounter a 7105 error, then you should run DBCC CHECKDB to check for errors. If you don’t find any, then the error is likely the result of a dirty read. If you find errors, however, then either restore the database from a backup or use DBCC CHECKDB to fix the issue.

Detecting Consistency Errors

SQL Server provides mechanisms for verifying the integrity of pages as they are read from and written to disk. It also provides a log of corrupt pages that helps you identify the type of error that has occurred, how many times it has occurred, and the current status of the page that has become corrupt. These features are discussed in the following sections.

Page Verify Option

A database-level option called Page Verify determines how SQL Server checks for page corruption that the I/O subsystem causes when it is reading and writing pages to disk. It can be configured as CHECKSUM, which is the default option, TORN_PAGE_DETECTION, or NONE.

The recommended setting for Page Verify is CHECKSUM. When this option is selected, every time a page is written, a CHECKSUM value is created against the entire page and saved in the page header. A CHECKSUM value is a hash sum, which is deterministic and unique based on the value that the hashing function is run against. This value is then recalculated when a page is read into the buffer cache and compared to the original value.

When TORN_PAGE_DETECTION is specified, whenever a page is written to disk, the first 2 bytes of every 512-byte sector of the page are written to the page’s header. When the page is subsequently read into memory, these values are checked to ensure that they are the same. The flaw here is obvious; it is perfectly possible for a page to be corrupt, and for this corruption not to be noticed, because it is not within the bytes that are checked. TORN_PAGE_DETECTION is a deprecated feature of SQL Server, which means that it will not be available in future versions. You should avoid using it. If Page Verify is set to NONE, then SQL Server performs no page verification whatsoever. This is not good practice.

If all of your databases have been created in a SQL Server 2022 instance, then they are all configured to use CHECKSUM by default. If you have migrated your databases from a previous version of SQL Server, however, then they may be configured to use TORN_PAGE_DETECTION. You can check the Page Verify setting of your databases by using the script in Listing 9-1.
--Create the Chapter9 database
CREATE DATABASE Chapter9 ;
GO
--View page verify option, for all databases on the instance
SELECT
        name
        ,page_verify_option_desc
FROM sys.databases ;
Listing 9-1

Checking the Page Verify Option

If you find that a database is using TORN_PAGE_DETECTION, or worse was set to NONE, then you can resolve the issue by altering the setting in the Options page of the Database Properties dialog box, as shown in Figure 9-1.
Figure 9-1

The Options page

Note

Changing the Page Verify option does not cause the CHECKSUM to be created against the data pages immediately. The CHECKSUM is only generated when the pages are written back to disk after being modified.

Alternatively, you can achieve the same results using T-SQL by using an ALTER DATABASE <DatabaseName> SET PAGE_VERIFY CHECKSUM WITH NO_WAIT statement. The script in Listing 9-2 causes all databases that are currently set to either NONE or TORN_PAGE_DETECTION to be reconfigured to use CHECKSUM. The script uses the XQuery data() function to avoid the need for a cursor. The script works by building the statement required for every row in the table. It flips the data for each row into XML, but the tags are then striped out using the Data() function, leaving only the statement. It is then flipped back to a relational string and passed into a Unicode variable, which is then executed as dynamic SQL.
DECLARE @SQL NVARCHAR(MAX)
SELECT @SQL =
(
SELECT
        'ALTER DATABASE ' + QUOTENAME(Name) +
                               ' SET PAGE_VERIFY CHECKSUM WITH NO_WAIT; ' AS [data()]
FROM sys.databases
WHERE page_verify_option_desc <> 'CHECKSUM'
FOR XML PATH('')
) ;
BEGIN TRY
    EXEC(@SQL) ;
END TRY
BEGIN CATCH
    SELECT 'Failure executing the following SQL statement ' + CHAR(13) +CHAR(10) + @SQL ;
END CATCH
Listing 9-2

Reconfiguring All Databases to Use CHECKSUM

Tip

You can use this technique any time you require a script to perform an operation against multiple databases. The code is far more efficient than using a cursor and promotes good practice by allowing DBAs to lead by example. You are always telling your developers not to use the cursor, right?

Suspect Pages

If SQL Server discovers a page with a bad checksum or a torn page, then it records the pages in the MSDB database in a table called dbo.suspect_pages. It also records any pages that encounter an 823 or 824 error in this table. The table consists of six columns, as described in Table 9-1.
Table 9-1

suspect_pages Columns

Column

Description

Database_id

The ID of the database that contains the suspect page

File_id

The ID of the file that contains the suspect page

Page_id

The ID of the page that is suspect

Event_Type

The nature of the event that caused the suspect pages to be updated

Error_count

An incremental counter that records the number of times that the event has occurred

Last_updated_date

The last time the row was updated

The possible values for the event_type column are explained in Table 9-2.
Table 9-2

Event Types

Event_type

Description

1

823 or 824 error

2

Bad checksum

3

Torn page

4

Restored

5

Repaired

7

Deallocated by DBCC CHECKDB

After recording the suspect page in the suspect_pages table, SQL Server updates the row after you have fixed the issue by either restoring the page from a backup or by using DBCC CHECKDB. It also increments the error count every time an error with the same event_type is encountered. You should monitor this table for new and updated entries and you should also periodically delete rows from this table, which have an event_type of 4 or 5, to stop the table from becoming full.

Note

Page restores will be discussed in Chapter 12.

The script in Listing 9-3 creates a table in the Chapter9, called CorruptTable, which is then populated with data. It then causes one of the table’s pages to become corrupt.
USE Chapter9
GO
--Create the table that we will corrupt
CREATE TABLE dbo.CorruptTable
(
ID    INT    NOT NULL    PRIMARY KEY CLUSTERED    IDENTITY,
SampleText NVARCHAR(50)
) ;
--Populate the table
DECLARE @Numbers TABLE
(ID        INT)
;WITH CTE(Num)
AS
(
SELECT 1 Num
UNION ALL
SELECT Num + 1
FROM CTE
WHERE Num <= 100
)
INSERT INTO @Numbers
SELECT Num
FROM CTE ;
INSERT INTO dbo.CorruptTable
SELECT 'SampleText'
FROM @Numbers a
CROSS JOIN @Numbers b ;
--DBCC WRITEPAGE will be used to corrupt a page in the table. This requires the
--database to be placed in single user mode.
--THIS IS VERY DANGEROUS – DO NOT EVER USE THIS IN A PRODUCTION ENVIRONMENT
ALTER DATABASE Chapter9 SET  SINGLE_USER WITH NO_WAIT ;
GO
DECLARE @SQL NVARCHAR(MAX) ;
SELECT @SQL = 'DBCC WRITEPAGE(' +
(
        SELECT CAST(DB_ID('Chapter9') AS NVARCHAR)
) +
', 1, ' +
(
        SELECT TOP 1 CAST(page_id AS NVARCHAR)
        FROM dbo.CorruptTable
        CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
) +
', 2000, 1, 0x61, 1)' ;
EXEC(@SQL) ;
ALTER DATABASE Chapter9 SET  MULTI_USER WITH NO_WAIT ;
GO
SELECT *
FROM dbo.CorruptTable ;
Listing 9-3

Corrupting a Page

The results in Figure 9-2 show that the final query in the script, which tried to read the data from the table, failed because one of the pages is corrupt, and therefore, there is a bad checksum.
Figure 9-2

Bad checksum error

Caution

DBCC WRITEPAGE is used here for educational purposes only. It is undocumented and also extremely dangerous. It should never be used on a production system and should only be used on any database with extreme caution.

You can use the query in Listing 9-4 to generate a friendly output from the msdb.dbo.suspect_pages table. This query uses the DB_NAME() function to find the name of the database, joins to the sys.master_files system table to find the name of the file involved, and uses a CASE statement to translate the event_type into an event type description.
SELECT
    DB_NAME(sp.database_id) [Database]
    ,mf.name
    ,sp.page_id
    ,CASE sp.event_type
        WHEN 1 THEN '823 or 824 or Torn Page'
        WHEN 2 THEN 'Bad Checksum'
        WHEN 3 THEN 'Torn Page'
        WHEN 4 THEN 'Restored'
        WHEN 5 THEN 'Repaired (DBCC)'
        WHEN 7 THEN 'Deallocated (DBCC)'
    END AS [Event]
    ,sp.error_count
    ,sp.last_update_date
FROM msdb.dbo.suspect_pages sp
INNER JOIN sys.master_files mf
        ON sp.database_id = mf.database_id
                AND sp.file_id = mf.file_id ;
Listing 9-4

Querying suspect_pages

After corrupting a page of our CorruptTable table, running this query will produce the results in Figure 9-3. Obviously, the page_id is likely to be different if you were to run the scripts on your own system, since the Database Engine is likely to have allocated different pages to the table that you created.
Figure 9-3

Results of querying suspect_pages

Note

We will fix the error later in this chapter, but that involves losing data that was stored on the page. If a backup is available, then a page restore is a better option than a repair in this scenario.

Consistency Issues for Memory-Optimized Tables

Corruption usually occurs during a physical I/O operation, so you can be forgiven for thinking that memory-optimized tables are immune to corruption, but this is a fallacy. As you may remember from Chapter 7, although memory-optimized tables reside in memory, a copy of the tables—and depending on your durability settings, a copy of your data—is kept in physical files. This is to ensure that the tables and data are still available after a restart of the instance. These files can be subject to corruption. It is also possible for data to become corrupt in memory, due to issues such as a faulty RAM chip.

Unfortunately, the repair options of DBCC CHECKDB are not supported against memory tables. However, when you take a backup of a database that contains a memory-optimized filegroup, a checksum validation is performed against the files within this filegroup. It is therefore imperative that you not only take regular backups, but that you also check that they can be restored successfully, on a regular basis. This is because your only option, in the event of a corrupted memory-optimized table, is to restore from the last known good backup.

System Database Corruption

If system databases become corrupt, your instance can be left in an inaccessible state. The following sections discuss how to respond to corruption in the Master database and the Resource database.

Corruption of the Master Database

If the Master database becomes corrupted, it is possible that your instance will be unable to start. If this is the case, then you need to rebuild the system databases and then restore the latest copies from backups. Chapter 12 discusses strategies for database backups in more detail, but this highlights why backing up your system databases is important. In the event that you need to rebuild your system databases, you will lose all instance-level information, such as Logins, SQL Server Agent jobs, Linked Servers, and so on, if you are not able to restore from a backup. Even knowledge of the user databases within the instance will be lost and you will need to reattach the databases.

In order to rebuild the system databases, you need to run setup. When you are rebuilding system databases using setup, the parameters described in Table 9-3 are available.
Table 9-3

System Database Rebuild Parameters

Parameter

Description

/ACTION

Specifies Rebuilddatabase for the action parameter.

/INSTANCENAME

Specifies the instance name of the instance that contains the corrupt system database.

/Q

This parameter stands for quiet. Use this to run setup without any user interaction.

/SQLCOLLATION

This is an optional parameter that you can use to specify a collation for the instance. If you omit it, the collation of the Windows OS is used.

/SAPWD

If your instance uses mixed-mode authentication, then use this parameter to specify the password for the SA account.

/SQLSYSADMINACCOUNTS

Use this parameter to specify which accounts should be made sysadmins of the instance.

The PowerShell command in Listing 9-5 rebuilds the system databases of the PROSQLADMIN instance.
.setup.exe /ACTION=rebuilddatabase /INSTANCENAME=PROSQLADMIN /SQLSYSADMINACCOUNTS=SQLAdministrator
Listing 9-5

Rebuilding System Databases

As mentioned, when this action is complete, ideally we restore the latest copy of the Master database from a backup. Since we do not have one, we need to reattach our Chapter9 database in order to continue. Additionally, the detail of the corrupt page within the suspect_pages table will also be lost. Attempting to read the CorruptTable table in the Chapter9 database causes this data to be repopulated, however. The script in Listing 9-6 reattaches the Chapter9 database. You should change the file paths to match you own configuration before you run the script.
CREATE DATABASE Chapter9 ON
( FILENAME = N'F:MSSQLDATAChapter9.mdf' ),
( FILENAME = N'F:MSSQLDATAChapter9_log.ldf' )
 FOR ATTACH ;
Listing 9-6

Reattaching a Database

Corruption of the Resource Database or Binaries

It is possible for the instance itself to become corrupt. This can include corrupt Registry keys or the Resource database becoming corrupt. If this happens, then find the repair utility that ships with the SQL Server installation media. To invoke this tool, select Repair from the Maintenance tab of the SQL Server Installation Center.

After the wizard has run the appropriate rule checks, you are presented with the Select Instance page, as illustrated in Figure 9-4.
Figure 9-4

The Select Instance page

After you select the instance that needs to be repaired, the following page of the wizard runs an additional rules check to ensure that the required features can be repaired. Finally, on the Ready To Repair page, you see a summary of the actions that are to be performed. After choosing to repair, you see the repair progress report. Once the repair completes, a Summary page displays, which provides you with the status of each operation that was performed and also a link to a log file that you may wish to review if you need to perform troubleshooting.

As an alternative to using SQL Server Installation Center, you can achieve the same rebuild from the command line. This is useful if your instance is running on Windows Server Core. When you are repairing an instance from the command line, the parameters available to you are those listed in Table 9-4. Because the Master database is not being rebuilt when you are repairing an instance, you do not need to specify a collation or Administrator details.
Table 9-4

Instance Repair Parameters

Parameter

Description

/ACTION

Specifies Repair for the action parameter.

/INSTANCENAME

Specifies the instance name of the instance that contains the corrupt system database.

/Q

This parameter is Quiet. Use this to run without any user interaction.

/ENU

An optional parameter that you can use on a localized operating system to specify that the English version of SQL Server should be used.

/FEATURES

An optional parameter you can use to specify a list of components to repair.

/HIDECONSOLE

An optional parameter that causes the console to be suppressed.

The PowerShell command in Listing 9-7 also rebuilds the PPROSQLADMIN instance. This script also works for instances hosted on Windows Server Core.
.setup.exe /ACTION=repair /INSTANCENAME=PROSQLADMIN /q
Listing 9-7

Repairing an Instance

DBCC CHECKDB

DBCC CHECKDB is a utility that can be used to both discover corruption and also fix the errors. When you run DBCC CHECKDB, by default it creates a database snapshot and runs the consistency checks against this snapshot. This provides a transactionally consistent point from which the checks can occur while at the same time reducing contention in the database. It can check multiple objects in parallel to improve performance, but this depends on the number of cores that are available and the MAXDOP setting of the instance.

Checking for Errors

When you run DBCC CHECKDB for the purpose of discovering corruption only, then you can specify the arguments, detailed in Table 9-5.
Table 9-5

DBCC CHECKDB Arguments

Argument

Description

NOINDEX

Specifies that that integrity checks should be performed on heap and clustered index structures but not on nonclustered indexes.

EXTENDED_LOGICAL_CHECKS

Forces the logical consistency of XML indexes, indexed views, and spatial indexes to be performed.

NO_INFOMSGS

Prevents informational messages from being returned in the results. This can reduce noise when you are searching for an issue, since only errors and warnings with a severity level greater than 10 are returned.

TABLOCK

DBCC CHECKDB creates a database snapshot and runs its consistency checks against this structure to avoid taking out locks in the database, which cause contention. Specifying this option changes that behavior so that instead of creating a snapshot, SQL Server takes out a temporary exclusive lock on the database, followed by exclusive locks on the structures that it is checking. In the event of high write load, this can reduce the time it takes to run DBCC CHECKDB, but at the expense of contention with other processes that may be running. It also causes the system table metadata validation and service broker validation to be skipped.

ESTIMATEONLY

When this argument is specified, no checks are performed. The only thing that happens is that the space required in TempDB to perform the checks is calculated based on the other arguments specified.

PHYSICAL_ONLY

When this argument is used, DBCC CHECKDB is limited to performing allocation consistency checks on the database, consistency checks on system catalogs, and validation on each page of every table within the database. This option cannot be used in conjunction with DATA_PURITY.

DATA_PURITY

Specifies that column integrity checks are carried out, such as ensuring that values are within their data type boundaries. For use with databases that have been upgraded from SQL Server 2000 or below only. For any newer databases, or SQL Server 2000 databases that have already been scanned with DATA_PURITY, the checks happen by default.

ALL_ERRORMSGS

For backward compatibility only. Has no effect on SQL 2022 databases.

DBCC CHECKDB is a very intensive process that can consume many CPU and I/O resources. Therefore, it is advisable to run it during a maintenance window to avoid performance issues for applications. The Database Engine automatically decides how many CPU cores to assign the DBCC CHECKDB based on the Instance level setting for MAXDOP and the amount of throughput to the sever when the process begins. If you expect load to increase during the window when DBCC CHECKDB will be running, however, then you can throttle the process to a single core by turning on Trace Flag 2528. This flag should be used with caution, however, because it causes DBC CHECKDB to take much longer to complete. If a snapshot is not generated, either because you have specified TABLOCK or because there was not enough space on disk to generate a snapshot, then it also causes each table to be locked for a much longer period.

The sample in Listing 9-8 does not perform any checks but calculates the amount of space required in TempDB in order for DBCC CHECKDB to run successfully against the Chapter9 database.
USE Chapter9
GO
DBCC CHECKDB WITH ESTIMATEONLY ;
Listing 9-8

Checking TempDB Space Required for DBCC CHECKDB

Because our Chapter9 database is tiny, we only require less than half a megabyte of space in TempDB. This is reflected in the results, shown in Figure 9-5.
Figure 9-5

TempDB space required for DBC CHECKDB results

The script in Listing 9-9 uses DBCC CHECKDB to perform consistency checks across the entire Chapter9 database.
USE Chapter9
GO
DBCC CHECKDB ;
Listing 9-9

Running DBCC CHECKDB

Figure 9-6 displays a fragment of the results of running this command. As you can see, the issue with the corrupt page in the CorruptTable table has been identified.
Figure 9-6

DBCC CHECKDB identifies corrupt page

In real life, unless you are troubleshooting a specific error, you are unlikely to be running DBCC CHECKDB manually. It is normally scheduled to run with SQL Server Agent or a maintenance plan. So how do you know when it encounters an error? Simply, the SQL Server Agent job step fails. Figure 9-7 shows the error message being displayed in the history of the failed job. The output from DBCC CHECKDB is also written to the SQL Server Error log. This is regardless of whether or not it was invoked manually or through a SQL Server Agent job.
Figure 9-7

Errors in job history

Because DBCC CHECKDB finding errors causes the job to fail, you can set up a notification so that a DBA receives an alert. Assuming that Database Mail is configured on the sever, you can create a new operator that receives e-mails by selecting New Operator from the context menu of the Operators folder under the SQL Server Agent folder in SQL Server Management Studio, as illustrated in Figure 9-8.
Figure 9-8

Create a new operator

Once you have created the operator, you are able to specify that operator in the Notifications tab of the Job Properties page of the SQL Server Agent job. This is illustrated in Figure 9-9.
Figure 9-9

Configure notification

SQL Server Agent jobs are discussed fully in Chapter 24.

Fixing Errors

When we use DBCC CHECKDB to repair a corruption in the database, we need to specify an additional argument that determines the repair level to use. The options available are REPAIR_REBUILD or REPAIR_ALLOW_DATA_LOSS. REPAIR_REBUILD is, of course, the preferred option, and it can be used to resolve issues that will not cause data loss, such as bad page pointers, or corruption inside a nonclustered index. REPAIR_ALLOW_DATA_LOSS attempts to fix all errors it encounters, but as its name suggests, this may involve data being lost.

Caution

You should only use this option to restore the data if no backup is available.

Before specifying a repair option for DBCC CHECKDB, always run it without a repair option first. This is because when you do so, it will tell you the minimum repair option that you can use to resolve the errors. If we look again at the output of the run against the Chapter9 database, then we can see that the end of the output advises the most appropriate repair option to use. This is illustrated in Figure 9-10.
Figure 9-10

Suggested repair option

In our case, we are informed that we need to use the REPAIR_ALOW_DATA_LOSS option. If we try to use the REPAIR_REBUILD option, we receive the following message, from DBCC CHECKDB:

CHECKDB found 0 allocation errors and 4 consistency errors in database 'chapter9'.repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (chapter9, repair_rebuild)
Since we do not have a backup of the Chapter9 database, this is our only chance of fixing the corruption. In order to use the repair options, we also have to put our database in SINGLE_USER mode. The script in Listing 9-10 places the Chapter9 database in SINGLE_USER mode, runs the repair, and then alters the database again to allow multiple connections.
ALTER DATABASE Chapter9 SET SINGLE_USER ;
GO
DBCC CHECKDB (Chapter9, REPAIR_ALLOW_DATA_LOSS) ;
GO
ALTER DATABASE Chapter9 SET MULTI_USER ;
GO
Listing 9-10

Repairing Corruption with DBCC CHECKDB

The partial results in Figure 9-11 show that the errors in CorruptTable have been fixed. It also shows that the page has been deallocated. This means that we have lost all data on the page.
Figure 9-11

Results of repairing corruption with DBCC CHECKDB

If we query the msdb.dbo.suspect_pages table again using the same query as demonstrated in Listing 9-4, we see that the Event column has been updated to state that the page has been deallocated. We can also see that the error_count column has been incremented every time we accessed the page, through either SELECT statements or DBCC CHECKDB. These results are displayed in Figure 9-12.
Figure 9-12

Suspect_pages table, following repair

Emergency Mode

If your database files are damaged to the extent that your database is inaccessible and unrecoverable, even by using the REPAIR_ALLOW_DATA_LOSS option, and you do not have usable backups, then your last resort is to run DBCC CHECKDB in emergency mode using the REPAIR_ALLOW_DATA_LOSS option. Remember, emergency mode is a last resort option for repairing your databases, and if you cannot access them through this mode, you will not be able to access them via any other means. When you perform this action with the database in emergency mode, DBCC CHECKDB treats pages that are inaccessible due to corruption as if they do not have errors in an attempt to recover data.

This operation can also back up databases that are inaccessible due to log corruption. This is because it attempts to force the transaction log to recover, even if it encounters errors. If this fails, it rebuilds the transaction log. Of course, this may lead to transaction inconsistencies, but as mentioned, this is an option of last resort.

As an example, we will delete the transaction log file for the Chapter9 database in the operating system. You can find the operating system location of the transaction log file by running the query in Listing 9-11.
SELECT physical_name
FROM sys.master_files
WHERE database_id = DB_ID(‘Chapter9’)
    AND type_desc = ‘Log’ ;
Listing 9-11

Finding the Transaction Log Path

Because data and log files are locked by the SQL Server process, we first need to stop the instance. After starting the instance again, we can see that our Chapter9 database has been marked as Recovery Pending, as shown in Figure 9-13.
Figure 9-13

Database in Recovery Pending

Since we have no backup available for the Chapter9 database, the only option that we have is to use DBCC CHECKDB in emergency mode. The script in Listing 9-12 puts the Chapter9 database in emergency mode and then uses DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option to fix the error.
ALTER DATABASE Chapter9 SET EMERGENCY ;
GO
ALTER DATABASE Chapter9 SET SINGLE_USER ;
GO
DBCC CHECKDB ('Chapter9', REPAIR_ALLOW_DATA_LOSS) ;
GO
ALTER DATABASE Chapter9 SET MULTI_USER ;
GO
Listing 9-12

DBCC CHECKDB in Emergency Mode

The partial results, displayed in Figure 9-14, show that SQL Server was able to bring the database online by rebuilding the transaction log. However, it also shows that this means that transactional consistency has been lost and the restore chain has been broken. Because we have lost transactional consistency, we should now run DBCC CHECKCONSTRAINTS to find errors in foreign key constraints, and CHECK constraints. DBCC CHECKCONSTRAINTS is covered later in this chapter.
Figure 9-14

Results of DBCC CHECKDB in emergency mode

Note

If running DBCC CHECKDB in emergency mode fails, then there is no other way that the database can be repaired.

Other DBCC Commands for Corruption

A number of other DBCC commands perform a subset of the work carried out by DBCC CHECKDB. These are discussed in the following sections.

DBCC CHECKCATALOG

In SQL Server, the system catalog is a collection of metadata that describes the database and data held within it. When DBCC CHECKCATALOG is run, it performs consistency checks on this catalog. This command is run as part of DBCC CHECKDB but can also run as a command in its own right. When run in its own right, it accepts the same arguments as DBCC CHECKDB, with the exception of PHYSICAL_ONLY and DATA_PURITY, which are not available for this command.

DBCC CHECKALLOC

DBCC CHECKALLOC performs consistency checks against the disk allocation structures within a database. It is run as part of DBCC CHECKDB but can also be run as a command in its own right. When run in its own right, it accepts many of the same arguments as DBCC CHECKDB, with the exception of PHYSICAL_ONLY, DATA_PURITY, and REPAIR_REBUILD, which are not available for this command. The output is by table, index, and partition.

DBCC CHECKTABLE

DBCC CHECKTABLE is run against every table and indexed view in a database, as part of DBCC CHECKDB. However, it can also be run as a separate command in its own right against a specific table and the indexes of that table. It performs consistency checks against that specific table, and if any indexed views reference the table, it also performs cross table consistency checks. It accepts the same arguments as DBCC CHECKDB, but with it, you also need to specify the name or ID of the table that you want to check.

Caution

I have witnessed people split their tables into two buckets and replace DBCC CHECKDB with a run DBCC CHECKTABLE against half of their tables on alternate nights. This not only leaves gaps in what is being checked, but a new database snapshot is generated for every table that is checked, as opposed to one snapshot being generated for all checks to be performed. This can lead to longer run times, per table.

DBCC CHECKFILEGROUP

DBCC CHECKFILEGROUP performs consistency checks on the system catalog, the allocation structures, tables, and indexed views within a specified filegroup. There are some limitations to this, however, when a table has indexes that are stored on a different filegroup. In this scenario, the indexes are not checked for consistency. This still applies if it is the indexes that are stored on the filegroup that you are checking, but the corresponding base table is on a different filegroup.

If you have a partitioned table, which is stored on multiple filegroups, DBCC CHECKFILEGROUP only checks the consistency of the partition(s) that are stored on the filegroup being checked. The arguments for DBCC CHECKFILEGROUP are the same as those for DBCC CHECKDB, with the exception of DATA_PURITY, which is not valid and you cannot specify any repair options. You also need to specify the filegroup name or ID.

DBCC CHECKIDENT

DBCC CHECKIDENT scans all rows within a specified table to find the highest value in the IDENTITY column. It then checks to ensure that the next IDENTITY value, which is stored in a table’s metadata, is higher than the highest value in the IDENTITY column of the table. DBCC CHECKIDENT accepts the arguments detailed in Table 9-6.
Table 9-6

DBCC CHECKIDENT Arguments

Argument

Description

Table Name

The name of the table to be checked.

NORESEEED

Returns the maximum value of the IDENTITY column and the current IDENITY value, but will not reseed the column, even if required.

RESEED

Reseeds the current IDENITY value to that of the maximum IDENTITY value in the table.

New Reseed Value

Used with RESEED, specifies a seed for the IDENITY value. This should be used with caution, since setting the IDENTITY value to lower than the maximum value in the table can cause errors to be generated, if there is a primary key or unique constraint on the IDENTITY column.

WITH NO_INFOMSGS

Causes informational messages to be suppressed.

We could check the IDENTITY value against the maximum IDENTITY value in our CorruptTable table by using the command in Listing 9-13.
DBCC CHECKIDENT('CorruptTable',NORESEED) ;
Listing 9-13

DBCC CHECKIDENT

The results, displayed in Figure 9-15, show that both the maximum value in the IDENITY column and the current IDENTITY value are both 10201, meaning that there is not currently an issue with the IDENTTY value in our table.
Figure 9-15

DBCC CHECKIDENT results

DBCC CHECKCONSTRAINTS

DBCC CHECKCONSTRAINTS can check the integrity of a specific foreign key or check constraint within a table, check all constraints on a single table, or check all constraints on all tables of a database. DBCC CHECKCONSTRAINTS accepts the arguments detailed in Table 9-7.
Table 9-7

DBCC CHECKCONSTRAINTS Arguments

Argument

Description

Table or Constraint

Specifies either the name or ID of the constraint you wish to check, or specifies the name or ID of a table to check all enabled constraints on that table. Omitting this argument causes all enabled constraints on all tables within the database to be checked.

ALL_CONSTRAINTS

If DBCC CHECKCONSTRAINTS is being run against an entire table or entire database, then this option forces disabled constraints to be checked as well as enabled ones.

ALL_ERRORMSGS

By default, if DBCC CHECKCONSTRAINTS finds rows that violate a constraint, it returns the first 200 of these rows. Specifying ALL_ERRORMSGS causes all rows violating the constraint to be returned, even if this number exceeds 200.

NO_INFOMSGS

Causes informational messages to be suppressed.

The script in Listing 9-14 creates a table called BadConstraint and inserts a single row. It then creates a check constraint on the table, with the NOCHECK option specified, which allows us to create a constraint that is immediately violated by the existing row that we have already added. Finally, we run DBCC CHECKCONSTRAINTS against the table.
USE Chapter9
GO
--Create the BadConstraint table
CREATE TABLE dbo.BadConstraint
(
ID        INT PRIMARY KEY
) ;
--Insert a negative value into the BadConstraint table
INSERT INTO dbo.BadConstraint
VALUES(-1) ;
--Create a CHECK constraint, which enforces positive values in the ID column
ALTER TABLE dbo.BadConstraint WITH NOCHECK ADD CONSTRAINT chkBadConstraint CHECK (ID > 0) ;
GO
--Run DBCC CHECKCONSTRAINTS against the table
DBCC CHECKCONSTRAINTS('dbo.BadConstraint') ;
Listing 9-14

DBCC CHECKCONSTRAINTS

The results of running this script are shown in Figure 9-16. You can see that DBCC CHECKCONSTRAINTS has returned the details of the row that breaches the constraint.
Figure 9-16

DBCC CHECKCONSTRAINTS results

Tip

After running DBCC CHECKDB, or other DBCC commands to repair corruption, it is good practice to run DBCC CHECKCONSTRAINTS. This is because the repair options of the DBCC commands do not take constraint integrity into account.

Even if DBCC CHECKCONSTRAINTS does not find any bad data, it still does not mark the constraint as trusted. You must do this manually. The script in Listing 9-15 first runs a query to see if the constraint is trusted and then manually marks it as trusted.
SELECT
    is_not_trusted
FROM sys.check_constraints
WHERE name = ‘chkBadConstraint’ ;
ALTER TABLE dbo.BadConstraint WITH CHECK CHECK CONSTRAINT chkDadConstraint ;
Listing 9-15

Marking a Constraint as Trusted

Consistency Checks on VLDBs

If you have VLDBs (very large databases), then it may be difficult to find a maintenance window long enough to run DBCC CHECKDB, and running it while users or ETL processes are connected is likely to cause performance issues. You may also encounter a similar issue if you have a large estate of smaller databases that are using a common infrastructure, such as a SAN or a private cloud. Ensuring that your databases are consistent, however, should be a priority near the top of your agenda, so you should try and find a strategy that achieves both your maintenance and performance goals. The following sections discuss strategies that you may choose to adopt to achieve this balance.

DBCC CHECKDB with PHYSICAL_ONLY

One strategy that you can adopt is to run DBCC CHECKDB regularly, ideally nightly, using the PHYSICAL_ONLY option, and then run a complete check on a periodic, but less-frequent basis, ideally weekly. When you run DBCC CHECKDB with the PHYSICAL_ONLY option, consistency checks are carried out of system catalogs and allocation structures and each page of every table is scanned and validated. The net result of this is that corruption caused by I/O errors is trapped, but other issues, such as logical consistency errors, are identified. This is why it is important to still run a full scan weekly.

Backing Up WITH CHECKSUM and DBCC CHECKALLOC

If you are in a position where all of your databases have a full backup every night and all are configured with a PAGE_VERIFY option of CHECKSUM, then an alternative approach to the one mentioned in the previous section is to add the WITH CHECKSUM option to your full backups, followed by a DBCC CHECKALLOC, to replace the DBCC CHECKDB with the PHYSICAL_ONLY option specified on a nightly basis. The DBCC CHECKALLOC command, which is actually a subset of the DBCC CHECKDB command, validates the allocation structures within the database. When the full backups are taken WITH CHECKSUM, then this fulfills the requirement to scan and verify each page of every table for I/O errors. Just like running DBCC CHECKDB, with the PHYSICAL_ONLY option specified, this identifies any corruption caused by I/O operations and identifies any bad checksums. Any page errors that occurred in memory, however, are not identified. This means that like the PHYSICAL_ONLY strategy, you still require a full run of DBCC CHECKDB once a week to trap logical consistency errors or corruptions that occurred in memory. This option is very useful if you have an environment with common infrastructure and you are performing full nightly backups of all databases, since you will reduce the overall amount of I/O on a nightly basis. This is at the expense of the duration of your backup window, however, and it increases the resources used during this time.

Splitting the Workload

Another strategy for VLDBs may be to split the load of DBCC CHECKDB over multiple nights. For example, if your VLDB has multiple filegroups, then you could run DBCC CHECKFILEGROUP against half of the filegroups on Monday, Wednesday, and Friday, and against the other half of the filegroups on Tuesday, Thursday, and Saturday. You could reserve Sunday for a full run of DBCC CHECKDB. A full run of DBCC CHECKDB is still advised on a weekly basis, since DBCC CHECKFILEGROUP does not perform checks, such as validating Service Broker objects.

If your issue is common infrastructure, as opposed to VLDBs, then you can adapt the concept just described so that you run DBCC CHECKDB on a subset of databases on alternate nights. This can be a little complex, since the approach here, in order to avoid swamping the SAN, is to segregate the databases intelligently, and based on size, as opposed to a random 50/50 split. This will often involve centralizing the maintenance routines, on a central server, known as a Central Management Server (CMS). A CMS is a SQL Server instance, which sits in the middle of your SQL Server estate, and provides centralized monitoring and maintenance. Often, for intelligent maintenance, such as I am describing here, the CMS will control maintenance routines by scheduling PowerShell scripts, which will interact with metadata, stored on the CMS, to decide which maintenance jobs to run.

Offloading to a Secondary Server

The final strategy for reducing the load on production systems caused by DBCC CHECKDB is to offload the work to a secondary server. If you decide to take this approach, it involves taking a full backup of the VLDB and then restoring it on a secondary server before you run DBCC CHECKDB on the secondary server. This approach has several disadvantages, however. First, and most obviously, it means that you have the expense of procuring and maintaining a secondary server, just for the purpose of running consistency checks. This makes it the most expensive of the options discussed (unless of course, you reuse an existing server, such as a UAT server). Also, if you find corruption, you will not know if the corruption was generated on the production system and copied over in the backup or if the corruption was actually generated on the secondary server. This means that if errors are found, you still have to run DBCC CHECKDB on the production server.

Summary

Many types of corruption can occur in SQL Server. These include pages that have been damaged at the file system level, logical consistency errors, and corrupt pages that have a bad checksum. Pages can also be damaged in memory, which would not be identified through a checksum.

Three-page verification options can be selected for a database. The NONE option leaves you totally exposed to issues and is regarded as bad practice. The TORN_PAGE_DETECTION option is deprecated and should not be used, since it only checks the first 2 bytes in every 512-byte sector. The final option is CHECKSUM. This is the default option and should always be selected.

Pages that are damaged are stored in a table called dbo.suspect_pages in the MSDB database. Here, the error count is increased every time the error is encountered and the event type of the page is updated to indicate that it has been repaired or restored as appropriate.

If a system database, especially Master, becomes corrupt, you may be unable to start your instance. If this is the case, then you can rectify the issue by running setup with the ACTION parameter set to Rebuilddatabases. Alternatively, if the instance itself has become corrupt, then you can run setup with the ACTION parameter set to repair. This resolves issues such as corrupt Registry keys or corruption to the Resource database.

DBCC CHECKDB is a command that you should run on a regular basis to check for corruption. If you find corruption, then you can also use this command to fix the issue. There are two repair modes that are available, depending on the nature of the corruption: REPAIR_REBUILD or REPAIR_ALLOW_DATA_LOSS. You should only use the REPAIR_ALLOW_DATA_LOSS option as a last resort in the event that no backup is available from which to restore the database or corrupt pages. This is because the REPAIR_ALLOW_DATA_LOSS option is liable to deallocate the corrupt pages, causing all data on these pages to be lost.

Other DBCC commands can be used to perform a subset of DBCC CHECKDB functionality. These include DBCC CHECKTABLE, which can validate the integrity of a specific table, and DBCC CONSTRAINTS, which you can use to verify the integrity of foreign keys and check constraints, especially after you run DBCC CHECKDB with a repair option.

For VLDBs or estates that share infrastructure, running DBCC CHECKDB can be an issue because of performance impact and resource utilization. You can mitigate this by adopting a strategy that offers a trade-off between maintenance and performance goals. These strategies include splitting the workload, offloading the workload to a secondary server, or running only subsets of the checking functionality on a nightly basis and then performing a full check on a weekly basis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.231.128