Chapter 13. Highly Available Upgrades

Upgrades present a challenge that all organizations face at some point. Upgrades fall into different categories, such as upgrading versions of Microsoft SQL Server, server consolidation, applying a SQL Server or Microsoft Windows service pack or hotfix, or applying third-party patches for the software you run in your environment. The challenge with upgrades is not so much the upgrade process itself—that is fairly mechanical—it is ensuring that you minimize downtime and the impact on users. If you have specific service level agreements (SLAs) in place, you understand the complexities of maintaining a highly available environment when you need to apply changes to your production environment. This chapter guides you through the process of how to plan and execute highly available upgrades and the considerations you need to take into account.

Important

Regardless of any discussion that follows, please keep in mind that only qualified personnel should perform upgrades of any type. Leaving your upgrade to someone who is unfamiliar with your environment could prove costly.

General Upgrade, Consolidation, and Migration Tips

An upgrade or migration is all about having a solid plan, testing it, and developing contingency plans to deal with common problems or possibilities. From a high availability perspective, at some point your upgrade process is likely to incur some downtime. Remember that there are application-specific patches or upgrades as well as ones for operating systems and that the availability of the operating system will absolutely affect the availability of the applications running on the server. Whether you are upgrading for security reasons or to fix another functionality, as you try to minimize the impact take into account the following rules of thumb that apply to all the concepts presented in this chapter:

  • Everything affects everything. Think about your body. When a doctor prescribes a new medicine for you to take, it is absorbed by your blood and might solve a problem, but it might also have some undesirable side effects that affect you daily. Or it might not have any negative interactions with your internal systems, and you will be just fine.

    From a technology standpoint, if you apply a Windows service pack, assume it will affect SQL Server in some way. There can be no 100 percent guarantee that whatever you do is transparent, and it might even subtly change the behaviors of your application or operating system. The more you have running on one server, the greater the chance of possible interactions. Even doing something as seemingly innocuous as upgrading the BIOS on your system or upgrading to the latest driver or firmware for a specific piece of hardware (such as a SAN)—seemingly unrelated to your application software—can affect your software behaviors or break them completely.

    You should consult the hardware or software manufacturer for any and all available information on their update and what it might do to your systems. However, manufacturers do not have the ability to test all permutations of the effects of their patches (and probably do not have your application code to do so even if they wanted to), so it is your responsibility to ensure that upgrades behave in your environment.

  • Try not to combine multiple updates, especially large ones. For example, do not upgrade to a higher bandwidth network card and then install various driver updates to existing components in one maintenance window. It is even more important not to perform multiple updates if they were not tested both alone and together. Troubleshooting after simultaneous upgrades is infinitely more difficult than after just one; it is more difficult to determine what is causing the problem because you applied different changes.

    Another example would be if you upgrade the Windows service pack and the SQL service pack at the same time. You probably save some downtime during the upgrades, but if you experience some form of regression you will more than lose it in the amount of time it takes you to troubleshoot the problem.

  • Test any patches, driver updates, hotfixes, service packs, and so on prior to rolling them out. If availability is one of your company’s concerns, there is no better time than the present to start testing. The best way to test, should you want to apply a hotfix, is to do it on a dedicated testing or staging environment to determine how it will impact your applications, the server itself, and so on. You do not want to cause a potentially larger availability outage by having to roll back your installation or reinstall from scratch should you encounter a worst-case scenario. You need additional hardware to do these tests, but think about the business case. Is it better to spend some money up front to ensure minimal downtime or to pay an expensive price for being down?

  • Test the patches under your production load. Do not test on a server with no or minimal usage. Many companies are great about testing the functionality of what they are upgrading to but neglect to test it under stress. The testing environment must have some method of simulating load on the servers so that you will know there are no issues waiting for you around a "performance" bend in the upcoming road.

    This is especially true with SQL Server because of its self-tuning nature. This internal tuning is incredibly complex but it can also be somewhat fragile in that it is picking a particular plan based on its statistical information and the underlying mathematics. You might be right on the borderline for a particular plan, and the smallest possible change could give you a completely different plan for that subset of queries. The performance difference is probably not noticeable unless this is a plan you happen to use very often and it is also the biggest load point (bottleneck) in your application at the time. The solution might be as simple as a two-word hint for that query to slide things back to the original plan. However, finding the problem during testing allows you to implement this hint during your upgrade instead of finding it during your peak load the next day and being unable to do anything about it quickly enough to prevent a negative user experience.

  • How do you build your test platform? Your test platforms will prove your deployment. If they are not built in the same manner as the production servers, even small delta could skew the results.

  • Always read the documentation that comes with the upgrade or patch carefully. It contains information about the fixes and behaviors, as well as how to install the upgrade or patch. Too often support calls are generated because people skip this step and just click on Setup.exe or Install.bat (or whatever the installation mechanism is) without thinking. If you have any questions or concerns that are not addressed in the documentation, especially in relation to your configurations, call a support professional. This is your production environment, not some system that no one is using (if it is, you might want to make it your test server). Although some might see it as a wasted support call, it is better to ensure that the process will go smoothly and properly than to skip that call but spend hours on the phone later, after an unsuccessful installation. Also, keep an eye on newsgroups, magazine articles, and other resources, because your peers might point you to helpful information they have learned through their implementation experiences.

  • Notify anyone who will be affected by the upgrade and give them plenty of lead time. The point is to inconvenience others as little as possible. Notifying them five minutes beforehand or killing their session or connection without any warning will not create goodwill. Remember to notify them after the system is available for use again. Give them a solid contact point to notify if they do experience difficulties. You might have been up all night doing the upgrade, and somebody else could try to fix the problem the next day while you are catching up on your sleep without knowing what you were working on or where you left off.

  • Do not allow user connections to the server while you are applying the upgrade or fix. The upgrade instructions should include any specific implications and details, but it is best to ensure that nothing or no one can interfere with the installation process. For example, a Microsoft SQL Server 2000 service pack installation puts the server into single-user mode, so no connections can be made unless the user goes in at the right moment, which would ruin the installation.

  • Prepare a detailed implementation plan and ensure your disaster recovery plans are up to date prior to the upgrade or update. You do not want surprises in the middle of your implementation. As noted earlier in this book, your plan and the way to back out of it will be crucial for your success. Unfortunately, many people still attempt to apply an upgrade or fix to their systems without thinking of the consequences. When availability is one of your main goals, you are asking for trouble if you are without a workable plan.

  • Consider the time frame within which you will execute your plan. If you know your business does its monthly sales forecasting the last week of the month and users need access to the sales database at that time, do not perform the upgrade that week, or maybe even not the one before or after. That would leave you, in a four-week month, a one-week window to execute the plan at a time when it would have the least impact on the business and give you the most time to recover should something catastrophic happen.

  • Make backups of all of your user and system databases prior to performing an update or upgrade. This is one of the most important things you should do. It ensures (assuming the backups are good) that in a worst-case scenario you can get back to the point in time where you were prior to the bad update or upgrade. If the process goes well, remember to back up the databases afterward so you have a new baseline of good backups for use in a disaster recovery scenario. Finally, you should make another set of backups after the upgrade if it involved SQL Server in any way. This way, you can restore to this point in time without restoring the server to the previous state separately from the database.

  • For SQL-based backups, you should consider your backup strategy carefully. You should also calculate the effect of both your backup strategy and your current checkpoint timers and how they affect your SQL Server log size. It might be worth a little extra work to quiesce or at least partially quiesce your database if you will be backing it up twice! Think of the time, disk, or tape savings in pure size and time for the backup. Then think also of the restore time savings if things do go wrong in spite of all your careful planning.

Note

Good planning cannot protect you from every problem, but it can give you a solid plan for dealing with most problems.

More Info

For more information on putting plans together and good background information in general for this chapter, consult Chapter 1, Chapter 2, Chapter 9, Chapter 10, and Chapter 12.

Upgrading, Consolidating, and Migrating to SQL Server 2000

Whether you are just upgrading from another version of SQL Server, consolidating your database environment, or migrating to SQL Server from another platform, you need to be concerned with some basics before you even begin the process. This section focuses primarily on the technical aspects, but it is crucial to understand the business reasons or drivers for undertaking such a project to ensure that the technical meets the business, and vice versa. Follow the guidelines in Chapter 1 and Chapter 2 for setting up a steering committee and so on. Any planning fits squarely within Microsoft Solutions Framework (MSF) and Microsoft Operations Framework (MOF) models.

Your business sponsors should be able to answer the following questions. The information will greatly help you in your technical planning.

  • What business value do you believe will be gained from your upgrade, consolidation, or migration effort?

  • If you are consolidating, what is the desired consolidation ratio—2:1, 5:1, 10:1, or what? Is it realistic?

  • What are the drivers for your effort? Why are you upgrading, consolidating, or migrating?

  • Have the IT and business stakeholders bought into and signed off on the effort? Is there adequate funding?

  • Is chargeback for resource usage a need or a consideration? Will the tools currently in place to measure chargeback still work?

  • How many users are expected to be supported concurrently by this solution? in the short term? in the long term?

  • What is the life expectancy of this solution (that is, how long is this solution, along with its systems, supposed to stay in production)?

  • How much will the data grow over time? What is projected versus actual growth (if known)?

  • What is acceptable performance from end-user, administrative, and management perspectives? Keep in mind that performance is defined differently—it could mean throughput, response time, and so on.

  • What is the current availability goal for each individual system? Each solution? Will the availability goal for the component or solution meet the new requirements?

  • How is maintenance going to be performed on this system? Will maintenance change due to consolidation?

  • How are backups going to be performed on the system and the data? How will this work affect performance?

  • What are the security requirements for both the application and the systems for the solution? Are they in line with corporate policies?

  • Are there security conflicts between systems and objects that might be consolidated together?

  • What is the cost of developing, implementing, and supporting? in the short term? in the long term?

  • What is the actual cost of downtime for each individual system or database installation? The entire solution? What is the current cost of administering and running the server?

  • What are the dependencies of the components in the solution? For example, are there external data feeds into SQL Server that might fail as the result of an availability problem or move in a system?

  • What technologies are currently used in the application, solution, or system, and what is desired? How will a change affect each component of your solution individually and as a whole?

  • What is the budget allocation for hardware? What is the available budget for the entire effort?

  • If this is an existing SQL Server, is the system already at or exceeding capacity, whether it is processor, memory, or disk?

  • What SLAs are in place and how will they be affected?

  • Where are the servers located? Are they currently in the same place? If not, how will changing location or geographic placement affect the entire group of users, performance, and the SLA?

  • What roles exist within the IT division? Do you have a person who is dedicated solely to capacity management?

  • Do you have effective change control processes in place with the existing systems or do you need new ones?

  • Do you have proper development, test, and staging environments?

  • Do you have a testing or quality assurance team? Are there dedicated DBAs? If so, how many?

  • What is the current ratio of SQL Servers and databases to DBAs?

  • Do you plan on eliminating or downsizing personnel as a result of this effort?

  • What are the different types of systems considered for upgrading, consolidating, or migrating—production, test, and development?

  • Will you employ new technologies, such as clustering for high availability? How will that affect your employees from a skills, cost, and training perspective?

  • Do you have specific SLAs to maintain with each "application" even after consolidating them onto a single system in a server consolidation effort?

  • Are there multiple systems with conflicting SLAs currently or being planned on the same machine?

  • Are certain systems required to be separated from others due to things like performance or SLAs?

  • What is the desired end date for the project? Are there any constraints that might come into play, such as those posed by hardware or service vendors?

  • Do you know if the systems are mission critical? If you are looking to combine certain target servers, how will that affect the other systems?

  • How much business, in terms of both actual transactions (not database) and revenue, does each system generate?

  • List any reference projects done to date that have influenced your interest.

  • State any particular issues or information that must be taken into account, such as the following:

    • Are there any corporate policies that might affect the effort?

    • Such things as "sole source" providers are illegal with some government agencies, so that might restrict consolidation contracts in some way.

    • Corporate security policies might inhibit or change the nature of an effort, such as server consolidation, by requiring physical isolation of systems and data from different departments, as in the case of Limited Liability Partnerships.

Phase 1: Envisioning

While the business side of the house is figuring out the nontechnical drivers for your effort, the administrators (DBAs and non-DBAs, too) must start to gather information for the plan on the technical side. This involves a great deal of information that you should have been gathering all along to establish baselines and make judgments. Ask the following sample questions and document the answers, as each one will influence your planning. Add questions if they are relevant to your environment.

  • What technical advantages do you believe you will gain from this effort?

  • How did the original deployments occur? What can be learned (good and bad) from those deployments?

  • How many servers are going to be part of the effort? Have the servers even been identified? Who is your vendor, and what class will the target servers be?

  • How many databases are on the targeted servers?

  • Are the servers to be consolidated currently separated geographically?

  • What are the usage patterns of the targeted servers, including use of the system databases? Do you have a performance baseline for each server? Is this baseline broken down to business units and their respective systems?

  • Do you know the resource utilization of each server? Can you accurately assess if the targeted servers are overutilized or underutilized?

  • Are these servers already employing some form of high availability, such as failover clustering or log shipping?

  • Is replication configured for any of the databases?

  • How do you manage database schemas, including stored procedures and functions? Do you keep them in some form of version control?

  • Are security policies and procedures documented for each server (physical hardware, operating system, SQL Server, and the applications)? Are security policies in line with your current corporate policies?

  • Do you know all the information about each server (such as service pack levels, disk space usage, SQL Server configuration settings, and so on)?

  • Do you know the types of workload on each system (that is, OLTP versus OLAP/Decision-support)?

  • Do you use administrative tools such as Alerts and SQL Mail? Do you have standardization between all servers?

  • Do you have any custom extended stored procedures that might affect other databases?

  • Do you have conflicting objects with the same name?

  • Do you have access to all source code for the applications accessing SQL Server as well as stored procedures (especially if you use stored procedure encryption)?

  • Do you know how many users, on average, connect to SQL Server? What types of clients connect to SQL Server (such as handheld devices, fat clients, and so on)? How do they connect?

  • Do you know the requirements of the third-party applications that might use SQL Server and be consolidated?

  • What current backup technologies are used?

  • What current high availability technologies are used?

  • Have the DBAs in your organization been formally trained on SQL Server in all aspects, including administration, high availability, and performance?

  • Are the current servers in different domains?

  • What versions of SQL Server are these servers running?

  • What versions of the operating system are used on these servers?

  • How much will the data grow over time? What is projected versus actual growth (if known)?

  • What technologies are currently used in the solution?

Profile Target Systems

The most important component of any upgrade, consolidation, or migration is having adequate technical information about each system. Without this, you cannot proceed successfully with any of your efforts. If you do not know if you can combine a workload, it is impossible to make any decisions.

Each system that is being considered must be profiled, and not for just one day or, sometimes, even just one week. The profile is the result of monitoring over time to gather trends. These trends might be daily or weekly jobs, as well as daily, weekly, and even monthly trends in workload (for example, end-of-month sales reports). Profiling includes much more than just the system name: every important performance aspect of the server must be captured. When figuring out if workloads and systems can be combined, you need to determine if incompatibilities will result from, say, a configuration setting or an application requirement.

Besides capturing general items such as operating system version, service pack levels, IP addresses, and disk configurations through the worksheets, the specific settings for each application also need to be captured. SQL Server provides the following methods to assist in the documentation of your systems:

  • Run the system-stored procedure sp_configure, which captures all of the configuration settings for SQL Server.

  • Run the system-stored procedure sp_helpstartup on Microsoft SQL Server 6.5 or sp_procoption on Microsoft SQL Server 7.0 and SQL Server 2000. These help determine the startup procedures and settings used by SQL Server.

  • Use the command-line Sqldiag.exe utility to capture SQL Server system information. This utility is usually used for diagnostic purposes, but this is one of its side benefits. This is the syntax to use:

    SQLDIAG.EXE -X -U user_name -P password -I instance_name -O
    output_file

    - X skips the error logs; you can use - C as well to get cluster information; - I is only used if it is not a default instance.

You should also document all of the objects associated with each database. This includes stored procedures, jobs, logins, and system-level logins as well as their corresponding database users, maintenance plans, and so on, even if they reside outside of the database. You will use this information later to determine conflicts, such as duplicate stored procedure names and logins or differences in naming standardizations. If you want, run the same DBCC commands on each system. Document anything and everything, even if you think it might not be relevant. Include linked servers, modified system databases, and so on.

More Info

For more information on gathering information about your systems, consult Chapter 4, Chapter 12, and Chapter 15. The run book described in Chapter 12 is a great resource for information. Chapter 4 and Chapter 15 cover monitoring and capacity management from a system performance perspective.

Gathering System Performance Metrics

Gathering performance metrics from your server is more than a one-time event. To truly understand what is going on with your system, performance metrics must be done over a period of time and then analyzed. At a minimum, you should gather statistics about your systems for one business week. During that time, record numbers when there is little to no load, medium load, and heavy load. Gathering numbers only at times that show one extreme or the other does not paint an accurate picture of system use.

Other SQL Server Information to Gather

This section describes the most common data to collate, which should help you plan for the issues that are most commonly encountered when consolidating. Refer to the section on technical considerations under the planning section of this chapter for more information and details on most of these points.

Consolidating System Databases Be sure to take care of the following:

  • Identify any objects in system tables that are not part of a standard installation.

  • Identify any objects in system tables that have been modified since a base installation.

  • Look for duplicate names.

  • Review each object for relevance in the new environment.

  • Review each object to determine if duplicity exists at a more granular level.

  • Review each object for explicit references, such as path names, server names, and task names.

  • Do not address tasks found in msdb; this will be addressed in the next section.

  • Only search for nondefault objects found in tempdb. Use of tempdb is dealt with later.

Collation and Sort Order Check out the following:

  • What is the sort order of the source server?

  • Identify any objects that deviate from the collation setting of the server.

Security (Logins, Security, and Permissions) Be sure you have information about the following items:

  • Collect and identify duplicate logins.

  • Determine whether trusts are required across domains.

  • Determine if guest is an active user.

  • Collect data of logins with administrative rights.

  • Collect security settings and the permission structure for the public group.

  • Determine if you will need specific registry permissions.

  • Determine what permissions are required for extended stored procedures, particularly xp_cmdshell.

  • What accounts are the SQL Server services running under?

  • Understand the security model that the client is implementing.

  • Compare security models of source servers and target servers.

  • Collect database options, to ensure that read-only, single-user, and dbo-only settings are maintained during migration.

  • Collect the encrypted passwords to transfer to the new server.

  • Compile a list of all Windows Authentication and SQL logins that have administrative access.

  • Pay special attention to the scheduled jobs on the system. Not only do these jobs have to be migrated, but they also identify specific workloads that kick off at specific times. They also tell you which user accounts are commonly performing the maintenance for each instance.

Serverwide Configuration Settings Here are some factors to check out:

  • Determine if any SQL Server configurations, such as affinity mask, Address Windowing Extensions (AWE), and so on, have been modified since installation—that is, do they deviate from a standard installation?

  • Monitor context switches after migration, mainly to decide if fiber mode is necessary.

  • Determine if there are requirements for XML stored procedures and OLE automation objects.

  • Establish which statistics are inaccurate when running in fiber mode.

  • Monitor the worker threads to see if they are exceeding the maximum value allocated.

  • Collect error messages that have been added to sysmessages and sysservermessages.

  • Establish if any system-supplied error messages have been modified.

Determining the Existing Environment (Workstations, Servers, Applications, Users, and so on) Some more information you must gather follows:

  • Verify that clients with older versions of Microsoft Data Access Components (MDAC)—such as MDAC 2.5—can connect to a default or named instance of SQL Server 2000, depending on which you are implementing. Default instances should be fine, but named instance support was introduced with MDAC 2.6.

  • Collect the host names that connect to named instances of SQL Server.

  • Identify applications where the connection string is hard coded. This also includes connection objects, such as COM objects and data source names (DSNs).

  • Identify if any connections are connecting other than TCP/IP. Exclude internal connections that connect using named pipes.

  • Identify all applications that are known to have hard-coded parameters.

  • Identify all applications for which vendors are no longer accessible.

  • Collect information that describes the type of workload the application generates.

  • Determine the domain that the central server will reside in.

  • Identify the domains of the source servers.

  • Identify the domains of the connection sources.

  • Collect the trust relationships between central server and connection sources.

Health Checks Take care of these, too:

  • Execute DBCC statements and scan logs for errors.

  • Review error logs and event viewers.

  • Determine if any procedures must be added to the cache before opening the environment up to your user community.

Multiple Versions Take care of this:

  • Collect all versions of servers, binaries, and compatibility modes of the databases.

Excessive Use of Tempdb Take the following actions:

  • Determine the maximum space used by tempdb for source servers.

  • Establish the amount of memory used for procedures.

  • Decide if each SQL Server instance should have its own physical place for tempdb or if they should all share a single space.

Database Options Answer these questions, too:

  • How much space has been assigned to devices on source servers?

  • How much space is actually used?

  • What is the maximum space used for the transaction logs?

  • What has the growth rate been like over the last week, month, quarter, six months, and year?

  • Is the current backup strategy at source servers in line with the strategy to be used at the target server?

  • Are there requirements for connection settings to be other than the default ANSI settings and connection-related properties?

  • Collect the current database options.

Phase 2: Technical Considerations for Planning

Once you gather the necessary information, you can then perform the analysis to put the proper plans in place.

Single Instance Versus Multiple Instances

Before you can consider any other technical aspect, you must decide if you will be using one instance with many databases or multiple instances, each with its own set of databases. Instances were first introduced in SQL Server 2000. One instance of SQL Server 2000 equates to one installation of SQL Server on your machine. Prior to SQL Server 2000, you could only have one installation of SQL Server per physical server. There are two types of instances: default and named. A default instance is analogous to the current functionality of previous versions of SQL Server. A named instance of SQL Server is exactly like it sounds: you name your SQL installation with a unique name. SQL Server 2000 supports up to 16 instances on one physical server or in a single Windows server cluster, regardless of the number of physical nodes comprising the cluster. In other words, you can have 16 named instances or one default and 15 named instances. Consult SQL Server Books Online for any changes in the number of instances supported.

Note

You can only have one default instance, so older applications that cannot handle named instances might have problems. Please test if you are implementing named instances.

Each instance gets its own set of core binaries, but they share underlying components such as MDAC, Microsoft Distributed Transaction Coordinator (MS DTC), and one Microsoft Search service. If there is more than one instance of SQL Server, each has the ability to stay at a certain service pack level, with the exception of any shared components. This is a benefit for a consolidation effort, because you can have instances at the specific levels at which you need them; however, for example, if you need to install a SQL Server 2000 service pack and it for some reason requires a physical reboot, you will affect all other services running on that server. In general, once you have upgraded one instance to that service pack level you will not have to reboot to upgrade the other instances because these shared files have already been replaced. Regardless, your availability and SLAs must be taken into account when designing consolidated SQL Servers.

From a performance perspective, each instance will get its own memory, including cache for stored procedures. Is it better for your endeavor to put hundreds of databases under one instance or to spread them out over more than one? The "Memory and Processor" section later in this chapter delves into this topic in more depth.

In most cases, a single instance means less administrative overhead: with only one instance to manage, the complexity of what goes on in SQL Server can be easily managed and localized. It also ensures that "automatic" settings, such as using dynamic memory, are easier to consider if there are no other processes to contend with. If you have limited hardware, this might be your best option.

Multiple instances allow for each instance to cater to a different set of rules, SLAs, and so on. The biggest advantage is that you can stop the services for each instance individually. Multiple instances also mean more complexity when it comes to system design and management on all levels, but the rewards of dedicated memory and separate processes (with the exception of shared components) might be worth the trade-off.

When thinking about single versus multiple instances—as stated in the introductory paragraph of this section—I/O, processor, and memory are the key considerations.

Application Compatibility with Instances

Because the architecture of SQL Server 2000 is different with instances, in particular named instances, you should test older applications to ensure that they still work properly. Named instances technically require MDAC 2.6 to connect properly. A default instance should be fine. In the event there is a problem if named instances are used, there are a few potential ways to solve any issues if the application cannot connect to a named instance:

  • Install MDAC 2.6 or higher on the machine hosting the application or on each client machine if the application is run directly by the client. This is the version of MDAC that ships with SQL Server 2000, and it contains support for named instances. Installing MDAC 2.6 does not guarantee that the application will work with named instances, however. Testing on one or a small number of machines is recommended before rolling this out to large numbers of users.

  • If it is not possible to update the client utilities to the SQL Server 2000 versions, do not use the name but rather connect directly to the IP address and port number. Configure that in Client Network Utility.

  • Update the client tools to SQL Server 2000. Then use the SQL Server 2000 version of Client Network Utility to create an alias to the named instance.

  • Update application code to support named instances. If the application already hard-coded server names, database names, and paths into the application, there are more significant issues than named instances.

Disk Subsystem

The first stop for any SQL Server system should be the architecture of its disk subsystem. If you are looking to combine workloads or to upgrade, it is imperative to know not only how much disk space you will need but, if you are consolidating, what workloads will work well together from an I/O perspective. That is a serious challenge faced by customers both big and small. Read-only reporting databases have different I/O requirements than heavily used OLTP systems with many writes and a smaller number of reads. Do not take this configuration lightly; you might initially save cost by consolidating, but if you have underperforming applications, the consolidated environment will be a failure.

More Info

Disks were covered in great depth in Chapter 4. Refer to that chapter for any further discussions relating to disk issues.

Memory and Processor

There are a few important considerations when determining how many processors the new servers will contain. As with disk and memory, prior benchmarking and application and system profiling data will need to exist to help you make a more accurate decision. Here are some of the main issues:

  • Based on your memory or processor needs, you need to choose the appropriate operating system. Do not assume you will be always be able to utilize the same hardware and budget appropriately.

    More Info

    See the topic "Memory Used by SQL Server Objects Specifications" in SQL Server Books Online for more information on how much memory certain types of objects will consume.

  • When it comes to memory and SQL Server usage, the biggest consideration is your procedure cache. This also dictates whether you can use one instance or multiple instances for your consolidation. With 32 bits, you can access at most 2.6 GB of procedure cache (this is also dependent on what memory options you use). If you are now consolidating multiple databases, each with many stored procedures, you might run out of procedure cache. Each stored procedure you have will have a plan in cache so that it runs faster. If you have many databases, and each has the same stored procedures, they are still different to SQL Server because the statistics for each database are potentially different, and therefore the plans are different as well. On top of that, if your server contains more than one processor, you will have two plans for every stored procedure, because the SQL Server engine will decide at run time how many processors will be used to run this stored procedure at this moment. Thus you could potentially be trying to cache the number of stored procedures in each database times the number of databases times two. This will definitely affect performance and push you into multiple instances. You might not know this requirement up front, and you might discover it in your testing phase.

  • If you need more than 2 GB of memory for your SQL Server 2000 instance, you will need to use a form of advanced memory, which must be configured properly for SQL Server to recognize it.

    More Info

    See the section "Memory Management for SQL Server 2000" in Chapter 14, for a full description of how SQL Server uses memory, recommendations on tuning memory, and how to configure memory for use with SQL Server 2000.

  • Processor and connection affinity has two effects: it limits the SQL Server instance to using only certain processors instead of using all the processors on the system (which is the default); this is processor affinity. It also ties the user scheduler to each processor, which is connection affinity.

    To restrict processor usage for a particular instance, use the affinity mask option of SQL Server. The ability to limit which processors are used would be useful if you had an eight-processor system and wished to run two instances such that neither ever affected the processor workload of the other. You would simply configure the first instance to use processors 0, 1, 2, and 3 and the second instance to use processors 4, 5, 6, and 7. This sounds great, but in practice is often less than optimal. For example, when instance 1 is very busy, it cannot use any of the processor time from instance 2, even when instance 2 is idle. A better approach would be to configure instance 1 with processors 0, 1, 2, 3, 4, and 5 and instance 2 with processors 2, 3, 4, 5, 6, and 7. Thus each server has two dedicated processors that it can use and four processors that are shared.

    Tip

    Windows and other applications can still use all of the processors. This setting affects only SQL Server instances unless you restrict the operating system or other applications as well. For more information, see the topic "Managing SQL Server Resources with Other Tools" in Chapter 14.

    Tying user connections, or connection affinity, is not as straightforward. This allows an instance to always direct a set of client connections to a specific SQL Server Scheduler, which in turn manages the dispatching of threads on the available processors.

    There is a limited set of conditions under which connection affinity provides a noticeable performance benefit. One of its advantages is that it can increase the number of hits against a Level 2 hardware cache, but it also prevents a task from going to another processor if the processor it is tied to is busy. The disadvantages of using connection affinity are similar to those discussed for processor affinity.

    A good example of a disadvantage specific to connection affinity would be a Windows-level memory check that comes up and is assigned to a particular processor. Your Transact-SQL query can no longer jump to another processor for completion and must wait until this check completes. Processors multitask, but they also have a priority component; on occasion, Windows priorities can and need to outrank SQL Server thread priorities.

    If misused, connection affinity can prevent one from fully utilizing all available resources. Keep this in mind when evaluating the use of this feature.

    More Info

    For more information on the affinity mask option, see the topic "Affinity Mask Option" in SQL Server Books Online.

  • If you are moving an application from a system of one type—for example, with one or two processors—to one that has six or eight processors to handle the increased load, each query might perform differently because a new execution plan might exist due to the increased number of processors available. If this is the case, you might need to alter the configuration settings of max degree of parallelism, and possibly also affinity mask. The problem with combining multiple workloads is that tuning these two things affects everything running under that instance.

Important

Do not use any current tool other than SQL Server to manage your processor resources, such as Process Control under Windows 2000 Datacenter Edition. These other tools might not produce the desired effect. You should do all processor-related tweaking with the SQL Server parameters. Consult updated versions of SQL Server Books Online for any information on other tools as they become available.

Networking

You might now encounter more network traffic flowing to one SQL Server. As with any other planning consideration, do you currently know how many concurrent users are usually connected to the SQL Server at various times, especially under maximum load? From a memory perspective, each connection consumes 12 KB added to the network packet size multiplied by three:

Networking

Therefore, in addition to the memory required for the normal SQL Server usage for things like procedure cache and queries, you need to take into account network connections. This factors into whether you decide to consolidate on one instance or multiple instances.

Once you know how much memory user connections take up for each database, you need to worry about network bandwidth. Can what is coming in and going out be handled by the proposed networking on the server? For example, if you have one 10-Mb card but your overall required throughput is 20 MB, do you add another network card and IP address (both of which add a bit of system overhead) or get a higher bandwidth network card and ensure that the network itself can handle the load?

Other networking issues include the following:

  • What port is SQL Server using? On installation, SQL Server 2000 grabs a dynamic port and not necessarily 1433. This is partially due to the instance support, as multiple instances cannot really share the same port. Generally the first instance (whether named or default) is assigned to 1433. Use Server Network Utility to assign a known, static port to each instance.

  • Does the SQL Server require domain connectivity (that is, a failover cluster) or have processes that use other SQL Servers (such as log shipping)? Are they all in the same domain? Mixed domains can cause problems in consolidation.

  • Do you have a mixture of SQL Servers that use domain accounts for the service accounts? Are they all in the same domain? Do you also have SQL Servers set to use the local service account? This might seem easy to solve when you consolidate because you can standardize; however, when it comes to object ownership and the login that was configured to run SQL Server Agent jobs, it is a different story. You must open the properties SQL Server Agent job and make sure that the Owner is set to the proper SQL Server login.

Security and Logins

Security is an obvious concern. Are there different standards for all of the applications, databases, and servers targeted? If there are, will they be in conflict with one another if they are combined? Or, for example, is something like IPSec required, but you are consolidating to a cluster where IPSec is not designed for failover, meaning the application will need to be changed or possibly not consolidated but put instead on its own instance or server?

Logins and how applications and users actually access SQL Server are the biggest areas of concern for security in a consolidation effort. User migration could consume a great deal of planning time. The "Networking" section earlier in this chapter already mentioned domain connectivity and domain accounts, but here the focus is on the actual logins. Do you have servers that have a mix of both Windows Authentication and mixed mode? Will that introduce risk when workloads are combined?

Also, security has changed a bit if you are only familiar with SQL Server 6.5 and earlier. SQL Server 7.0 introduced the concept of roles, so you might want to rethink how security is done for a particular application. This might become scope creep, so do not consider or implement it if it will impede the consolidation effort.

Here are some issues to consider with logins:

  • Plan sufficient time for your login and user migration plan, test, and implementation.

  • You have a login named JoeJ at the SQL Server level who has sysadmin privileges on one database server, a generic user on another database server, and dbo on one specific database. JoeJ maps back to one person. The problem in consolidation is that you do not want to accidentally give one user more or less rights than he or she needs, so take that into account when combining workloads.

  • You have three different users named JoeJ across different database servers and, until now, this was not a problem. It will take good communication to resolve these types of issues.

  • Even worse is the same login for the same person with different passwords across different servers that are now being consolidated. You can either assign a new password, choose one of the two that exist, or ask the user to select which password to use.

  • Do you need the BuiltinAdministrators login? Do you currently remove it now from all installations? If you do not remove it, but are considering it, what will its impact be? Does a third-party software application require it?

  • Because consolidation means that you will more than likely have more users connected to the server, what permissions will you grant to public and guest users?

  • Not all logins that exist are in use (for example, old employees, active users versus configured users, and so on). Only migrate logins that are used. Remember that adding and dropping logins impacts the transaction log.

  • Do the logins, if Windows Authentication is used, come from different domains? Will you need to set up two-way trusts at the Windows level?

  • Does the database, or specifically, the application using it, currently use or require sa access? If so, that should be evaluated.

  • Have you locked down Group Policies on Windows Servers since the server was rolled out? How will that impact the new consolidated environment (for example, have you locked down the policies needed for clustering if that is the desired end goal)?

  • Did you plan your port assignments for your SQL Server instances, or are you letting SQL Server choose them dynamically? It is best to use static, known ports.

High Availability

Each instance of SQL Server 2000 needs to be made highly available to meet your SLAs. Because this book is all about high availability, putting solutions together, planning, and so on, this section is pretty self-evident.

Replication

If replication is configured on any, or many, of the databases being considered, planning will certainly be impacted. Each database or publisher needs a distributor, and the distributor can live on the publisher, the subscribers, or be hosted on a separate system. Do you plan on having a 1:1 ratio of publishers to distributors? That is probably unrealistic in a consolidated environment, which means that you will need to do some up-front capacity planning to ensure that the distributor can handle multiple publishers. Disk I/O will be impacted greatly, followed by processor and memory, especially depending on the type of replication implemented. This also means that the distributor will have to exist in a completely separate instance or, most likely, on a different server to ensure good performance and high availability. You might consider a remote distributor as well. Remote distributors for various publishers can be consolidated under one SQL Server instance, but distribution databases per publisher should be distinct. One instance with a publisher and distributor might be acceptable in a nonconsolidated environment, but it will not be acceptable in a consolidated environment.

You also have to consider the base snapshot directory and its location per publisher: will you share one per SQL Server instance or use another one?

Also consider the subscribers, as they will be impacted, too, if you move publishers and distributors. This is important if you have to disable replication during the implementation process.

Migrating Objects

One of the hardest tasks for any upgrade, consolidation, or migration effort will be migrating objects, such as alerts, SQL Server Agent jobs, and Data Transformation Services (DTS) packages. For alerts, jobs, and operators, you can easily script them; however, before applying them to the consolidated SQL Server, ensure that there are no naming conflicts and that when you migrate to the new server, the proper user is associated with SQL Server Agent jobs. For DTS packages, you have a few options to migrate the packages to other servers, but once you move them, you might need to make modifications to them. Do not assume that they will run as is. Also ensure that the package is compatible. As of SQL Server 7.0 Service Pack 2, Microsoft changed the format from SQL Server 7.0 RTM and Service Pack 1. In that case, you would have to completely re-create the package under SQL Server 2000. Remember to account for time in your testing phase for testing and possibly fixing these types of issues.

Tip

There is a utility called the Copy Database Wizard included that can help move databases and their corresponding users, jobs, and other elements. An interesting option with this utility is to select the Run Later option and allow it to create five separate DTS tasks to move these pieces. You can then run all five or any combination of them that you need to transfer your database information.

Also, you could consider using the Transfer Databases Task of DTS as well.

Administration

Prior to migration, think about the following considerations that impact the eventual administration of each consolidated SQL Server instance:

  • Serverwide server settings definitely need to be reconciled before bringing the new server online. Each individual SQL Server might have its own, and one setting could impact a database in many ways, so coming up with the final set of global server settings is crucial.

  • Similarly, if you use any startup options, those also have to be reconciled.

  • Do you have any history for objects or technologies configured, such as replication or SQL Server Agent jobs existing in msdb that are still valid and need to be migrated?

  • How will your maintenance change as a result of adding multiple databases or instances? For example, how will your entire enterprise backup strategy change? Will one backup now conflict with another and affect performance? Is your maintenance window for the server now smaller because of increased load?

    Tip

    Be especially careful if you plan to back up to disk, and then from disk to tape. This speeds the backup process but triples the workload on disk spindles (the first operation is the read from the database, the second is the write to disk, and the third is the read from disk to write to tape).

  • Watch msdb growth, because with more databases it might increase in size due to more statuses and histories being written (depending on which functionality of SQL Server you are using).

Chargeback

Chargeback is the process of assessing the cost to utilization of the hardware so business units or customers can be charged appropriately. If chargeback is a consideration, you have a few options. One is to use SQL Server. Third-party tools such as ARMTech from Aurema can assist you in your cost accounting for system resource usage.

System Databases

System databases are an important consideration, especially in a consolidation effort. The system databases contain data and structures that are relevant to the entire server installation. Unlike user databases that can exist independently of each other, system databases are closely associated with each other as well as to the user databases from the source server.

How will master, model, and msdb be configured? Because you are now combining multiple servers, each with its own copy of the databases, you cannot just restore multiple copies, as the last restore will overwrite the previous copy. Remember that msdb includes all of your jobs, alerts, and operators (as well as history for some features such as log shipping), so plan on a size of at least 45 MB, and then add the additional amount per instance as well as about 10 percent overage.

Besides these space requirements, you need to analyze master, model, tempdb, and msdb to detect duplicate database objects and document them. This will extend from logins, stored procedures, user-defined data types, and tasks through to the specifics of objects. For instance, an object of the same name will represent or perform actions particular to the source server, or an object of a different name will represent or perform actions that accept modifications at the server level. This becomes particularly difficult when modifications have been made to these databases outside of logins. Remember to include any and all startup stored procedures.

You must identify any elements that are not part of a standard SQL Server build or installation, whether they are modifications or new objects. Review each object for relevance in a consolidated environment. Review each object to determine if duplicity exists at a more granular level. Review each object for explicit references, such as path names, server names, and task names. Do not address tasks found in msdb, as those are handled separately and must be dealt with on their own. Only search for nondefault objects found in tempdb. Also consider in advance how you will respond to application needs for previous compatibility levels. In general, these tips can save you much trouble in having to "update" applications, but remember that being fully on the current version usually minimizes support difficulty and costs.

Collations and Sort Orders

You must also take into account the collations and sort orders between servers. It is more likely that databases of varying collations will coexist in a consolidated environment than in numerous stand-alone servers. You can encounter unexpected results after migrating to a consolidated environment, and you must realize that certain applications depend on the collation and sort order setting of the database. If they are not set properly, the applications will not work after the switch to the consolidated environment.

Temporary tables will be created in tempdb with collation settings of the tempdb if the collation syntax is not included. Passwords in a case-insensitive SQL Server are converted to uppercase before being stored or used. Passwords in a case-sensitive SQL Server are not converted to uppercase. Because of this difference, passwords originally encrypted on a case-sensitive server and later transferred to a case-insensitive server cannot be used unless all alphabetic characters in the password are uppercase.

Similarly, understand how you need NULL to behave in your queries. If you turn the database option ANSI_NULLS on, all comparisons to a null value evaluate to NULL, which equates to unknown. If this option is off, comparisons of non-Unicode data evaluate as true if both values are null (that is, NULL = NULL is true). This is set to Off by default, so you should see the latter behavior. Also look at any other ANSI-related setting that might affect your environment.

More Info

For more information about collations, see the "Effect on Passwords of Changing Sort Orders" topic in either the SQL Server 6.5 Books Online or 7.0 Books Online and "Selecting Collations" in SQL Server Books Online.

Other Technical Considerations

Here are other technical considerations to think about prior to migration:

  • Are you currently using linked servers to enable connectivity and queries between servers that might or might not exist any longer? Are you changing domains that might also impact linked servers? If you consolidate, how will it impact the application? Will queries need to be rewritten? How will you migrate your linked server settings?

  • As mentioned a few times, certain resources, such as DTC and the underlying Microsoft Search service that powers full text, are shared among all of the instances and even among other applications. That will be a concern in a consolidated environment as you add more databases and resources that use shared resources.

  • General application and database upgrade and migration rules apply in a consolidation effort. Remember to take into account running of processes like DBCCs, reviewing logs, updating statistics, and so on, and fix any issues prior to moving the database. Also remember that depending on the physical layout (that is, where you put the data and log files for your consolidated SQL Server), what you do to one database might affect others.

  • Have you modified any of the system tables? If so, those modifications might not work with the consolidated SQL Servers. Also, Microsoft does not recommend that you modify the system tables or add objects to databases like msdb or master.

  • Are there collation and sort order conflicts between the current SQL servers and the proposed consolidated SQL Server environment? This is an easy item to miss if the planners, implementers, and DBAs are not in touch. Resolve any conflicts prior to consolidating.

  • Does the database to be consolidated use extended stored procedures? If so, how will that impact other databases and, potentially, other instances?

  • Service packs are a major consideration for a consolidated environment. If you have one SQL Server at one level and another at a different level, is that acceptable? (With SQL Server 2000, you can have this situation.)

  • Remember that if you previously implemented SQL Server 7.0 clustering and upgraded to SQL Server 2000, you cannot go back on the same hardware without reinstalling the operating system due to the changes in MDAC that SQL Server 2000 installs. Plan for this.

  • XML stored procedures are not supported in fiber mode, so use thread mode or a separate instance to avoid error 6604. Thread mode is excellent for systems performing MS DTC tasks, remote queries, linked servers, and extended stored procedures.

Phase 3: Consolidation Planning—The Server Design and the Test Process

Once the envisioning phase is done and guiding principles are agreed on and documented, work on planning the consolidation can begin. There are two main parts to planning: designing what the consolidated servers will "look" like and testing the implementation process.

Designing the New Production Servers

When you start to design the new environment, you need to take into account every aspect from the ground up: administration and operations (including monitoring), performance, backup and recovery, chargeback (if necessary), disaster recovery, high availability, security, and so forth. Do not assume that you can just use what you have in place for your current system, as the new environment might be different. Also, with more databases and SQL Server instances, you will have different rules and options that need to be reconciled. At this stage, if you identify new tools that need to be either built or acquired, document and plan for those, too.

Migration of Applications, Users, and Data

One of the important factors for the DBA is how the applications, users, and data will be migrated to the new consolidated environment. At this stage, any potential SQL Server 6.5 and SQL Server 7.0 migration issues should be completely isolated and identified, from Transact-SQL incompatibilities to system settings and everything in between. You might also need an intermediate staging server. Determine the order in which you will migrate the applications, users, and data. Resolve any conflicts in object names and location. Worry about the end users’ experience and how you will make the effort as transparent as possible for them. Also determine how you will notify your end users about the move. How will business processes be impacted?

Avoid scope creep. You are consolidating, not adding new functionality or enhancements to applications or data schemas. Problems could be identified during the migration planning that, as long as they will not break in the consolidated environment, should be left alone. Once you have consolidated, you might want to consider these problems, but trying to solve them at this stage only causes the consolidation to take much longer than necessary.

Finally, do not waste time adding applications to a consolidation until the application is reasonably stable. Stability is crucial; many of the troubleshooting tools that might be required to diagnose the flaky application (for example extra perfmon counters, debug builds, and so on) could adversely affect the performance of the entire system.

Test the Process

Prior to actually doing the planned upgrade or consolidation in a production environment, you must build and test all of the new procedures, tools, and the entire migration. The capacity and capability of the environment used for testing should be the same as, if not close to, the final production environment. However, you may be constrained by budget, resources, and other limitations. The importance of this stage is to isolate, identify, document, and fix any defects, errors, or problems that you encounter so a production effort will, one hopes, be flawless. Devising the proper test plans will ensure a successful production rollout. The plans should identify the following:

  • Do applications and utilities still function in the same way as before the change, and how that is tested and measured? Remember, it is not just the changed component, but components that have dependencies on it. This can be a large matrix that must be modeled and then tested.

  • Does the system, and subsequently, SQL Server and all related applications still provide the required performance as they did before the change and how that can be tested and measured? This is as difficult to test as functionality. Do you have playback scripts or dummy test loads that you can use to reliably and accurately assess and compare the performance impact of a change?

Building a test platform is not an easy task. In some cases, it does not need to be an identical system or set of systems from the standpoint of number of processors, amount of memory, or total storage accessible by the cluster nodes, but in all other ways the test platform and the production environment should be identical. The ability of the system or systems to be partitioned affects this as well. Partitioned systems can be set up so that a small partition has exactly the same BIOS, chipset, devices (include firmware and BIOS versions), driver versions, kernel components (for example, antivirus, storage management, security monitor, system management utilities, firewall, disk mirroring, and so on), and user mode applications. All versions of all components should match the production environment and configuration, otherwise the confidence gained from testing an update or upgrade is lowered. Systems that do not support partitioning must be mirrored separately, but it is still important to ensure that all the components are identical between the test system and the production system. The same logic applies to clusters.

The most critical factor affecting whether the update or upgrade will be successful (and not cause any unplanned downtime) is interactions between different versions of hardware and software components and not the total size of the system as measured by number of processors, amount of memory, total storage, number of network or storage adapters, and so on. Thus, a test system or partition need not be exactly the same capacity as the production system, but it should be representative. An example for the use of a partitionable system might be that the production environment is 12, 16, or even more processors, but to be representative, the test partition might only need to contain 4 CPUs. Similarly, the production system might include 32 GB, 64 GB, or even more RAM, but to be representative, the test partition might only need to contain 8 GB of RAM.

Continuing with the same logic, it might not be necessary to fully reproduce the storage capacity of the production system, but only to provide the same behavior and characteristics, scaled to match the smaller test configuration. Some storage arrays support multiple systems and provide internal partitioning. The test system could be attached to the same partitionable storage resource as the production system but scaled down, providing the same storage behavior and characteristics for the test system as the production system. Finally, if you are using a server cluster, it is important to model the test cluster to match the production cluster configuration as closely as possible. Doing this ensures that planned and unplanned failovers allow the application or entire solution to function correctly, guaranteeing the highest possible availability. You can accomplish this testing with just two cluster nodes even if you have a four-node cluster in production, but if you do not match capabilities as closely as possible, your testing may be somewhat invalid.

The reasons that versions are the key factor to test, rather than size or capacity, include the following:

  • No single vendor can test against all possible permutations that customers might deploy. They number in the millions.

  • Quick fix engineering updates (QFEs) are typically less risky than version changes because the architecture of the component (that is, how it functions internally as well as how it interacts with the rest of the hardware components, operating system, the various applications and utilities, and so on) does not change.

    Version changes, even "point" version changes such as a difference between version 1.0 and 1.1, or service packs from hardware and software vendors, might have many small changes compiled together or might include architectural changes. The aggregation of many small changes or the change in architecture might impact other components in the complete configuration or solution and thus need to be more extensively tested.

    Obviously, a full version change of any type, be it for hardware, the operating system, applications, or utilities, requires similar or even larger amounts of testing than that done for service packs.

    From a Windows Datacenter Edition standpoint, certain conditions must be met for a configuration to be considered tested well enough to mitigate the risks inherent in updates and upgrades. These include the support of an original equipment manufacturer (OEM) as part of the Datacenter program, where the customer and the OEM have mutually agreed on configuration management and change control processes and policies as well as the pretesting required for an SLA.

The use of a scaled-down environment is usually possible, with the most glaring exception being when the production system is running close to its maximum capacity, as measured in processor utilization, memory utilization, paging activity, network bandwidth, storage queues, and so on. In this case, even small changes in the behavior of components might cause the production system to begin to function noticeably more slowly and not be responsive enough for production use. This is unavoidable and simply a function of how any system responds when the work demand exceeds its ability to complete that work. In a worst case, the system might effectively be unusable. Thus, customers should always leave headroom in the production system that can absorb an unexpected performance impact. This also gives the system the same headroom for spikes in demand. As a rule of thumb, many customers limit average utilizations of various system resources to between 65 percent and 70 percent. This guideline, however, may not apply to your systems at all. There is a difference between peak usage and overall average use of systems. Those same customers begin planning to expand or otherwise upgrade the system capacity when those levels are consistently exceeded.

Once the test systems are built and are "identical" with the production systems, baseline testing should occur to validate functionality and assess performance characteristics and how they scale in comparison to the production systems.

There is some expense involved for the test system or systems, adapters, fabrics, networks, storage, and possible clusters and other hardware components. This is also true for the various software components, such as the applications and utilities that run on the system. However, you need to measure these expenses against the cost of encountering unplanned downtime that results from a failure, or when the update or upgrade goes bad and results in hours—or even more—of downtime. For all of these components, both hardware and software, customers should ask their vendors about costs and licensing, vendor policy on product instances used strictly for testing purposes, and other concerns.

Finally, it might be necessary to have some clients and a small network set up so that loads can be applied to the system or components using synthetic or dummy loads supplied by vendors or built by the customer.

Important

Pay attention to time in your testing. Too often, testing occurs in the weeks before a migration when there is plenty of time. Your staff might kick off jobs and go home. The next morning, the jobs have completed successfully and everybody is happy. However, when you go to migrate production applications in the 8-hour window of downtime you have arranged with users and find out the migration takes 12 hours, you have a problem. Every test task should measure not only the migration function, but also the time it takes to do this function. You might find that your available downtime window limits how much of the task you can accomplish in a night or a weekend.

Determining Risk

It is important to determine the risks—whether based on people, technology, or cost—that will be measured against. The success of a plan depends on knowing where potential failures can occur so you can mitigate them and see if the risks are worth taking. The best thing you can do to determine these risks is to test the entire process. However, you should also discuss the risks with anyone who has migrated these applications in the past. They might save you a lot of trouble by helping you to avoid any problems they encountered.

Phase 4: Developing

The developing stage is when the plans are realized and implemented for the first time in a testing or staging environment. This requires hardware completely separate from the eventual production servers, as it is crucial to work out any potential problems before going live. Ideally, the servers used in this stage have the same capacity and horsepower as their production counterparts, but that might be unrealistic due to budgetary constraints. The bottom line is that the better your testing or staging environments mirror your production environment, the more successful you will be when the actual production migration occurs.

There are at least three steps during this phase: technology validation, proof of concept, and piloting.

Once you decide on and plan for the technology that will be used during this phase, it is time to see if the technology performs as expected or if it is not appropriate at all. This is one of your last chances to modify the consolidation effort’s physical design. Once the technology has been proven, it is time to roll out a proof of concept to show that the final environment will work as expected. The proof of concept should mirror the production environment exactly, just on a smaller scale. Finally, choose one of the servers that will be consolidated and make a pilot of it. Even if the technology has been evaluated and the proof of concept was successful, you still need a proper pilot to prove that in a production capacity—unlike the proof of concept, which is on a much smaller scale—the consolidation will benefit the entire business. If the pilot is a failure or is scrapped for other reasons, this is the time to figure things out.

Follow the configuration plans and migration plans as documented. If problems arise, document them, and if there is a fix, document the fix and revise the plans accordingly. After migrating applications, data, and users to the new consolidated SQL Server, put it through its paces—use it as it would be used on a daily basis—and test the servers under load. It is crucial to ensure that the databases that are now consolidated under one SQL Server will work well together when all applications are experiencing high utilization. Otherwise, there will be performance issues in production.

If any issues arise due to coexistence after testing, document the issues, and possibly rethink the strategy for that particular application’s databases. Mitigating risk at this stage is crucial. Remember to test not only the migration, but also any administration plans, scripts, procedures, tools and so on that will be used in the eventual production environment. Just testing the migration process is not enough.

Phase 5: Deploying and Stabilizing

There are two main steps in this phase: back-end deployment and application deployment.

Back-end deployment is when the backbone of the system (hardware, operating system, networking, SQL Server, and so on) has been completely configured and tested. Application deployment is just that—configuring the databases and rolling out the applications in the new consolidated environment. At some point, you reach the go/no-go point, beyond which you cannot easily go back to the old environment. Stabilization continues through and beyond this stage.

Once you are confident that the proper hardware designs have been crafted and the migration plans are well tested, it is time to build and deploy the production consolidated SQL Servers. Even if your plans are well tested, do not perform the entire consolidation at once. Take a phased approach: Deploy one consolidated server, migrated server, or upgrade, thoroughly test it, compare it with the original environment, and then finally retire the old environment. Only at this point should you consider consolidating another SQL server, because if you do not completely verify that one effort went as expected and other problems are encountered during another migration, you could obscure and add to the problems. If it was unsuccessful for whatever reason, you can update your plans accordingly. Keep the old environment available until you have completed several or all of the upgrades, just in case.

Windows Version Upgrades

No discussion on highly available upgrades can ignore the operating system. Although you may be a SQL Server DBA, you need to consider that your operating system will need to be upgraded at some point. This is one of the reasons you will need a great working relationship with your systems engineers to ensure that the proper communication flows between groups.

Should You Upgrade Your Version of Windows?

Upgrading to a new version of Windows causes heartache for many administrators because they are changing the foundation of their entire server. Windows is Windows, but as you know, there are differences between Microsoft Windows NT 4.0 Server, Microsoft Windows 2000 Server, and now Microsoft Windows Server 2003. Should you upgrade your operating system? You must answer this question, and the answer here is the same as it was in Chapter 3: It depends. If you have a current production system that is running, stable, and has SLAs with valid support contracts in place (that is, the product has not reached the end of its support life from the manufacturer), you might not need to upgrade the operating system. You might want to upgrade it, on the other hand, if your application software (such as SQL Server) has a feature or performance characteristic you need that is dependent on some feature of the new operating system or, say, if you need more scalability, which could be provided by the higher memory and processor capacity of a new operating system. The bottom line is that although you might want to upgrade, you might be able to avoid this for a period of time that you will determine.

If your operating system is near, at, or beyond its supported phase, regardless of the supportability status of your application software, you should consider upgrading your version of Windows. This is not a ploy to trick you into spending money. Microsoft would like you to have stable, available, and supported solutions that you are happy with. It wants you to have the peace of mind that comes from knowing that, should you run into any problems, you can pick up the phone and have someone say, "That product is still supported."

Like many customers, you might have a legacy system with no support, but that system should be fine as long as you and your staff can handle any known issues that come up, understand that the manufacturer might not have fixes for other issues that might arise, and know that any downtime that might occur because of that lack of fixes is acceptable from a total cost of ownership (TCO) perspective. You will be able to handle only problems that fall within the realm of what you can fix for an end-of-life product if no more fixes are available from the manufacturer, so understanding and accepting the risk is crucial for systems that may be mission-critical. If that is the case, ensure that any and all relevant documentation is complete and handy.

Another major factor is the abilities built into each version of the operating system. For example, if you are using SQL Server virtual servers, upgrading from Windows NT 4.0 Enterprise Edition to Windows 2000 Advanced Server or even Windows Server 2003 would be a huge benefit because clustering is greatly improved in those versions. Remember to take into account how the technologies you are using might have been enhanced or changed (possibly impacting you in a negative way) in the upgrade to the next version.

Also remember the hardware platform. At some point, the OEMs and hardware vendors who provide the system, adapters, and other components will cease both production and support on a certain operating system or hardware platform. If you later need a replacement component such as a processor, memory, disk controller, network card, or connector, it might not be in the supply chain, and the system would then have a point of failure from which there is no easy recovery. Also, if the software that controls the hardware component is no longer supported, no fixes will be possible for a recurring problem, even if the underlying hardware still functions.

There are many factors that go into a decision to upgrade your operating system—cost, features, compatibility with your current applications, and so on. Just because an operating system is new does not mean you should wait for one service pack and then upgrade to it automatically, as has been a common practice for years. Windows is tested extremely thoroughly for reliability and stability prior to release, and it gets better with each succeeding release of the operating system. If you need the features or benefits that come with, say, Windows Server 2003, you should definitely consider it for your environment.

Performing a Windows Version Upgrade on a Server

There are really two varieties of Windows upgrades: upgrading stand-alone servers and upgrading clustered servers (that is, server clusters, not Network Load Balancing). When you are running a mission-critical application like SQL Server on your server, minimizing your downtime becomes crucial. The most obvious method of reducing downtime besides planning and testing is to perform your version change onto new hardware and not on your existing servers. Why?

First and foremost, from a contingency standpoint, if something goes wrong on your new hardware during the cutover, your old hardware configuration has not changed, giving you the perfect fallback or rollback plan. In fact, it should be exactly the same as when you stopped allowing traffic to hit it, so you should have zero loss of any data or functionality. If you use the same hardware, your rollback plans might be hard to recover, and if you have to do a complete reinstall, you will never be exactly in the same state you were in prior to reconfiguration. Second, you can reduce the amount of downtime if you can start building your system while the other is up, and then just have to do some minor tasks instead of many things after you stop traffic to the current production server. In terms of making an upgrade available—whether Windows or SQL Server—one of the best things you can do is perform it on new hardware. The difference between being down 20 minutes or 20 hours is huge. Last, but not least, you will get the benefits of newer, faster hardware, which should extend the amount of time the system can be kept in production. If you use your existing hardware, which might be more than a few years old, it might work, but how long will you be able to keep it in production? Remember that newer versions of software, including operating systems, require more horsepower. Hardware prices have come down over the past few years and you can get an extremely capable server at a fairly reasonable price.

Tip

In a multiple upgrade scenario, such as upgrading from Windows NT 4.0 to Windows 2000 and SQL Server 7.0 to SQL Server 2000, you should upgrade your operating system first, and then SQL Server.

Important

Remember to test your systems thoroughly after configuration and back them up once you have established that the installation of the new operating system is working properly. This ensures you can recover to the new, known, good configuration. Do not decommission old servers until you know your new ones are working properly!

Upgrading Stand-Alone Servers

Upgrading a stand-alone server is straightforward. If you are using the same hardware, whatever functionality that server hosts will be completely unavailable during the upgrade process. This puts you at risk in the event that things do not go as planned during or after the upgrade and you need to roll back to your previous state. If you are doing this on the same hardware, at that point you will be relying on your backups. It is always best to configure a new server and then decommission the older one after you confirm that it is up and running in the way that it should be.

Note

Doing a version or SKU upgrade can present some challenges if you are a constant consumer of SQL Server or MDAC hotfixes. Be sure to reapply any hotfixes not included in the Windows service pack level on the system if you elect this procedure.

Upgrading Clustered Servers

Upgrading servers in a server cluster is not dissimilar to upgrading stand-alone servers, as each node would individually need to be upgraded. A cluster presents different challenges, however. First and foremost, is your hardware solution still on the cluster Hardware Compatibility List (HCL) for your operating system choice? It might be, but it also might not. If it is not, you will have to buy new hardware. Could you upgrade on your current hardware and have it still work? Maybe, but if problems occur, you will technically have an unsupported solution. The biggest problem would be in terms of driver compatibility for your hardware, especially for components like RAID controllers. Do not put yourself in this situation.

You also now have more than one server that needs to be dealt with because it is an entire solution. How do you handle a multiple-server upgrade and keep your servers somewhat available? The best approach is to use new hardware, but if you are going to utilize the same hardware, the answer is actually easy; perform a rolling upgrade. A rolling upgrade is when you take the resources currently owned by one node and manually fail them over to another node while you are upgrading its operating system. This is where all of your planning comes into place. If you planned your system resource usage properly, you should have no performance impact after a failover and there should be no need to cause another availability interruption by failing the resources back (or to yet another node) until you are going to upgrade the other nodes in the cluster.

Warning

You cannot mix versions of server clusters in a day-to-day production environment. This means that you cannot, say, keep one node at Windows NT 4.0 Enterprise Edition and one at Windows Server 2003 while you are "checking things out." Once you make the decision to upgrade your existing server cluster, all nodes must be upgraded. During a rolling upgrade, this state does occur because not all nodes are upgraded at once. A mixed-mode cluster is fully supported in a rolling upgrade scenario but not as a permanent production platform.

More Info

For detailed information on performing a rolling upgrade to a Windows 2000 server cluster and what versions you can upgrade from and to, see http://www.microsoft.com/windows2000/techinfo/planning/incremental/rollupgr.asp for Windows 2000. For Windows Server 2003, see http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/windowsserver2003/deploy/upgrdmigrate/RollUpNT.asp.

SQL Server Version Upgrades or Migrations

Upgrading an SQL Server version is something most DBAs have encountered or will encounter at some time in their careers. Whether you are upgrading the legacy Enterprise Resource Planning (ERP) system that was on SQL Server 6.5 and is finally being retired or migrating to a new SQL Server 2000 instance on new hardware because you outgrew capacity on your current server, you have to take into account the amount of downtime that will be caused and what effect it will have on your end users and the SLA.

Tip

Much of the information in an earlier section, "Upgrading, Consolidating, and Migrating to SQL Server 2000," is applicable to any upgrade or migration process, including version upgrades for SQL Server. Use that in conjunction with the information in this section to help you plan your upgrades or migrations from another SQL Server version or instance to your target instance of SQL Server 2000.

Depending on which version you are starting from, you will have different options. Regardless of the version, you should have a dedicated testing environment as well as new hardware for the new SQL Server 2000 instances. When you are trying to upgrade on the same server, you increase your risk greatly for a few reasons. The most important reason is that should something go wrong, you might not only ruin the new environment, but also damage the old environment to the point that it can never be a fallback plan. Another reason is that while the upgrade is happening, you might not be able to use the server at all, whereas on different hardware, you might be able to keep things running while you are doing some, if not all, of the upgrade. From an availability perspective, that is crucial.

Another reason to keep the original hardware available is troubleshooting against the perceptions of the occasional change-fearing user. Consider a case in which you performed a flawless migration, only to have users complain about the performance of a particular task a few days, or even a week or two later. Keep in mind that this might be noise, as sometimes hearing that the system changed at all makes users think there actually will be a performance difference, causing perceived problems. Having the ability to run the query on the existing old system and prove that it is just as fast or even faster can save you much time and effort. It can also save your reputation and your performance review.

Depending on which version of SQL Server you are starting from, different native SQL-based tools are available to assist during an upgrade, consolidation, or migration effort. Other requirements also drive the decision process of what to use, such as the window of time that is open to perform the migration. Whatever tool you decide to use, realize that each has its strengths and weaknesses, and you might need to combine tools.

Warning

If you are using replication, it must be disabled or unconfigured prior to the upgrade. It would be necessary to have the replication configuration documented, as well as scripted, so that it can be set up after the upgrade. For more upgrade information with replication, see the topics "Backing Up and Restoring Replication Databases," "Scripting Replication," and "Replication and Upgrading" in SQL Server Books Online.

Caution

Do not assume your application running on a previous version of SQL Server will work or perform better than it did before the upgrade to SQL Server 2000. First, if it is from a third-party manufacturer, make sure the software is supported on SQL Server 2000. Second, make sure that syntax that is used by any application is still valid and that some variables or things like column names that you are using are not now reserved keywords in SQL Server 2000 (see "Reserved Keywords" in SQL Server Books Online under the topic Transact-SQL Reference) or are not being deprecated. The same goes for any applications, as Open Database Connectivity (ODBC) keywords are also listed.

More Info

For more information on upgrading your SQL Server version, some key topics in SQL Server Books Online you can reference (there are more that are not listed here) are "Upgrading an Existing Installation of SQL Server," "Upgrading from SQL Server 7.0 to SQL Server 2000," "Preparing to Upgrade from SQL Server 6.5," and "How to Upgrade from SQL Server 6.5."

Tools for Upgrading from SQL Server 6.5

The database formats of SQL Server 6.5 and SQL Server 2000 are incompatible, so it is not possible to use the backup and restore process to create the database on the target SQL Server 2000 instance, which can also jump start the process. You also cannot apply transaction logs of a SQL Server 6.5 database to a SQL Server 2000 database. There are a few options to consider when planning your upgrade or migration to SQL Server 2000:

  • BCP/BULK INSERT. One tried-and-true method of migrating data from one platform to another is the use of flat files. BCP has been in SQL Server since version 4.21a, and now there is not only the command-line version, but also a Transact-SQL command BULK INSERT. In some ways, this is easier to plan than using the Upgrade Wizard, and in other ways it is more difficult. For example, you now need to worry about creating all your databases in SQL Server 2000 with the proper size and devising a process to migrate the users. BCP only takes care of the data migration. Any indexes, views, and other elements would need to be recreated after the bulk insert because it might slow the process down if indexes are configured prior to inserting data.

  • SQL Server Upgrade Wizard. SQL Server 2000 has a built-in wizard to assist you with your migration from SQL Server 6.5. It takes into account all aspects of your SQL Server 6.5 configuration, including users. However, the Upgrade Wizard is not appropriate for all migrations to SQL Server 2000.

More Info

For more information and details about the specifics of using the Upgrade Wizard, see the topic "Upgrading Databases from SQL Server 6.5 (Upgrade Wizard)" in SQL Server Books Online.

Tools for Upgrading from SQL Server 7.0

If you are migrating from SQL Server 7.0 to SQL Server 2000, you have some different options than you would if you started with SQL Server 6.5.

  • Log shipping. It is possible to send and then apply transaction logs from a SQL Server 7.0 Service Pack 2 database to a SQL Server 2000 database. This is one of the better migration options, if not the best, as it not only provides a fallback plan by leaving the source server in place, but also the amount of downtime incurred will most likely be minimal.

  • Backup/RestoreBackup and restore needs no long introduction. It is possible to take a SQL Server 7.0 backup and restore it under SQL Server 2000; the restore process upgrades the database and performs other functions, such as rebuilding statistics.

  • Copy Database Wizard. This wizard uses the attach and detach functionality of SQL Server to do the work. The database files will be detached, copied, and then attached to the target server. When going from SQL Server 7.0 to SQL Server 2000, the attach process upgrades the database to SQL Server 2000, but statistics are not automatically rebuilt. That might be a consideration when deciding between backup/restore and the Copy Database Wizard. Also, consider if the collations are different between the SQL Servers, especially if you have an international application. For example, if you use char/varchar data types you could not store and retrieve any data except what can be represented by the code page on the destination SQL Server. Unicode would be a workaround.

    More Info

    For more information on the differences between attaching and detaching versus backup and restore, see the section "Attaching and Detaching Databases Versus Backup and Restore" later in this chapter.

  • BCP/BULK INSERT. BCP or BULK INSERT work the same as they do with a SQL Server 6.5 or SQL Server 2000 database; you take flat files and use them to import data into a database. You would need to find another method to import users, objects, and other items.

Upgrading Between Different Versions of SQL Server 2000

The same options for SQL Server 7.0 are valid for SQL Server 2000 to SQL Server 2000 migrations, and because all databases are at the same version level (sans service pack differences), there is less work that needs to be done because the databases do not need to be upgraded. Concerns about collations and service packs are still valid, but you do not need to worry as much about things like behavior differences and syntax changes.

Upgrading from Previous Versions of SQL Server Clustering

If you are looking to upgrade from previous versions of SQL Server clustering to SQL Server 2000 failover clustering, or even from a stand-alone SQL Server to a failover cluster, this section will help you. The same rules for a standard upgrade apply (such as having to unconfigure replication), but there are other considerations to take into account:

  • You absolutely cannot mix different clustered versions of SQL Server on the same server cluster. If you are currently running SQL Server 6.5 or SQL Server 7.0 clustering, those absolutely cannot coexist with a SQL Server 2000 failover cluster unless they are physically installed on separate server clusters. This means you would potentially need new hardware for your upgrade to SQL Server 2000.

  • If you are upgrading your operating system as well, see the section "Windows Version Upgrades" earlier in this chapter for information about Windows. SQL Server’s health and stability in a clustered environment is directly dependent on a properly configured operating system.

  • With SQL Server 6.5 and SQL Server 7.0, the binaries were put on the shared drive and possibly the quorum. Now all binaries will be on the local system drive in the same place on each node. Are the nodes configured the same?

  • Where are the database files physically located on your shared disk array? Does your disk configuration meet your current and future needs? This is probably the most important factor in any SQL Server cluster upgrade, whether it is SQL Server 6.5 or SQL Server 7.0 to SQL Server 2000, or SQL Server 2000 to SQL Server 2000. In SQL Server 6.5 and SQL Server 7.0, files could be placed on the quorum drive with no warnings. In SQL Server 2000 this not a recommended configuration.

    As you learned, the quorum drive should be left alone and used only by the server cluster. Nothing else should be placed on it if at all possible. Another potential problem is that even if the data and log files are not on the quorum but on one LUN/drive letter and you are going to be using multiple instances, you now need dedicated drive letters to support each instance. Multiple SQL Server 2000 instances cannot share a drive letter. You need to move the data and log files for the databases that will live on other instances and not on that existing drive and reconfigure your disk subsystem by adding drives. Even with a modern storage area network (SAN), for which adding disks and space are relatively painless, adding additional disks to an existing server cluster is a regimented process detailed elsewhere in this book. Remember to try to add all your disk resources to be used on the server cluster before you install clustering to avoid this extra work. Finally, if you are on an old SCSI attached drive array, will it meet your growth needs? Chances are it will not in the long term, so you might even need a new disk subsystem.

More Info

For the exact technical steps for upgrading and more information, see the topics "Upgrading to a SQL Server 2000 Failover Cluster," "How to Upgrade from a Default Instance to a Default Clustered Instance of SQL Server 2000 (Setup)," and "How to Upgrade from a Local Default Instance to a Clustered, Named Instance of SQL Server 2000 (Setup)" in SQL Server Books Online

Attaching and Detaching Databases Versus Backup and Restore

One feature introduced in SQL Server 7.0 was the ability to attach and detach a database. This is a very useful function that you can use in many scenarios, including upgrades and disaster recovery, but what exactly is the difference between detaching and attaching a database and using backup and restore? Both seemingly do the same things. However, in reality, they are different. Detaching and then attaching a database, notwithstanding the physical time it takes to copy the files, can be a relatively quick process in comparison to a straightforward backup and restore. That is one of the reasons many people like to use it as an option for their databases.

If you detach and then attach a database from the same SQL Server version (for example SQL Server 2000 to SQL Server 2000), nothing should really change. The on-page structure of the database will be intact after the move. You might notice performance issues if the server is not exactly the same (capacity, memory, RAID type on the disks). Why? Remember that your statistics, as well as execution plans, are based on a specific configuration. These are not changed or updated in the attach process to match the new server, so you need to ensure that after the database is attached, it performs and behaves as you expect it to.

If you use the detach and attach process or the Copy Database Wizard, which is based on that functionality, to upgrade from SQL Server 7.0 to SQL Server 2000, that is a fully supported method of migration. During the attach process, the database’s metadata is upgraded to SQL Server 2000. However, the earlier caveat will become much more important: your statistics are not automatically rebuilt. Although SQL Server 2000 is built on the foundation started by SQL Server 7.0, the engine did change a bit in SQL Server 2000, so statistics that were generated under SQL Server 7.0 are invalid, meaning that the queries do not take advantage of the new optimizer. You should thus ensure that your execution plans and statistics are rebuilt based on the hardware you will now be running.

Take into account when detaching and attaching that your database will be unavailable during the entire process. You are literally detaching the file from SQL Server and can then make a copy, and so on, and then reattach the database after you have your copy. Once you have your copy, or if you plan on using that file and not making a backup, you can move it wherever you want as a function of time, bandwidth, and disk speed. The savings that you might gain in the attach process might therefore be negated by a copy time for a very large database over a WAN.

Tip

One of the keys to good performance with attaching and detaching (and subsequently the Copy Database Wizard) is to copy the files to a different set of disks. When copying from the same physical disk or stripe of disks to the same disk or stripe, you will spend most of your time seeking back and forth for the write location, and therefore your I/O performance will suffer a great deal.

Tip

Remember that you have the ability to use compression with a third-party tool to reduce the size of your files before you copy them over the network. You are in effect trading processor time on both ends for reduced network time in the middle. You also might want to try alternative copy programs such as xcopy or robocopy from the Windows 2000 Resource Kit if you have particularly large files to move around. Even if they do not move the data faster for your particular case, the ability to restart at the point at which you left off in the case of a disconnect or timeout can be invaluable.

Backup and restore was discussed in detail in Chapter 10. However, when comparing backup and restore to detaching and attaching a database, why should you consider it? First and foremost, it is tried and true (as is detach and attach at this point). Like the attach process, a restore upgrades a SQL Server 7.0 database when it is restored on an instance of SQL Server 2000. The main difference that you are likely to be concerned with is that statistics are rebuilt automatically during the restore process. All other things being equal, you have two excellent choices for ensuring your database is upgraded and restored on your SQL Server 2000 instance.

Note

Keep in mind that server capacity and configuration on a different server affect your database, whether as the result of an attach or a restore. For example, SQL Server will keep the on-page structures, but if it is on a different disk subsystem (say, RAID 5 versus a striped mirror), your data access might have better (or possibly worse) performance.

Service Packs and Hotfixes

All systems need normal maintenance, which greatly increases your long-term availability. This means that from time to time you need to apply patches and upgrades. Many administrators would consider service packs to be normal maintenance, but in reality, they are a form of upgrade. A service pack for any Microsoft product is a collection of patches, bug fixes, and so on, collectively tested and released as one distribution. Some service packs might enhance functionality, and others might not. Hotfixes are different than an SQL Server 2000 service pack or a service pack for a version of Windows 2000 or Windows 2003 Server. A hotfix is exactly what it sounds like: an issue that needs to be addressed is detected. In this case a patch is made available for you to apply between service pack releases.

The following is a definition of critical hotfixes, which can be defined as those hotfixes that affect the following problems:

  • Data is at risk of being corrupted.

  • Data is at risk of being lost.

  • Security is at risk of being compromised on the system.

  • System crashes (blue screens) that are repeatable (same failure) and occur regularly.

  • System hangs that are frequent (same criteria as for blue screens) and, if the Dump Switch, Service Processor, or Secure Admin Console is used to create a crash, the resulting dump file indicates a common cause (which would qualify as repeatable).

  • System performance is at an unacceptable level, such as data transfers that take hours instead of minutes.

  • The customer might have a system usage or configuration that is very similar to customers who have actually experienced one of these problems.

Keep these general rules in mind when applying any type of service pack or hotfix:

  • Test, test some more, and then test again. You want to make absolutely sure that when you roll this out in production, you know how it is done and that you will have no problems in terms of compatibility or availability as a result. Create a standard set of tests that are run for base operating system functionality and for application functionality. Record these results for comparison after the upgrade is performed.

  • Make backups of everything prior to applying the service pack or hotfix. Remember, system files such as DLLs and binaries are being changed. Should you need to go back to the state of your system if the installation does not work and cannot be backed out cleanly, having backups will save you. For a SQL Server service pack, back up all user databases and system databases before and after the upgrade.

  • Make sure your system has enough disk space to meet the requirements of the upgrade. This is sometimes overlooked. You might have to unlock your resources (for example, a read-only database) to run the upgrade scripts or add new procedures.

  • Again, as mentioned earlier, read all documentation that comes with the upgrade carefully. Do not apply and ask questions later; that could prove costly in both the short term and the long term.

  • After the application of the upgrade, test everything thoroughly alone as well as under load. Use the same standard set of tests that are run during the testing phase for base operating system functionality and application functionality so that you can do a before–after comparison. This will, more often than not, tell you if the upgrade was a success or a failure.

  • Once you are sure everything is okay, make full backups of your operating system and your databases so that you have a snapshot of a known, good post-upgrade configuration. Do not wait, because if something goes wrong, your backups might be older, and you will need to reapply any upgrades you just performed.

  • In terms of all versions of Windows Datacenter Server, these systems are often strictly managed from the standpoint of installing updates, adding enhancements, and making other changes. There are a number of reasons for this, such as the customer having an SLA with another party (for example, an OEM). For this reason and others, it is necessary to have retest requirements for systems and drivers so that the risk of change and possible reliability decrease is mitigated.

    Datacenter Server customers occasionally require hotfixes. However, these should be infrequent and uncommon occurrences. You should apply hotfixes only when customers are actually experiencing problems, with some critical exceptions, as explained in this section.

Important

Hotfixes should not be applied in a preventive manner, except when it is clear you need them based on known issues. In all other cases, if a Datacenter Server system is not exhibiting signs of a failure that is resolved by a hotfix, OEMs and independent software vendors should not recommend that the customer install the hotfix. Remember that this applies to the operating system layer; any support and hotfixes for applications installed on top of Windows Datacenter editions should be cleared for installation to ensure that you will not invalidate your OEM support agreements.

Emergency Hotfixes and Testing Requirements

In some rare cases you might need a fix immediately, even without the delay incurred from update testing. In general, it can be better for your availability to "work around" the problem instead of installing an untested kernel component. Note that because the provided file might be unsigned, the customer is notified in a pop-up message of that fact when the kernel component is installed.

In these cases, even if a hotfix is immediately required and testing cannot be accomplished before delivery to the customer, the responsible party should accomplish the testing as soon as possible to ensure the stability of the system. The driver should be submitted to Windows Update, if appropriate, only after this testing has occurred (see Figure 13-1).

Automatic Updates screen in Windows.

Figure 13-1. Automatic Updates screen in Windows.

Tip

If you are using Windows Update for operating system hotfixes and critical updates, you might want to disable automatic updates on a server that you want to be highly available to control what is installed. First and foremost, it might or might not be directly connected to the Internet. Second, if you choose to automatically download and apply all fixes that you have not tested, you might unknowingly affect your production server. If you enable automatic updates, the best thing to do would to be to have Windows Update notify you before downloading and installing updates.

More Info

For more information on putting a plan together for your production changes, see Chapter 1 and Chapter 2.

Applying a Windows Service Pack

Applying a service pack to your operating system will affect the availability of your SQL Server in one way or another whether it is clustered or not. A Windows service pack is applied per server, or specifically, per instance of Windows (if you have a multiboot system). On a stand-alone system, the process is straightforward in that you install it on the operating system and the server, and its functionality is unavailable during the whole process. Clustered systems are similar, yet different.

Windows Service Packs and Server Clusters

One significant difference between a stand-alone server and a server cluster in terms of an upgrade is that your environment can be a bit more available (depending on how you have things configured). Because a Windows service pack is done per installation of Windows, in an N-node cluster, you will hopefully have N – 1 nodes still completely up and available (assuming no other problems). Because of this, you use a process similar to that of a rolling upgrade of the operating system. First, notify users that the clustered process (such as SQL Server) will be unavailable at certain specific times. You can then manually fail the resources over to another node and start servicing requests again. At the same time, you can also be upgrading node 1. However, here is where your planning comes into place. If you planned your system resource usage properly, you should have no performance impact after a failover, and there should be no need to cause another availability interruption by failing the resources back (or to yet another node) until you are going to upgrade the other nodes in the cluster.

Note

At the time of the writing of this book, no information about Windows Server 2003 Service Packs was available, so all information is based on Windows 2000.

Applying a SQL Server 2000 Service Pack

Before applying a SQL Server 2000 service pack, you need to understand how it works for both clustered and nonclustered environments. As you know, a single server or a server cluster running SQL Server 2000 supports up to 16 instances per operating system installation. Each installation has some dedicated binaries, as well as a set of shared binaries (including MDAC). The binaries for each instance are usually found on the drive you selected during installation under Program FilesMicrosoft SQL Server. Under that directory, you will find an 80 directory containing any shared binaries for all instances not installed under your Windows directories. You will find the dedicated binaries for a default instance in the Mssql directory, and you will find the dedicated binaries for a named instance in the Mssql$ instance_name directory. What does all of this background mean to you? Microsoft fully supports, should you have multiple instances of SQL Server 2000, a mixed service pack environment. However, consider the ramifications of such an environment: whenever you apply a service pack, you upgrade any shared binaries. So, for example, if you have two instances of SQL Server both at Service Pack 2, and then upgrade one, you now have one at Service Pack 3, one at Service Pack 2, with shared binaries at the Service Pack 3 level. Going back to the idea that everything affects everything, although this is fully supported from a Microsoft standpoint, make sure that it is supported by any third-party software vendors you are using. It also complicates any plan to roll back an environment to a previous version. If your installation of a service pack on one instance fails, and you install, for example, the RTM version of SQL Server 2000, you will have potentially downgraded your shared components until you upgrade to the right patch or service pack level.

A SQL Server 2000 service pack is applied per instance, so if you have five instances, you will have to run the installation process five times. There is no way to install them all at once with one process. You generally only have to reboot after the first one because there are locked files (by the operating system) that require the reboot to upgrade, and files that were not locked before will probably not be locked now, or the files will already have been upgraded and therefore be skipped by the installation. On nonclustered instances, you can script the installation of a SQL Server 2000 service pack.

When a SQL Server 2000 service pack is installed, a log file is created under your Windows installation directory, such as C:Windows. If there has been more than one execution of a SQL Server 2000 service pack install, the log files are numbered sequentially, such as Sqlsp1.log, Sqlsp2.log, and so on.

Note

There are separate service packs for SQL Server 2000 (including Standard Edition, Enterprise Edition, and Developer Edition), SQL Server 2000 Desktop Engine (MSDE 2000), and SQL Server 2000 Analysis Services. You might need to get different versions of the service pack and apply each of them separately depending on what is installed on your server.

Although there are no 64-bit SQL Server 2000 service packs as of the publication of this edition of the book, the 32-bit and 64-bit service packs should behave the same. If the behavior winds up being different for some reason under the 64-bit version, it will be documented in the information that comes with the service pack and in Knowledge Base articles.

Important

You must install SQL Server 2000 Service Pack 3 (or later) for SQL Server 2000 to work with Windows Server 2003.

Installing a SQL Server 2000 service pack is a permanent option for a SQL Server instance. There is no way to revert to the previous version of SQL Server without a complete reinstall from an installation point or CD-ROM, which obviously causes an availability problem. However, keep in mind that rolling back could destroy other things like MDAC versions that were upgraded in the interim and needed for an application, so you must keep track of everything done to the server so that you can put the server back in the state you need it to be in for your applications.

Failover Clustering and SQL Server 2000 Service Packs

Applying SQL Server 2000 service packs is not the same as applying a service pack to a clustered server running SQL Server 7.0. The older process for SQL Server 7.0 involved using a wizard to uncluster SQL Server 7.0. You would then apply the service pack and recluster. It was an awkward procedure at best, and it caused problems for some. With a SQL Server 2000 service pack, as with the main installation process, the service pack is cluster-aware. It will detect that it is going to be applied to a virtual server.

However, there is no concept of a rolling upgrade as there is with the operating system, so SQL Server is unavailable during the entire operation. During the installation process, the installer will then proceed to not only upgrade the database, but all binaries on each node defined for that specific virtual server. Therefore, you run the service pack install once and have it applied to all nodes. You do not run the service pack install on each node; however there is one (and only one) exception to that rule: should you encounter a catastrophic failure on one node and have to rebuild it (see Chapter 5, and Chapter 6, for details), you do not have to rerun the setup to patch all nodes. You can just patch the newly repaired node. This is possible because when the node is added back into the SQL virtual server definition, a registry key of need_sp_key is added to the newly rebuilt node, indicating it needs to have a service pack applied to it. You can now run the SQL Server 2000 service pack installation on the node, and Setup checks for need_sp_key and proceeds to update the binaries on that node only because the databases were already updated. This also allows you to fully service requests from applications or clients because SQL Server 2000 is not put into single-user mode. On the new node you are updating, you will see some entries in the Sqlsp.log file that will look similar to the following:

[args]
NumRemoteServers=1
Server.1=<newnodename>
[Server.1]
NumRemoteServices=1
...

These entries confirm that the unattended setup processes are running on the one node only. From a GUI standpoint, you will see the standard "Setup Is Performing Required Operations On ..." messages, but behind the scenes, it is doing the right thing.

Caution

If you rerun the SQL Server 2000 service pack installation from the node currently owning the SQL Server resources and not the failed ones added back into the SQL virtual server definition, you will then be reapplying the service pack to all nodes defined for the SQL virtual server. This should be seen as a potential risk, as you have one or more perfectly functioning nodes that are now being affected. Make sure before you start the process that you are on the right node!

Warning

Due to file replacements that need to be registered, the first installation of a SQL Server 2000 service pack usually requires a reboot of all nodes. Once you do this, unless there is some other requirement for a reboot in the service pack, all other instances on the cluster that have the service pack applied should not need a reboot because the shared files are already installed and registered. You can check the HKLMCCSControlMicrosoftSession ManagerPendingFileRenames registry key for a list. By stopping whatever services were using these files before running the upgrade you might avoid the reboot in the future, but perform this procedure with caution. It might just be best to do the reboot. Again, test before doing it in your production environment.

Log Shipping and SQL Server 2000 Service Packs

If you employ log shipping, you must also understand the impact of a SQL Server 2000 service pack. First and foremost, a service pack upgrade does not break log shipping. Setup automatically detects user databases (as well as filegroups) that are not able to be written to and skip those. Because a database that is being log shipped on a secondary is either in NORECOVERY or STANDBY, it cannot be written to or updated. These skipped databases are documented in the Sqlsp.log file mentioned earlier. The following message is also displayed during the process:

Setup has detected one or more databases and filegroups which are not writable.

It is fully supported to log ship from any version of SQL Server (RTM, Service Pack 1, Service Pack 2, and Service Pack 3 as of the writing of this book) to another because there are no metadata changes that would affect log shipping. If there were to be any metadata changes to the user databases themselves in future service packs, this might require you to have all databases at the same service pack level. Read the documentation that ships with the service pack to see if there are any user database metadata changes that are included.

You should, however, consider service pack upgrades to an instance of SQL Server if your primary is at one level and your secondary is at another.

Important

If you recover a database from secondary status to be the active database servicing requests, you do not need to reapply the SQL Server 2000 service pack to it unless there are metadata changes that would affect the database in question (as noted earlier). This would be clearly documented in whatever comes with the update. The one exception to this rule is if you are using replication and use the keep_replication flag when bringing the database online. Once the database is fully recovered, before opening it up to users and applications, run the stored procedure sp_vupgrade_replication to upgrade the replication metadata. If you do not do this, the replication metadata for that database will be out of sync. If running sp_vupgrade_replication is not necessary, it will be noted in the accompanying documentation.

Warning

Do not apply any SQL Server 2000 service pack prior to Service Pack 2 on a log shipped server. Service Pack 1 would not apply successfully on a server that had databases that could not be written to.

Replication and SQL Server 2000 Service Packs

Replication and SQL Server 2000 service packs are a bit more complicated a combination than log shipping or failover clustering. The order in which you upgrade your instances participating in a replication chain absolutely matters, specifically:

  • Distributor has to be upgraded before Publisher. If Publisher and Distributor are on the same instance of SQL Server, this is not a concern, as they will be upgraded together.

  • If you are using transactional replication with some read-only Subscribers (that is, they do not participate in replication other than receiving data), the Subscriber can be upgraded before or after you do Publisher and Distributor, so order does not matter for the Subscriber in this case.

  • If you are using merge or transactional replication with updating Subscribers (that is, they replicate data elsewhere), you must update the Subscriber after you update Distributor and then Publisher. The order in this case would be Distributor, Publisher, and then all Subscribers.

    Tip

    In this case, if you have a database that, for example, is a read-only subscriber for one publication but the source for another chain of replication servers, you must know your hierarchy so that you upgrade everything in the proper order.

  • If you are using a remote distributor with merge replication, you need to generate a new snapshot after applying SQL Server 2000 Service Pack 3. This might affect the availability of some servers, so you must plan for it.

  • As noted in many places, back up all databases participating in replication after the application of the service pack. This will ensure that you will not have to reapply the service pack if you need to restore the database.

Important

Heed the information in the previous section on log shipping about databases that are recovered and running sp_vupgrade_replication. This also applies to purely read-only databases, which are covered in the next section.

Applying a SQL Server Service Pack to a Nonwritable Database or Filegroup

There are cases where you literally have a read-only database (such as in replication) or one that is used for reporting but needs to be upgraded (that is, it is a Subscriber) to the latest service pack. Because read-only databases are skipped in the service pack install for SQL Server, these need to be upgraded. To accomplish this task, follow these steps prior to applying the service pack or after, depending on your needs. It is better to do it beforehand so that you have only one availability outage and not multiple outages:

Nonwritable Database

  1. Alter the state of the database that needs to be modified to make it writable. You can either clear the Read-Only option in the Options tab of the database Properties dialog box or run the following Transact-SQL statement:

    ALTER DATABASE database_name SET READ_WRITE
  2. Repeat step 1 for each read-only database on that instance that should be upgraded.

  3. Apply (or reapply) the service pack.

  4. Reselect the Read-Only option, or run the following Transact-SQL statement for each database to make it read-only again:

    ALTER DATABASE database_name SET READ_ONLY

Nonwritable Filegroup

  1. Alter the state of the filegroup that needs to be modified to make it writable. Run the following Transact-SQL statement:

    ALTER DATABASE database_name MODIFY FILEGROUP filegroup_name READ
    WRITE
  2. Repeat Step 1 for each read-only filegroup on that instance that should be upgraded.

  3. Apply (or reapply) the service pack.

  4. Run the following Transact-SQL statement for each database to make it read-only again:

    ALTER DATABASE database_name MODIFY FILEGROUP filegroup_name READONLY

Hotfixes

Hotfixes are ultimately added, in most cases, to the next major service pack. Should you apply a hotfix when it is released? Do you avoid the hotfix? The answer is not clear cut. Not all hotfixes apply to you, so you have to read any relevant Knowledge Base articles. Like anything else, assess the risk and reward of applying the hotfix. Be aware that because they are essentially one-off fixes, and, because of their critical nature, hotfixes might not undergo the rigorous months of regression testing that, say, a service pack is subjected to. They fix a problem, but in doing so, they might affect other applications or processes running on the server.

In addition to individual hotfixes, often bundles or rollups of hotfixes are released from time to time. For example, a common rollup would be security fixes that are wrapped together. These packages sometimes receive additional testing before release to help address some of the more common issues at that moment. Security is very important, of course, and putting off the testing and deployment of these patches could cost you the server, its data, and—in a worst-case scenario—potentially even your job. Finding the correct middle ground between testing and deployment is the only safe way to proceed.

Note

As of the release of this book, Microsoft policy is to support the creation of hotfixes on the previous service pack for 60 days after the current service pack ships. For example, hotfixes of any nature (security, bug, and so on) can be made for up to 60 days after SQL Server 2000 Service Pack 3 is released specifically for SQL Server 2000 Service Pack 2.

Note

Keep in mind that the next service pack will also contain other enhancements or fixes that might not have been released as a patch for the version of the service pack that you have installed. Once the new service pack is released, you should test your applications against it and upgrade as soon as possible. If you cannot upgrade for other reasons, such as those that are business-related (for example, a thirdparty application must be certified with it otherwise you will invalidate your support contract), evaluate these situations and take the appropriate actions to ensure that you can upgrade to the latest service pack.

You will be supported on whatever version of a service pack you are using as long as the platform itself is in active support; you just might not be able to get new patches for it. If this policy changes for future service packs, such as when SQL Server 2000 Service Pack 4 is released, please check http://support.microsoft.com or consult a Microsoft support professional about the policy for hotfixes and service packs.

Summary

Achieving a highly available upgrade is a challenge for even the best IT shops. Everything about upgrades is ultimately a function of time. Upgrades, highly available or not, require a great deal of planning and testing before you actually implement them in your production environment. Remember to take into account any SLAs when you are planning your upgrades, as there is always a chance that you might not achieve them should the upgrade require you to be down longer than the time allotted in the SLA itself.

Doing your homework should pay off. If no one notices any functionality differences after the upgrade and downtime is minimal, this provides the best situation possible for everyone involved. If you do not do your homework, it could mean a great deal of both short-term and long-term pain. Upgrades of any sort should never be taken lightly. Mitigate your risks and, whenever possible, upgrade to completely new hardware so that your old environments provide the perfect fallback plan.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.66.156