Chapter 2. Data Disasters

 

‘There’s no disaster that can’t become a blessing, and no blessing that can’t become a disaster.’

 
 --Richard Bach

This chapter deals with the various ways in which your organization’s data can be exposed to risk, and possible prevention methods.

What is data?

A general dictionary defines ‘data’ as ‘factual information, especially information organized for analysis or used to reason or make decisions’. Organizations have various kinds of data in several formats and importance. For example, important finance documents and Excel spreadsheets, computer files in Word documents, databases, E-mails, employee information, customer details, etc, can all be classified as data. Different organizations view data with varying importance. For example, a credit card supplier will consider all details of his credit card numbers, owners, etc, as very, very important. Another organization may view its technical information – software code and the tools they develop – as important data. Irrespective of what it is specifically, data is of paramount importance to any organization and must be protected carefully.

What is meant by risk to data?

A risk to data is a potential for loss or corruption. Every organization depends on various kinds of current and historical data for its business. For example, a simple piece of data can be the details of all customers entered on an Excel spreadsheet and stored on a computer hard disk. A complicated piece of data can be complete details of millions of credit card transactions entered in a huge software database inside a major organization like VISA or MasterCard. Depending on the user, department or organization a loss or corruption of the data can result in various business problems, so protection and safe retrieval of data are of the utmost importance to any organization.

Why and how do companies lose data?

Computer data is a virtual, not a physical, item. It cannot be protected by security guards, guard dogs or insurance. Companies often lose valuable data, for a variety of reasons, eg:

  • Many organizations do not invest enough money and resources in taking regular backups or installing proper anti-virus measures. Even today many organizations and businessmen believe that computers can be used and discarded just like any other electrical appliance. They believe that if an old computer fails, they can simply throw it out and buy a later model at a lower price. They fail to understand that regular electrical appliances – refrigerators and fans, for example – do not hold data, but computers do and if they fail suddenly the organization will lose data instantly. Computers should not be treated the same as any other electrical appliance.

  • Many organizations do not have proper and qualified technical staff to maintain their computer systems.

  • Lack of proper uninterrupted power supplies can also cause disk failures, resulting in loss of data.

  • Data is stored anywhere and everywhere – on floppies, CD-Roms, local hard disks, etc. No clearly-identified data storage locations are known to end-users.

  • Organizations don’t spend enough on anti-virus and other tools – sudden virus attacks can wipe out years of data in a few minutes.

Many business managers think it is easy to hire or replace experienced technical staff immediately from some outsourcing or external company should there be a computer-related disaster. It is not so easy, and is often impossible. It is not possible for an outside IT person, however qualified, to suddenly walk into an organization and start assisting in disaster recovery, business continuity or data recovery. It usually takes several weeks or months for any IT professional to fully understand the IT nature and functioning of any organization. It is not possible to pick it all up in a day, even in small organizations. For example:

Example . 

A small financial firm had a single computer running a financial application. The application was actually developed by a freelance programmer for a fee. In addition, the freelancer was also maintaining the company data. Only he knew how the stuff worked, how to feed or extract data and reports for the business. Due to a power outage, the central computer disk crashed and the machine stopped booting. Fortunately (or unfortunately) it was a soft crash, meaning that the hardware was okay. Without consulting the freelance programmer, the business owner picked up the phone and called the hardware vendor who had supplied the computer. The vendor promptly sent some new techie to have a look. The techie investigated the issue and realized that the operating system had to be reloaded and informed the business owner. The business owner simply said, ‘Go ahead. Do what you want. Get our system back to working condition.’ Immediately the techie reformatted the hard disk and reloaded the operating system. The business owner was happy that the computer got fixed without any hardware costs (computer hardware was very, very expensive those days). Next day, the freelance programmer came to office as usual, and all hell was let loose. The company had lost data worth about two years of data-entry at a huge effort. This is actually a true story. The company had not invested in any backup unit other than a few floppies, which had outdated data anyway. As you must have guessed by now, the techie was none other than the author of this book who had just started his computer career by innocently destroying somebody’s data in 1989.

How should organizations store data safely?

Organizations should consider how to store and recover data as soon as they decide to computerize their operations even though it could be used by just one person or a single department. Data backup strategies will have to be created to determine the timeframes, technologies, media and offsite storage of the backups. This will also ensure that recovery point and time objectives can be met. Depending on the organization’s size and scale of operations, data storage can be one or more of the following:

  • Very small organizations may have just one or two computers. They can have a single directory or folder called ‘data’ on the hard disk where they create, save and access all data. This data can be as simple as a bunch of Word and Excel documents. As a disaster recovery option, the computer-user can copy all the files every day to an alternative location, eg, floppies, CD-ROMs or low capacity USB disks. In the event of a computer failure or disk crash, they will need to call some IT staff to fix the computer, reload the operating system and then copy the data directory back to its original location from the floppies, CD-ROMs or USB disk. Nowadays, small pensized USB disks with capacities ranging from sixteen megabytes to more than four gigabytes are available for prices ranging from US$50 to about US$400. These USB disks plug in directly to the USB ports on most computers and data can be copied directly to them.

  • Small to medium organizations may have several computers. They can dedicate one large computer as a data storage server for all the other users. This is usually called a file server. All users create files – documents or spreadsheets – on their local computers, but save and store the files on specified locations on the file server. For example, the file server can have several directories or folders called ’Finance’, ‘Design’, ‘HR’, etc. The different departments will have to store their important files into their respective directories. The file server can be attached with a small tape drive that will back up all the specified directories and folders on the server every day. In the event of a file server crash, or an accidental deletion of files, a techie can restore the folders after the server has been fixed. Tape drives are available in various formats, ranging from two gigabytes to more than two hundred gigabytes on a single tape.

  • Large organizations will have hundreds, and maybe even thousands. of computers and servers of all types. Servers can range from file servers, database servers, web servers, e-mail servers, and so on. Some sort of enterprise planning is needed to ensure that everyone stores data on properly identified servers. Each of the servers can be backed up using high capacity tape drives, archival systems, mirror servers, etc. For example, huge organizations like Ford, Shell, IBM, etc, will have hundreds of servers of various types containing terabytes of important data. Large investments in backup equipment and qualified staff are required for such massive operations. Depending on the time available, or technical limitations, it can be a centralized backup or a distributed backup.

What are some of the most common storage and backup options?

Various types exist, but there is no single solution for everyone. Some of the common storage and backup options range from inexpensive floppies that can store a couple of megabytes to expensive tape libraries that can hold several terabytes of data. Small tape drives are available for about US$1,000. Gigantic tape libraries that can back up mainframes and large Unix boxes can easily cost more than US$100,000. New technologies such as Advanced ATA, SCSI-RAID, Fiber Channel, etc, are available on modern backup devices. These advanced technologies can back up large amounts of data in short amounts of time. Some of the common storage and backup devices are:

  • DATs (Digital Audio Tapes), ranging from 2 to 24 gigabytes.

  • DLTs (Digital Linear Tapes), from 20 to 80 gigabytes.

  • SDLTs (Super Digital Linear Tapes), 40 to 100 gigabytes.

  • LTOs (Linear Tape Open), 100 to 300 gigabytes.

  • Tape Libraries of LTO or DLT. These can hold seven to twenty or more tapes inside one box. Combined backup capacities will exceed a terabyte or more.

  • Special tape drives for bigger machines like large mainframes, AS/400s, Unix boxes, etc.

These devices usually come with their own backup software, or can work with some popular backup software like Arcserve, Brightstor, Veritas, etc. The manufacturers of these packages also provide various flavours of backup software, eg:

  • Standard: Basic file backup option

  • Open files option: This is used to back up files even though some users could be accessing them.

  • Database backup option: These can be used to back up online databases like SQL, Oracle, etc.

  • E-mail backup: These can be used to back up e-mail servers like MS-Exchange, Lotus Notes, etc.

  • Remote backup: These can be used to back up workstations, laptops, etc, which are connected on the network.

  • Image backup: These can be used to take a snapshot of the entire hard disk, sector by sector. These sorts of backups will be very useful to restore a hard disk perfectly back to its backed-up state in the event of a disk failure.

In addition there are other types of specialized backup options like Norton’s Ghost. This software can take a snapshot of the entire hard disk data as a single image file. In the event of a disk crash, the image file can be used to rebuild the hard disk back to its original condition on the same machine. It is also possible to restore the image file on an identical new hard disk on an identical machine model.

What is meant by recovery time objective (RTO) and recovery point objective (RPO)?

These terms define acceptable loss limits, usually for your data and business downtime. They are used quite frequently in disaster recovery discussions. Determining the recovery time objective (RTO) and the recovery point objective (RPO) will define how fast an organization needs to recover in order to survive and how much data loss can be tolerated.

RTO defines the timeframe (in hours or days) within which specific business operations must be restored. It answers the question: How long can a business afford to be down? For example, will the organization tolerate a downtime of one business day? If yes, the RTO is one business day.

RPO defines the point in time to which to recover. For example, an RPO can be stated as ‘Data can be recovered as of 9 pm last night’. It answers the question: How much data can the business afford to lose? For example, suppose an organization takes daily backups of one of its critical servers during the night and the server fails abruptly the next afternoon. The IT staff will only be able to restore data as of last night’s backup, so the recovery point (RPO) will be to the previous day’s end of business.

Organizations can prepare tables, like the one below, for all their critical systems and tackle them one by one.

Businesses must be able to resume operations quickly. Most would prefer to resume from the point at which operations got disrupted or stopped and to be able to preserve the last data entry or transaction. This means that 100% of the data must be available on alternative systems at all times. The shorter the delay and less data lost the sooner the business can be back in action. But there is a heavy cost for this, as such online systems can be very expensive. If you want to lose no data then expensive online backup and archival systems must be used. RTO and RPO must be decided in the specific context of your particular organization.

Table 2. RTO and RPO for critical systems

System

RTO

RPO

Development server

One business day

Data restored as of previous business day.

Finance system

One day

Data restored as of previous business day.

Data connectivity

Four hours

N/A

What does ‘Internet backup’ mean?

Nowadays, Internet service providers also provide various options for backing up important data to a server on the Internet, for a fee. Using this service it is possible to copy important data to a secure server or disk space dedicated to an organization. A simple utility or software can be installed on the PC or server that allows the user to schedule backups, select files and folders to be backed-up, password protect files, and so on. Data can also be encrypted for transmission. Though many organizations are currently reluctant to have their data stored on the Internet, it will one day become a popular method of backup once the right security practices are established.

What is a ‘geocluster’?

‘Geocluster’ is short for geographic cluster, which is a very expensive backup option. They are usually used by very big organizations that have to keep data synchronized between different countries. A geocluster is made up of several servers operating in tandem to provide load balancing and fail-over services. For example, if an organization wants to keep a mirror of an important data server data elsewhere in a disaster recovery site in a different city, it can use a geocluster to keep the main and the backup server in sync at all times. Geoclusters are too expensive for small to medium sized firms.

How often should backups be taken, and what should be backed up?

Ideally ‘Everything, Everyday’. But, some organizations take a full backup over the weekend and incremental backups daily. As your organization becomes heavily dependent on computers and data for your day-to-day work, storage and retrieval of data becomes of paramount importance. Companies should ensure that all important data is backed-up regularly and stored in proper fire- and waterproof safes. Your organization should ensure that end-users store their data only on specified file, mail and database servers that get backed-up regularly. In the event of any data loss or accidental deletion, your IT staff can restore the previous day’s data back to the user. Your end-users should also be educated to ensure they do not store any important data on the local hard disks or floppies, etc, that cannot be recovered if damaged.

How can one decide what data needs to be backed-up?

This decision should be taken by involving the heads of every department. Ask them what data they consider important and cannot afford losing. Then provide secure folders and other server accesses to the respective department’s staff. Educate users to ensure they store their important data only inside the specified server locations. Then back up those locations everyday. Test the backups by restoring them on a test location periodically.

Some organizations back up everything that is stored on a server. This could be a useful practice in some cases. However, it could lead to backups taking a lot of time and unnecessary tape consumption if users store non-business related files like MP3 songs, image files, etc, on the servers. Ensure your users only store business-related files and not personal stuff on company systems.

How and where should backup tapes be stored?

The location for storing your data tapes is of paramount importance. Backup tapes, like audiocassettes, get damaged easily by heat, moisture, etc, so they must be stored in a secure location. Some of the best practices for storing and using tapes are:

  • After every backup, tapes should be labelled and stored in a fire-proof safe in a non-humid area.

  • Backup tapes should not be stored inside or near the data centre. This is to ensure that the tapes don’t get destroyed in the event of any disaster, like fire or water seepage, within the data centre.

  • Data tapes are usually stored in an off-site storage that is an alternative site, outside the organization’s premises. Also, the same backup tapes should not be used for years and years, as they tend to lose their magnetic retention over time.

  • Old tapes should be periodically tested for their ability to restore data. If the tape does not work necessary precautions like taking a new backup on a new tape should be done immediately.

  • Old tapes should be destroyed safely so that they do not fall into the wrong hands.

  • Study and implement all the manufacturers’ recommendations for the model of backup tape and drive purchased.

How often should backups be tested?

This is a very important exercise. Your IT departments could be backing-up data over the years but never getting a chance to test whether they can retrieve data from the backup tapes. Tapes don’t last for ever and get damaged by heat, moisture, disuse, etc, so it is necessary to test every backup tape periodically to see whether you can retrieve the data. If a server fails and if the tape is also not readable then you will have a major crisis. It is highly advisable to plan a regular schedule for restoring a sizeable amount of data to a test location from each and every tape that is used for backups. Then your end-users can verify whether the data restored to the test location is correct and readable. These exercises can be made part of your organization’s DR policy to prevent data recovery surprises.

Will just taking proper data backups daily ensure disaster recovery?

Not enough. Backups on tape or other media will simply ensure that your data is safe. Disaster Recovery is a different ball game. Just having the data on a backup tape will be of no use if the file server blows to pieces. In order to have proper disaster recovery safeguards and recovery methods organizations must invest in the following additional precautions to prevent disasters striking the servers in the first place:

  • Maintenance: Comprehensive hardware maintenance contracts for all critical servers to ensure that the vendor repairs or replaces faulty equipment within hours of failure.

  • Spares: On-site availability of parts like spare hard disks, spare power supplies, or even a spare machine.

  • Mirror servers: Depending on low tolerable downtime, some organizations may even invest in having mirrored servers for critical functions.

  • UPS: Uninterrupted power supply to all critical equipment.

  • Fire prevention mechanisms

  • Water seepage prevention

  • Security: Unauthorized access prevention.

  • Anti-virus: Virus prevention, with anti-virus updates.

  • Updates: Applying proper service packs, hot fixes, bios updates, driver updates, etc, as recommended or supplied by the equipment or software manufacturer.

... and other manufacturer’s recommendations.

Some questions to ask before starting backups on critical servers and equipment:

  • Do you have a complete list of critical equipment that needs to be backed up daily? Have you missed any important equipment?

  • Do you know what needs to be backed up in each of the above critical equipment?

  • Who is assigned to take backups?

  • How is the backup taken?

  • How long should your backups be stored?

  • Who is authorized to initiate restores if necessary?

  • Will backup tapes need to be stored offsite?

For a few more ‘Where’, ‘How’, ‘Why’, ‘When’,’ What’ questions – see chapter 14 on ‘Plenty of Questions’.

What do you mean by ‘disk mirroring’?

It is possible to have data duplicated in real time across two separate hard disks within a single machine or between two machines. This is called ‘disk mirroring’. It ensures continuous availability and accuracy. For example, if there is a server that has two disks of identical capacity, then it is possible to establish a mirror between the two such that data on primary disk-1 always gets mirrored to secondary disk-2. Hence, if the primary disk fails, the secondary disk will have all the data of the primary disk intact. Mirroring can be software-based or hardware-based, although hardware-based mirroring is superior. Nowadays, third party software and hardware is available for mirroring. These packages contain several useful and configurable features not directly available with the basic operating system or hardware.

What are some of the high-end storage and backup solutions available today?

Some of the high-end solutions that are available from reputable manufacturers like Veritas and Hewlett Packard are listed below. Visit their websites for detailed information and specifications.

  • VERITAS Cluster Server is a high-availability solution. It is ideal for reducing both planned and unplanned downtime, facilitating server consolidation, and effectively managing a range of applications in heterogeneous environments. Visit www.veritas.com for details.

  • HP StorageWorks Enterprise Backup Solution (EBS) is a complete enterprise backup / recovery / archive hardware solution built around HP StorageWorks tape-automation products such as the HP StorageWorks ESL9000 Tape Libraries and the HP StorageWorks Ultrium 460 Tape Drive. See www.hp.com/storage for further details.

  • Replication/data availability solution. In the disaster-recovery arena, VERITAS has a replication/ data availability solution called VERITAS Volume Replicator.

What do you mean by ‘database replication’?

Nowadays organizations depend on specialized files called databases that can hold a variety of information in a single computer file. These databases can be accessed and updated by many people simultaneously. Some of the common names in databases are SQL, Oracle, DB2, MS-Access, and so on. Corruption or deletion of a database file can wipe out years of data entered and accessed by thousands of users, so it is vital to take enormous precautions when handling and maintaining databases. Specially-qualified staff, database administrators, are required to manage databases. A database replication is a partial or full duplication of data from a source database to a destination database. For example, if a database server holds a database called CUSTDATA containing all customer data of an organization, then replication can periodically pump all the data within the CUSTDATA database into another database file called CUSTDATABKUP on a different server. Replication may use any one of a number of methods – synchronous, asynchronous, mirroring, etc. Hence, if the main server fails then it is still possible to extract all data from the backup database server. Nowadays special backup tools are available that can be used to automatically replicate a main database’s contents into another database. Various low-end to high-end database synchronization tools are available, with different features available. Some popular examples are available from: www.dbbalance.com and www.red-gate.com.

What does ‘server load balancing’ mean?

Many heavy-duty applications cannot run on just one single machine. There could be thousands of users accessing such an application and the server will get bogged, down unable to service thousands of simultaneous requests. In such cases, server load balancing is necessary. In server load balancing multiple servers are used to host a common application.

Through load balancing, traffic can be distributed automatically across multiple servers running a common application so that no one server is overloaded. With this technique, a group of servers appears as a single server to the network. Load balancing can be implemented among servers within a site or among servers on different sites. Using load balancing among different sites can enable the application to continue to operate as long as one or more sites remain operational.

How can one prevent loss of IT equipment?

IT equipment can be broadly classified into two categories:

  • Equipment that holds company and user data, such as file servers, database servers, hard disks, tapes, laptops, etc.

  • Equipment that does not hold company or user data, such as LAN switches, hubs, routers, monitors, etc.

It is more important to protect equipment that holds data than equipment that does not hold data. However, it does not mean that non-data equipment is of any less importance. It is just a matter of higher priority, as any company data is of paramount importance to any organization and cannot be purchased from external sources, whereas non-data equipment can be purchased off-the-shelf from several vendors and reconfigured. For example, if an important file server in the finance department blows to pieces, or gets stolen, the situation cannot be resolved just by buying another brand new file server, because data is not repurchasable. On the other hand, if a LAN switch connecting several machines gets damaged it is possible to buy a new one immediately and reconfigure to standard settings. Some of the specific precautions for critical equipment that holds data are:

  • Have standby power supplies, hard disks or even spare machines if possible.

  • Ensure that your equipment is under complete, comprehensive warranty and insurance

  • Ensure full daily backups. Insist and verify whether every employee is storing important data only on identified server locations that get backed up everyday.

  • Have all manuals, CD-Roms, bootable disks, repair disks, etc, handy

  • Verify data integrity regularly by restoring data to a test location.

  • Do not store all important data on a single server. Have multiple physical servers to split the load.

  • Buy useful recovery tools like Disk Repair, File Undelete, Registry Recover, etc, and become familiar with their usage.

One simple way to ensure that all critical IT systems are covered for various risks can begin as follows.

A critical server housing an important application and data can be protected from predictable disasters by having a checklist:

  • System or function name.

  • Used for.

  • How important is this system for your business?

Checklists are explained in detail later.

On-site disaster prevention methods:

  • Data from system backed up fully every day.

  • Data tapes and storage medium stored properly in fire-and water-proof safes.

  • Essential spares like power supplies, spare hard disks, etc, available on-site.

  • Servers under comprehensive hardware maintenance guarantee by a qualified vendor backed by an SLA.

  • Servers housed in a secure data centre with clean UPS power.

  • Servers maintained by qualified and trained staff.

  • Servers and data access only to authorized staff

  • Servers protected from viruses and hackers by anti-virus and intrusion detection systems.

  • Installation of all necessary upgrades, service packs, hot fixes, driver updates, bug fixes, etc, to prevent faults.

  • Clear step-by-step documents to assist in data restoration, replacement of spares, etc.

  • Insurance to cover theft, fire, damage to equipment, etc.

... and other essential information and precautions.

DR and BC methods

  • Hot standby server in a DR or BC site, preferably identical to the one in the main site in all respects.

  • Automatic or manual data synchronization between main and standby server.

  • Copies of every important document, test plans, etc, placed in the DR or BC site.

  • Testing and periodic dry runs at the DR or BC site.

  • Other essential information and precautions.

Similar checklists and questions can be asked and tackled for each critical system or business function.

Do’s and don’ts for preventing data disasters

Do’s

  • Ensure that the technical support team is responsible for full and proper backup of all servers daily.

  • Invest money in buying good quality tape drives and other backup devices.

  • Ensure all important data is stored only on servers that are backed up daily and back up important information daily.

  • Learn how to restore data properly.

  • Store tapes and key papers in fire- and water-proof safes.

  • Test whether you can read old backup tapes and restore data from them.

Don’ts

  • Don’t allow employees to store business data on their local drives.

  • Don’t allow unauthorized access to servers and databases.

  • Don’t allow data to exceed tape drive capacity.

  • Don’t use the same tapes for a long time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.196.27