Chapter 17

Backing Up

This chapter examines the practice of safeguarding data by creating backup copies, restoring that same data if necessary, and recovering data in case of a catastrophic hardware or software failure. The chapter gives you a full understanding of the reasons for sound backup practices. You can use the information here to make intelligent choices about which strategies are best for you. The chapter also shows you how to perform some types of data recovery and system restoration on your own and provides advice about when to seek professional assistance.

Choosing a Backup Strategy

Backups are always trade-offs. Any backup consumes time, money, and effort on an ongoing basis; backups must be monitored, validated, indexed, and stored, and you must continuously purchase new media. Sound expensive? The cost of not having backups is the loss of your critical data. Re-creating data from scratch costs time and money, and if the cost of doing it all again is greater than the cost associated with backing up, you should be performing backups. At their most basic, backups are insurance against financial loss for you or your business.

Your first step in formulating and learning to use an effective backup strategy is to choose the strategy that is right for you. First, you must understand some of the most common (and not-so-common) causes of data loss so that you can better understand the threats your system faces. Then, you need to assess your own system, how it is used and by whom, your available hardware and software resources, and your budget constraints. The following sections look at each of these issues in detail and provide some backup system examples.

Why Data Loss Occurs

Files may disappear for any number of reasons: They can be lost because hardware fails and causes data loss; or if your attention wanders, you might accidentally delete or overwrite a file. Some data loss occurs as a result of natural disasters and other circumstances beyond your control. A tornado, a flood, or an earthquake could strike; the water pipes could burst; or the building could catch on fire. Your data, as well as the hardware, would likely be destroyed in such a disaster. A disgruntled employee might destroy files or hardware in an attempt at retribution. Equipment can be stolen, or it might fail; all equipment fails at some time—most likely when it is extremely important for it not to fail.

A Case in Point

A recent Harris poll of Fortune 500 executives found that roughly two-thirds of them had problems with their backup and disaster recovery plans. How about you?

Data can also be lost because of malfunctions that corrupt the data as it is being written to the disk. Other applications, utilities, and drivers might be poorly written, buggy (the phrase most often heard today is “still beta quality”), or might suffer some corruption and fail to correctly write that all-important data you have just created. If that happened, the contents of your data file would be indecipherable garbage of no use to anyone.

All these accidents and other disasters offer important reasons for having a good backup strategy; however, the most frequent cause of data loss is human error. Who among us has not overwritten a new file with an older version or unintentionally deleted a needed file? This applies not only to data files, but also to configuration files and binaries. In mail lists, Usenet newsgroup postings, and online forums, stories about deleting entire directories such as /home, /usr, or /lib are all too common. On a stable server that is not frequently modified or updated, you can choose to mount /usr read-only to prevent writing over or deleting anything in it. Incorrectly changing a configuration file and not saving the original in case it has to be restored (which happens more often than not because the person reconfigured it incorrectly) is another common error.

Tip

To make a backup of a configuration file you are about to edit, use the cp command:

matthew@seymour:~$ cp filename filename.original

To restore it, use the following:

matthew@seymour:~$ cp filename.original filename

Never edit or move the *.original file, or the original copy will be lost. You can change the file’s mode to be unwritable; then if you try to delete it, you are prevented from doing so and receive a warning.

Proper backups can help you recover from these problems with a minimum of hassle, but you have to put in the effort to keep backups current, verify that they are intact, and practice restoring the data in different disaster scenarios.

Assessing Your Backup Needs and Resources

By now you have realized that some kind of plan is needed to safeguard your data, and, like most others, you may be overwhelmed by the prospect. Entire books, as well as countless articles and white papers, have been written on the subject of backing up and restoring data. What makes the topic so complex is that each solution is truly individual. However, the proper approach to making the decision is very straightforward. You start the process by answering two questions:

What data must be safeguarded?

How often does the data change?

The answers to these two questions help you determine how important the data is, understand the volume of the data, and determine the frequency of the backups. This information, in turn, helps you choose the backup medium. Only then can you select the software to accommodate all these considerations. (You learn about choosing backup software, hardware, and media later in this chapter.)

Available resources are another important consideration when selecting a backup strategy. Backups require time, money, and personnel. Begin your planning activities by determining what limitations you face for all these resources. Then construct your plan to fit those limitations—or be prepared to justify the need for more resources with a careful assessment of both backup needs and costs.

Tip

If you are not willing or able to assess your backup needs and choose a backup solution, you can choose from the legions of consultants, hardware vendors, and software vendors that are available to assist you. The best way to choose one in your area is to ask other UNIX and Linux system administrators (located through user groups, discussion groups, or mail lists) who are willing to share their experiences and make recommendations. If you cannot get a referral, ask consultants for references and check them out.

Many people also fail to consider the element of time when formulating backup plans. Some backup devices are faster than others, and some recovery methods are faster than others. You need to consider that when making choices.

To formulate a backup plan, you need to determine the frequency of backups. The necessary frequency of backups should be determined by how quickly the important data on your system changes. On a home system, most files never change, a few change daily, and some change weekly. No elaborate strategy needs to be created to deal with that. A good strategy for home use is to back up (to any kind of removable media) critical data frequently and back up configuration and other files weekly.

At the enterprise level on a larger system with multiple users, a different approach is called for. Some critical data changes constantly, and it could be expensive to re-create this data because doing so typically involves elaborate and expensive solutions. Most of us exist somewhere in between these extremes. Assess your system and its use to determine where you fall in this spectrum.

Backup schemes and hardware can be elaborate or simple, but they all require a workable plan and faithful follow-through. Even the best backup plan is useless if the process is not carried out, data is not verified, and data restoration is not practiced on a regular basis. Whatever backup scheme you choose, be sure to incorporate in it these three principles:

Have a plan—Design a plan that is right for your needs and have equipment appropriate to the task. This involves assessing all the factors that affect the data you are backing up. We delve into more detail later in the chapter.

Follow the plan—Faithfully complete each part of your backup strategy and verify the data stored in the backups. Backups with corrupt data are of no use to anyone. Even backup operations can go wrong.

Practice your skills—Practice restoring data from your backup systems from time to time so that when disaster strikes, you are ready (and able) to benefit from the strength of your backup plan. (For restoring data, see the section “Using Backup Software.”) Keep in mind that it is entirely possible that the flaws in your backup plan will become apparent only when you try restoring.

Sound Practices

You have to create your own best backup plan, but here are some building blocks that go into the foundation of any sound backup program:

Maintain more than one copy of critical data.

Label backups.

Store backups in a climate-controlled and secure area.

Use secure offsite storage of critical data. Many companies choose bank vaults for their offsite storage, and this is highly recommended.

Establish a backup policy that makes sense and can be followed religiously. Try to back up your data when the system is consistent (that is, no data is being written), which is usually overnight.

Keep track of who has access to your backup media and keep the total number of people as low as possible. If you can, allow only trusted personnel near your backups.

Routinely verify backups and practice restoring data from them.

Routinely inspect backup media for defects and regularly replace them (after destroying the data on them if it is sensitive).

Evaluating Backup Strategies

When you are convinced that you need backups, you need a strategy. Being specific about an ideal strategy is difficult because each user’s or administrator’s strategy will be highly individualized, but here are a few general examples:

Home user—At home, the user has the Ubuntu installation media that takes less than an hour to reinstall, so the time issue is not a major concern. The home user will want to back up any configuration files that have been altered, keep an archive of any files that have been downloaded, and keep an archive of any data files created while using any applications. Unless the home user has a special project in which constant backups are useful, a weekly backup is probably adequate. The home user will likely use a consumer-focused online cloud service like Dropbox, an external hard drive, or other removable media for backups.

Small office—Many small offices tend to use the same strategy as home users but are more likely to back up critical data daily and use an automated cloud service. Although they will use scripts to automate backups, some of this is probably still being done by hand.

Small enterprise—Here is where backups begin to require higher-end equipment with fully automated on- and off-site backups. Commercial backup software usually makes an introduction at this level, but a skillful system administrator on a budget can use one of the basic applications discussed in this chapter. Backups are highly structured and supervised by a dedicated system administrator. You might have guessed that small enterprises are also moving their backups to online cloud services.

Large enterprise—Large enterprises are the most likely candidates for the use of expensive, proprietary, highly automated backup solutions. At this level, data means money, lost data means lost money, and delays in restoring data means money lost as well. These system administrators know that backups are necessary insurance and plan accordingly. Often, they own their own online, distributed cloud systems, with multiple redundant data centers in geographically diverse locations.

Does all this mean that enterprise-level backups are better than those done by a home user? Not at all. The “little guy” with Ubuntu can do just as well as the enterprise operation by investing more time in the process. By examining the higher-end strategies in this chapter, therefore, we can apply useful concepts across the board.

This chapter focuses on local-level activities, not the cloud-services activities that are based on techniques like those listed here, but combined with networking and cloud-service-specific additional details. This chapter also discusses some technologies that are a bit outdated for the enterprise but might be useful to a hobbyist with cheap and easy access to older equipment. If you want to use an online cloud service, take what you learn here, read everything made available by your cloud service provider, and then do your homework to design a suitable backup solution for your unique needs. This could be as simple as putting all your important files in a Dropbox-style cloud folder that automatically updates to another computer you own. This can work well if you are a casual consumer-grade user backing up simple documents and a few media files and remember that services like these generally do not guarantee that your data will be permanently backed up (especially with the free versions). Although we’ve not had problems with these solutions, we warn that they are not enterprise backup solutions. You might need to study up on Amazon Web Services, OpenStack, or other cloud providers and learn the fine details of their services to see if they suit your needs.

Note

If you are a new system administrator, you might inherit an existing backup strategy. Take some time to examine it and see if it meets the current needs of the organization. Think about what backup protection your organization really needs and determine whether the current strategy meets that need. If it does not, change the strategy. Consider whether the current policy is being practiced by the users, and, if not, why it is not.

Backup Levels

UNIX uses the concept of backup levels as a shorthand way of referring to how much data is backed up in relation to a previous backup. It works this way: A level 0 backup is a full backup. The next backup level is 1.

Backups at the other numbered levels will back up everything that has changed since the last backup at that level or a numerically higher level. (The dump command, for example, offers 10 different backup levels.) For example, a level 3 followed by a level 4 backup generates an incremental backup from the full backup, and a level 4 followed by a level 3 generates a differential backup between the two.

The following sections examine a few of the many strategies in use today. Many strategies are based on these sample schemes; one of them can serve as a foundation for the strategy you construct for your own system.

Simple Strategy

If you need to back up just a few configuration files and some small data files, copy them to a USB stick, label it, and keep it someplace safe. Most users have switched to using an external hard drive for backups because they are becoming less and less expensive and hold a great amount of data, or they have moved backups online.

In addition to configuration and data files, you should archive each user’s /home directory and entire /etc directory. Between the two, that backup would contain most of the important files for a small system. Then, if necessary, you can easily restore this data from the backup media device you have chosen after a complete reinstall of Ubuntu.

Experts used to say that if you have more data than can fit on a floppy disk, you really need a formal backup strategy. Because a floppy disk only held a little over 1MB (and is now incredibly obsolete), perhaps we should change that to “if you have more data than can fit on a cheap USB stick.” In any case, some formal backup strategies are discussed in the following sections.

Full Backup on a Periodic Basis

A full backup on a periodic basis is a strategy that involves a backup of the complete file system on a weekly, biweekly, or other periodic basis. The frequency of the backup depends on the amount of data being backed up, the frequency of changes to the data, and the cost of losing those changes.

This backup strategy is not complicated to perform, and it can be accomplished with the swappable disk drives discussed later in the chapter. If you are connected to a network, it is possible to mirror the data on another machine (preferably offsite); the rsync tool is particularly well suited to this task. Recognize that this does not address the need for archives of the recent state of files; it only presents a snapshot of the system at the time the update is done.

Full Backups with Incremental Backups

Another scheme involves performing a full backup of the entire system once a week, along with a daily incremental backup of only those files that have changed in the previous day, and it begins to resemble what a system administrator of a medium to large system traditionally uses.

This backup scheme can be advanced in two ways. First, each incremental backup can be made with reference to the original full backup. In other words, a level 0 backup is followed by a series of level 1 backups. The benefit of this backup scheme is that a restoration requires only two tapes (the full backup and the most recent incremental backup). But because it references the full backup, each incremental backup might be large (and could grow ever larger) on a heavily used system.

Alternatively, each incremental backup could reference the previous incremental backup. This is a level 0 backup followed by a level 1, followed by a level 2, and so on. Incremental backups are quicker (less data each time) but require every tape to restore a full system. Again, it is a classic trade-off decision.

Modern commercial backup applications such as Amanda or BRU assist in organizing the process of managing complex backup schedules and tracking backup media. Doing it yourself using the classic dump or employing shell scripts to run tar requires that system administrators handle all the organization themselves. For this reason, complex backup situations are typically handled with commercial software and specialized hardware packaged, sold, and supported by vendors.

Mirroring Data or RAID Arrays

Given adequate (and often expensive) hardware resources, you can always mirror the data somewhere else, essentially maintaining a real-time copy of your data on hand. This is often a cheap, workable solution if no large amounts of data are involved. The use of redundant array of independent disks (RAID) arrays (in some of their incarnations) provides for recovery if a disk fails.

Note that RAID arrays and mirroring systems just as happily write corrupt data as valid data. Moreover, if a file is deleted, a RAID array will not save it. RAID arrays are best suited for protecting the current state of a running system, not for backup needs.

Making the Choice

Only you can decide what is best for your situation. After reading about the backup options in this book, put together some sample backup plans; then run through a few likely scenarios and assess the effectiveness of your choice.

In addition to all the other information you have learned about backup strategies, here are a couple good rules of thumb to remember when making your choice:

If the backup strategy and policy is too complicated (and this holds true for most security issues), it will eventually be disregarded and fall into disuse.

The best scheme is often a combination of strategies; use what works.

Choosing Backup Hardware and Media

Any device that can store data can be used to back it up, but that is like saying that anything with wheels can take you on a cross-country trip. Trying to fit 10GB worth of data on a big stack of CD-RWs or DVD-RWs is an exercise in frustration, and using an expensive automated tape device to save a single copy of an email is a waste of resources. In addition, those technologies are rapidly disappearing.

In this section, you find out about some of the most common backup hardware available and how to evaluate its appropriateness for your backup needs. With large storage devices becoming increasingly affordable (you can now get multiple-TB hard drives for around $100), decisions about backup hardware for small businesses and home users have become more interesting.

External Hard Drive

This is an easy option. Buy an external hard drive that connects to your system via USB and copy important data to it regularly. This has replaced past recommendations for most home users.

Network Storage

For network backup storage, remote arrays of hard drives provide one solution to data storage. With the declining cost of mass storage devices and the increasing need for larger storage space, network storage (NAS, or network-attached storage) is available and supported in Linux. Network storage involves cabinets full of hard drives and their associated controlling circuitry, as well as special software to manage all of it. NAS systems are connected to the network and act as huge (and expensive) mass storage devices.

More modest and simple network storage can be done on a remote desktop-style machine that has adequate storage space (up to eight 1TB drives is a lot of storage space, easily accomplished with off-the-shelf parts), but then that machine and the local system administrator have to deal with all the problems of backing up, preserving, and restoring the data. Several hardware vendors offer such products in varying sizes.

Tape Drive Backups

While this is becoming less common, tape drive backup is a viable technology that is still in use. Tape drives have been used in the computer industry from the beginning. Tape drive storage has been so prevalent in the industry that the tar command (the most commonly used command for archiving) is derived from the words tape archive. Capacities and durability of tapes vary from type to type and range from a few gigabytes to hundreds of gigabytes, with commensurate increases in cost for the equipment and media. Autoloading tape-drive systems can accommodate archives that exceed the capacity of the file systems.

Tip

Older tape equipment is often available in the used equipment market and might be useful for smaller operations that have outgrown more limited backup device options.

Tape equipment is well supported in Linux and, when properly maintained, is extremely reliable. The tapes themselves are inexpensive, given their storage capacity and the ability to reuse them. Be aware, however, that tapes do deteriorate over time and, being mechanical, tape drives can and will fail.

Caution

Neglecting to clean, align, and maintain tape drives puts your data at risk. The tapes themselves are also susceptible to mechanical wear and degradation. Hardware maintenance is part of a good backup policy. Do not ever forget that it is a question of when—not if—hardware will fail.

Cloud Storage

Services such as Dropbox and Amazon’s AWS and S3 offer a way to create and store backups offsite. Larger companies may create their own offsite, online storage options as well. In each of these and similar cases, data is copied and stored remotely on a file server set aside specifically for that purpose. The data backups may be scheduled with great flexibility and according to the plans and desires of the customer.

Cloud storage is a backup solution that is recent and growing in popularity, but it is also a technology that is changing rapidly. To learn more about the options mentioned here, take a look at www.dropbox.com and https://aws.amazon.com/s3/. Although these are not the only services of the kind available, they offer a good introduction to the concept. If you like to “roll your own,” you definitely want to take a look at Ubuntu Enterprise Cloud at www.ubuntu.com/cloud.

Using Backup Software

Because there are thousands of unique situations requiring as many unique backup solutions, it is not surprising that Linux offers many backup tools. Along with command-line tools such as tar and dd, Ubuntu also provides a graphical archiving tool for desktop installations called Déjà Dup that is quite powerful. Another excellent but complicated alternative is the Amanda backup application—a sophisticated backup application that works well over network connections and can be configured to automatically back up all the computers on a network. Amanda works with drives as well as tapes.

Note

The software in a backup system must support the hardware, and this relationship can determine which hardware or software choices you make. Many system administrators choose particular backup software not because they prefer it to other options but because it supports the hardware they own.

The price seems right for free backup tools, but consider the software’s ease of use and automation when assessing costs. If you must spend several hours implementing, debugging, documenting, and otherwise dealing with overly elaborate automation scripts, the real costs go up.

tar: The Most Basic Backup Tool

The tar tool, the bewhiskered old man of archiving utilities, is installed by default. It is an excellent tool for saving entire directories full of files. For example, here is the command used to back up the /etc directory:

matthew@seymour:~$ sudo tar cvf etc.tar /etc

This example uses tar to create an archive, calls for verbose message output, and uses the filename etc.tar as the archive name for the contents of the directory /etc.

Alternatively, if the output of tar is sent to the standard output and redirected to a file, the command appears as follows:

matthew@seymour:~$ sudo tar cv /etc > etc.tar

The result is the same as with the preceding tar options: All files in the /etc directory will be saved to a file named etc.tar.

With an impressive array of options (see the man page), tar is quite flexible and powerful in combination with shell scripts. With the -z option, it can even create and restore gzip compressed archives, and the -j option works with bzipped files and tarballs compressed with xz.

Creating Full and Incremental Backups with tar

If you want to create a full backup, the following creates a bzip2 compressed tarball (the j option) of the entire system:

matthew@seymour:~$ sudo tar cjvf fullbackup.tar.bz2 /

To perform an incremental backup, you must locate all the files that have been changed since the last backup. For simplicity, assume that you do incremental backups on a daily basis. To locate the files, use the find command:

matthew@seymour:~$ sudo find / -newer name_of_last_backup_file ! -a –type f –print

When run alone, find generates a list of files system-wide and prints it to the screen. The ! -a -type eliminates everything but regular files from the list; otherwise, the entire directory is sent to tar, even if the contents were not all changed.

Pipe the output of the find command to tar as follows:

matthew@seymour:~$ sudo find / -newer name_of_last_backup_file ! –type d -print |
 	ar czT - backup_file_name_or_device_name

Here, the T - option gets the filenames from a buffer (where the - is the shorthand name for the buffer).

Note

The tar command can back up to a raw device (one with no file system) and to a formatted partition. For example, the following command backs up those directories to device /dev/hdd (not /dev/hda1, but to the unformatted device itself):

matthew@seymour:~$ sudo tar cvzf /dev/hdd  /boot  /etc /home
Restoring Files from an Archive with tar

The xp option with tar restores the files from a backup and preserves the file attributes, as well, and tar creates any subdirectories it needs. Be careful when using this option because the backups might have been created with either relative or absolute paths. You should use the tvf option with tar to list the files in the archive before extracting them so that you know where they will be placed.

For example, to restore a tar archive compressed with bzip2, use the following:

matthew@seymour:~$ sudo tar xjvf ubuntutest.tar.bz2

To list the contents of a tar archive compressed with bzip2, use this:

matthew@seymour:~$ sudo tar tjvf ubuntutest.tar.bz2
tar: Record size = 8 blocks

drwxr-xr-x matthew/matthew         0 2013-07-08 14:58 other/

-rwxr-xr-x matthew/matthew      1856 2013-04-29 14:37 other/matthew helmke
                                cccv]public.asc


-rwxr-xr-x matthew/matthew       170 2013-05-28 18:11 backup.sh

-rwxr-xr-x matthew/matthew      1593 2013-10-11 10:38 backup method

Note that because the pathnames do not start with a backslash, they are relative pathnames and will install in your current working directory. If they were absolute pathnames, they would install exactly where the paths state.

The GNOME File Roller

The GNOME desktop file archiving graphical application File Roller (file-roller) views, extracts, and creates archive files using tar, gzip, bzip, compress, zip, rar, lha, and several other compression formats. Note that File Roller is only a front end to the command-line utilities that actually provide these compression formats; if a format is not installed, File Roller cannot use that format.

Caution

File Roller does not complain if you select a compression format that is not supported by installed software until after you attempt to create the archive. So be sure to install any needed compression utilities before you use File Roller.

File Roller is well integrated with the GNOME desktop environment to provide convenient drag-and-drop functionality with the Nautilus file manager. To create a new archive, select Archive, New to open the New Archive dialog box and navigate to the directory where you want the archive to be kept. Type your archive’s name in the Selection: /root text box at the bottom of the New Archive dialog box. Use the Archive Type drop-down menu to select a compression method. Then drag the files that you want to be included from Nautilus into the empty space of the File Roller window, and the animated icons show that files are being included in the new archive. When you have finished, a list of files appears in the previously blank File Roller window. To save the archive, select Archive, Close. Opening an archive is as easy as using the Archive, Open dialog to select the appropriate archive file. You can learn more at https://help.ubuntu.com/community/File%20Roller.

The KDE ark Archiving Tool

Ubuntu also offers the KDE ark and kdat GUI tools for backups; they are installed only if you select the KDE desktop during installation, but you can search through Synaptic to find them. Archiving has traditionally been a function of system administrator and not seen as a task for individual users, so no elaborate GUI was believed necessary. Backing up has also been seen as a script-driven, automated task in which a GUI is not as useful. Although that’s true for system administrators, home users usually want something a little more attractive and easier to use, and that’s the gap ark was created to fill.

You launch ark by launching it from the command line. It is integrated with the KDE desktop (as File Roller is with GNOME), so it might be a better choice if you use KDE. This application provides a graphical interface for viewing, creating, adding to, and extracting from archived files. Several configuration options are available with ark to ensure its compatibility with Microsoft Windows. You can drag and drop from the KDE desktop or Konqueror file browser to add or extract files, or you can use the ark menus.

As long as the associated command-line programs are installed, ark can work with tar, gzip, bzip2, zip, and lha files (the last four being compression methods used to save space through compaction of the archived files).

Existing archives are opened after launching the application itself. You can add files and directories to the archive or delete them from the archive. After opening the archive, you can extract all of its contents or individual files. You can also perform searches by using patterns (all *.jpg files, for example) to select files.

To create new archives, choose File, New, and then type the name of the archive, providing the appropriate extension (.tar, .gz, and so on). Then you can add files and directories as you desire.

Déjà Dup

Déjà Dup is a simple backup tool with a useful GUI. It supports local, remote, or cloud backups. It can encrypt and compress your data for secure and fast transfers and more. In the applications list, Ubuntu just calls it Backups (see Figure 17.1).

Images

FIGURE 17-1 The Backups icon is easy to find.

After you open the Backups application, go through the menu items on the left to set where the backup will be stored, what will be backed up, a schedule for automatic backups, and more (see Figure 17.2). When you have set everything to your taste, remember to turn on Déjà Dup by toggling the setting at the upper right from Off to On.

Images

FIGURE 17-2 Backup settings are accessed using the menu entries on the left.

Back In Time

Back In Time is a viable alternative to Déjà Dup for many users. It is easily available from the Ubuntu software repositories, is stable, and has a clear and easy-to-understand interface.

Back In Time uses rsync, diff, and cp to monitor, create, and manipulate files, and it uses cron to schedule when it will run. Using these command-line tools is described later in this chapter. Back In Time is little more than a well-designed GUI front end designed for GNOME and also offers a separate package in the repositories with a front end for KDE. If you use the standard Ubuntu interface, install a package called nautilus-actions to get context menu access to some of the backup features.

The first time you run Back In Time, it takes a snapshot of your drive. This may take a long time, depending on the amount of data you have. You designate which files and directories to back up and where to back them up. Then set when to schedule the backup. The program takes care of the rest.

To restore, select the most recent snapshot from the list in Back In Time. Then browse through the list of directories and files until you find the file that interests you. You may right-click the file to view a pop-up menu, from which you may open a file, copy a file to a desired location, or view the various snapshots of a file and compare them to determine which one you might want to restore.

Back In Time keeps multiple logs of actions and activities, file changes, and versions, and it is a useful tool.

You can find the official documentation for Back In Time at https://backintime.readthedocs.io/.

Unison

Unison is a file-synchronization tool that works on multiple platforms, including Linux, other flavors of UNIX such as Solaris and macOS, and Windows. After Unison is set up, it synchronizes files in both directions and across platforms. If changes are made on both ends, files are updated in both directions. When file conflicts arise, such as when the same file was modified on each system, the user is prompted to decide what to do. Unison can connect across a network using many protocols, including ssh. It can connect with and synchronize many systems at the same time and even to the cloud.

Unison was developed at the University of Pennsylvania as a research project among several academics. It is no longer under active development as a research project, but it does appear to continue to be maintained with bug fixes and very occasional feature additions. The original developers claim to still be using it daily, so it is not completely abandoned.

Unison is powerful and configurable. The foundation is based on rsync, but with some additions that enable functionality that is generally available only from a version control system.

Even though the project is no longer the primary focus of any of the developers, many people still use Unison. For that reason, it gets a mention in this chapter and might be worthy of your time and effort if you are interested. Unison is released under the free GPL license, so you might decide you want to dig in to the code. The developers have publicly stated that they do not have time to maintain it regularly but welcome patches and contributions. If this is a project that interests you, see www.cis.upenn.edu/~bcpierce/unison/.

Amanda

Amanda is a powerful network backup application created by the University of Maryland at College Park. Amanda is a robust backup and restore application best suited to unattended backups with an autoloading tape drive of adequate capacity. It benefits from good user support and documentation.

Amanda’s features include compression and encryption. It is intended for use with high-capacity tape drives, floptical, CD-R, and CD-RW devices.

Amanda uses GNU tar and dump; it is intended for unattended, automated tape backups and is not well suited for interactive or ad hoc backups. The support for tape devices in Amanda is robust, and file restoration is relatively simple. Although Amanda does not support older Macintosh clients, it uses Samba to back up Microsoft Windows clients, as well as any UNIX client that can use GNU tools (including macOS). Because Amanda runs on top of standard GNU tools, file restoration can be made using those tools on a recovery disk even if the Amanda server is not available. File compression can be done on either the client or server, thus lightening the computational load on less-powerful machines that need to be backed up.

Caution

Amanda does not support dump images larger than a single tape and requires a new tape for each run. If you forget to change a tape, Amanda continues to attempt backups until you insert a new tape, but those backups will not capture the data as you intended them to. Do not use a tape that is too small or forget to change a tape, or you will not be happy with the results.

There is no GUI for Amanda. Configuration is done in the time-honored UNIX tradition of editing text configuration files located in /etc/amanda. The default installation in Ubuntu includes a sample cron file because it is expected that you will be using cron to run Amanda regularly. The client utilities are installed with the package amanda-client; the Amanda server is called amanda-server. Install both. As far as backup schemes are concerned, Amanda calculates an optimal scheme on-the-fly and schedules it accordingly. It can be forced to adhere to a traditional scheme, but other tools are possibly better suited for that job.

The man page for Amanda (the client is amdump) is well written and useful, explaining both the configuration of Amanda and detailing the several programs that actually make up Amanda. The configuration files found in /etc/amanda are well commented; they provide a number of examples to assist you with configuration.

The program’s home page is www.amanda.org. There you can find information about subscribing to the mail list and links to Amanda-related projects and a FAQ.

Alternative Backup Software

Commercial and other freeware backup products do exist; BRU and Veritas are good examples of effective commercial backup products. Here are some useful free software backup tools that are not installed with Ubuntu:

flexbackup—This backup tool is a large file of Perl scripts that makes dump and restore easier to use. flexbackup’s command syntax can be found by using the command with the -help argument. It also can use afio, cpio, and tar to create and restore archives locally or over a network using rsh or ssh if security is a concern. Its home page is www.edwinh.org/flexbackup/. Note that it has not received any updates or changes in a very long time.

afio—This tool creates cpio formatted archives but handles input data corruption better than cpio (which does not handle data input corruption very well at all). It supports multivolume archives during interactive operation and can make compressed archives. If you feel the need to use cpio, you might want to check out afio.

Many other alternative backup tools exist, but covering all of them is beyond the scope of this book.

Copying Files

Often, when you have only a few files that you need to protect from loss or corruption, it might make sense to simply copy the individual files to another storage medium rather than create an archive of them. You can use the tar, cp, rsync, and even cpio commands to do this; you can also use a handy file management tool known as mc. tar is the traditional choice because older versions of cp did not handle symbolic links and permissions well at times, causing those attributes (characteristics of the file) to be lost; tar handled those file attributes in a better manner. cp has been improved to fix those problems, but tar is still more widely used. rsync is an excellent choice for mirroring sets of files, especially when done over a network.

To illustrate how to use file copying as a backup technique, the examples here show how to copy (not archive) a directory tree. This tree includes symbolic links and files that have special file permissions you need to keep intact.

Copying Files Using tar

One choice for copying files into another location is to use the tar command; you just create a tar file that is piped to tar to be uncompressed in the new location. To accomplish this, first change to the source directory. Then the entire command resembles this:

matthew@seymour:~$ tar -cvf files | (cd target_directory ; tar -xpf)

In this command, files is the filenames you want to include; you can use * to include the entire current directory.

When you change to the source directory and execute tar, you use the cvf arguments to do the following:

c—Creates an archive.

v—Specifies verbose; that is, lists the files processed so you can see that it is working.

f—Specifies the filename of the archive. (In this case, it is -.)

The following tar command options can be useful for creating file copies for backup purposes:

l—Stay in the local file system (so that you do not include remote volumes).

atime-preserve—Do not change access times on files, even though you are accessing them now (to preserve the old access information for archival purposes).

The contents of the tar file (held for you temporarily in the buffer, which is named -) are then piped to the second expression, which extracts the files to the target directory. In shell programming (refer to Chapter 14), enclosing an expression in parentheses causes it to operate in a subshell and be executed first.

After you change to the target directory, you use the following options with tar:

x—Extracts files from a tar archive.

p—Preserves permissions.

f—Specifies the filename, which in this case is -, the temporary buffer that holds the files archived with tar.

Compressing, Encrypting, and Sending tar Streams

The file copy techniques using the tar command in the previous section can also be used to quickly and securely copy a directory structure across a LAN or the Internet (using the ssh command). One way to make use of these techniques is to use the following command line to first compress the contents of a designated directory and then decompress the compressed and encrypted archive stream into a designated directory on a remote host:

matthew@seymour:~$ tar -cvzf data_folder | ssh remote_host '( cd ~/mybackup_dir;
tar -xvzf )'

The tar command is used to create, list, and compress the files in the directory named data_folder. The output is piped through the ssh (Secure Shell) command and sent to the remote computer named remote_host. On the remote computer, the stream is then extracted and saved in the directory named /mybackup_dir. You are prompted for a password to send the stream.

Copying Files Using cp

To copy files, you could use the cp command. The general format of the command when used for simple copying is as follows:

matthew@seymour:~$ cp -a source_directory target_directory

The -a argument is the same as -dpR:

-d—Preserves symbolic links (by not dereferencing them) and copies the files that they point to instead of copying the links.

-p—Preserves all file attributes, if possible. (File ownership might interfere.)

-R—Copies directories recursively.

You can also use the cp command to quickly replicate directories and retain permissions by using it with the -avR command-line options. Using these options preserves file and directory permissions, gives verbose output, and recursively copies and re-creates subdirectories. You can also create a log of the backup during the backup by redirecting the standard output like this:

matthew@seymour:~$ sudo cp -avR directory_to_backup destination_vol_or_dir 1 > /
root/backup_log.txt

You can get the same effect this way:

matthew@seymour:~$ sudo cp -avR ubuntu /test2 1 > /root/backup_log.txt

This example makes an exact copy of the directory named /ubuntu on the volume named /test2 and saves a backup report named backup_log.txt under /root.

Using rsync

An old favorite for backing up is rsync. One big reason for this is that rsync enables you to copy only files that have changed since the last backup. With this tool, although the initial backup might take a long time, subsequent backups are much faster. rsync is also highly configurable and can be used with removable media such as USB hard drives or over a network. Let’s look at one way to use rsync.

First, create an empty file and call it backup.sh:

matthew@seymour:~$ sudo touch backup.sh

Then, using your favorite text editor, enter the following command into the file and save it:

rsync --force --ignore-errors --delete --delete-excluded --exclude-
from=/home/matthew-exclude.txt --backup --backup-dir=`date +%Y-%m-%d` -av /
/media/externaldrive/backup/Seymour

Make the file executable:

matthew@seymour:~$ sudo chmod +x backup.sh

This command uses several options with rsync and puts them in a script that is quick and easy to remember and run. You can run the script at the command line by using sudo sh ./backup.sh or as an automated cron job.

Here is a rundown of what is going on in the command. Basically, rsync is told to copy all new and changed files (what to back up) and delete from any existing backup any files that have been deleted on the source (and back them up in a special directory, just to be safe). It is told where to place the backup copy and is given details on how to deal with specific issues in the process. (Read the rsync man page for more options and to customize to your needs.)

Following are the options used here:

--force—Forces deletion of directories in the target location that are deleted in the source, even if the directories in the destination are not empty.

--ignore-errors—Tells --delete to go ahead and delete files even when there are I/O errors.

--delete—Deletes extraneous files from destination directories.

--delete-excluded—Also deletes excluded files from destination directories.

--exclude-from=/home/matt-exclude.txt—Prevents backing up files or directories listed in this file. (It is a simple list with each excluded directory on its own line.)

--backup—Creates backups of files before deleting them from a currently existing backup.

--backup-dir='date +%Y-%m-%d'—Creates a backup directory for the previously mentioned files that looks like this: 2013-07-08. Why this format for the date? Because it is standard, as outlined in ISO 8601 (see www.iso.org/iso/home/standards/iso8601.htm). It is clear, works with scripts, and sorts beautifully, making your files easy to find.

-av—Tells rsync to use archive mode and verbose mode.

/—Denotes the directory to back up. In this case, it is the root directory of the source, so everything in the filesystem is being backed up. You could put /home here to back up all user directories or make a nice list of directories to exclude in the filesystem.

/media/externaldrive/backup/seymour—Sets the destination for the backup as the /backup/seymour directory on an external hard drive mounted at /mount/externaldrive.

To restore from this backup to the same original location, you reverse some of the details and may omit others. Something like this works nicely:

matthew@seymour:~$ rsync --force --ignore-errors --delete --delete-excluded
/media/externaldrive/backup/seymour /

This becomes even more useful when you think of ways to script its use. You could create an entry in crontab, as described in Chapter 14. Even better, you could set two computers to allow for remote SSH connections using private keys created with ssh-keygen, as described in Chapter 19, so that one could back up the files from one computer to the other computer without requiring manual login. Then you could place that in an automated script.

Version Control for Configuration Files

For safety and ease of recovery when configuration files are corrupted or incorrectly edited, the use of a version control system is recommended. In fact, this is considered an industry best practice. Many top-quality version control systems are available, such as Git, Subversion, Mercurial, and Bazaar. If you already have a favorite, perhaps one that you use for code projects, you can do what we describe in this section using that version control system. The suggestions here are to get you thinking about the idea of using version control for configuration files and to introduce a few well-used and documented options for those who are unfamiliar with version control. First, some background.

Version control systems are designed to make it easy to revert changes made to a file, even after the file has been saved. Each system does this a little bit differently, but the basic idea is that not only is the current version of the file saved, but each and every version that existed previously is also saved. Some version control systems do this by saving the entire file every time. Some use metadata to describe just the differences between versions. In any case, it is possible to roll back to a previous version of the file to restore a file to a state before changes were made. Developers who write software are well aware of the power and benefit to being able to do this quickly and easily; it is no longer required that the file editor remember the technical details of where, what, or even how a file has been edited. When a problem occurs, the file is simply restored to its previous state. The version control system is also able to inform the user where and how each file has changed at each save.

Using a version control system for configuration files means that every time a configuration is changed, those changes are recorded and tracked. This enables easy discovery of intruders (if a configuration has been changed by an unauthorized person trying to reset, say, the settings for Apache so that the intruder can allow a rogue web service or site to run on your server), easy recovery from errors and glitches, and easy discovery of new features or settings that have been enabled or included in the configuration by software upgrades.

Many older and well-known tools do this task, such as changetrack, which is quite a good example. All such tools seek to make the job of tracking changes to configuration files more easily and quickly, but with the advances in version control systems, most provide very little extra benefit. Instead of suggesting any of these tools, we think you are probably better off learning a modern and good version control system. One exception is worth a bit of discussion because of its ability to work with your software package manager, which saves you the task of remembering to commit changes to your version control system each time the package manager runs. This exception is etckeeeper.

etckeeper takes all of your /etc directory and stores the configuration files from it in a version control system repository. You can configure the program by editing the etckeeper.conf file to store data in a Git, Mercurial, Bazaar, or Subversion repository. In addition, etckeeper connects automatically to the APT package management tool used by Ubuntu and automatically commits changes made to /etc and the files in it during normal software package upgrades. Other package managers, such as Yum, can also be tracked when using other Linux distributions such as Fedora. It even tracks file metadata that is often not easily tracked by version control systems, like the permissions in /etc/shadow.

Caution

Using any version control system to track files that contain sensitive data such as passwords can be a security risk. Tracked files and the version control system itself should be treated with the same level of care as the sensitive data itself.

By default, etckeeper uses Git. On Ubuntu, this is changed to Bazaar (bzr) because it is the version control system used by Ubuntu developers. Because this is configurable, we mention just the steps here and leave it to you to adapt them for your particular favorite version control system.

First, edit /etc/etckeeper/etckeeper.conf to use your desired settings, such as the version control system to use, the system package manager being used, and whether to have changes automatically committed daily. After etckeeper is installed from the Ubuntu repositories, it must be initiated from the command line:

matthew@seymour:~$ etckeeper init

If you are only going to use etckeeper to track changes made to /etc when software updates are made using APT, you do not need to do anything else. If you edit files by hand, make sure you use your version control system’s commands to commit those changes or use the following:

matthew@seymour:~$ etckeeper commit "Changed prompt style"

The message in quotes should reflect the change just made. This makes reading logs and finding exact changes much easier later.

Recovering or reverting file changes is then done using your version control system directly. Suppose, for example, that you have made a change in /etc/bash.bashrc, the file that sets the defaults for your bash shell. You read somewhere how to change the prompt and did not like the result. However, because the changes are being tracked, you can roll it back to the previous version. Because bzr is the default for etckeeper in Ubuntu, here is how you do that with bzr. First, check the log to find the commit number for the previous change:

matthew@seymour:~$ bzr log /etc/bash.bashrc
------------------------------------------------------------
revno: 2
committer: matthew <matthew@seymour>
branch nick: seymour etc repository
timestamp: Fri 2021-07-16 11:08:22 -0700
message:
  Changed /etc/bash.bashrc
------------------------------------------------------------
revno: 1
committer: matthew <matthew@seymour>
branch nick: seymour etc repository
timestamp: Fri 2021-07-16 11:00:16 -0700
message:
  Changed /etc/bash.bashrc
------------------------------------------------------------

If you know the change was made in the most recent revision, denoted revno 2 (for revision number two), you can revert to that version:

matthew@seymour:~$ bzr revertrevision 2 /etc/bash.bashrc

Today it is common for programmers, systems administrators, and developer types to back up their dotfiles using version control. Dotfiles are the configuration files and directories in a user’s /home directory, all of which begin with a dot, like .bashrc. These are not necessarily backed up by all software, and because they are often customized by highly technical people to suit their desires, backing them up is a good idea. Version control systems are commonly used. A program for Ubuntu called dotdee performs this task for a different type of configuration file or directory that ends with .d and is stored in /etc. You can find more information about dotdee in Chapter 9, “Managing Software.”

System Rescue

There will come a time when you need to engage in system rescue efforts. This need arises when the system will not even start Linux so that you can recover any files. This problem is most frequently associated with the boot loader program or partition table, but it could be that critical system files have been inadvertently deleted or corrupted. If you have been making backups properly, these kinds of system failures are easily, though not quickly, recoverable through a full restore. Still, valuable current data might not have been backed up since the last scheduled backup, and the backup archives may be found to be corrupt, incomplete, or missing. A full restore also takes time you might not have. If the problem causing the system failure is simply a damaged boot loader, a damaged partition table, a missing library, or misconfiguration, a quick fix can get the system up and running, and the data can then be easily retrieved.

In this section, you learn a couple of quick things to try to restore a broken boot loader or recover your data when your system fails to boot.

The Ubuntu Rescue Disc

The Ubuntu installation DVD or USB drive works quite well as a live rescue system. To use it, insert the medium and reboot the computer, booting from it just as you did when you installed Ubuntu originally.

Restoring the GRUB2 Boot Loader

The easiest way to restore a broken system’s GRUB2 files is simply to replace them. Your best bet is to use installation media from the same release as what you have installed on the hard drive.

To get started, boot using the live DVD and open a terminal. Then determine which of the hard drive’s partitions holds the Ubuntu installation, which you can discover by using the following:

matthew@seymour:~$ sudo fdisk –l

You may find this block ID command useful, as it tends to return a bit more information:

matthew@seymour:~$ sudo blkid

Unless you customized your installation—in which case you probably already know your partitioning scheme and the location of your Ubuntu installation—the partition will probably be on a drive called sda on the first partition, which you can mount now by using this:

matthew@seymour:~$ sudo mount /dev/sda1 /mnt

This mounts the drive in the current file system (running from the live DVD) at /mnt, where it will be accessible to you for reading and modifying as needed. Next, you reinstall GRUB2 on this device:

matthew@seymour:~$ sudo grub-install -–boot-directory=/mnt/boot /dev/sda

At this point, reboot (using your hard drive and not the live DVD), and all should be well. After the reboot is complete, enter the following:

matthew@seymour:~$ sudo update-grub

This refreshes the GRUB2 menu and completes the restoration. You can find a lot of great information about GRUB2 at https://help.ubuntu.com/community/Grub2.

Saving Files from a Nonbooting Hard Drive

If restoring the GRUB2 boot loader fails and you still cannot boot from the hard drive, try to use the live DVD to recover your data. Boot and mount the hard drive as shown previously and then attach an external storage device such as a USB thumb drive or an external hard drive. Then copy the files you want to save from the mounted drive to the external drive.

If you cannot mount the drive at all, your options become more limited and possibly more expensive. In this case, it is likely that either the hardware has failed or the file system has become badly corrupted. Either way, recovery is either impossible or more difficult and best left to experts if the data is important to you. But, the good news is that you have been making regular backups, right? So, you probably lost only a day or maybe a week of work and can buy a new drive, install it, and start from scratch, putting the data from your backup on your new Ubuntu installation on the new hardware.

Every experienced system administrator has had a drive fail; no hardware is infallible. We expect occasional hardware failures, and that’s why we have good backup and recovery schemes in place for data. There are two types of system administrators: those who lose data when this happens and those who have good schemes in place. Be forewarned and be wise.

If you cannot boot a drive and do not have a backup, which happens to most system administrators only once in their lives (because they learn from the mistake), immediately stop messing with the hard drive. Your best bet to recover the data will be very expensive, but you should look for a company that specializes in the task and pay them to do it. If your data is not worth the expense for recovery and you want to try to recover it yourself, you can try, but this is not a task for the faint of heart, and more often than not, the data is simply lost. Again, the best course is to back up regularly, check your backups to be sure they are valid, and repeat. Practice restoring from backups before you need to do it, perhaps with a test system that is not vital and will not hurt anything if you make a mistake.

References

https://help.ubuntu.com/community/BackupYourSystemAn excellent place to start for learning and examining backup methods in Ubuntu

www.tldp.orgThe Linux Documentation Project, which offers several useful HOWTO documents that discuss backups and disk recovery

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.152.136