Chapter 12
Protecting Files

  • Objective 3.6: Given a scenario, backup, restore, and compress files.

images Protecting data includes creating and managing backups. A backup, often called an archive, is a copy of data that can be restored sometime in the future should the data be destroyed or become corrupted.

Backing up your data is a critical activity. But even more important is planning your backups. These plans include choosing backup types, determining the right compression methods to employ, and identifying which utilities will serve your organization’s data needs best. You may also need to transfer your backup files over the network. In this case, ensuring that the archive is secure during transit is critical as well as validating its integrity once it arrives at its destination. All of these various topics concerning protecting your data files are covered in this chapter.

Understanding Backup Types

There are different classifications for data backups. Understanding these various categories is vital for developing your backup plan. The following backup types are the most common types:

  • System image
  • Full
  • Incremental
  • Differential
  • Snapshot
  • Snapshot clone

Each of these backup types is explored in this section. Their advantages and disadvantages are included.

System Image A system image is a copy of the operating system binaries, configuration files, and anything else you need to boot the Linux system. Its purpose is to quickly restore your system to a bootable state. Sometimes called a clone, these backups are not normally used to recover individual files or directories, and in the case of some backup utilities, you cannot do so.

Full A full backup is a copy of all the data, ignoring its modification date. This backup type’s primary advantage is that it takes a lot less time than other types to restore a system’s data. However, not only does it take longer to create a full backup compared to the other types, it requires more storage. It needs no other backup types to restore a system fully.

Incremental An incremental backup only makes a copy of data that has been modified since the last backup operation (any backup operation type). Typically, a file’s modified timestamp is compared to the last backup type’s timestamp. It takes a lot less time to create this backup type than the other types, and it requires a lot less storage space. However, the data restoration time for this backup type can be significant. Imagine that you performed a full backup copy on Monday and incremental backups on Tuesday through Friday. On Saturday the disk crashes and must be replaced. After the disk is replaced, you will have to restore the data using Monday’s backup and then continue to restore data using the incremental backups created on Tuesday through Friday. This is very time-consuming and will cause significant delays in getting your system back in operation. Therefore, for optimization purposes, it requires a full backup to be completed periodically.

Differential A differential backup makes a copy of all data that has changed since the last full backup. It could be considered a good balance between full and incremental backups. This backup type takes less time than a full backup but potentially more time than an incremental backup. It requires less storage space than a full backup but more space than a plain incremental backup. Also, it takes a lot less time to restore using differential backups than incremental backups, because only the full backup and the latest differential backup are needed. For optimization purposes, it requires a full backup to be completed periodically.

Snapshot A snapshot backup is considered a hybrid approach, and it is a slightly different flavor of backups. First a full (typically read-only) copy of the data is made to backup media. Then pointers, such as hard links, are employed to create a reference table linking the backup data with the original data. The next time a backup is made, instead of a full backup, an incremental backup occurs (only modified or new files are copied to the backup media), and the pointer reference table is copied and updated. This saves space because only modified files and the updated pointer reference table need to be stored for each additional backup.

images The snapshot backup type described here is a copy-on-write snapshot. There is another snapshot flavor called a split-mirror snapshot, where the data is kept on a mirrored storage device. When a backup is run, a copy of all the data is created, not just new or modified data.

With a snapshot backup, you can go back to any point in time and do a full system restore from that point. It also uses a lot less space than the other backup types. In essence, snapshots simulate multiple full backups per day without taking up the same space or requiring the same processing power as a full backup type would. The rsync utility (described later in this chapter) uses this method.

Snapshot Clone Another variation of a snapshot backup is a snapshot clone. Once a snapshot is created, such as an LVM snapshot, it is copied, or cloned. Snapshot clones are useful in high data IO environments. When performing the cloning, you minimize any adverse performance impacts to production data IO because the clone backup takes place on the snapshot and not on the original data.

While not all snapshots are writable, snapshot clones are typically modifiable. If you are using LVM, you can mount these snapshot clones on a different system. Thus, a snapshot clone is useful in disaster recovery scenarios.

Your particular server environment as well as data protection needs will dictate which backup method to employ. Most likely you need a combination of the preceding types to properly protect your data.

Looking at Compression Methods

Backing up data can potentially consume large amounts of additional disk or media space. Depending upon the backup types you employ, you can reduce this consumption via data compression utilities. The following popular utilities are available on Linux:

  • gzip
  • bzip2
  • xz
  • zip

The advantages and disadvantages of each of these data compression methods are explored in this section.

gzip The gzip utility was developed in 1992 as a replacement for the old compress program. Using the Lempel-Ziv (LZ77) algorithm to achieve text-based file compression rates of 60–70%, gzip has long been a popular data compression utility. To compress a file, simply type in gzip followed by the file’s name. The original file is replaced by a compressed version with a .gz file extension. To reverse the operation, type in gunzip followed by the compressed file’s name.

bzip2 Developed in 1996, the bzip2 utility offers higher compression rates than gzip but takes slightly longer to perform the data compression. The bzip2 utility employs multiple layers of compression techniques and algorithms. Until 2013, this data compression utility was used to compress the Linux kernel for distribution. To compress a file, simply type in bzip2 followed by the file’s name. The original file is replaced by a compressed version with a .bz2 file extension. To reverse the operation, type in bunzip2 followed by the compressed file’s name, which decompresses (deflates) the data.

imagesOriginally there was a bzip utility program. However, in its layered approach, a patented data compression algorithm was employed. Thus, bzip2 was created to replace it and uses the Huffman coding algorithm instead, which is patent free.

xz Developed in 2009, the xz data compression utility quickly became very popular among Linux administrators. It boasts a higher default compression rate than bzip2 and gzip via the LZMA2 compression algorithm. Though, with certain xz command options, you can employ the legacy LZMA compression algorithm, if needed or desired. The xz compression utility in 2013 replaced bzip2 for compressing the Linux kernel for distribution. To compress a file, simply type in xz followed by the file’s name. The original file is replaced by a compressed version with an .xz file extension. To reverse the operation, type in unxz followed by the compressed file’s name.

zip The zip utility is different from the other data compression utilities in that it operates on multiple files. If you have ever created a zip file on a Windows operating system, then you’ve used this file format. Multiple files are packed together in a single file, often called a folder or an archive file, and then compressed. Another difference from the other Linux compression utilities is that zip does not replace the original file(s). Instead it places a copy of the file(s) into the archive file.

To archive and compress files with zip, type in zip followed by the final archive file’s name, which traditionally ends in a .zip extension. After the archive file, type in one or more files you desire to place into the compressed archive, separating them with a space. The original files remain intact, but a copy of them is placed into the compressed zip archive file. To reverse the operation, type in unzip followed by the compressed archive file’s name.

It’s helpful to see a side-by-side comparison of the various compression utilities using their defaults. In Listing 12.1, an example on a CentOS Linux distribution is shown.

Listing 12.1: Comparing the various Linux compression utilities

# cp /var/log/wtmp wtmp
#
# cp wtmp wtmp1
# cp wtmp wtmp2
# cp wtmp wtmp3
# cp wtmp wtmp4
#
# ls -lh wtmp?
-rw-r--r--. 1 root root 210K Oct  9 19:54 wtmp1
-rw-r--r--. 1 root root 210K Oct  9 19:54 wtmp2
-rw-r--r--. 1 root root 210K Oct  9 19:54 wtmp3
-rw-r--r--. 1 root root 210K Oct  9 19:54 wtmp4
#
# gzip wtmp1
# bzip2 wtmp2
# xz wtmp3
# zip wtmp4.zip wtmp4
  adding: wtmp4 (deflated 96%)
#
# ls -lh wtmp?.*
-rw-r--r--. 1 root root 7.7K Oct  9 19:54 wtmp1.gz
-rw-r--r--. 1 root root 6.2K Oct  9 19:54 wtmp2.bz2
-rw-r--r--. 1 root root 5.2K Oct  9 19:54 wtmp3.xz
-rw-r--r--. 1 root root 7.9K Oct  9 19:55 wtmp4.zip
#
# ls wtmp?
wtmp4
#

In Listing 12.1, first the /var/log/wtmp file is copied to the local directory using super user privileges. Four copies of this file are then made. Using the ls -lh command, you can see in human-readable format that the wtmp files are 210K in size. Next, the various compression utilities are employed. Notice that when using the zip command, you must give it the name of the archive file, wtmp4.zip, and follow it with any file names. In this case, only wtmp4 is put into the zip archive. After the files are compressed with the various utilities, another ls -lh command is issued in Listing 12.1. Notice the various file extension names as well as the files’ compressed sizes. You can see that the xz program produces the highest compression of this file, because its file is the smallest in size. The last command in Listing 12.1 shows that all the compression programs but zip removed the original file.

images For the previous data compression utilities, you can specify the level of compression and control the speed via the -# option. The # is a number from 1 to 9, where 1 is the fastest but lowest compression and 9 is the slowest but highest compression method. The zip utility does not yet support these levels for compression, but it does for decompression. Typically, the utilities use -6 as the default compression level. It is a good idea to review these level specifications in each utility’s man page, as there are useful but subtle differences.

There are many compression methods. However, when you use a compression utility along with an archive and restore program for data backups, it is vital that you use a lossless compression method. A lossless compression is just as it sounds: no data is lost. The gzip, bzip2, xz, and zip utilities provide lossless compression. Obviously it is important not to lose data when doing backups.

Comparing Archive and Restore Utilities

There are several programs you can employ for managing backups. Some of the more popular products are Amanda, Bacula, Bareos, Duplicity, and BackupPC. Yet, often these GUI and/or web-based programs have command-line utilities at their core. Our focus here is on those command-line utilities:

  • cpio
  • dd
  • rsync
  • tar

Copying with cpio

The cpio utility’s name stands for “copy in and out.” It gathers together file copies and stores them in an archive file. The program has several useful options. The more commonly used ones are described in Table 12.1.

Table 12.1 The cpio command’s commonly used options

Short Long Description
-I N/A Designates an archive file to use.
-i --extract Copies files from an archive or displays the files within the archive, depending upon the other options employed. Called copy-in mode.
N/A --no-absolute- filenames Designates that only relative path names are to be used. (The default is to use absolute path names.)
-o --create Creates an archive by copying files into it. Called copy-out mode.
-t --list Displays a list of files within the archive. This list is called a table of contents.
-v --verbose Displays each file’s name as each file is processed.

To create an archive using the cpio utility, you have to generate a list of files and then pipe them into the command. Listing 12.2 shows an example of doing this task.

Listing 12.2: Employing cpio to create an archive

$ ls Project4?.txt
Project42.txt  Project43.txt  Project44.txt
Project45.txt  Project46.txt
$
$ ls Project4?.txt | cpio -ov > Project4x.cpio
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
59 blocks
$
$ ls Project4?.*
Project42.txt  Project44.txt  Project46.txt
Project43.txt  Project45.txt  Project4x.cpio
$

Using the ? wildcard and the ls command, various text files within the present working directory are displayed first in Listing 12.2. This command is then used, and its STDOUT is piped as STDIN to the cpio utility. (See Chapter 4, if you need a refresher on STDOUT and STDIN.) The options used with the cpio command are -ov, which create an archive containing copies of the listed files. They also display the file’s name as they are copied into the archive. The archive file used is named Project4x.cpio. Though not necessary, it is considered good form to use the cpio extension on cpio archive files.

images You can back up data based upon its metadata, and not its file location, via the cpio utility. For example, suppose you want to create a cpio archive for any files within the virtual directory system owned by the JKirk user account. You can use the find / -user JKirk command and pipe it into the cpio utility in order to create the archive file. This is a handy feature.

You can view the files stored within a cpio archive fairly easily. Just employ the cpio command again, and use its -itv options and the -I option to designated the archive file, as shown in Listing 12.3.

Listing 12.3: Using cpio to list an archive’s contents

$ cpio -itvI Project4x.cpio
-rw-r--r--   1 Christin Christin 29900 Aug 19 17:37 Project42.txt
-rw-rw-r--   1 Christin Christin     0 Aug 19 18:07 Project43.txt
-rw-rw-r--   1 Christin Christin     0 Aug 19 18:07 Project44.txt
-rw-rw-r--   1 Christin Christin     0 Aug 19 18:07 Project45.txt
-rw-rw-r--   1 Christin Christin     0 Aug 19 18:07 Project46.txt
59 blocks
$

Though not displayed in Listing 12.3, the cpio utility maintains each file’s absolute directory reference. Thus, it is often used to create system image and full backups.

To restore files from an archive, employ just the -ivI options. However, because cpio maintains the files’ absolute paths, this can be tricky if you need to restore the files to another directory location. To do this, you need to use the --no-absolute-filenames option, as shown in Listing 12.4.

Listing 12.4: Using cpio to restore files to a different directory location

$ ls -dF Projects
Projects/
$
$ mv Project4x.cpio Projects/
$
$ cd Projects
$ pwd
/home/Christine/Answers/Projects
$
$ ls Project4?.*
Project4x.cpio
$
$ cpio -iv --no-absolute-filenames -I Project4x.cpio
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
59 blocks
$
$ ls Project4?.*
Project42.txt  Project44.txt  Project46.txt
Project43.txt  Project45.txt  Project4x.cpio
$

In Listing 12.4 the Project4x.cpio archive file is moved into a preexisting subdirectory, Projects. By stripping the absolute path names from the archived files via the --no-absolute-filenames option, you restore the files to a new directory location. If you wanted to restore the files to their original location, simply leave that option off and just use the other cpio switches shown in Listing 12.4.

Archiving with tar

The tar utility’s name stands for tape archiver, and it is popular for creating data backups. As with cpio, with the tar command, the selected files are copied and stored in a single file. This file is called a tar archive file. If this archive file is compressed using a data compression utility, the compressed archive file is called a tarball.

The tar program has several useful options. The more commonly used ones for creating data backups are described in Table 12.2.

Table 12.2 The tar command’s commonly used tarball creation options

Short Long Description
-c --create Creates a tar archive file. The backup can be a full or incremental backup, depending upon the other selected options.
-u --update Appends files to an existing tar archive file, but only copies those files that were modified since the original archive file was created.
-g --listed-incremental Creates an incremental or full archive based upon metadata stored in the provided file.
-z --gzip Compresses tar archive file into a tarball using gzip.
-j --bzip2 Compresses tar archive file into a tarball using bzip2.
-J --xz Compresses tar archive file into a tarball using xz.
-v --verbose Displays each file’s name as each file is processed.

To create an archive using the tar utility, you have to add a few arguments to the options and the command. Listing 12.5 shows an example of creating a tar archive.

Listing 12.5: Using tar to create an archive file

$ ls Project4?.txt
Project42.txt  Project43.txt  Project44.txt
Project45.txt  Project46.txt
$
$ tar -cvf Project4x.tar Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$

In the Listing 12.5, three options are used. The -c option creates the tar archive. The -v option displays the file names as they are placed into the archive file. Finally, the -f option designates the archive file name, which is Project42x.tar. Though not required, it is considered good form to use the .tar extension on tar archive files. The command’s last argument designates the files to copy into this archive.

images You can also use the old-style tar command options. For this style, you remove the single dash from the beginning of the tar option. For example, -c becomes c. Keep in mind that additional old-style tar command options must not have spaces between them. Thus, tar cvf is valid, but tar c v f is not.

If you are backing up lots of files or large amounts of data, it is a good idea to employ a compression utility. This is easily accomplished by adding an additional switch to your tar command options. An example is shown in Listing 12.6, which uses gzip compression to create a tarball.

Listing 12.6: Using tar to create a tarball

$ tar -zcvf Project4x.tar.gz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls Project4x.tar.gz
Project4x.tar.gz
$

Notice in Listing 12.6 that the tarball file name has the .tar.gz file extension. It is considered good form to use the .tar extension and tack on an indicator showing the compression method that was used. However, you can shorten it to .tgz if desired.

There is a useful variation of this command to create both full and incremental backups. A simple example helps to explain this concept. The process for creating a full backup is shown in Listing 12.7.

Listing 12.7: Using tar to create a full backup

$ tar -g FullArchive.snar -Jcvf Project42.txz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls FullArchive.snar Project42.txz
FullArchive.snar  Project42.txz
$

Notice the -g option in Listing 12.7. The -g option creates a file, called a snapshot file, FullArchive.snar. The .snar file extension indicates that the file is a tarball snapshot file. The snapshot file contains metadata used in association with tar commands for creating full and incremental backups. The snapshot file contains file timestamps, so the tar utility can determine if a file has been modified since it was last backed up. The snapshot file is also used to identify any files that are new or determine if files have been deleted since the last backup.

The previous example created a full backup of the designated files along with the metadata snapshot file, FullArchive.snar. Now the same snapshot file will be used to help determine if any files have been modified, are new, or have been deleted to create an incremental backup as shown in Listing 12.8.

Listing 12.8: Using tar to create an incremental backup

$ echo "Answer to everything" >> Project42.txt
$
$ tar -g FullArchive.snar -Jcvf Project42_Inc.txz Project4?.txt
Project42.txt
$
$ ls Project42_Inc.txz
Project42_Inc.txz
$

In Listing 12.8, the file Project42.txt is modified. Again, the tar command uses the -g option and points to the previously created FullArchive.snar snapshot file. This time, the metadata within FullArchive.snar shows the tar command that the Project42.txt file has been modified since the previous backup. Therefore, the new tarball only contains the Project42.txt file, and it is effectively an incremental backup. You can continue to create additional incremental backups using the same snapshot file as needed.

images The tar command views full and incremental backups in levels. A full backup is one that includes all of the files indicated, and it is considered a level 0 backup. The first tar incremental backup after a full backup is considered a level 1 backup. The second tar incremental backup is considered a level 2 backup, and so on.

Whenever you create data backups, it is a good practice to verify them. Table 12.3 provides some tar command options for viewing and verifying data backups.

Table 12.3 The tar command’s commonly used archive verification options

Short Long Description
-d

--compare

--diff

Compares a tar archive file’s members with external files and lists the differences.
-t --list Displays a tar archive file’s contents.
-W --verify Verifies each file as the file is processed. This option cannot be used with the compression options.

Backup verification can take several different forms. You might ensure that the desired files (sometimes called members) are included in your backup by using the -v option on the tar command in order to watch the files being listed as they are included in the archive file. You can also verify that desired files are included in your backup after the fact. Use the -t option to list tarball or archive file contents. An example is shown in Listing 12.9.

Listing 12.9: Using tar to list a tarball’s contents

$ tar -tf Project4x.tar.gz
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$

You can verify files within an archive file by comparing them against the current files. The option to accomplish this task is the -d option. An example is shown in Listing 12.10.

Listing 12.10: Using tar to compare tarball members to external files

$ tar -df Project4x.tar.gz
Project42.txt: Mod time differs
Project42.txt: Size differs
$

Another good practice is to verify your backup automatically immediately after the tar archive is created. This is easily accomplished by tacking on the -W option, as shown in Listing 12.11.

Listing 12.11: Using tar to verify backed-up files automatically

$ tar -Wcvf ProjectVerify.tar Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
Verify Project42.txt
Verify Project43.txt
Verify Project44.txt
Verify Project45.txt
Verify Project46.txt
$

You cannot use the -W option if you employ compression to create a tarball. However, you could create and verify the archive first and then compress it in a separate step. You can also use the -W option when you extract files from a tar archive. This is handy for instantly verifying files restored from archives.

Table 12.4 lists some of the options that you can use with the tar utility to restore data from a tar archive file or tarball. Be aware that several options used to create the backup, such as -g and -W, can also be used when restoring data.

Table 12.4 The tar command’s commonly used file restore options

Short Long Description
-x

--extract

--get

Extracts files from a tarball or archive file and places them in the current working directory
-z --gunzip Decompresses files in a tarball using gunzip
-j --bunzip2 Decompresses files in a tarball using bunzip2
-J --unxz Decompresses files in a tarball using unxz

To extract files from an archive or tarball is fairly simple using the tar utility. Listing 12.12 shows an example of extracting files from a previously created tarball.

Listing 12.12: Using tar to extract files from a tarball

$ mkdir Extract
$
$ mv Project4x.tar.gz Extract/
$
$ cd Extract
$
$ tar -zxvf Project4x.tar.gz
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls
Project42.txt  Project44.txt  Project46.txt
Project43.txt  Project45.txt  Project4x.tar.gz
$

In Listing 12.12, a new subdirectory, Extract, is created. The tarball created back in Listing 12.6 is moved to the new subdirectory, and then the files are restored from the tarball. If you compare the tar command used in this listing to the one used in Listing 12.6, you’ll notice that here the -x option was substituted for the -c option used in Listing 12.6. Also notice in Listing 12.12, that the tarball is not removed after a file extraction, so you can use it again and again, as needed.

images The tar command has many additional capabilities, such as using tar backup parameters and/or the ability to create backup and restore shell scripts. Take a look at GNU tar website, https://www.gnu.org/software/tar/manual/, to learn more about this popular command-line backup utility.

Since the tar utility is the tape archiver, you can also place your tarballs or archive files on tape, if desired. After mounting and properly positioning your tape, simply substitute your SCSI tape device file name, such as /dev/st0 or /dev/nst0, in place of the archive or tarball file name within your tar command.

Duplicating with dd

The dd utility allows you to back up nearly everything on a disk, including the old Master Boot Record (MBR) partitions some older Linux distributions still employ. It’s primarily used to create low-level copies of an entire hard drive or partition. It is often used in digital forensics, for creating system images, for copying damaged disks, and for wiping partitions.

The command itself is fairly straightforward. The basic syntax structure for the dd utility is as follows:

dd  if=input-device of=output-device [OPERANDS]

The output-device is either an entire drive or a partition. The input-device is the same. Just make sure that you get the right device for out and the right one for in, otherwise you may unintentionally wipe data.

Besides the of and if, there are a few other arguments (called operands) that can assist in dd operations. The more commonly used ones are described in Table 12.5.

Table 12.5 The dd command’s commonly used operands

Operand Description
bs=BYTES Sets the maximum block size (number of BYTES) to read and write at a time. The default is 512 bytes.
count=N Sets the number (N) of input blocks to copy.
status=LEVEL Sets the amount (LEVEL) of information to display to STDERR.

The status=LEVEL operand needs a little more explanation. LEVEL can be set to one of the following:

  • none only displays error messages.
  • noxfer does not display final transfer statistics.
  • progress displays periodic transfer statistics.

It is usually easier to understand the dd utility through examples. A snipped example of performing a bit-by-bit copy of one entire disk to another disk is shown in Listing 12.13.

Listing 12.13: Using dd to copy an entire disk

# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
[…]
sdb               8:16   0    4M  0 disk
└─sdb1            8:17   0    4M  0 part
sdc               8:32   0    1G  0 disk
└─sdc1            8:33   0 1023M  0 part
[…]
#
# dd if=/dev/sdb of=/dev/sdc status=progress
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB) copied, 0.232975 s, 18.0 MB/s
#

In Listing 12.13, the lsblk command is used first. When copying disks via the dd utility, it is prudent to make sure the drives are not mounted anywhere in the virtual directory structure. The two drives involved in this operation, /dev/sdb and /dev/sdc, are not mounted. With the dd command, the if operand is used to indicate the disk we wish to copy, which is the /dev/sdb drive. The of operand indicates that the /dev/sdc disk will hold the copied data. Also the status=progress will display period transfer statistics. You can see in Listing 12.13 from the transfer statistics that there is not much data on /dev/sdb, so the dd operation finished quickly.

You can also create a system image backup using a dd command similar to the one in shown in Listing 12.13, with a few needed modifications. The basic steps are as follows:

  1. Shut down your Linux system.
  2. Attach the necessary spare drives. You’ll need one drive the same size or larger for each system drive.
  3. Boot the system using a live CD, DVD, or USB so that you can either keep the system’s drives unmounted or unmount them prior to the backup operation.
  4. For each system drive, issue a dd command, specifying the drive to back up with the if operand and the spare drive with the of operand.
  5. Shut down the system, and remove the spare drives containing the system image.
  6. Reboot your Linux system.

If you have a disk you are getting rid of, you can also use the dd command to zero out the disk. An example is shown in Listing 12.14.

Listing 12.14: Using dd to zero an entire disk

# dd if=/dev/zero of=/dev/sdc status=progress
1061724672 bytes (1.1 GB) copied, 33.196299 s, 32.0 MB/s
dd: writing to ’/dev/sdc’: No space left on device
2097153+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 34.6304 s, 31.0 MB/s
#

The if=/dev/zero uses the zero device file to write zeros to the disk. You need to perform this operation at least 10 times or more to thoroughly wipe the disk. You can also employ the /dev/random and/or the /dev/urandom device files to put random data onto the disk. This particular task can take a long time to run for large disks. It is still better to shred any disks that will no longer be used by your company.

Replicating with rsync

Originally covered in Chapter 3, the rsync utility is known for speed. With this program, you can copy files locally or remotely, and it is wonderful for creating backups.

Before exploring the rsync program, it is a good idea to review a few of the commonly used options. Table 3.4 in Chapter 3 contains the more commonly used rsync options. Besides the options listed in Table 3.4, there are a few additional switches that help with secure data transfers via the rsync utility:

  • The -e or, --rsh, option changes the program to use for communication between a local and remote connection. The default is OpenSSH.
  • The -z or, --compress, option compresses the file data during the transfer.

Back in Chapter 3 we briefly mentioned the archive option, -a (or --archive), which directs rsync to perform a backup copy. However, it needs a little more explanation. This option is the equivalent of using the -rlptgoD options and does the following:

  • Directs rsync to copy files from the directory’s contents and for any subdirectory within the original directory tree, consecutively copying their contents as well (recursively).
  • Preserves the following items:
    • Device files (only if run with super user privileges)
    • File group
    • File modification time
    • File ownership (only if run with super user privileges)
    • File permissions
    • Special files
    • Symbolic links

It’s fairly simple to conduct rsync backup locally. The most popular options, -ahv, allow you to back up files to a local location quickly, as shown in Listing 12.15.

Listing 12.15: Using rsync to back up files locally

$ ls -sh *.tar
40K Project4x.tar  40K ProjectVerify.tar
$
$ mkdir TarStorage
$
$ rsync -avh *.tar TarStorage/
sending incremental file list
Project4x.tar
ProjectVerify.tar

sent 82.12K bytes  received 54 bytes  164.35K bytes/sec
total size is 81.92K  speedup is 1.00
$
$ ls TarStorage
Project4x.tar  ProjectVerify.tar
$

Where the rsync utility really shines is with protecting files as they are backed up over a network.

For a secure remote copy to work, you need the OpenSSH service up and running on the remote system. In addition, the rsync utility must be installed on both the local and remote machines. An example of using the rsync command to securely copy files over the network is shown in Listing 12.16.

Listing 12.16: Using rsync to back up files remotely

$ ls -sh *.tar
40K Project4x.tar  40K ProjectVerify.tar
$
$ rsync -avP -e ssh *.tar [email protected]:~
[email protected]’s password:
sending incremental file list
Project4x.tar
      40,960 100%    7.81MB/s    0:00:00 (xfr#1, to-chk=1/2)
ProjectVerify.tar
      40,960 100%   39.06MB/s    0:00:00 (xfr#2, to-chk=0/2)

sent 82,121 bytes  received 54 bytes  18,261.11 bytes/sec
total size is 81,920  speedup is 1.00
$

Notice in Listing 12.16 that the -avP options are used with the rsync utility. These options not only set the copy mode to archive but will provide detailed information as the file transfers take place. The important switch to notice in this listing is the -e option. This option determines that OpenSSH is used for the transfer and effectively creates an encrypted tunnel so that anyone sniffing the network cannot see the data flowing by. The *.tar in the command simply selects what local files are to be copied to the remote machine. The last argument in the rsync command specifies the following:

  • The user account (user1) located at the remote system to use for the transfer.
  • The remote system’s IPv4 address, but a hostname can be used instead.
  • Where the files are to be placed. In this case, it is the home directory, indicated by the ~ symbol.

Notice also in that last argument that there is a needed colon (:) between the IPv4 address and the directory symbol. If you do not include this colon, you will copy the files to a new file named in the local directory.

images The rsync utility uses OpenSSH by default. However, it’s good practice to use the -e option. This is especially true if you are using any ssh command options, such as designating an OpenSSH key to employ or using a different port than the default port of 22. OpenSSH is covered in more detail in Chapter 16.

The rsync utility can be handy for copying large files to remote media. If you have a fast CPU but a slow network connection, you can speed things up even more by employing the rsync -z option to compress the data for transfer. This is not using gzip compression but instead applying compression via the zlib compression library. You can find more out about zlib at https://zlib.net/.

Securing Offsite/Off-System Backups

In business, data is money. Thus it is critical not only to create data archives but also to protect them. There are a few additional ways to secure your backups when they are being transferred to remote locations.

Besides rsync, you can use the scp utility, which is based on the Secure Copy Protocol (SCP). Also, the sftp program, which is based on the SSH File Transfer Protocol (SFTP), is a means for securely transferring archives. We’ll cover both utilities in the following sections.

Copying Securely via scp

The scp utility is geared for quickly transferring files in a noninteractive manner between two systems on a network. This program employs OpenSSH.

It is best used for small files that you need to securely copy on the fly, because if it gets interrupted during its operation, it cannot pick back up where it left off. For larger files or more extensive numbers of files, it is better to employ either the rsync or sftp utility.

There are some rather useful scp options. A few of the more commonly used switches are listed in Table 12.6

Table 12.6 The scp command’s commonly used copy options

Short Description
-C Compresses the file data during transfer
-p Preserves file access and modification times as well as file permissions
-r Copies files from the directory’s contents, and for any subdirectory within the original directory tree, consecutively copies their contents as well (recursively)
-v Displays verbose information concerning the command’s execution

Performing a secure copy of files from a local system to a remote system is rather simple. You do need the OpenSSH service up and running on the remote system. An example is shown in Listing 12.17.

Listing 12.17: Using scp to copy files securely to a remote system

$ scp Project42.txt  [email protected]:~
[email protected]’s password:
Project42.txt   100%   29KB  20.5MB/s   00:00
$

Notice that to accomplish this task, no scp command options are employed. The -v option gives a great deal of information that is not needed in this case.

images The scp utility will overwrite any remote files with the same name as the one being transferred without any asking or even a message stating that fact. You need to be careful when copying files using scp that you don’t tromp on any existing files.

A handy way to use scp is to copy files from one remote machine to another remote machine. An example is shown in Listing 12.18.

Listing 12.18: Using scp to copy files securely from/to a remote system

$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.101/24
$
$ scp [email protected]:Project42.txt [email protected]:~
[email protected]’s password:
[email protected]’s password:
Project42.txt                100%   29KB   4.8MB/s   00:00
Connection to 192.168.0.104 closed.
$

First in Listing 12.18, the current machine’s IPv4 address is checked using the ip addr show command. Next the scp utility is employed to copy the Project42.txt file from one remote machine to another. Of course, you must have OpenSSH running on these machines and have a user account you can log into as well.

Transferring Securely via sftp

The sftp utility will also allow you to transfer files securely across the network. However, it is designed for a more interactive experience. With sftp, you can create directories as needed, immediately check on transferred files, determine the remote system’s present working directory, and so on. In addition, this program also employs OpenSSH.

To get a feel for how this interactive utility works, it’s good to see a simple example. One is shown in Listing 12.19.

Listing 12.19: Using sftp to access a remote system

$ sftp [email protected]
[email protected]’s password:
Connected to 192.168.0.104.
sftp>
sftp> bye
$

In Listing 12.19, the sftp utility is used with a username and a remote host’s IPv4 address. Once the user account’s correct password is entered, the sftp utility’s prompt is shown. At this point, you are connected to the remote system. At the prompt you can enter any commands, including help to see a display of all the possible commands and, as shown in the listing, bye to exit the utility. Once you have exited the utility, you are no longer connected to the remote system.

Before using the sftp interactive utility, it’s helpful to know some of the more common commands. A few are listed in Table 12.7.

Table 12.7 The sftp command’s commonly used commands

Command Description
bye Exits the remote system and quits the utility.
exit Exits the remote system and quits the utility.
get Gets a file (or files) from the remote system and stores it (them) on the local system. Called downloading.
reget Resumes an interrupted get operation.
put Sends a file (or files) from the local system and stores it (them) on the remote system. Called uploading.
reput Resumes an interrupted put operation.
ls Displays files in the remote system’s present working directory.
lls Displays files in the local system’s present working directory.
mkdir Creates a directory on the remote system.
lmkdir Creates a directory on the local system.
progress Toggles on/off the progress display. (Default is on.)

It can be a little tricky the first few times you use the sftp utility if you have never used an FTP interactive program in the past. An example of sending a local file to a remote system is shown in Listing 12.20.

Listing 12.20: Using sftp to copy a file to a remote system

$ sftp [email protected]
[email protected]’s password:
Connected to 192.168.0.104.
sftp>
sftp> ls
Desktop    Documents   Downloads   Music   Pictures
Public     Templates
Videos
sftp>
sftp> lls
AccountAudit.txt  Grades.txt         Project43.txt  ProjectVerify.tar
err.txt           Life               Project44.txt  TarStorage
Everything        NologinAccts.txt   Project45.txt  Universe
Extract           Project42_Inc.txz  Project46.txt
FullArchive.snar  Project42.txt      Project4x.tar
Galaxy            Project42.txz      Projects
sftp>
sftp> put Project4x.tar
Uploading Project4x.tar to /home/Christine/Project4x.tar
Project4x.tar               100%   40KB  15.8MB/s   00:00
sftp>
sftp> ls
Desktop         Documents   Downloads   Music   Pictures Project4x.tar   Public      Templates   Videos
sftp>
sftp> exit
$

In Listing 12.20, after the connection to the remote system is made, the ls command is used in the sftp utility to see the files in the remote user’s directory. The lls command is used to see the files within the local user’s directory. Next the put command is employed to send the Project4x.tar archive file to the remote system. There is no need to issue the progress command because by default progress reports are already turned on. Once the upload is completed, another ls command is used to see if the file is now on the remote system, and it is.

images

Backup Rule of Three

Businesses need to have several archives in order to properly protect their data. The Backup Rule of Three is typically good for most organizations, and it dictates that you should have three archives of all your data. One archive is stored remotely to prevent natural disasters or other catastrophic occurrences from destroying all your backups. The other two archives are stored locally, but each is on a different media type. You hear about the various statistics concerning companies that go out of business after a significant data loss. A scarier statistic would be the number of system administrators who lose their jobs after such a data loss because they did not have proper archival and restoration procedures in place.

The rsync, scp, and sftp utilities all provide a means to securely copy files. However, when determining what utilities to employ for your various archival and retrieval plans, keep in mind that one utility will not work effectively in every backup case. For example, generally speaking, rsync is better to use than scp in backups because it provides more options. However, if you just have a few files that need secure copying, scp works well. The sftp utility works well for any interactive copying, yet scp is faster because sftp is designed to acknowledge every packet sent across the network. It’s most likely you will need to employ all of these various utilities in some way throughout your company’s backup plans.

Checking Backup Integrity

Securely transferring your archives is not enough. You need to consider the possibility that the archives could become corrupted during transfer.

Ensuring a backup file’s integrity is fairly easy. A few simple utilities can help.

Digesting an MD5 Algorithm

The md5sum utility is based on the MD5 message-digest algorithm. It was originally created to be used in cryptography. It is no longer used in such capacities due to various known vulnerabilities. However, it is still excellent for checking a file’s integrity.

A simple example is shown in Listing 12.21 and Listing 12.22. Using the file that was uploaded using sftp earlier in the chapter, the md5sum is used on the original and the uploaded file.

Listing 12.21: Using md5sum to check the original file

$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.101/24
$
$ md5sum Project4x.tar
efbb0804083196e58613b6274c69d88c  Project4x.tar

Listing 12.22: Using md5sum to check the uploaded file

$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.104/24
$
$ md5sum Project4x.tar
efbb0804083196e58613b6274c69d88c  Project4x.tar
$

The md5sum produces a 128-bit hash value. You can see from the results in the two listings that the hash values match. This indicates no file corruption occurred during its transfer.

images A malicious attacker can create two files that have the same MD5 hash value. However, at this point in time, a file that is not under the attacker’s control cannot have its MD5 hash value modified. Therefore, it is imperative that you have checks in place to ensure that your original backup file was not created by a third-party malicious user. An even better solution is to use a stronger hash algorithm.

Securing Hash Algorithms

The Secure Hash Algorithms (SHA) is a family of various hash functions. Though typically used for cryptography purposes, they can also be used to verify an archive file’s integrity.

Several utilities implement these various algorithms on Linux. The quickest way to find them is via the method shown in Listing 12.23. Keep in mind your particular distribution may store them in the /bin directory instead.

Listing 12.23: Looking at the SHA utility names

$ ls -1 /usr/bin/sha???sum
/usr/bin/sha224sum
/usr/bin/sha256sum
/usr/bin/sha384sum
/usr/bin/sha512sum
$

Each utility includes the SHA message digest it employs within its name. Therefore, sha384sum uses the SHA-384 algorithm. These utilities are used in a similar manner to the md5sum command. A few examples are shown in Listing 12.24.

Listing 12.24: Using sha512sum to check the original file

$ sha224sum Project4x.tar
c36f1632cd4966967a6daa787cdf1a2d6b4ee5592
4e3993c69d9e9d0  Project4x.tar
$
$ sha512sum Project4x.tar
6d2cf04ddb20c369c2bcc77db294eb60d401fb443
d3277d76a17b477000efe46c00478cdaf25ec6fc09
833d2f8c8d5ab910534ff4b0f5bccc63f88a992fa9
eb3  Project4x.tar
$

Notice in Listing 12.24 the different hash value lengths produced by the different commands. The sha512sum utility uses the SHA-512 algorithm, which is the best to use for security purposes and is typically employed to hash salted passwords in the /etc/shadow file on Linux.

You can use these SHA utilities, just like the md5sum program was used in Listings 12.21 and 12.22, to ensure archive files’ integrity. That way, backup corruption is avoided as well as any malicious modifications to the file.

Summary

Providing appropriate archival and retrieval of files is critical. Understanding your business and data needs is part of the backup planning process. As you develop your plans, look at integrity issues, archive space availability, privacy needs, and so on. Once rigorous plans are in place, you can rest assured your data is protected.

Exam Essentials

Describe the different backup types. A system image backup takes a complete copy of files the operating system needs to operate. This allows a restore to take place, which will get the system back up and running. The full, incremental, and differential backups are tied together in how data is backed up and restored. Snapshots and snapshot clones are also closely related and provide the opportunity to achieve rigorous backups in high IO environments.

Summarize compression methods. The different utilities, gzip, bzip2, xz, and zip, provide different levels of lossless data compression. Each one’s compression level is tied to how fast it operates. Reducing the size of archive data files is needed not only for backup storage but also for increasing transfer speeds across the network.

Compare the various archive/restore utilities. The assorted command-line utilities each has its own strength in creating data backups and restoring files. While cpio is one of the oldest, it allows for various files through the system to be gathered and put into an archive. The tar utility has long been used with tape media but provides rigorous and flexible archiving and restoring features, which make it still very useful in today’s environment. The dd utility shines when it comes to making system images of an entire disk. Finally, rsync is not only very fast, it allows encrypted transfers of data across a network for remote backup storage.

Explain the needs when storing backups on other systems. To move an archive across the network to another system, it is important to provide data security. Thus, often OpenSSH is employed. In addition, once an archive file arrives at its final destination, it is critical to ensure no data corruption has occurred during the transfer. Therefore, tools such as md5sum and sha512sum are used.

Review Questions

  1. Time and space to generate archives are not an issue, and your system’s environment is not a high IO one. You want to create full backups for your system only once per week and need to restore data as quickly as possible. Which backup type plan should you use?

    1. Full archive daily
    2. Incremental archive daily
    3. Differential archive daily
    4. Full archive weekly; incremental daily
    5. Full archive weekly; differential daily
  2. The system admin took an archive file and applied a compression utility to it. The resulting file extension is .gz. Which compression utility was used?

    1. The xz utility
    2. The gzip utility
    3. The bzip2 utility
    4. The zip utility
    5. The dd utility
  3. You need to quickly create a special archive. This archive will be a single compressed file, which contains any .snar files across the virtual directory structure. Which archive utility should use?

    1. The tar utility
    2. The dd utility
    3. The rsync utility
    4. The cpio utility
    5. The zip utility
  4. An administrator needs to create a full backup using the tar utility, compress it as much as possible, and view the files as they are being copied into the archive. What tar options should the admin employ?

    1. -xzvf
    2. -xJvf
    3. -czvf
    4. -cJf
    5. -cJvf
  5. You need to create a low-level backup of all the data on the /dev/sdc drive and want to use the /dev/sde drive to store it on. Which dd command should you use?

    1. dd of=/dev/sde if=/dev/sdc
    2. dd of=/dev/sdc if=/dev/sde
    3. dd of=/dev/sde if=/dev/sdc count=5
    4. dd if=/dev/sde of=/dev/sdc count=5
    5. dd if=/dev/zero of=/dev/sdc
  6. You need to create a backup of a user directory tree. You want to ensure that all the file metadata is retained. Employing super user privileges, which of the following should you use with the rsync utility?

    1. The -r option
    2. The -z option
    3. The -a option
    4. The -e option
    5. The --rsh option
  7. You decide to compress the archive you are creating with the rsync utility and employ the -z option. Which compression method are you using?

    1. compress
    2. gzip
    3. bzip2
    4. xz
    5. zlib
  8. Which of the following is true concerning the scp utility? (Choose all that apply.)

    1. Well suited for quickly transferring files between two systems on a network
    2. Is faster than the sftp utility
    3. An interactive utility useful for quickly transferring large files
    4. Can be interrupted during file transfers with no ill effects
    5. Uses OpenSSH for file transfers
  9. You are transferring files for a local backup using the sftp utility to a remote system and the process gets interrupted. What sftp utility command should you use next?

    1. The progress command
    2. The get command
    3. The reget command
    4. The put command
    5. The reput command
  10. You have completed a full archive and sent it to a remote system using the sftp utility. You employ the md5sum program on both the local archive and its remote copy. The numbers don’t match. What most likely is the cause of this?

    1. The local archive was corrupted when it was created.
    2. The archive was corrupted when it was transferred.
    3. You used incorrect commands within the sftp utility.
    4. The numbers only match if corruption occurred.
    5. You used incorrect utility switches on md5sum.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.160.181