Compressing Files

Problem

You need to compress some files and aren’t sure of the best way to do it.

Solution

First, you need to understand that in traditional Unix, archiving (or combining) and compressing files are two different operations using two different tools, while in the DOS and Windows world it’s typically one operation with one tool. A “tarball” is created by combining several files and/or directories using the tar (tape archive) command, then compressed using the compress, gzip, or bzip2 tools. This results in files like tarball.tar.Z, tarball.tar.gz, tarball.tgz, or tarball.tar.bz2. Having said that, many other tools, including zip, are supported.

In order to use the correct format, you need to understand where your data will be used. If you are simply compressing some files for yourself, use whatever you find easiest. If other people will need to use your data, consider what platform they will be using and what they are comfortable with.

The Unix traditional tarball was tarball.tar.Z, but gzip is now much more common and bzip2 (which offers better compression than gzip) is gaining ground. There is also a tool question. Some versions of tar allow you to use the compression of your choice automatically while creating the archive. Others don’t.

The universally accepted Unix or Linux format would be a tarball.tar.gz created like this:

$ tar cf tarball_name.tar directory_of_files
$ gzip tarball_name.tar

If you have GNU tar, you could use -Z for compress (don’t, this is obsolete), -z for gzip (safest), or -j for bzip2 (highest compression). Don’t forget to use an appropriate filename, this is not automatic.

$ tar czf tarball_name.tgz directory_of_files

While tar and gzip are available for many platforms, if you need to share with Windows you are better off using zip, which is nearly universal. zip and unzip are supplied by the InfoZip packages on Unix and almost any other platform you can possibly think of. Unfortunately, they are not always installed by default. Run the command by itself for some helpful usage information, since these tools are not like most other Unix tools. And note the -l option to convert Unix line endings to DOS line endings, or -ll for the reverse.

$ zip -r zipfile_name directory_of_files

Discussion

There are far too many compression algorithms and tools to talk about here; others include: AR, ARC, ARJ, BIN, BZ2, CAB, CAB, JAR, CPIO, DEB, HQX, LHA, LZH, RAR, RPM, UUE, and ZOO.

When using tar,we strongly recommend using a relative directory to store all the files. If you use an absolute directory, you might overwrite something on another system that you shouldn’t. If you don’t use any directory, you’ll clutter up whatever directory the user is in when they extract the files (see Checking a tar Archive for Unique Directories). The recommended use is the name and possibly version of the data you are processing. Table 8-2 shows some examples.

Table 8-2. Good and bad examples of naming files for the tar utility

Good

Bad

./myapp_1.0.1

myapp.c

myapp.h

myapp.man

./bintools

/usr/local/bin

It is worth noting that Red Hat Package Manager (RPM) files are actually CPIO files with a header. You can get a shell or Perl script called rpm2cpio (http://fedora.redhat.com/docs/drafts/rpm-guide-en/ch-extra-packaging-tools.html) to strip that header and then extract the files like this:

$ rpm2cpio some.rpm | cpio -i

Debian’s .deb files are actually ar archives containing gzipped or bzipped tar archives. They may be extracted with the standard ar, gunzip, or bunzip2 tools.

Many of the Windows-based tools such as WinZip, PKZIP, FilZip, and 7-Zip can handle many or all of the above formats and more (including tarballs and RPMs).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108