Project: File Compression

Even though storage devices continue to become larger, file compression remains an important tool for everyday computer use. For example, images are often compressed to enable faster transmission across a network or storage on a mobile device.

A simple compression algorithm (see, e.g., [3]) for text files is based on encoding long strings of repeated characters. The idea is to replace a string of repeated characters:

 !!!!!!!!

with an escape sequence made up of an otherwise rare character (such as ~), the count, and then the repeated character:

 ~8!

This saves space as soon as there are at least four in a row of any character. Certainly, this type of compression will only benefit some files, but many documents contain, for example, long strings of spaces.

The only difficulty with decompressing a file that has been compressed this way is if the original file contains one of the escape characters (i.e., the tilde ~), because then the decompression program will have difficulty deciding whether to print the escape character or a sequence. One solution is to have the compression program replace any tilde ~ with two tildes ~~; that way, when the decompressor sees a tilde, it can check the next character to see if it is a tilde or a number.

Exercises

  1. Describe what happens if this compression algorithm is applied to a file that contains only escape characters (tildes).
  2. Write a compress(text) function that implements this type of file compression. Use constants for the escape character and minimum number of repetitions. Test your program on itself.
  3. Write a decompress(text) function that returns compressed text to its original state. You may find it helpful to write a separate function like this:
     def int_at_start(s):
      # return positive integer at beginning of string s
      # and index of next character

    to use after finding an escape character, since the count might be longer than one digit. The index of the next character is used both to find the character to repeat, and to know where to continue processing after the escape.

    Be sure that decompress(compress(text)) is the same as text. Test your program on itself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.170