Even though storage devices continue to become larger, file compression remains an important tool for everyday computer use. For example, images are often compressed to enable faster transmission across a network or storage on a mobile device.
A simple compression algorithm (see, e.g., [3]) for text files is based on encoding long strings of repeated characters. The idea is to replace a string of repeated characters:
!!!!!!!!
with an escape sequence made up of an otherwise rare character (such as ~), the count, and then the repeated character:
~8!
This saves space as soon as there are at least four in a row of any character. Certainly, this type of compression will only benefit some files, but many documents contain, for example, long strings of spaces.
The only difficulty with decompressing a file that has been compressed this way is if the original file contains one of the escape characters (i.e., the tilde ~), because then the decompression program will have difficulty deciding whether to print the escape character or a sequence. One solution is to have the compression program replace any tilde ~ with two tildes ~~; that way, when the decompressor sees a tilde, it can check the next character to see if it is a tilde or a number.
def int_at_start(s):
# return positive integer at beginning of string s
# and index of next character
to use after finding an escape character, since the count might be longer than one digit. The index of the next character is used both to find the character to repeat, and to know where to continue processing after the escape.
Be sure that decompress(compress(text)) is the same as text. Test your program on itself.
3.15.144.170