Image compression

The purpose of compression is the representation of images by methods that require less units of information (for example, bytes) than the mere storage of each pixel in arrays.

For instance, recall the binary image we constructed in the first section; that is a 128 x 128 image represented by 16,384 bits (True/False), where all but 113 of those bits are False. There surely must be more efficient ways to store this information in a way that require less than 16,384 bits. We could very well do so by simply providing the size of the canvas (two bytes), the location of the center of the disk (two more bytes), and the value of its radius (another byte). We now have a new representation using only 40 bits (assuming each byte consists of 8 bits). We refer to such exact representation as a lossless compression.

Another possible way to compress an image is the process of turning a color image into its black and white representation, for example. We performed this operation on the image skimage.data.coffee, turning an object of size 3 x 400 x 600 (720,000 bytes) into an object of size 400 x 600 (240,000 bytes). Although, in the process, we lost the ability to see its color. This kind of operation is appropriately called a lossy compression.

In the following pages, we are going to explore several settings for image compression from a mathematical point of view. We will also develop efficient code to perform these operations from within the SciPy stack. We are not concerned with the creation of code to read or save these compressed images to file; for that, we already have reliable utilities in the Python Imaging Library, that have also been imported to different functions in the modules scipy.misc, scipy.ndimage, and the toolkit scikit-image. If we wish to compress and store a numpy array A representing a black-and-white photography as different file types, we simply issue something along these lines:

In [1]: import numpy as np; 
   ...: from scipy.misc import lena, imsave
In [2]: A = lena()
In [3]: imsave("my_image.png", A); 
   ...: imsave("my_image.tiff", A); 
   ...: imsave("my_image.pcx", A); 
   ...: imsave("my_image.jpg", A); 
   ...: imsave("my_image.gif", A)

A quick visualization of the contents of the folder in which we are working shows the sizes of the files created. For instance, under a *NIX system, we could issue the following command:

% ls -nh my_image.*
-rw-r--r--  1 501  20   257K May 29 08:16 my_image.bmp
-rw-r--r--  1 501  20    35K May 29 08:16 my_image.jpg
-rw-r--r--  1 501  20   273K May 29 08:15 my_image.pcx
-rw-r--r--  1 501  20   256K May 29 08:16 my_image.tiff

Note the different sizes of the resulting files. The lossless formats PCX, BMP, and TIFF offer similar compression rates (273K, 257K, and 256 K, respectively). On the other hand, the JPEG lossy format offers an obvious improvement (35 K).

Lossless compression

Some of the most common lossless compression schemes used for image processing are as follows:

  • Run-length encoding: This method is used in PCX, BMP, TGA, and TIFF file types, when the original image can be regarded as palette-based bitmapped, for example, a cartoon or a computer icon.
  • Lempel-Ziv-Welch (LZW): This is used by default in the GIF image format.
  • Deflation: This is very powerful and reliable. It is the method used for PNG image files. It is also the compression method employed to create ZIP files.
  • Chain code: This is the preferred method to encode binary images, especially if these contain a small number of large blobs.

Let's examine, for instance, how run-length encoding works in a suitable example. Consider the checkerboard image skimage.data.checkerboard. We receive it as a 200 x 200 array of integer values and, in this way, it requires 40,000 bytes of storage. Note, it can be regarded as a palette-based bitmap with only two colors. We start by transforming each zero value to a B, and each 255 to a W:

In [5]: from skimage.data import checkerboard
In [6]: def color(value):
   ...:     if value==0: return 'B'
   ...:     else: return 'W'
   ...:
In [7]: image = np.vectorize(color)(checkerboard()); 
   ...: print image
[['W' 'W' 'W' ..., 'B' 'B' 'B']
 ['W' 'W' 'W' ..., 'B' 'B' 'B']
 ['W' 'W' 'W' ..., 'B' 'B' 'B']
 ...,
 ['B' 'B' 'B' ..., 'W' 'W' 'W']
 ['B' 'B' 'B' ..., 'W' 'W' 'W']
 ['B' 'B' 'B' ..., 'W' 'W' 'W']]

Next, we create a function that encodes both lists and strings of characters, producing instead, a string composed of patterns of the form "single character plus count":

In [7]: from itertools import groupby
In [8]: def runlength(string):
   ...:     groups = [k + str(sum(1 for _ in g)) for k,g in
   ...:               groupby(string)]
   ...:     return ''.join(groups)
   ...:

Notice what happens when we rewrite the image as a flattened string containing its colors and encode it in this fashion:

In [9]: coded_image = runlength(image.flatten().tolist())
In [10]: print coded_image
W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27
B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23
W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27
B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23
W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27
B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23
W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26
B23W27B23W27B23W27B24W26B23W27B23W27B23W27B24W26B
...
26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B
23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W
27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B
23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W
27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B
23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W
27B23W27B23W27B23W26B24W27B23W27B23W27B23W26B24W27B23W27B23W27B23W26
In [11]: len(coded_image)
Out[11]: 4474

We have reduced its size to a mere 4,474 bytes. Now, how would you decode this information back to an image? Imagine you are provided with this string, the additional information of the size of the image (200 x 200), and the palette information (B for black, and W for white).

Another nice exercise is to find descriptions of the other mentioned lossless compression methods, and write Python codes for their corresponding encoder and decoder.

Lossy compression

Among the many possible schemes of lossy compression, we are going to focus on the method of transform coding. The file type JPEG, for instance, is based on the discrete cosine transform.

In any of these cases, the process is similar. We assume that the image is a function. Its visualization can be regarded as a representation of its graph, and as such this is a spatial operation. Instead, we can compute a transform of the image (say, Fourier, discrete cosine, or Wavelet). The image is now represented by a collection of values: The coefficients of the function in the corresponding transform. Now, compression occurs when we disregard a large quantity of those coefficients and reconstruct the function with the corresponding inverse transform.

We have already observed the behavior of the reconstruction of an image after disregarding 25 percent of its lower frequencies, when addressing the Fourier analysis techniques in the previous section. This is not the only way to disregard coefficients. For instance, we can instead collect coefficients with a large enough absolute value. Let's examine the result of performing that operation on the same image, this time using the discrete cosine transform:

In [12]: from skimage.data import text; 
   ....: from scipy.fftpack import dct, idct
In [13]: image = text().astype('float')
In [14]: image_DCT = dct(image)

Let's disregard the values that are less than or equal to 1000 in absolute value. Note that there are more than 256,317 such coefficients (almost 98 percent of the original data):

In [15]: mask = np.absolute(image_DCT)>1000
In [16]: compressed = idct(image_DCT * mask)
Lossy compression

In spite of having thrown away most of the coefficients, the reconstruction is very faithful. There are obvious artifacts, but these are not too distracting.

We can perform a similar operation using the wavelet transform. A naive way to perform compression in this setting could be to disregard whole levels of coefficients, and then reconstruct. In the example that we covered in the previous sections (a Haar wavelet representation of the image skimage.data.camera with nine levels of coefficients), if we eliminate the last two levels, we are throwing away 245,760 coefficients (almost 94 percent of the original information). Observe the quality of the reconstruction:

In [18]: import pywt; 
   ....: from skimage.data import camera
In [19]: levels = int(np.floor(np.log2(camera().shape).max()))
In [20]: wavelet = pywt.Wavelet('haar')
In [21]: wavelet_coeffs = pywt.wavedec2(camera(), wavelet,
   ....:                                level=levels)
In [22]: compressed = pywt.waverec2(wavelet_coeffs[:8], wavelet)
Lossy compression

Similar to transform coding is the method of compression by singular value decomposition. In this case, we regard an image as a matrix. We represent it by means of its singular values. Compression in this setting occurs when we disregard a large quantity of the smaller singular values, and then reconstruct. For an example of this technique, read Chapter 3, SciPy for Linear Algebra of the book Learning SciPy for Numerical and Scientific Computing, Second Edition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.231.15