1.3. Data Compression

A ninety minute uncompressed video clip digitized in High Definition Television (HDTV) format at 1.2 gigabits per second (Gb/s) requires 791 GByte of storage. Even though modern storage devices, such as magnetic disk drives, provide large storage capacities, it might not be economical to store SM objects uncompressed. Moreover, the data transfer rate of magnetic disks is not high enough to retrieve multiple high bit rate objects in real time. A more serious problem arises when transferring a large number of SM objects over the network. Even though network speeds are rapidly increasing, it is not economically feasible to handle the simultaneous display of a large number of SM objects over existing networks.

To resolve these problems and to efficiently handle large number of SM objects, we need to compress these objects, where a smaller SM object requires less disk storage space and less network bandwidth. In the remainder of this section, we briefly overview data compression techniques.

1.3.1. Information vs. Data

Data is an individual fact or multiple facts, or a value, or a set of values. For example, a collection of digitally captured pixel values of an image is data. Information is data in a usable form, usually interpreted in a predefined way. For example, the content of an image is information. In general, information is meaningful to us and it is stored and handled as data in computer systems. Thus, the main goal of compression techniques is to reduce the size of data while maintaining as much information as possible.

A popular approach in multimedia data compression is perceptual coding that hides errors where humans will not see or hear them. Based on the studies of human hearing and vision to understand how we see/hear, perceptual coding exploits the limit of human perception. For example, human audio perception ranges between 20 Hz and 20 KHz but most voice sounds are in low frequencies (below 4 KHz). Thus, audio signals above 4 KHz are eliminated in telephone systems. Human visual perception is strongly influenced by edges and low frequency information such as big objects. Thus, many detailed patterns in an image can be ignored in the coding process. Perception coding takes advantage of this fact and encodes only those signals that will be perceived by humans, eliminating lots of imperceptible signals.

1.3.2. Coding Overview

Good compression algorithms should satisfy the following objectives:

  • Achieve high compression ratios. It is obvious that a higher compression ratio is more beneficial.

  • Ensure a good quality of the reconstructed information, i.e., it is important to maintain a good or acceptable quality while providing a high compression ratio.

  • Minimize the complexity of the encoding and decoding process, i.e., if it takes too long to encode and/or decode the SM objects, then the usage of the algorithm might become limited. For real time encoding and decoding, coding and decoding processes must be fast.

Some other general requirements include independence of specific size and frame rate and support various data rates of an object.

Depending on the importance of the original information, there are two types of compression: lossless compression and lossy compression. Using lossless compression, the reconstructed information is mathematically equivalent to the original information (i.e., reconstruction is perfect). Thus, it does not lose or alter any information in the process of compression and decompression. For example, documents, databases, program executable files, and medical images, to name a few, do not tolerate a loss or alteration of information. Lossless compression reduces the size of data for these important types of information by detecting repeated patterns and compressing them. However, it achieves only a modest level of compression (about a factor of 5), which may not be sufficient for SM objects.

While lossless compression is desirable for information that cannot tolerate a loss of even a bit, some media types are not very sensitive to the loss of information. Human eyes usually do not detect minor color changes between the original image and the reconstructed one, and usually such small changes do not pose a problem in many practical applications. Thus, lossy compression achieves a very high degree of compression (compression ratios up to 200) by removing less important information such as imperceptible audio signals in music. Using lossy compression, reconstructed images may demonstrate some degradation in the quality of the image. The practical objective of lossy compression is to maximize the degree of compression while maintaining the quality of the information to be virtually lossless.

Entropy and Source Encoding

Entropy is a measure of amount of information. If N states are possible, each characterized by a probability pi, with , then is the entropy which is the lowest bound on the number of bits needed to describe all parts of the system. It corresponds to the information content of the system. For example, when there are only two symbols, S1 (0.9) and S2 (0.1), the entropy is 0.9log20.9 – 0.1log20.l = 0.47. Another typical example is as follows. In an image with uniform distribution of gray-level intensity, i.e., pi = 1/256, the number of bits needed to code each gray level is 8 bits. Thus, the entropy of this image is 8.

Entropy encoding ignores semantics of input data and compresses media streams by regarding them as sequences of bits. Run-length encoding and Huffman encoding are popular entropy encoding schemes. On the contrary, source encoding optimizes the compression ratio by considering media-specific characteristics. It is also called sematic-based coding. Many advanced schemes such as DPCM, DCT, FFT, and Wavelet fall into this category. In reality, most practical compression algorithms employ a hybrid of the source and entropy coding such that a source encoding is applied at the early stage of compression and then an entropy encoding is applied to results in an attempt to further reduce the data size.

Run-length Encoding

This is one of the simplest compression techniques. It replaces consecutive occurrences of a symbol with the symbol followed by the number of times it is repeated. For example, the string, "a a a a a" can be represented by the following two bytes, "5a". The first byte shows the number of occurrences of the character, and the second byte represents the character itself. This scheme is naturally lossless and its compression factor ranges from 1/2 to 1/5 depending on the contents of objects. This scheme is most useful where symbols appear in long runs: e.g., for images that have areas where the pixels all have the same value, cartoons for example. However, this may not be efficient for complex objects where adjacent symbols are all different.

Relative Encoding

This is a technique that attempts to improve efficiency by transmitting only the difference between each value and its predecessor. In an image, based on the fact that neighboring pixels may be changing slowly in many cases, one can digitize the value at a pixel by using the values of the neighboring pixels and encode the difference between two. A differential PCM coder (DPCM) quantizes and transmits the difference, d(n) = x(n)x(n – 1). The advantage of using difference d(n) instead of the actual value x(n) is the reduced number of bits to represent a sample. For example, the series of pixel values "60 105 161 129 78", can be represented by "60+45+56-32-51". By reducing the size of each value (in the example, 6 bits are enough to represent the difference while 8 bits are required to represent the original values), assuming the difference is far smaller than the original value, it can reduce the overall size of data. This scheme works well when the adjacent values are not greatly changing, such as voice signals. Furthermore, the transmitter can predict each value based on a mathematical model and transmit only the difference between the predicted and actual values, further reducing the size of required bits to represent a value (predictive coding).

Huffman Encoding

Huffman encoding is a popular compression technique that assigns variable length codes to symbols, so that the most frequently occurring symbols have the shortest codes. Thus, more frequently occurring values are assigned fewer bits exploiting the statistical distribution of the values within an object. To correctly decompress the encoded data, encoder and decoder must share the same codebook. Huffman coding is particularly effective where the data are dominated by a small number of symbols. Suppose that one wants to encode a source of N = 8 symbols: a,b,c,d,e,f,g,h. The probabilities of these symbols are: P(a) = 0.01, P(b) = 0.02, P(c) = 0.05, P(d) = 0.09, P(e) = 0.18, P(f) = 0.2, P(g) = 0.2, P(h) = 0.25. If we assign 3 bits per symbol (N = 23 = 8), the average length of the symbols is: L = 3P(i) = 3 bits/symbol. The minimum average length we could ever achieve is equal to the entropy (according to Shannon's theorem): S = – P(i)log2 P(i) = 2.5821 bits/symbol = 0.86 × L.

The Huffman code assignment procedure is based on a binary tree structure. This tree is developed by a sequence of pairing operations in which the two least probable symbols are joined at a node to form two branches of a tree.

  • Step 1. The list of probabilities of the source symbols are associated with the leaves of a binary tree.

  • Step 2. Take the two smallest probabilities in the list and generate an intermediate node as their parent and label the branch from parent to one of the child nodes 1 and the branch from parent to the other child 0.

  • Step 3. Replace the probabilities and associated nodes in the list by the single new intermediate node with the sum of the two probabilities. If the list contains only one element, quit. Otherwise, go to step 2.

It is very important to estimate the probability pi, the relative frequency of the symbols. To decode the variable length codes, Huffman encoding uses prefix codes, which have the property that no codeword can be the prefix (i.e., an initial segment) of any other codeword. Note that there is no guarantee that the best possible codes always reduce the size of sources. In the worst case, there is no compression.

Transform Coding

In transform coding, using a mathematical transformation, the original data to be coded is converted into new data (transform coefficients) which is more suitable for compression. This process is reversible such that the original data can be recovered through inverse transformation. For example, Fourier transform and Cosine transform convert signals from space domain into frequency domain to obtain the frequency spectrum with which one can easily separate low frequency information.

Transform coefficients represent a proportion of energy contributed by different frequencies. In data compression using transform coding, it is important to choose a transformation so that only a subset of coefficients have significant values. In other words, energy is confined to a subset of important coefficients, known as energy compaction. Energy compaction is good for coding because we can consider only significant coefficients. If the number of significant coefficients is far smaller than the number of samples in original sequence, compression is possible. In practice, one can code significant coefficients accurately using a greater number of bits while allocating fewer or no bits to other less meaningful coefficients (which offer less perceptible information to humans). Moreover, many low energy coefficients can even be discarded through quantization. In practice, many algorithms apply transform coding at block level. For example, an N × N image is divided into several n × n blocks and each n × n block undergoes a reversible transformation.

1.3.3. JPEG

JPEG (Joint Photographic Expert Group) is an international compression standard for continuous-tone still images, both gray-scale and color. Its development was motivated by the fact that the compression ratio of lossless methods such as Huffman is not high enough for many applications such as high quality gray-scale images, photographic images, and still video. JPEG achieves a high compression ratio by removing spatial redundancy (i.e., correlation between neighboring pixels) within an image, which is basically a lossy process.

Figure 1.5 shows the components and sequence of JPEG. JPEG utilizes discrete cosine transform (DCT), which is one of the best known transforms and is a close approximation to the optimal for a large class of images. DCT transforms data from a spatial domain to a frequency domain and separates high frequency information and low frequency information.

Figure 1.5. JPEG overview.


Discrete Cosine Transform (DCT)

As the first step in JPEG compression, an original image is partitioned into 8x8 pixel blocks and the algorithm independently applies DCT to each block. DCT transforms data from the spatial domain to the frequency domain in which they can be more efficiently encoded.

The definition of forward discrete cosine transform (FDCT) and inverse discrete cosine transform (IDCT) is as follows:

Equation 1.1


Equation 1.2


where: C(u), C(v) = 1/ for u, v = 0, C(u), C(v) = 1 otherwise.

Using FDCT, original pixel values, f(i, j) where 0 ≤ i, j < 8, are transformed into F(u, v) named DCT coefficient. The output DCT coefficient matrix represents information of the 8x8 block in frequency domain. DCT enhances compression by concentrating most of the energy in the signal in the lower spatial frequencies. The left upper corner of the matrix represents low frequency information while the right lower corner represents high frequency information. IDCT restores original pixel values from DCT coefficients.

Quantization

The purpose of quantization is to achieve high compression by representing DCT coefficients with no greater precision than necessary by discarding information which is not visually significant to human perception. After FDCT, each of the 64 DCT coefficients is quantized: F[u, v] = Round(F[u, v]/q[u, v]). For example, assume a DCT coefficient, 101101 = 45 (6 bits). If it is quantized by a quanta value, q[u, v] = 4, the original DCT value is truncated to 4 bits, 1011 = 11, reducing the required number of bits to represent a DCT coefficient. Due to many-to-one mapping, quantization is a fundamentally lossy process, and is the principal source of lossiness in DCT-based encoders. There are two different types of quantization: uniform and non-uniform. Uniform quantization divides each F[u, v] by the same constant N. Non-uniform quantization uses a quantization table from psychovisual experiments to exploit the limits of the human visual system.

Zig-Zag Sequence

After quantization, the 8x8 matrix of quantized coefficients is ordered into a one-dimensional array (1x64) using the zig-zag sequence (Figure 1.6). This is to facilitate entropy coding by placing low-frequency coefficients on the top of the vector (maps 8x8 to 1x64 vector).

Figure 1.6. Zig-zag sequence.


Entropy Coding

In order to achieve additional compression losslessly by encoding the quantized DCT coefficients more compactly based on statistics, JPEG applies Differential Pulse Code Modulation (DPCM) on the DC component which is a measure of the average value of the 64 image samples. DPCM is useful because there is usually a strong correlation between the DC coefficients of adjacent 8x8 blocks. Run Length Encoding (RLE) is applied on AC components which usually have lots of zeros. For further compression, Huffman coding is used for both DC and AC coefficients at the last stage.

JPEG Operating Modes

JPEG supports the following operation modes to meet the various needs of many applications:

  • Sequential Mode:

    Each image component is encoded in a single left-to-right, top-to-bottom scan as explained in this section.

  • Lossless Mode:

    A special case of the JPEG where indeed there is no loss. It does not use a DCT-based method. Instead, it uses a predictive (differential coding) method. Typical compression ratio is 2:1.

  • Progressive Mode:

    Each image is coded in multiple scans with each successive scan refining the image until a desired quality of image is achieved. Display of the encoded image follows the same steps: a coarser image is displayed first, then more components are decompressed to provide a finer version of the image.

  • Hierarchical Mode:

    An image is compressed to multiple resolution levels. This makes it possible to display a high resolution image on a low resolution monitor by accessing a lower resolution version without having to decompress the full version.

1.3.4. MPEG

JPEG achieves intra-frame compression for still images by exploiting the redundancy in images (spatial redundancy). This intra-frame compression is not enough for video because it doesn't consider inter-frame compression between successive frames. We need a different scheme for video to exploit both spatial redundancy and temporal redundancy. MPEG is a de facto compression standard for video, audio, and their synchronization. MPEG (Moving Picture Coding Experts Group) was established in 1988 to create standards for delivery of video and audio. MPEG achieves intra-frame encoding using DCT-based compression for the reduction of spatial redundancy (similar to JPEG). Its inter-frame encoding utilizes block-based motion compensation for the reduction of temporal redundancy. Specifically, MPEG uses bidirectional motion compensation.

Block-based Motion Compensation

To exploit the temporal redundancy between adjacent video frames, difference coding compares pixels in the current frame with ones in the previous frame so that only changed pixels are recorded. In this way, a fraction of the number of pixel values will be recorded for the current frame. In practice, pixels values are slightly different even with no movement of objects. This can be interpreted as lots of changes and result in less compression. Thus, difference coding needs to ignore small changes in pixel values and it is naturally lossy.

For more efficient compression, difference coding is done at the block level. Block-based difference coding receives a sequence of blocks rather than frames. For example, a 160 × 120 image (19200 pixels) can be divided into 300 8x8 blocks. If a block in the current frame is similar to the one in the same location of the previous frame, the algorithm skips it or stores only the difference. If they are different, a whole block of pixels is updated at once. Limitations of difference coding are obvious. It is useless where there is a lot of motion (few pixels unchanged). What if objects in the frame move fast? What if a camera itself is moving?

Motion compensation assumes that the current frame can be modelled as a translation of the previous frame. As in block-based difference coding, the current frame is divided into uniform non-overlapping blocks. Each block in the current frame to be encoded is compared to areas of similar size from the preceding frame in order to find an area that is similar, i.e., the best matching block. The relative difference in locations is known as the motion vector. The matching process can be based on prediction or interpolation. Because fewer bits are required to code a motion vector than to code an actual block, compression is achieved. Motion compensation is the basis of most video compression algorithms.

For further and better compression, bidirectional motion compensation can be used. Areas just uncovered in the current frame are not predictable from the past, but they can be predicted from the future. Thus, bidirectional motion compensation searches in both past and future frames to find the best matching block. Moreover, the effect of noise and errors can be reduced by averaging between previous and future frames. Bidirectional interpolation also provides a high degree of compression. However, it requires that frames be encoded and transmitted in a different order from which they will be displayed. In reality, because exact matching is not possible, it is a lossy compression.

Group of Pictures

MPEG defines a set of pictures to form a group of pictures (GOP) consisting of the following types (Figure 1.7):

  • I frame: Intra-coded picture

  • P frame: Unidirectionally predicted picture

  • B frame: Bidirectionally predicted picture

Figure 1.7. A group of pictures in MPEG.


I-frames are encoded using intra-frame compression that is similar to JPEG. Thus, they can be decoded without other frames. I-frames are used as access points for random access. P-frames are predicted frames with reference to a previous I or P frame. B-frames are bidirectional frames encoded using the previous and the next I/P frames.

MPEG Standards

MPEG-1 (ISO/IEC 11172) is a standard for storing and playing video on a single computer at low bit-rates up to 1.5 Mb/s. MPEG-2 (ISO/IEC 13818) is a standard for high quality video such as digital TV (HDTV and DVD). MPEG-2 builds upon MPEG-1 standard and supports both field prediction and frame prediction (interlaced video format). While MPEG-2 aims at high quality video, MPEG-4 standard (ISO/IEC 14496) supports low bit-rate encoding of audio and video, user interactivity, and the special requirements for next generation broadcast services. MPEG-7 standard (ISO/IEC 15938) provides a set of standardized tools to describe multimedia content: metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes. Officially, MPEG-7 is called Multimedia Content Description Interface [129]. MPEG-21 standard (ISO/IEC 21000) is being developed to define a multimedia framework in order to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities.

Hierarchical Coding

Hierarchical coding in MPEG encodes images in a manner that facilitates access to images at different quality levels or resolutions. This makes the progressive transmission possible: Partial image information is transmitted in stages, and at each stage, the reconstructed image is progressively improved. Hierarchical coding is motivated by the need for transmitting images over low-bandwidth channels. Progressive transmission can be stopped either if an intermediate version is of satisfactory quality or the image is found to be of no interest. Hierarchical coding is also very useful in multi-use environments where applications need to support a number of display devices with different resolutions. It could optimize utilization of storage server and network resources.

MPEG Issues

The following issues are commonly considered in the design of streaming applications using MPEG:

  • Avoiding propagation of errors. Due to the dependency among successive frames, errors such as missing frames or packets transfer over multiple frames. We can avoid this problem by sending an I-frame every once in a while (e.g., by setting a reasonable group of pictures).

  • Bit-rate control. In general, complicated images result in a less compression and a higher bit-rate than simple images. To regulate the output bit-rate, a simple feedback loop based on buffer fullness is used. If the buffer is close to full, MPEG increases the quantization scale factor to reduce the size of data.

  • Constant Bit Rate (CBR) vs. Variable Bit Rate (VBR). MPEG streams can be encoded either as constant bit rate or as variable bit rate. CBR approach is more appropriate for video broadcasting through fixed bandwidth channels while VBR supports fixed quality of images such as DVD better than CBR.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.196.27