Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10

High Dynamic Range Video Compression

Y. Zhang; D. Agrafiotis; D.R. Bull University of Bristol, Bristol, United Kingdom

Abstract

High dynamic range (HDR) technology is able to offer high levels of immersion with a dynamic range comparable to that of the human visual system. This increase in the level of visual immersion comes at the cost of higher bit rate requirements compared with those associated with conventional imaging technologies. As a result, efficient HDR-specific coding solutions are necessary. In this chapter, we review existing HDR image and video coding methods, emphasizing how these exploit certain properties of the human visual system and how they provide enhanced rate-quality performance compared with conventional approaches.

Keywords

High dynamic range; High dynamic range image coding; High dynamic range video coding; Backward compatible; High bit depth

10.1 Introduction

The simultaneous dynamic range of the human visual system (HVS) covers a range of approximately four orders of magnitude (Kunkel and Reinhard, 2010). Overall, the HVS can adapt to light conditions with a dynamic range (range of luminance levels) of approximately 14 orders of magnitude (Hood and Finkelstein, 1986; Ferwerda et al., 1996). This range includes the so-called photopic and scotopic vision ranges.

In contrast to the wide range of luminance adaptation exhibited by the HVS, most existing conventional capture and display devices can accommodate a dynamic range of only between two and three orders of magnitude. This is often referred to as “low dynamic range” (LDR). High dynamic range (HDR) imaging has been demonstrated to increase the immersiveness of the viewing experience by capturing a luminance range more compatible with the scene and displaying visual information that covers the full visible (instantaneous) luminance range of the HVS with a larger color gamut (Reinhard et al., 2010). This, however, comes at the cost of much larger storage and transmission bandwidth requirements owing the increased bit depth that is used by most HDR formats to represent this information. Efficient HDR image and video compression algorithms are hence needed that can produce manageable bit rates for a given target perceptual quality. At the time of writing, standardization of compression for wide color gamut and HDR video is still an ongoing process (Sullivan et al., 2007; Winken et al., 2007; Wu et al., 2008; Zhang et al., 2013b; Duenas et al., 2014).

This chapter offers an introduction to the topic of HDR video compression. We review some of the latest HDR image and video coding methods, looking at both layered (backward-compatible/residual-based) and high-bit-depth (native) HDR compression approaches. In the process, we also examine certain HDR-related aspects of the HVS. This chapter assumes a working knowledge of image and video compression. For further details on this aspect, the reader is refereed to Bull (2014).

10.2 HDR Image Storage Formats and Compression

To preserve the benefits of HDR imaging and avoid taking up excess memory and bandwidth, an efficient HDR format is necessary for storing HDR image and video data. There are three recognized HDR image formats that have been standardized: Radiance RGBE, TIFF, and EXR. At the time of writing, no HDR video format existed.

The Radiance RGBE (.hdr) file format uses run-length coding to represent data with 32 bits per pixel (Ward, 1991, 1994). Eight bits are used for each of the three color channels — red, green, and blue — and an extra eight bits are used for the exponent. The dynamic range covered by this format is more than 76 orders of magnitude. The Radiance format also supports XYZE encoding, which offers full color gamut coverage.

The LogLuv HDR image format (Larson, 1998) comes in two versions that support 24 and 32 bits per pixel, respectively. The LogLuv color space has been adopted as part of the TIFF library (.tiff). The advantage of the LogLuv format is that it stores luminance and chrominance information separately, allowing these values to be directly processed during tone mapping and compression.

The OpenEXR HDR image format (.exr) encodes pixels using 16-bit floating point values for the red, green, and blue channels (Bogart et al., 2003). Each color channel is encoded with use of a half-precision floating point number, where one bit is used for the sign, five bits are used for the exponent, and 10 bits are used for the mantissa. This encoding strategy covers around 10.7 orders of magnitude. The ZIP deflate library and other more efficient lossless wavelet compression tools can be used to achieve, on average, a 60% reduction in storage space compared with uncompressed data (Reinhard et al., 2010).

All of these HDR image formats apply lossless compression to the image data in order to preserve the original HDR information. This makes them unsuitable for applications requiring reduced transmission bit rates and scenarios where storage space is severely limited. JPEG XR (Srinivasan et al., 2007) and JPEG 2000 (Boliek, 2014) are image codecs that offer lossy compression for high-bit-depth (HDR) images (Srinivasan et al., 2007; Richter, 2009; Zhang et al., 2012a). An extension to the JPEG standard called JPEG XT (in the process of standardization at the time of writing), targets backward-compatible, scalable lossy to lossless coding of HDR images (Husak and Ninan, 2012; Richter, 2013b, 2014; Richter et al., 2014). Pinheiro et al. (2014) investigated the performance of three profiles (profiles A, B, and C) of JPEG XT using the signal-to-noise ratio and the feature similarity index as the objective quality metrics. The three different profiles differ in terms of the HDR reconstruction method used in the standard. The results of this performance evaluation indicate that the rate-distortion performances (signal-to-noise ratio) of profiles A and B are similar, showing fast “saturation” at higher bit rates. Profile C exhibits different rate-distortion behavior, with no such “saturation” at high bit rates. Profiles B and C show stronger dependency than profile A on the choice of the tone-mapping operator (TMO) that generates the LDR image. The feature similarity index results suggest that the rate-distortion curves of all three profiles exhibit fast “saturation” at high bit rates.

A backward-compatible JPEG method (JPEG-HDR) was proposed 2005 in Ward and Simmons (2005). JPEG-HDR codes an HDR image in a layered fashion, with an eight-bit tone-mapped image forming the base layer and an image residual in the extension layer. Both layers are coded with standard JPEG. Before the introduction of this method, Ward and Simmons (2004) had proposed another residual-based HDR extension for JPEG. The HDR image is tone-mapped down to an eight-bit LDR image that is coded with JPEG. A ratio image between the original HDR image and the tone-mapped version is downsampled and stored as a tag in the header file. This ratio image can be used by HDR-capable decoders to reconstruct the HDR content, while all other legacy devices would ignore the tag and directly display the tone-mapped LDR image. Spaulding et al. (2003) proposed JPEG backward-compatible layered coding for color gamut extension. In the base layer, an image with a clipped color gamut is encoded. In the enhancement layer, a residual image is formed in a subband. This residual image is defined as the arithmetic difference between an input extended reference input medium metric RGB (ERIMM RGB) color space image and the encoded standard RGB (sRGB) foreground image (limited to eight bits). Korshunov and Ebrahimi (2012, 2013) presented a generic HDR JPEG backward-compatible image compression scheme. They highlight the importance of the TMO in the performance of the codec and its dependence on the content, the device used, and environmental parameters such as backlighting, display type and size, and environment illumination. Korshunov and Ebrahimi (2013) evaluated the performance of their method using three simple TMOs (“log,” “gamma,” and “linear”). A number of different viewing condition parameters (ie, backlight lighting, display type, display size, environment illumination) were taken into account. The experimental results indicated that the rate-distortion performance of the proposed solutions compared with other methods showed better compression efficiency for the luminance channel on the basis of peak signal-to-noise ratio evaluations.

The method proposed in Richter (2013a) is also JPEG backward compatible and proposes an extended discrete cosine transform process with additional refinement bits. The refinement bits are the result of coding the residual formed after subtraction of the inverse tone-mapped, JPEG-coded LDR layer from the input HDR image. The refinement bits are placed within application markers, thus being hidden from legacy JPEG decoders. The experimental results presented in Richter (2013b) indicate that the high-bit-depth codecs (JPEG 2000 and JPEG XR) offer better HDR rate-distortion performance.

A more detailed discussion of the JPEG XT standard for JPEG-compatible compression can be found in Chapter 12.

10.3 HDR Video Compression

HDR video has significantly higher bit rate requirements than LDR video, not only because of the increased bit depth but also because of increased levels of noise. HDR video compression methods can be grouped into two categories: layered approaches that are backward compatible with standard LDR decoders and high-bit-depth HDR coding methods that code the input HDR signal as close to its native format as possible.

10.3.1 Layered (Backward-Compatible) HDR Video Compression

Layered HDR compression methods are designed so that legacy decoders, which can cope only with standard dynamic range bit depths and LDR displays (displays that are unable to render HDR content), are still able to decode the base layer and display a tone-mapped version of the HDR image/video. HDR-capable decoders would be able to decode the full stream and deliver the higher-performance HDR content. Fig. 10.1 shows a general block diagram describing the structure of a typical backward-compatible HDR encoder. The base layer encodes a tone-mapped eight-bit LDR representation of the HDR input using a fully compatible legacy encoder/decoder. The enhancement layer contains the difference (residual) between the inverse tone-mapped base layer and the original HDR input. This residual is used for reconstruction of the HDR content to be used with HDR devices.

f10-01-9780081004128 — Figure 10.1 General structure of a backward-compatible HDR image and video codec. Source: Zhang et al. (2015).

The TMO plays a significant role in the performance of layered HDR video coding methods. The main goal of tone mapping is to reduce the dynamic range of the input HDR image/video while preserving perceptual aspects in the resulting LDR image/video. Several TMOs have been developed for different purposes, such as photographic tone reproduction (Reinhard et al., 2002), photoreceptor physiology–based modeling (Reinhard and Devlin, 2005), simulation of artistic drawing (Tumblin and Turk, 1999), gradient domain dynamic range compression (Fattal et al., 2002), display adaptive tone mapping (Mantiuk et al., 2008), and user interactive local adjustment tone mapping (Lischinski et al., 2006).

Different visual and coding results are obtained when HDR content is mapped with different TMOs. A number of studies have evaluated, objectively and subjectively, the visual impact of TMOs (Yoshida et al., 2005; Eilertsen et al., 2013; Yeganeh and Wang, 2013; Narwaria et al., 2013a). Narwaria et al. (2012) investigated the relation between TMOs and visual attention. They performed psychovisual experiments that assessed the impact of eight TMOs on visual attention. The results of their experiments suggest that TMOs can modify fixation behavior, with contrast greatly influencing saliency.

In most video tone mapping cases a TMO is applied independently to each frame of the HDR video. This produces brightness level variation between consecutive tone-mapped frames, causing flickering artifacts. Hence, when designing a TMO for HDR video, one has to take the temporal dimension into consideration. Lee and Kim (2007) proposed a time-dependent adaptation method for the TMO of Fattal et al. (2002) which exploits the motion information and prevents flickering artifacts. A real-time tone-mapping system for HDR video was proposed in Kiser et al. (2012). This adapts to changes in the scene and reduces the amount of flickering artifacts in the resulting tone-mapped HDR video. The method was implemented in a field programmable gate array for use as a real-time tone-mapping system. Boitard et al. (2014) classified the temporal artifacts generated by TMOs when tone mapping video into different types and surveyed methods designed to remove such artifacts. Aydin et al. (2014) designed a local TMO for avoiding visible artifacts in the spatial and temporal domains. The proposed method decomposes the signal into a base layer and a detail layer. A temporal filter is applied on the detail layer and a spatiotemporal filter is applied on the base layer. An extensive review of TMOs can be found in Reinhard et al. (2010) and Banterle et al. (2011).

Mantiuk et al. (2004) were the first to propose layered coding for backward-compatible HDR video compression. Their method was designed as an extension to the MPEG-4 (Part II) compression standard. The method applies tone mapping (perception-based HDR luminance-to-integer encoding) to form the LDR base layer. The input to the MPEG-4 codec consists of eight-bit RGB data and HDR data in the CIE XYZ color space. The CIE XYZ color space enables encoding of the full visible luminance range and color gamut. The luminance values are quantized with a nonlinear function that distributes the error according to the luminance response curve of the HVS. A color space transformation is introduced that facilitates comparisons between LDR and HDR pixels of different color spaces so that the chrominance data of the residual (enhancement) layer can be determined. Mantiuk et al. (2006a,b) extended this work to include a contrast sensitivity function (CSF)-based prefiltering of the residual stream that removes imperceptible high spatial frequency information, thus reducing the bit rate of the coded sequence.

Mai et al. (2011) proposed a backward-compatible method that aims to find an optimal tone curve for mapping the input HDR video to a backward-compatible eight-bit LDR video format. The optimal tone curve minimizes the quality loss due to tone mapping, encoding/decoding, and inverse tone mapping of the original video. To compute the optimal tone-mapping curve, closed-form and statistical solutions were proposed. The LDR video can be compressed by a conventional video codec such as H.264. The reconstructed video can either be displayed on a conventional LDR display or can be inverse-tone-mapped and augmented by an optional enhancement layer containing an HDR residual signal (also compressed by the codec) for display on an HDR display.

Koz and Dufaux (2014) presented a backward-compatible HDR video compression scheme that uses the tone-mapping method of Mai et al. (2011) and the perceptually uniform mapping of Aydin et al. (2008). Perceptually uniform mapping converts real-world luminance values into perceptually uniform encoded values. The aim of the proposed coding method is to minimize the mean square error between the perceptually uniform-encoded values of the original HDR signal and those of the reconstructed signal. To constrain flickering distortion in the tone-mapped HDR video, their method restrains the average luminance of neighboring frames within the just noticeable differences (JND) interval. Using perceptually uniform mapping, Koz and Dufaux define the perceptually uniform peak-signal to noise ratio that accepts perceptually uniform-mapped values as its input for estimation of the quality of the HDR (or LDR) content when that is displayed on bright displays (Aydin et al., 2008). The experimental results indicate that this method provides a good trade-off between HDR video compression performance and the flickering distortion introduced by the tone mapping applied in most backward-compatible HDR video compression approaches. Compared with the method of Mai et al. (2011), this method offers better performance in terms of the perceptually uniform peak-signal to noise ratio but achieves a smaller objective score when quality is measured with the HDR-MSE and HDR-VDP metrics.

The method proposed in Le Dauphin et al. (2014) adjusts the TMO depending on the coding type of the video frame (intraframe or interframe) in order to maximize the compression efficiency. The experimental results presented suggest that this adaptation can lead to more than 20% bit rate reduction for the LDR layer and more than 2% reduction for HDR video.

The scalable extension of High Efficiency Video Coding (HEVC) — that is, SHVC, offer both bit-depth and color-gamut scalability. Bit-depth-scalable approaches have been proposed in Winken et al. (2007), Wu et al. (2008), and Liu et al. (2008). Typically, the proposed methods create a backward-compatible eight-bit base layer and an enhancement layer for higher-bit-depth reconstruction (ie, 10-, 12-, or 14-bit). In Bordes et al. (2013) and Duenas et al. (2014), color-gamut scalability was been investigated in SHVC. The proposed method addresses the case where the original enhancement layer uses a color gamut different from that used by the base layer. A prediction tool for color differences between the base layer and the enhancement layer is thus required. This can be useful, for instance, for deployment of HDR or wider color gamut services compatible with legacy LDR devices. Typically, the LDR or high-definition TV uses the ITU-R Rec.709 format, while UHD (maybe HDR) is likely to be based on ITU-R Rec.2020.

10.3.2 High-Bit-Depth (Native) HDR Video Compression

High-bit-depth HDR video compression methods directly code the input high-bit-depth HDR video, as opposed to first separating it into base and enhancement layers through tone mapping. They rely on existing codecs that can handle a high-bit-depth input (eg, JPEG 2000, H.264/AVC, HEVC) with added preprocessing steps and/or modifications/extensions that aim to exploit HDR-specific aspects of the HVS for compression gains.

Motra and Thoma (2010) proposed an adaptive-LogLuv transform which can be used with an existing video encoder such as H.264/AVC. Their approach can represent the luminance channel at any specified bit depth. Eight bits are used for the chrominance channels. This method requires that side information is stored/transmitted with each HDR frame. Round-off quantization noise produced by the logarithmic color space transformation used in this method can have repercussions on visual quality. An extended version of this work was presented in Garbas and Thoma (2011), wherein a temporally coherent adaptive dynamic range mapping was proposed. This generates temporal weights and shift parameters for each frame and makes use of the weighted prediction tool specified by the H.264/AVC codec (Boyce, 2004). An electro-optical transfer function is proposed in Miller et al. (2013) which acts as a perceptual quantizer that maximizes the perceptual quality when one is creating HDR content with bit depths of 10 or 12 bits.

The HVS exhibits nonlinear sensitivity to the distortions introduced by lossy image and video coding. This effect is due to the luminance masking, contrast masking, and spatial and temporal frequency masking characteristics of the HVS. Efficient perception-based compression of HDR imagery requires models that capture accurately the masking effects experienced by the HVS under HDR conditions, so that bits are not wasted coding redundant imperceptible information. Fig. 10.2 shows the general structure of perception-based HDR image and video coding.

f10-02-9780081004128 — Figure 10.2 General structure of perception-based HDR image and video encoding. Source: Modified from Zhang et al. (2015).

Zhang et al. (2012a,b) proposed a perception-based HDR image and video compression method for both JPEG 2000 and H.264/AVC. The method uses a discrete wavelet packet transform (DWPT) and applies coefficient weighting factors that are derived from luminance and chrominance CSFs. The aim of the weighting factors is to reduce the amount of imperceptible information that is sent to the encoder. Fig. 10.3 shows the relationship between luminance and chrominance CSFs and a five-level two-dimensional discrete wavelet packet transform decomposition. The perceptual CSF model characterizes the relationship between contrast sensitivity and spatial frequency. In the luminance case (black curve), the HVS is more sensitive to middle spatial frequencies and is less sensitive to lower and higher spatial frequencies. In the chrominance case (blue and red curves), the HVS is more sensitive to lower spatial frequencies and less sensitive to high spatial frequencies. Psychophysical experiments reported in Barten (1999), Mullen (1985), and Mannos and Sakrison (1974) have quantified this phenomenon and describe the ability of the HVS to recognize differences in luminance and chrominance as a function of contrast and spatial frequency. The weighted wavelet coefficients in the case of HDR images are passed to a JPEG 2000 encoder, whereas in the case of HDR video, they are inverse-transformed and the resulting video frames are fed to an H.264/AVC encoder.

f10-03-9780081004128 — Figure 10.3 Relationship between luminance and chrominance CSFs and a five-level two-dimensional discrete wavelet packet transform decomposition. Source: Zhang et al. (2011).

The experimental results presented in Zhang et al. (2012b) indicate that the method offers visually lossless quality at a significantly reduced bit rate compared with JPEG 2000 (Table 10.1, Fig. 10.4). The HDR images listed in Table 10.1 were adopted from (Debevec and Malik, 1997; Drago and Mantiuk, 2005; Reinhard et al., 2010). Quality was measured with the HDR-VDP (Mantiuk et al., 2005) and HDR-VDP-2 (Mantiuk et al., 2011) quality metrics. The method also outperforms that of Motra and Thoma (2010) for coding HDR video (Fig. 10.5).

Table 10.1

Storage Requirements (kB) of Ten Test HDR Images When Coded With the Method of Zhang et al. (2012b), With JPEG 2000, or With One of the Three HDR Image Formats

HDR Image	Resolution (pixels)	Proposed Method (Visually Lossless)	JPEG 2000 (Visually Lossless)	OpenEXR (.exr)	LogLuv (.tiff) (32 bits)	RGBE (.hdr)
memorial	512 × 768	590	787	1246	1190	1312
AtriumNight	760 × 1016	1151	1308	1983	2412	2547
mpi_atrium_1	1024 × 676	1376	1531	1916	2000	2375
EMPstair	852 × 1136	1211	1411	2076	2667	3040
BristolBridge	2048 × 1536	7332	8195	10,132	7696	9895
BoyScoutFalls	1000 × 1504	2974	3281	4497	4794	5142
BoyScoutTrail5	998 × 1496	2935	3321	4464	5048	5280
BoyScoutTree	998 × 1489	3231	3624	4224	4784	5169
sfmoma1	852 × 1136	1521	1825	1962	2640	2996
WardFlowers	1504 × 1000	2935	3344	3620	4292	4676

t0010

Source: Zhang et al. (2012a).

f10-04-9780081004128 — Figure 10.4 (A) Rate-distortion performance of the method of Zhang et al. (2012b) compared with that of JPEG 2000 (test images, memorial.hdr, EMPstair.hdr). (B)–(E) Visually lossless images obtained with the method of Zhang et al. (2012b) compared with the original HDR image (display adaptive TMO used for tone mapping). Source: Zhang et al. (2012a).

f10-05-9780081004128 — Figure 10.5 Original “Tunnel” and “Sun” sequence frames [(A) and (E)] versus frames reconstructed [(B)–(D) and (F)–(H), respectively] with the method of Zhang et al. (2012b) using different QPs. (I) and (J) HDR-VDP (95%) results versus bit rate for the methods of Zhang et al. (2012b), a previous method of Zhang et al. (2011) and the adaptive LogLuv transform of Motra and Thoma (2010) for the “Tunnel” and “Sun” sequences, respectively. Source: Zhang et al. (2012a).

The method of Zhang et al. (2012b) assumes that HDR content exhibits the same CSF as standard dynamic range content. However, HDR content and HDR displays can offer a much larger contrast than conventional imaging. Compared with conventional LDR imaging, HDR imaging (HDR images/video displayed on HDR displays) exhibits considerably larger brightness variations. From a psychophysics point of view, this significant contrast extension has the potential to change the noise visibility threshold in HDR content.

Zhang et al. (2013a) performed psychovisual experiments using a Dolby HDR display (prototype code name Seymour) and HDR stimuli (luminance range 0.103–3162 cd/m²) (Zhang et al., 2013a) to examine HDR luminance masking. The experimental results indicated that the noise visibility threshold increases in both low and high luminance background levels, and especially so with a low luminance background. These results suggest that quantization errors could be hidden by a background luminance masking effect in the darker and brighter areas of HDR content.

To exploit this varying distortion sensitivity, a luminance masking profile can be used to map the average luminance intensity to the quantization step of each coded block. Naccari et al. (2012) proposed the intensity-dependent quantization (IDQ) method for the HEVC standard. Their method applies coarser quantization to darker and brighter image areas without introducing coding artifacts that are noticeable to the HVS. The mapping from average pixel intensity to quantization step is performed through an IDQ profile that is fixed for all sequences and is a modified version of the profile proposed in Jia et al. (2006).

An IDQ profile scaling based on a global tone-mapping curve computed for each HDR frame was proposed in Zhang et al. (2013b). A global tone-mapping curve is a function that maps HDR luminance values either to the displayable luminance range (Mantiuk et al., 2008) or directly to LDR pixel values with eight bits per channel (Larson et al., 1997). In Zhang et al. (2013b) the latter case was considered with the histogram adjustment TMO of Larson et al. (1997). Scaling the IDQ profile using a global tone-mapping curve not only enables adaptation to any HDR content, but also offers a more accurate adaptation with respect to the HVS.

With this IDQ profile, the quantization step can be perceptually tuned on a transform unit basis. However, the intensity differential Quantization Parameter (idQP) value depends on the average pixel intensity (μ) of the original content. Given that the original content is not available to the decoder, for the coded bitstream to be properly decoded, μ would have to be communicated to the decoder. This creates significant overhead that increases the resulting bit rate.

To avoid this overhead the method of Zhang et al. (2013b) estimates μ from the predictor used for each coded block, thus avoiding the need for additional signaling. The predictor can be spatial in the case of intraframe coding or temporal for interframe coding. Estimation of μ from the block predictor avoids the introduction of overhead but creates dependencies in the decoding process that can significantly complicate/delay it. More specifically, the inverse quantization process at the decoder is now dependent on the availability of the predictor. For the case of a temporal predictor, the processing required by motion compensation may delay significantly the output of the pixel data since several memory access operations would have to first be performed.

To prevent such data dependency the method makes use of the intensity-dependent spatial quantization (IDSQ) proposed in Naccari et al. (2012). As can be seen in Fig. 10.6, IDSQ does not require availability of the predictor during the inverse quantization step. Instead, perceptual quantization is applied at the point of final reconstruction of the block, when prediction is added to the residual.

f10-06-9780081004128 — Figure 10.6 Block diagram of IDSQ processing. Source: Zhang et al. (2013b).

With IDSQ, the inverse quantization process is decoupled into two main steps (see Fig. 10.6):

1. Step 1: Over the quantized coefficients c, perform inverse quantization as specified in the HEVC standard with quantization parameter (QP). Then perform inverse transform to obtain r′.

2. Step 2: Before r′ is added to the predictor, perform inverse quantization with idQP(μ) to obtain the rescaled residuals $\hat{r}$ $\hat{r}$ .

The inverse scaling performed during IDSQ (Fig. 10.6) is performed in the spatial domain, which is the reason for the acronym IDSQ.

IDSQ was integrated into the HM reference codec considered for the HEVC range extensions (HM-10.0-RExt-2.0), and its performance was assessed by measurement of the bit rate reduction relative to the codec without perceptual quantization (Zhang et al., 2013b). The test material used in the performance evaluation consisted of six sequences captured by Zhang et al. (2013b) using a RED EPIC camera using its HDRx function (Fig. 10.7A–F. All the test sequences were then downsampled to 1920 × 1080 and provided as OpenEXR, half float representation files. The input bit depth for the luminance channel was 14 bits, with two eight-bit chrominance channels in LogLuv color space (Zhang et al., 2011). The luma/chroma sampling pattern was 4:4:4. The coding configurations selected for the experiments were the ones described in Flynn and Rosewarne (2013): all intra, random access, and low-delay B. The QP values considered were 12, 17, 22, and 27.

f10-07-9780081004128 — Figure 10.7 Example frames from the test HDR sequences used in the performance evaluation of the method of Zhang et al. (2013b). Nighttime sequences “Carnivalx”: (A) “Carnival1,” (B) “Carnival2,” (C) “Carnival3” and (D) “Carnival4.” Daytime sequences: (E) “Library” and (F) “ViceChancelorRoom.” Source: Zhang et al. (2013b).

HDR-VDP-2 was used to assess the quality of the reconstructed video. The results obtained (HDR-VDP-2 predicted mean opinion score (MOS)) suggest that the proposed HDR-IDSQ method provides quality perceptually equivalent to that of the anchor but at a significantly lower bit rate. An average bit rate reduction of up to 9.53% was observed when compared with the anchor, with the largest reduction (18.55%) being achieved for “Carnival2.”

As with most objective metrics, HDR-VDP-2 has certain limitations in terms of its correlation with subjective results. The experimental results of Narwaria et al. (2013b), for example, suggest that at higher bit rates HDR-VDP-2 scores do not increase in proportion to the subjective scores. To overcome this, recent work by Zhang et al. (2015) focused on conducting objective and subjective quality tests on an extended HDR dataset. The results of these tests support the conclusions in Zhang et al. (2013b), demonstrating a bit rate reduction of up to 42% for the HDR-IDSQ codec compared to an HEVC anchor across various coding conditions.

10.4 Summary

In this chapter, we have presented a review of state-of-the-art HDR image and video coding methods, with an emphasis on the latter. Layered (backward-compatible) methods apply tone mapping before compression to create a base (LDR) layer for legacy decoders and displays. Decoding of the optional enhancement layer enables reconstruction of the HDR output. In this case the choice of TMO can affect the performance of the codec not only in terms of the visual quality but also in terms of the bit rate produced. With layered HDR video compression, attention should also be paid to potential flickering artifacts resulting from tone mapping successive frames. Optimization of the tone-mapping curve can be used to minimize the quality loss due to the tone mapping, encoding/decoding, inverse tone mapping chain followed by layered HDR codecs.

High-bit-depth (native) coding involves standard codecs, such as HEVC, operating on the original HDR input. Rate/quality benefits can be obtained by the introduction of HDR-specific perceptual preprocessing, such as CSF-based prefiltering. Modifications can also be made to the actual codec, as in the case of the IDSQ method of Zhang et al. (2013b, 2015), that exploit HDR-specific redundancies/masking effects of the HVS. This method represents the state of the art in HDR video compression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10: High Dynamic Range Video Compression

Create new playlist

Sign In