Perceptual Criterion Based Rate Control And Fast Mode Search For Spatial Intra Prediction In Video Coding
Abstract
This thesis dwells on two important problems in the field of video coding; namely rate control and spatial domain intra prediction. While the former is applicable generally to most video compression standards, the latter applies to recent advanced video compression standards such as H.264, VC1 and AVS.
Rate control regulates the instantaneous video bit-rate to maximize a picture quality metric while satisfying channel rate and buffer size constraints. Rate control has an important bearing on the picture quality of encoded video. Typically, a quality metric such as Peak Signal-to-Noise ratio (PSNR) or weighted signal-to-noise ratio (WSNR) is chosen out of convenience. However neither metric is a true measure of perceived video quality.
A few researchers have attempted to derive rate control algorithms with the combination of standard PSNR and ad-hoc perceptual metrics of video quality. The concept of using perceptual criterion for video coding was introduced in [7] within the context of perceptual adaptive quantization. In this work, quantization noise levels were adjusted such that more noise was allowed where it was less visible (busy and textured areas) while sensitive areas (typically flat and low detail regions) were finely quantized. Macro–blocks were classified into low detail, texture and edge areas depending on a classifier that studied the variance of sub-blocks within a macro-block (MB). The Rate models were trained from training sets of pre -classified video. One drawback of the above scheme as with standard PSNR was that neither accounts for the perceptual effect of motion. The work in [8] achieved this by assigning higher weights to the regions of the image that were experiencing the highest motion. Also, the center of the image and objects in the foreground are perceived as more important than the sides.
However, attempts to use perceptual metrics for video quality have been limited by the accuracy of the video quality metrics chosen. In the recent years, new and improved metrics of subjective quality have been invented and their statistical accuracy has been studied in a formal manner. Particularly interesting is the work undertaken by ITU and the Video quality experts group (VQEG). VQEG conducted two phases of testing; in the first pha se, several algorithms were tested but they were not found to be very accurate, in fact none were found to be any more accurate than PSNR based metric. In the second phase of testing a few years later, a few new algorithms were experimented with, and it wa s concluded that four of these did achieve results good enough to warrant their standardization as a part of ITU –T Recommendation J.144. These experiments are referred to as the FR-TV (Full Reference Television) phase-II evaluations. ITU-T J.144 does not explicitly identify a single algorithm but provides guidelines on the selection of appropriate techniques to objectively measure subjective video quality. It describes four reference algorithms as well as PSNR. Amongst the four, the NTIA General Video Quality Model (VQM), [11] is the best performing and has been adopted by American National Standards Institute (ANSI) as a North American standard T1.801.03. NTIA’s approach has been to focus on defining parameters that model how humans perceive video quality. These parameters have been combined using linear models to produce estimates of video quality that closely approximate subjective test results. NTIA General Video Quality Model (VQM) has been proven to have strong correlation with subjective quality.
In the first part of the thesis, we apply metrics motivated by NTIA-VQM model within a rate control algorithm to maximize perceptual video quality. We derive perceptual weights using key NTIA parameters to influence QP value used to decide degree of quantization. Our experiments demonstrate that a perceptual quality motivated standard TMN8 rate control in an H.263 encoder results in perceivable quality improvements over a baseline TMN8 rate control algorithm that uses a PSNR metric. Our experimental results on a set of 11 sequences show on an average reduction of 6% in bitrate using the proposed algorithm for the same perceptual quality as standard TMN-8.
The second part of our thesis work deals with spatial domain intra prediction used in advance video coding standard such as H.264. The H.264 Advanced Video coding standard [36] has been shown to achieve video quality similar to older standards such as MPEG2 and H.263 at nearly half the bit-rate. Generally, this compression improvement is attributed to several new tools that were introduced in H.264 – including spatial intra prediction, adaptive block size for motion compensation, in-loop de-blocking filter, context adaptive binary arithmetic coding (CABAC), and multiple reference frames.
While the new tools allow better coding efficiency, they also introduce additi onal computational complexity at both encoder and decoder ends. We are especially concerned here on the impact of Intra prediction on the computational complexity of the encoder. H.264 reference implementations such as JM [29] search through all allowed intra-rediction “modes” in order to find the optimal mode. While this approach yields the optimal prediction mode, it comes at an extremely heavy computational cost. Hence there is a lot of interest into well -motivated algorithms that reduce the computational complexity of the search for the best prediction mode, while retaining the quality advantages of full-search Intra4x4.
We propose a novel algorithm to reduce the complexity of full search by exploiting our knowledge of the source statistics. Specifically, we analyze the transform domain energy distribution of the original 4x4 block in different directions and use the results of our analysis to eliminate unlikely modes and reduce the search space for the optimal I ntra mode. Experimental results show that the proposed algorithm achieves quality metrics (PSNR) similar to full search at nearly a third of the complexity.
This thesis has four chapters and is organized as follows, in the first chapter we introduce basics of video encoding and subsequently present exiting work in the area of perceptual rate control and introduce TMN-8 rate control algorithm in brief. At the end we introduce spatial domain intra prediction. In the second chapter we explain the challenges present in combining NTIA perceptual parameters with TMN8 rate control algorithm. We examine perceptual features used by NTIA from a video compression perspective and explain how the perceptual metrics capture typical compression artifacts. We next present a two pass perceptual rate control (PRCII) algorithm. Finally, we list experimental results on set of video sequences showing on an average of 6% bit-rate reduction by using PRC-II rate control over standard TMN-8 rate control. Chapter 3 contains part-II of our thesis work on, spatial domain intra prediction . We start by reviewing existing work in intra prediction and then present the details of our proposed intra prediction algorithm and experimental results. We finally conclude this thesis in chapter 4 and discuss direction for the future work on both our proposed algorithms.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Bayesian Nonparametric Modeling of Temporal Coherence for Entity-Driven Video Analytics
Mitra, Adway (2018-05-14)In recent times there has been an explosion of online user-generated video content. This has generated significant research interest in video analytics. Human users understand videos based on high-level semantic ... -
Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding
Gorur, Pushkar (2017-09-26)High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos ... -
Techniques For Low Power Motion Estimation In Video Encoders
Gupte, Ajit D (2013-02-12)This thesis looks at hardware algorithms that help reduce dynamic power dissipation in video encoder applications. Computational complexity of motion estimation and the data traffic between external memory and the video ...