Compressed domain analysis of video sequences

Babu, R Venkatesh

View/Open

T05406.pdf (6.103Mb)

Author

Babu, R Venkatesh

Metadata

Show full item record

Abstract

Digital video compression has become an essential part of day?to?day life due to a wide variety of applications, including video delivery over the Internet, television broadcasting, video streaming, video conferencing, as well as video storage and editing. Many modern compression algorithms such as MPEG?1, MPEG?2, MPEG?4, H.261, H.263, and H.264/MPEG?4 have been developed to handle specific applications. Since raw video data rates can be reduced by a factor of 15–80 without considerable loss in video quality using these compression techniques, all captured video data is stored in compressed form. Video analysis in the pixel domain requires full decoding of the compressed video, which is computationally very expensive. Hence, it is more efficient to perform analysis in the compressed domain itself without fully decoding the bitstream. This calls for techniques that can utilise information available in the compressed domain, such as motion vectors and DCT coefficients. In this thesis, we address various video?analysis tasks that can be performed in the compressed?domain framework. The encoded motion information available in MPEG?compressed video is extensively used for the following problems: Recognition of Human Actions Video Object Segmentation Video Retrieval Sprite/Mosaic Generation A brief description of each problem is given below: Recognition of Human Actions While many pixel?domain approaches for action recognition exist, they are computationally expensive and unsuitable for real?time applications. We aim to develop efficient compressed?domain recognition systems using pre?encoded motion information. Two systems are proposed for classifying seven human actions: walk, run, jump, bend up, bend down, twist right, and twist left. The first system uses a Hidden Markov Model (HMM). Time?series features are extracted from encoded motion vectors available in the P? and B?frames of MPEG video. Noisy motion vectors are removed using morphological filtering. Three types of features are proposed: i) Projected 1?D (from horizontal/vertical motion?vector components) ii) 2?D Polar (motion?vector polar tiling) iii) 2?D Cartesian (motion?vector Cartesian tiling) The performances of all three feature types are compared. The second system introduces coarse Motion Flow History (MFH), representing the extent of motion per macroblock. By adapting Motion History Images (MHI) to the compressed domain, coarse MHI and MFH characterise temporal and motion behaviour compactly. Features extracted from MHI (projection profile, centroids) and MFH (affine model, projected 1?D, 2?D polar) are used to train KNN, neural?network, SVM, and Bayes classifiers. Object Segmentation Video?object extraction is a challenging and important problem, especially for MPEG?4 which supports object?level interactivity. We propose a compressed?domain algorithm for extracting independently moving video objects using encoded motion vectors from inter?coded frames. A motion?accumulation process is introduced, projecting neighbouring?frame motion vectors to their correct spatial positions, thereby enriching motion data. Dense motion flow is generated via spatial interpolation. Coarse object segmentation is achieved using the Expectation–Maximization (EM) algorithm. A block?based affine?clustering method determines the number of required motion models. Objects are temporally tracked using estimated motion parameters, eliminating the need for repeated EM processing. Finally, a boundary?refinement strategy yields fine?level segmentation. Video Retrieval Motivated by MPEG?7’s emphasis on multimedia content description, we propose a compressed?domain system for video retrieval that extracts object?based as well as global features from accumulated motion information. Object?based features include: Speed Approximate area Trajectory Global features include: Motion activity Camera motion Coarse segmentation is performed at macroblock level using EM. Representative motion vectors (median of accumulated motion vectors) enhance robustness. Each object is tracked to derive its trajectory. Users query the system by specifying global and object?level attributes, including sketch?based trajectory input. Sprite Generation Sprite (mosaic) generation is used in MPEG?4 for videos with restricted backgrounds. Existing methods using motion vectors do not account for foreground objects. We propose a background?sprite generation method from MPEG video using background motion vectors only. The system includes: (i) Motion?vector processing (representative motion per macroblock) (ii) Coarse object segmentation using K?means (iii) Camera?motion estimation using weighted background vectors (iv) Frame integration according to camera?motion parameters to generate sprites/mosaics.

URI

https://etd.iisc.ac.in/handle/2005/9329

Collections

Electrical Engineering (EE) [451]