Show simple item record

dc.contributor.advisorSoundararajan, Rajiv
dc.contributor.authorMitra, Shankhanil
dc.date.accessioned2024-12-16T04:33:45Z
dc.date.available2024-12-16T04:33:45Z
dc.date.submitted2024
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6729
dc.description.abstractNo-reference (NR) video quality assessment (VQA) refers to the study of the quality of degraded videos without the need for reference pristine videos. The problem has wide applications ranging from the quality assessment of camera-captured videos to other user-generated content such as gaming, animation, and screen sharing. While several successful NR-VQA methods have been designed in the last decade, they all require a large amount of human annotations to learn effective models. This poses significant challenges as there is a need to keep conducting large-scale human subjective studies as distortions and camera pipelines evolve. In this thesis, we focus on addressing these problems through label-efficient and generalizable NR-VQA methods. We first propose a semi-supervised learning (SSL) framework exploiting many unlabelled and a small number of labelled videos. Our main contributions are twofold. Leveraging the benefits of consistency regularization and pseudo-labelling, our SSL model generates reliable pairwise pseudo-ranks for unlabelled video pairs using a student-teacher model on strong-weak augmented videos. We design the strong-weak augmentations to be quality invariant so that the unlabelled videos can be used effectively in SSL. The generated pseudo-ranks are used along with the limited labels to train our SSL model. While our SSL framework helps improve performance with several existing VQA features, we also present a spatial and temporal feature extraction method based on capturing spatial and temporal entropic differences. We show that these features help achieve an even better performance with our SSL framework. In the second part, we further improve the features for superior generalization across varied VQA tasks. In this context, we learn self-supervised quality-aware features without relying on any reference videos, human opinion scores, or training videos from the target database. In particular, we present a self-supervised multiview contrastive learning framework to learn spatio-temporal quality representations. We capture the common information between frame differences and frames by treating them as a pair of views and similarly obtain the shared representations between frame differences and optical flow. Further, we evaluate the self-supervised features in a opinion-unaware setup to test their relevance to VQA. In this regard, we compare the representations of the degraded video given by our module with a corpus of pristine natural video patches to predict the quality of the distorted video. Detailed experiments on multiple camera-captured VQA datasets reveal the superior performance of our method over other features when evaluated without training on human scores. To further improve the self-supervised VQA features, we seek to learn spatio-temporal features from video clips instead of merely operating on video frames and frame differences. In particular, we leverage the benefits of the attention mechanism in 3D transformers to model spatio-temporal dependencies. Thus, we first design a self-supervised Spatio-Temporal Visual Quality Representation Learning (ST-VQRL) framework to generate robust quality-aware features using a novel statistical contrastive loss for videos. Then, we propose a dual-model-based SSL method specifically designed for the Video Quality Assessment (SSL-VQA) task through a novel knowledge transfer of quality predictions between the two models. Despite being learned with limited human-annotated videos, our SSL-VQA method uses the ST-VQRL backbone to produce robust performances across various VQA datasets, including cross-database settings. Finally, we address the problem of generalization in VQA. Recent works have shown the remarkable generalizability of text-to-image latent diffusion models (LDMs) for various discriminative computer vision tasks. In this work, we leverage the denoising process of an LDM for generalizable NR-VQA by understanding the degree of alignment between perceptually relevant visual concepts and quality-aware text prompts. Since applying text-to-image LDMs for every video frame is computationally expensive, we only estimate the quality of a frame-rate sub-sampled version of the original video. To compensate for the loss in motion information due to frame-rate sub-sampling, we propose a novel temporal quality modulator (TQM). Our TQM adjusts for quality prediction by computing the cross-attention between the diffusion model’s representation and the motion features of the original and subsampled videos. Our extensive cross-database experiments across various user-generated, frame-rate variation, Ultra-HD, and streaming content-based databases show that our model can achieve superior generalization in VQA.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00739
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectNR-VQAen_US
dc.subjectNo-reference video quality assessmenten_US
dc.subjectvideo quality assessmenten_US
dc.subjectspatio-temporal featuresen_US
dc.subjectSpatio-Temporal Visual Quality Representation Learningen_US
dc.subjectsemi-supervised learningen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electronicsen_US
dc.titleLabel Efficient and Generalizable No-reference Video Quality Assessmenten_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record