Show simple item record

dc.contributor.advisorSoundararajan, Rajiv
dc.contributor.authorTotate, Sanjot Sagar
dc.date.accessioned2025-06-26T11:43:54Z
dc.date.available2025-06-26T11:43:54Z
dc.date.submitted2025
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6967
dc.description.abstractOne of the major challenges in no-reference (NR) image quality assessment (IQA) is the ability to generalize to diverse quality assessment applications. Recently, multi-modal vision-language models have been found to be very promising in this direction. They are beginning to form a part of several state-of-the-art NR IQA methods. On the other hand, multi-modal large language models (LLMs) are increasingly being studied for various computer vision applications including IQA. In this work, we perform a thorough study of the ability of multi-modal LLMs for NR IQA by training some of its components and testing for its generalizability. In particular, we keep the LLM frozen and learn parameters corresponding to the querying transformer, the LLM prompt, and some layers that process the embedding output by the LLM. We observe that some of these components offer a generalization performance far superior to any existing NR IQA algorithm. With the rapid emergence of artificial intelligence (AI)-generated images, there is also a need to understand human preferences of these images. We explore the fundamental dimensions of AI generated image quality assessment, particularly the relationship between alignment (how well images match their text prompts) and quality (both low-level artifacts and high-level structural coherence). We analyze how these dimensions interact and contribute to the overall perceived quality, examining whether separate assessment of alignment and quality yields better results than holistic evaluation approaches. Through comparative analysis of existing and novel assessment models, we provide insights into effective strategies for evaluating AI-generated images.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00977
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectImage representationsen_US
dc.subjectAppearance and texture representationsen_US
dc.subjecttexture representationsen_US
dc.subjectComputer vision tasksen_US
dc.subjectSupervised learningen_US
dc.subjectimage quality assessmenten_US
dc.subjectartificial intelligence generated imagesen_US
dc.subjectLLMen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Image analysisen_US
dc.titleGeneralizable No-Reference Image Quality Assessment: Multi-Modal Models and Human Preference Analysis for AI Generated Imagesen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record