Generalizable No-Reference Image Quality Assessment: Multi-Modal Models and Human Preference Analysis for AI Generated Images

Totate, Sanjot Sagar

dc.contributor.advisor	Soundararajan, Rajiv
dc.contributor.author	Totate, Sanjot Sagar
dc.date.accessioned	2025-06-26T11:43:54Z
dc.date.available	2025-06-26T11:43:54Z
dc.date.submitted	2025
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6967
dc.description.abstract	One of the major challenges in no-reference (NR) image quality assessment (IQA) is the ability to generalize to diverse quality assessment applications. Recently, multi-modal vision-language models have been found to be very promising in this direction. They are beginning to form a part of several state-of-the-art NR IQA methods. On the other hand, multi-modal large language models (LLMs) are increasingly being studied for various computer vision applications including IQA. In this work, we perform a thorough study of the ability of multi-modal LLMs for NR IQA by training some of its components and testing for its generalizability. In particular, we keep the LLM frozen and learn parameters corresponding to the querying transformer, the LLM prompt, and some layers that process the embedding output by the LLM. We observe that some of these components offer a generalization performance far superior to any existing NR IQA algorithm. With the rapid emergence of artificial intelligence (AI)-generated images, there is also a need to understand human preferences of these images. We explore the fundamental dimensions of AI generated image quality assessment, particularly the relationship between alignment (how well images match their text prompts) and quality (both low-level artifacts and high-level structural coherence). We analyze how these dimensions interact and contribute to the overall perceived quality, examining whether separate assessment of alignment and quality yields better results than holistic evaluation approaches. Through comparative analysis of existing and novel assessment models, we provide insights into effective strategies for evaluating AI-generated images.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00977
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Image representations	en_US
dc.subject	Appearance and texture representations	en_US
dc.subject	texture representations	en_US
dc.subject	Computer vision tasks	en_US
dc.subject	Supervised learning	en_US
dc.subject	image quality assessment	en_US
dc.subject	artificial intelligence generated images	en_US
dc.subject	LLM	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Image analysis	en_US
dc.title	Generalizable No-Reference Image Quality Assessment: Multi-Modal Models and Human Preference Analysis for AI Generated Images	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: sanjot_thesis (1).pdf
Size:: 3.006Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [470]

Show simple item record