Show simple item record

dc.contributor.advisorBhattacharyya, Chiranjib
dc.contributor.authorPatel, Vishal
dc.date.accessioned2011-08-09T06:33:26Z
dc.date.accessioned2018-07-31T04:40:23Z
dc.date.available2011-08-09T06:33:26Z
dc.date.available2018-07-31T04:40:23Z
dc.date.issued2011-08-09
dc.date.submitted2009
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/1346
dc.identifier.abstracthttp://etd.iisc.ac.in/static/etd/abstracts/1740/G23536-Abs.pdfen_US
dc.description.abstractFor the task of near-duplicate document detection, comparison approaches based on bag-of-words used in information retrieval community are not sufficiently accurate. This work presents novel approach when instance-level constraints are given for documents and it is needed to retrieve them, given new query document for near-duplicate detection. The framework incorporates instance-level constraints and clusters documents into groups using novel clustering approach Grouped Latent Dirichlet Allocation (gLDA). Then distance metric is learned for each cluster using large margin nearest neighbor algorithm and finally ranked documents for given new unknown document using learnt distance metrics. The variety of experimental results on various datasets demonstrate that our clustering method (gLDA with side constraints) performs better than other clustering methods and the overall approach outperforms other near-duplicate detection algorithms.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG23536en_US
dc.subjectDocument Clustering - Artificial Intelligenceen_US
dc.subjectLatent Dirichlet Allocationen_US
dc.subjectInformation Retrievalen_US
dc.subjectNear-Duplicate Detectionen_US
dc.subjectConstrained Clusteringen_US
dc.subjectGroup LDAen_US
dc.subjectDuplicate Bug Report Detectionen_US
dc.subjectNear-Duplicate Document Detectionen_US
dc.subject.classificationComputer Scienceen_US
dc.titleNear-Duplicate Detection Using Instance Level Constraintsen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record