Machine Learning and Rank Aggregation Methods for Gene Prioritization from Heterogeneous Data Sources

Laha, Anirban

dc.contributor.advisor	Agarwal, Shivani
dc.contributor.author	Laha, Anirban
dc.date.accessioned	2017-12-05T16:42:18Z
dc.date.accessioned	2018-07-31T04:38:44Z
dc.date.available	2017-12-05T16:42:18Z
dc.date.available	2018-07-31T04:38:44Z
dc.date.issued	2017-12-05
dc.date.submitted	2013
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2866
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/3725/G26678-Abs.pdf	en_US
dc.description.abstract	Gene prioritization involves ranking genes by possible relevance to a disease of interest. This is important in order to narrow down the set of genes to be investigated biologically, and over the years, several computational approaches have been proposed for automat-ically prioritizing genes using some form of gene-related data, mostly using statistical or machine learning methods. Recently, Agarwal and Sengupta (2009) proposed the use of learning-to-rank methods, which have been used extensively in information retrieval and related fields, to learn a ranking of genes from a given data source, and used this approach to successfully identify novel genes related to leukemia and colon cancer using only gene expression data. In this work, we explore the possibility of combining such learning-to-rank methods with rank aggregation techniques to learn a ranking of genes from multiple heterogeneous data sources, such as gene expression data, gene ontology data, protein-protein interaction data, etc. Rank aggregation methods have their origins in voting theory, and have been used successfully in meta-search applications to aggregate webpage rankings from different search engines. Here we use graph-based learning-to-rank methods to learn a ranking of genes from each individual data source represented as a graph, and then apply rank aggregation methods to aggregate these rankings into a single ranking over the genes. The thesis describes our approach, reports experiments with various data sets, and presents our findings and initial conclusions.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G26678	en_US
dc.subject	Gene Prioritization	en_US
dc.subject	Gene Ranking	en_US
dc.subject	Bipartite Ranking	en_US
dc.subject	Learning To Rank	en_US
dc.subject	Rank Aggregation Methods	en_US
dc.subject	Bipartite Instance Ranking	en_US
dc.subject	Rank Aggregration	en_US
dc.subject	Ranking of Genes	en_US
dc.subject	Gene Data Sources	en_US
dc.subject	Genes Bipartite Ranking	en_US
dc.subject	Bipartite Graph Ranking	en_US
dc.subject.classification	Bioinformatics	en_US
dc.title	Machine Learning and Rank Aggregation Methods for Gene Prioritization from Heterogeneous Data Sources	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G26678.pdf
Size:: 746.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [545]

Show simple item record