Show simple item record

dc.contributor.advisorJain, Chirag
dc.contributor.authorRajput, Jyotshna
dc.date.accessioned2024-07-17T05:13:54Z
dc.date.available2024-07-17T05:13:54Z
dc.date.submitted2024
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6562
dc.description.abstractThe growing availability of genome sequences of several species, including humans, has created the opportunity to utilize multiple reference genomes for bioinformatics analyses and improve the accuracy of genome resequencing workflows. Graph-based data structures are suitable for compactly representing multiple closely related reference genomes. Pangenome graphs use a directed graph format, where vertices are labeled with strings, and the individual reference genomes are represented as paths in the graph. Aligning sequences (reads) to pangenome graphs is fundamental for pangenome-based genome resequencing. The sequence-to-graph alignment problem seeks a walk in the graph that spells a sequence with minimum edit distance from the input sequence. Unfortunately, the exact algorithms known for solving this problem are not scalable. Among the known heuristics, co-linear chaining is a common technique for quickly aligning reads to a graph. However, the known chaining algorithms are restricted to directed acyclic graphs (DAGs) and are not trivially generalizable to cyclic graphs. Addressing this limitation is important because pangenome graphs often contain cycles due to inversions, duplications, or copy number mutations within the reference genomes. This thesis presents the first practical formulation and algorithm for co-linear chaining on cyclic pangenome graphs. Our work builds upon the known chaining algorithms for DAGs. We propose a novel iterative algorithm to handle cycles and provide a rigorous proof of correctness and runtime complexity. We also use the domain-specific small-width property of pangenome graphs. The proposed optimizations enable our algorithm to scale to large human pangenome graphs. We implemented the algorithm in C++ and referred to it as PanAligner (https://github.com/at-cg/PanAligner). PanAligner is an end-to-end long-read aligner for pangenome graphs. We evaluated its speed and accuracy by aligning simulated long reads to a cyclic human pangenome graph comprising 95 haplotypes. We achieved superior read mapping accuracy compared to existing methods.en_US
dc.description.sponsorshipSERB Start-up Research Grant (SRG), National Supercomputing Mission (NSM)en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00572
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectSequence alignmenten_US
dc.subjectvariation graphen_US
dc.subjectgenome sequencingen_US
dc.subjectco-linear chainingen_US
dc.subjectpath coveren_US
dc.subject.classificationResearch Subject Categories::INTERDISCIPLINARY RESEARCH AREASen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleSequence Alignment to Cyclic Pangenome Graphsen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record