Communication Overlapping Krylov Subspace Methods for Distributed Memory Systems

Tiwari, Manasi

dc.contributor.advisor	Vadhiyar, Sathish
dc.contributor.author	Tiwari, Manasi
dc.date.accessioned	2023-01-25T07:37:58Z
dc.date.available	2023-01-25T07:37:58Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5990
dc.description.abstract	Many high performance computing applications in computational fluid dynamics, electromagnetics etc. need to solve a linear system of equations $Ax=b$. For linear systems where $A$ is generally large and sparse, Krylov Subspace methods (KSMs) are used. In this thesis, we propose communication overlapping KSMs. We start with the Conjugate Gradient (CG) method, which is used when $A$ is sparse symmetric positive definite. Recent variants of CG include a Pipelined CG (PIPECG) method which overlaps the allreduce in CG with independent computations i.e., one Preconditioner (PC) and one Sparse Matrix Vector Product (SPMV). As we move towards the exascale era, the time for global synchronization and communication in allreduce increases with the large number of cores available in the exascale systems, and the allreduce time becomes the performance bottleneck which leads to poor scalability of CG. Therefore, it becomes necessary to reduce the number of allreduces in CG and adequately overlap the larger allreduce time with more independent computations than the independent computations provided by PIPECG. Towards this goal, we have developed PIPECG-OATI (PIPECG-One Allreduce per Two Iterations) which reduces the number of allreduces from three per iteration to one per two iterations and overlaps it with two PCs and two SPMVs. For better scalability with more overlapping, we also developed the Pipelined s-step CG method which reduces the number of allreduces to one per s iterations and overlaps it with s PCs and s SPMVs. We compared our methods with state-of-art CG variants on a variety of platforms and demonstrated that our method gives 2.15x - 3x speedup over the existing methods. We have also generalized our research with parallelization of CG on multi-node CPU systems in two dimensions. Firstly, we have developed communication overlapping variants of KSMs other than CG, including Conjugate Residual (CR), Minimum Residual (MINRES) and BiConjugate Gradient Stabilised (BiCGStab) methods for matrices with different properties. The pipelined variants give up to 1.9x, 2.5x and 2x speedup over the state-of-the-art MINRES, CR and BiCGStab methods respectively. Secondly, we developed communication overlapping CG variants for GPU accelerated nodes, where we proposed and implemented three hybrid CPU-GPU execution strategies for the PIPECG method. The first two strategies achieve task parallelism and the last method achieves data parallelism. Our experiments on GPUs showed that our methods give 1.45x - 3x average speedup over existing CPU and GPU-based implementations. The third method gives up to 6.8x speedup for problems that cannot be fit in GPU memory. We also implemented GPU related optimizations for the PIPECG-OATI method and show performance improvements over other GPU implementations of PCG and PIPECG on multiple nodes with multiple GPUs.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Communication Overlap	en_US
dc.subject	Asynchronous Executions	en_US
dc.subject	Krylov Subspace Methods	en_US
dc.subject	Pipelined methods	en_US
dc.subject	Exascale systems	en_US
dc.subject	GPU Systems	en_US
dc.subject	Conjugate Gradient method	en_US
dc.subject	MINRES method	en_US
dc.subject	Conjugate Residual method	en_US
dc.subject	s-step CG method	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer science	en_US
dc.title	Communication Overlapping Krylov Subspace Methods for Distributed Memory Systems	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Manasi_PhD_Thesis (3).pdf
Size:: 2.962Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [116]

Show simple item record

Communication Overlapping Krylov Subspace Methods for Distributed Memory Systems

Files in this item

This item appears in the following Collection(s)

Related items

Study of Higher Order Split-Step Methods for Stiff Stochastic Differential Equations ﻿

Optimal Control Of Numerical Dissipation In Modified KFVS (m-KFVS) Using Discrete Adjoint Method ﻿

Smooth Finite Element Methods with Polynomial Reproducing Shape Functions ﻿

Study of Higher Order Split-Step Methods for Stiff Stochastic Differential Equations

Optimal Control Of Numerical Dissipation In Modified KFVS (m-KFVS) Using Discrete Adjoint Method

Smooth Finite Element Methods with Polynomial Reproducing Shape Functions