Hardware-Software Co-Design Accelerators for Sparse BLAS

Ramesh, Chinthala

dc.contributor.advisor	Nandy, S K
dc.contributor.advisor	Raha, S
dc.contributor.advisor	Datta, Amitava
dc.contributor.advisor	Narayan, Ranjani
dc.contributor.author	Ramesh, Chinthala
dc.date.accessioned	2019-08-30T09:03:30Z
dc.date.available	2019-08-30T09:03:30Z
dc.date.submitted	2017
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4276
dc.description.abstract	Sparse Basic Linear Algebra Subroutines (Sparse BLAS) is an important library. Sparse BLAS includes three levels of subroutines. Level 1, Level2 and Level 3 Sparse BLAS routines. Level 1 Sparse BLAS routines do computations over sparse vector and spare/dense vector. Level 2 deals with sparse matrix and vector operations. Level 3 deals with sparse matrix and dense matrix operations. The computations of these Sparse BLAS routines on General Purpose Processors (GPPs) not only suffer from less utilization of hardware resources but also takes more compute time than the workload due to poor data locality of sparse vector/matrix storage formats. In the literature, tremendous efforts have been put into software to improve these Sparse BLAS routines performance on GPPs. GPPs best suit for applications with high data locality, whereas Sparse BLAS routines operate on applications with less data locality hence, GPPs performance is poor. Various Custom Function Units (Hardware Accelerators) are proposed in the literature and are proved to be efficient than soft wares which tried to accelerate Sparse BLAS subroutines. Though existing hardware accelerators improved the Sparse BLAS performance compared to software Sparse BLAS routines, there is still lot of scope to improve these accelerators. This thesis describes both the existing software and hardware software co-designs (HW/SW co-design) and identifies the limitations of these existing solutions. We propose a new sparse data representation called Sawtooth Compressed Row Storage (SCRS) and corresponding SpMV and SpMM algorithms. SCRS based SpMV and SpMM are performing better than existing software solutions. Even though SCRS based SpMV and SpMM algorithms perform better than existing solutions, they still could not reach theoretical peak performance. The knowledge gained from the study of limitations of these existing solutions including the proposed SCRS based SpMV and SpMM is used to propose new HW/SW co-designs. Software accelerators are limited by the hardware properties of GPPs, and GPUs itself, hence, we propose HW/SW co-designs to accelerate few basic Sparse BLAS operations (SpVV and SpMV). Our proposed Parallel Sparse BLAS HW/SW co-design achieves near theoretical peak performance with reasonable hardware resources.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G28775;
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Sparse Matrix Storage Formats	en_US
dc.subject	Hardware-Software Codesign Accelerators	en_US
dc.subject	Sparse BLAS	en_US
dc.subject	Hardware Accelerator	en_US
dc.subject	Sawtooth Compressed Row Storage	en_US
dc.subject	Sparse Vector Vector Multiplication	en_US
dc.subject	Sparse Matrix Matrix Multiplication	en_US
dc.subject	Sparse Matrix Vector Multiplication	en_US
dc.subject	Compressed Row Storage	en_US
dc.subject	Sparse Basic Linear Algebra Subroutines	en_US
dc.subject	SpMV Multiplication	en_US
dc.subject	SpMM Multiplication	en_US
dc.subject.classification	Nano Science and Engineering	en_US
dc.title	Hardware-Software Co-Design Accelerators for Sparse BLAS	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US