Hardware-Software Co-Design Accelerators for Sparse BLAS
Author
Ramesh, Chinthala
Metadata
Show full item recordAbstract
Sparse Basic Linear Algebra Subroutines (Sparse BLAS) is an important library. Sparse BLAS includes three levels of subroutines. Level 1, Level2 and Level 3 Sparse BLAS routines. Level 1 Sparse BLAS routines do computations over sparse vector and spare/dense vector. Level 2 deals with sparse matrix and vector operations. Level 3 deals with sparse matrix and dense matrix operations. The computations of these Sparse BLAS routines on General Purpose Processors (GPPs) not only suffer from less utilization of hardware resources but also takes more compute time than the workload due to poor data locality of sparse vector/matrix storage formats.
In the literature, tremendous efforts have been put into software to improve these Sparse BLAS routines performance on GPPs. GPPs best suit for applications with high data locality, whereas Sparse BLAS routines operate on applications with less data locality hence, GPPs performance is poor. Various Custom Function Units (Hardware Accelerators) are proposed in the literature and are proved to be efficient than soft wares which tried to accelerate Sparse BLAS subroutines. Though existing hardware accelerators improved the Sparse BLAS performance compared to software Sparse BLAS routines, there is still lot of scope to improve these accelerators.
This thesis describes both the existing software and hardware software co-designs (HW/SW co-design) and identifies the limitations of these existing solutions. We propose a new sparse data representation called Sawtooth Compressed Row Storage (SCRS) and corresponding SpMV and SpMM algorithms. SCRS based SpMV and SpMM are performing better than existing software solutions. Even though SCRS based SpMV and SpMM algorithms perform better than existing solutions, they still could not reach theoretical peak performance.
The knowledge gained from the study of limitations of these existing solutions including the proposed SCRS based SpMV and SpMM is used to propose new HW/SW co-designs. Software accelerators are limited by the hardware properties of GPPs, and GPUs itself, hence, we propose HW/SW co-designs to accelerate few basic Sparse BLAS operations (SpVV and SpMV). Our proposed Parallel Sparse BLAS HW/SW co-design achieves near theoretical peak performance with reasonable hardware resources.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Sparse Bayesian Learning For Joint Channel Estimation Data Detection In OFDM Systems
Prasad, Ranjitha (2018-08-30)Bayesian approaches for sparse signal recovery have enjoyed a long-standing history in signal processing and machine learning literature. Among the Bayesian techniques, the expectation maximization based Sparse Bayesian ... -
Efficient Design of Embedded Data Acquisition Systems Based on Smart Sampling
Satyanarayana, J V (2018-05-10)Data acquisition from multiple analog channels is an important function in many embedded devices used in avionics, medical electronics, robotics and space applications. It is desirable to engineer these systems to reduce ... -
Bayesian Techniques for Joint Sparse Signal Recovery: Theory and Algorithms
Khanna, SaurabhThis thesis contributes new theoretical results, solution concepts, and algorithms concerning the Bayesian recovery of multiple joint sparse vectors from noisy and underdetermined linear measurements. The thesis is written ...