• High Performance GPU Tensor Core Code Generation for Matmul using MLIR 

      Katel, Naveep Kumar
      State of the art in high-performance deep learning is primarily driven by highly tuned libraries. These libraries are often hand-optimized and tuned by expert programmers using low-level abstractions with significant effort. ...
    • Optimizing Dense Matrix Computations with PolyMage 

      Kumudha, K N
      Linear algebra computations and other arbitrary affine accesses are ubiquitous in applications from domains like scientific computing, digital signal processing (DSP), and deep neural networks. Libraries such as OpenBLAS, ...