An MLIR-Based High-Level Synthesis Compiler for Hardware Accelerator Design
Abstract
The emergence of machine learning, image and audio processing on edge devices has motivated research towards power-efficient custom hardware accelerators. Though FPGAs are an ideal target for custom accelerators, the difficulty of hardware design and the lack of vendor-agnostic, standardized hardware compilation infrastructure has hindered their adoption. High-level synthesis (HLS) offers a more compiler-centric alternative to the traditional Verilog-based hardware design improving developer productivity.
Though HLS offers many advantages over traditional HDL-based hardware design flow, it is still not a mature ecosystem. There is a need for research in both programming abstraction for hardware design and compiler optimizations to meet the efficiency of hand-optimized HDL designs. In the software world, LLVM has enabled rapid prototyping of programming languages. A new programming language can target the LLVM compiler and benefit from the existing optimizations and backend code generation. LLVM also enables the development of new compiler optimizations. A new optimization pass can be plugged into the existing compiler pipeline to evaluate its benefits on existing programming languages and benchmarks. This decoupling of different stages of the compiler pipeline can be largely attributed to the LLVM intermediate representation.
The high-level synthesis ecosystem still lacks such extensible modular compiler infrastructure which could be used for the development of new HLS programming languages and optimizations. In this work, we propose an MLIR based end-to-end HLS compiler and an intermediate representation that is suitable for the design and implementation of domain-specific accelerators for affine workloads. Our compiler brings similar levels of modularity and extensibility to the HLS compilation domain, which LLVM brought in the area of software compilation. A modular compiler infrastructure offers the advantage of incrementally introducing new language frontends and optimization passes without the need to reinvent the whole HLS compiler stack.
Our compiler converts a high-level description of the accelerator specified in the C programming language into a register-transfer-level(RTL) design (SystemVerilog). We use memory dependence analysis and integer-linear-program(ILP) based automatic scheduling to improve loop pipelining, and introduce parallelization between producer-consumer kernels. Our ILP-based optimizer beats the state-of-the-art Vitis HLS compiler by 1.3x in performance over a representative set of benchmarks, while requiring fewer FPGA resources.