Analog Compute for Edge-AI: Devices, Circuits & SoC
Abstract
Machine Learning and Artificial Intelligence research has yielded models with huge computational complexities to solve a multitude of problems. These complex models require substantial computational resources to perform computations quickly. However, with Moore’s law reaching its end and Dennard’s scaling already hit the wall, it is difficult to increase computing capabilities while maintaining power budgets. This has led to a fundamental yet grave hardware bottleneck in implementing machine learning on digital hardware and specialized digital accelerators for high-performance computing such as graphic processing units (GPUs) and tensor processing units (TPUs), which prioritize performance over energy and area efficiency.
          This thesis, therefore, presents a new approach, utilizing analog computing for designing resilient and scalable machine learning systems. The objective of this research is to address the challenges associated with conventional analog design in create large scale analog systems for Machine Learning. Unlike, conventional analog design approaches, this work first proposes a robust mathematical framework and its design methodologies to create modular, programmable, and scalable analog machine learning circuits and systems. Subsequently, the framework is extended to develop an end-to-end Analog AI Compute Ecosystem. This ecosystem includes: ARYABHAT, the first fully indigenous Analog AI computing chipset in India; ARYAFlow, a design compiler for mapping dataflow graphs onto the processor and ARYATest, an open-source automated AI test framework for chip testing. The broader application of this work enables the design of analog systems for machine learning, which are invariant to transistor operating regimes, modular just like digital design, robust to non-idealities of devices, and simultaneously process technology scalable.
         In addition, this thesis also proposes hybrid computing architectures with the goal of integrating standard CMOS process with novel 2D synaptic memory technology, thereby offering high integration density to surpass CMOS technological limitations. The objective here is to capitalize on the inherent limitations of standard CMOS technology, such as mismatch and nonlinearity, in conjunction with novel memory devices like 2D-Materials and Memristors, which offer high integration density. By combining these technologies, it becomes possible to surpass the existing technological barriers and achieve remarkable advancements in overall system performance.
In summary, this thesis endeavors to tackle the hardware challenges associated with machine learning by pursuing advancements in analog computing. It encompasses the development of a comprehensive analog compute framework and explores the potential of hybrid architectures and algorithms utilizing standard CMOS processes and emerging memory devices. These efforts aim to unlock new possibilities for high-performance machine learning systems, pushing the boundaries of what is currently achievable.

