PyGraph: Compiler Support for Efficient and Transparent Use of CUDA Graphs
Abstract
CUDA Graphs --- a recent hardware feature introduced for NVIDIA GPUs --- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG.
However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result --- deploying CUDA Graphs hurts performance in many cases.