Compiler controlled Task Management in Runtime Systems for Dynamic Data ow Model of Execution
Abstract
For the past 40 years, relentless focus on Moore's Law transistor scaling has provided ever-increasing transistor performance and density. An ever increasing demand for large scale parallelism has driven hardware designers to fit in more cores per die, reaching physical limits of power dissipation. Attention has now turned to low power light-weight cores such as ARM with thousands of wimpy and brawny cores per die. The responsibility of running applications on such cores, however, still remains with the operating system, adding to the overheads of an otherwise light-weight shared-memory based application. A massively parallel low power chip with high scalability as a building block for a larger compute infrastructure is the preferred design. Such chips are expected to sport features such as inexpensive computation and communication along with a low-latency runtime interface. State of the art runtime systems incur significant performance penalties as they are tied to traditional parallel computing models. Performance concerns have led researchers to consider alternative models of computing such as Dynamic Dataflow. Such models have proven to be more scalable and power budget friendly, making parallelism exploitation more amenable even with irregular applications that usually are tricky to parallelize and scale. An ideal runtime implementation exposes runtime management primitives to the software abstraction layer to use. We introduce one such distributed hardware runtime for a massively parallel manycore processor (called REDEFINE), that exposes parallelism handles as instructions that are part of the its ISA.We present a compilation strategy that utilizes the primitives to effectively manage tasks on the hardware. REDEFINE's compiler controls task creation and deletion and manages communication between them, and balances task loads on REDEFINE's distributed execution fabric.