Algorithmic and architectural solutions for scheduling task graphs on multiprocessors
Abstract
Scheduling is a critical challenge in the design of multicomputer systems, especially for minimizing execution time of parallel programs represented as task graphs. This thesis presents both algorithmic and architectural solutions for scheduling task graphs on multiprocessors. It introduces new parallelism measures that more accurately capture exploitable parallelism, and derives improved lower and upper bounds on execution time for directed acyclic graphs (DAGs), assuming zero communication delays. A novel graph partitioning approach is used to establish these bounds, which are shown to be sharper and computationally efficient.
A basic scheduling algorithm is proposed, and its performance is proven to match the derived upper bound, demonstrating optimality for complex task graphs beyond tree structures. The thesis also extends these bounds to systems with communication delays, improving upon existing results.
Special attention is given to diamond DAGs, common in scientific and numerical applications. These are categorized into four types, and their structural properties are analyzed. Linear-time optimal scheduling algorithms are developed for these graphs, marking the first polynomial-time solutions that account for communication delays.
Finally, the thesis introduces ASYFLOW (Asynchronous Task Flow Large Parallel System), an abstract model for large-scale parallel systems. ASYFLOW supports coarse-grain parallelism and features a dynamic, distributed scheduling and load balancing algorithm. It incorporates techniques for memory latency hiding, efficient synchronization, scalable interconnection networks, and performance evaluation. Simulation results demonstrate ASYFLOW’s scalability up to 1024 processors, validating the practical impact of the proposed algorithms and architectural model.