Dynamic hybrid partitioned non partitioned queue configurations based on workloads in supercomputer systems
Abstract
Supercomputers rely on batch queues to manage parallel job execution, with commercial schedulers offering configurable parameters that influence job routing, execution order, and queue visitation. However, selecting an optimal configuration is challenging due to varying workload patterns and system behavior. Manual reconfiguration is tedious and often ineffective, especially as workloads evolve. This project addresses the problem of dynamic queue reconfiguration by proposing a methodology that adapts queue parameters—such as processor limits, runtime limits, and priorities—based on historical workload data.
Assuming workload patterns from the previous month are indicative of future behavior, the system intelligently reconfigures queues while preserving the influence of prior configurations. To handle unpredictable job arrivals, a hybrid partitioned/non-partitioned queuing framework is introduced, combining dedicated queues with a shared pool of nodes accessible based on job priority. Experimental results demonstrate that this hybrid approach, combined with dynamic reconfiguration, reduces average job wait time by 48%, significantly improving system utilization and responsiveness.

