Integrating A New Cluster Assignment And Scheduling Algorithm Into An Experimental Retargetable Code Generation Framework
Abstract
This thesis presents a new unified algorithm for cluster assignment and acyclic region
scheduling in a partitioned architecture, and preliminary results on its integration into an experimental retargetable code generation framework. The object of this work is twofold. Firstly, to validate for the first time, and evaluate the framework which is almost automatic, so as to gain insights into possibilities for improvement. This was done by using as a baseline for comparison, highly optimized code generated by the handcrafted compiler of Texas Instruments, the TI Code Composer Studio V2. The second objective is to compare the integrated scheduling algorithm with another well known algorithm which performs scheduling and cluster allocation in the same phase, the Unified Assign and Schedule (UAS) algorithm. The computational complexity of the two algorithms is
comparable.
The components of the framework experimented with here are (a) a tree transformer generator, which takes as input, a description of the instruction set of the target architecture in the form of a regular tree grammar augmented with actions and attributes, and outputs a data dependency directed acyclic graph, (b) the well known public domain IMPACT front end for C, (c)a microarchitecture description module which uses a modification of the HMDES architecture description language of the TRIMARAN project, to include cluster information, and (d) a combined cluster allocator and acyclic region scheduler and a register allocator designed and implemented by us. Experiments have been carried out on creating the proper interfaces for all the modules to work together, and the targeting of the tool to the Texas Instruments TMS320c62x architecture to establish the feasibility of this approach. We present the results of our implementation on a set of benchmarks and some sorting programs and compare them with those obtained from the state-of-the-art TI compiler. The performance without software pipelining shows that our executables take on the average 1.4 times the execution time as that of those generated by the TI compiler. The integrated scheduling algorithm proposed in this thesis performs
at least as well as the UAS algorithm and sometimes better by as much as 9 % in terms
of the parallelism obtained.