| dc.description.abstract | Multimedia processing constitutes a significant part of current?day average microprocessor workloads. The coming of age of various enabling technologies like the Internet, compact discs, and digital versatile discs (DVDs) places enormous demands on the dynamic media (video, music, animation, etc.) processing capabilities of a computer system. Traditional workloads consist of scalar processing of 32?bit integer data types and processing of static media (images, sounds, etc.). In contrast, current multimedia processing involves real?time processing of continuous media data streams and uses vectors of packed 8?, 16?, and 32?bit integer and floating?point data. Further, multimedia applications exhibit significant amounts of parallelism—both fine?grained (instruction?level and data?level parallelism) and coarse?grained (thread?level parallelism).
In this thesis we propose a framework?based approach to manage the complexity of designing a media?processing system?on?a?chip. To this end, we have defined an architectural framework called SYMPHONY, which consists of a linear array of processors. Processors in the array use fast register? and memory?based communication and synchronization mechanisms to deliver high performance. Memory?based communication and synchronization are realized using special memory modules called single?assignment memory. Parallelism in an application is exploited using a combination of orthogonal parallel processing techniques, namely instruction?level parallelism (ILP), data parallelism (SIMD), multithreading, and multiprocessing.
Simulation studies were carried out to evaluate the performance of a simultaneously multithreaded processor in the SYMPHONY framework. The simulations used the UTAH Media Benchmarks (UCLA MediaBench) and a DLMS adaptive?filtering kernel. To reap maximum performance benefits from the processor, we combine ILP, data parallelism (in the form of SIMD parallelism using media extensions), and thread?level parallelism (TLP) simultaneously to enhance the performance of a simultaneously multithreaded superscalar processor. To the best of our knowledge, this is the first attempt toward combining these three orthogonal forms of parallelism.
Our results demonstrate that ILP, on average, does not degrade with increasing data?level and thread?level parallelism. Therefore, the benefits of SIMD parallelism are preserved even as TLP is exploited. We also show that simultaneously multithreaded processor configurations, as we advocate, offer complexity?effective realizations with superior performance compared to wide?issue superscalar processors. | |