Show simple item record

dc.contributor.advisorAmrutur, Bharadwaj
dc.contributor.authorVivekanandham, Rajesh
dc.date.accessioned2009-03-09T10:18:23Z
dc.date.accessioned2018-07-31T04:39:32Z
dc.date.available2009-03-09T10:18:23Z
dc.date.available2018-07-31T04:39:32Z
dc.date.issued2009-03-09T10:18:23Z
dc.date.submitted2006
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/413
dc.description.abstractA Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in out-of-order superscalar processors. Along with the instruction window size, the size of various other structures including the issue queue, store queue and register file need to increase as well. However, the cycle time and energy consumption of conventional large monolithic Content Addressable Memories (CAMs), the underlying structure of most conventional issue queue and store queue designs, worsen rapidly with an increase in size. This results in a three way trade-off involving ILP, clock frequency and energy consumption. In this thesis, we propose efficient designs for the issue queue and the store queue that improve the circuit latency and energy consumption while minimizing the loss in IPC. We propose the Scalable Low power Issue Queue (SLIQ) design which segments the issue queue structure to reduce the latency. This is complemented with a fast Wakeup index to a consumer in the issue queue for every instruction. As this consumer instruction can be woken up directly, without any delay, this mitigates the IPC loss faced by the pipelined issue queue. Also, as the scheme incorporates a pipelined broadcast, the indices are not required for correctness and can simply be gang invalidated on branch mispredictions. The IPC loss of an 8 segment SLIQ is Within 2.3% for the entire SPEC CPU2000 benchmark suite while achieving a 39.3% reduction in issue latency. Further, in the SLIQ design unnecessary broadcasts to the higher segments are avoided most of the time as in a large majority of the cases, an instruction has a single consumer. This consumer is woken up either by direct indexing or by broadcast in the first segment of the SLIQ. This enables the 8 segment SLIQ to significantly reduce the energy consumption and the energy-delay product by 48.3% and 67.4% respectively on an average. SLIQ also allows the architects to segment the issue queue carefully so that the latency of the issue logic is just within the per pipeline stage latency goals of the design. We also propose the Scalable Low power Store Queue (SLSQ) to address similar problems associated with the store queue data forwarding logic. We extend the state- of-the-art Store Vector based Disambiguator to also predict the index of the store that will forward to a given load. SLSQ marginally adds to the hardware budget, but predicts the store queue index of the store which will forward with an accuracy of 99.5% on an average. SLSQ, thus, eliminates unnecessary address broadcasts and Compares and reduces energy consumption of the store-to-load forwarding logic by 78.4% and 91.6% for the SPEC Int and FP suites respectively. Another variant of SLSQ, eliminates the need for a CAM in the forwarding logic and achieves a 49.9% reduction in store to load data forwarding latency while incurring a minimal IPC loss less than 0.1% on average for the entire SPEC CPU2000 benchmark suite.en
dc.language.isoen_USen
dc.relation.ispartofseriesG20943en
dc.subjectParallel Processing (Computer Science)en
dc.subjectQueing Processesen
dc.subjectQueue Designen
dc.subjectScalable Low Power Issue Queue (SLIQ) Microarchitectureen
dc.subjectScalable Low Power Store Queue (SLSQ) Microarchitectureen
dc.subjectSuperscalar Processorsen
dc.subjectLarge Instruction Windowen
dc.subject.classificationComputer Scienceen
dc.titleScalable Low Power Issue Queue And Store Queue Design For Superscalar Processorsen
dc.typeThesisen
dc.degree.nameMSc Enggen
dc.degree.levelDoctoralen
dc.degree.disciplineFaculty of Engineeringen


Files in this item

This item appears in the following Collection(s)

Show simple item record