Show simple item record

dc.contributor.advisorSingh, Virendra
dc.contributor.advisorParekhji, Rubin
dc.contributor.authorPrasanth, V
dc.date.accessioned2018-03-05T19:28:44Z
dc.date.accessioned2018-07-31T05:09:17Z
dc.date.available2018-03-05T19:28:44Z
dc.date.available2018-07-31T05:09:17Z
dc.date.issued2018-03-06
dc.date.submitted2012
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/3232
dc.identifier.abstracthttp://etd.iisc.ac.in/static/etd/abstracts/4093/G25445-Abs.pdfen_US
dc.description.abstractCMOS technology scaling is bringing new challenges to the designers in the form of new failure modes. The challenges include long term reliability failures and particle strike induced random failures. Studies have shown that increasingly, the largest contributor to the device reliability failures will be soft errors. Due to reliability concerns, the adoption of soft error mitigation techniques is on the increase. As the soft error mitigation techniques are increasingly adopted, the area and performance overhead incurred in their implementation also becomes pertinent. This thesis addresses the problem of providing low cost soft error mitigation. The main contributions of this thesis include, (i) proposal of a new delayed capture methodology for low overhead soft error detection, (ii) adopting Error Control Coding (ECC) for delayed capture methodology for correction of single event upsets, (iii) analyzing the impact of different derating factors to reduce the hardware overhead incurred by the above implementations, and (iv) proposal for hardware software co-design for reliability based upon critical component identification determined by the application executing on the hardware (as against standalone hardware analysis). This thesis first surveys existing soft error mitigation techniques and their associated limitations. It proposes a new delayed capture methodology as a low overhead soft error detection technique. Delayed capture methodology is an enhancement of the Razor flip-flop methodology. In the delayed capture methodology, the parity for a set of flip-flops is calculated at their inputs and outputs. The input parity is latched on a second clock, which is delayed with respect to the functional clock by more than the soft error pulse width. It requires an extra flip-flop for each set of flip-flops. On the other hand, in the Razor flip-flop methodology an additional flip-flop is required for every functional flip-flop. Due to the skew in the clocks, either the parity flip-flop or the functional flip-flop will capture the effect of transient, and hence by comparing the output parity and latched input parity an error can be detected. Fault injection experiments are performed to evaluate the bneefits and limitations of the proposed approach. The limitations include soft error detection escapes and lack of error correction capability. Different cases of soft error detection escapes are analyzed. They are attributed mainly to a Single Event Upset (SEU) causing multiple flip-flops within a group to be in error. The error space due to SEUs is analyzed and an intelligent flip-flop grouping method using graph theoretic formulations is proposed such that no SEU can cause multiple flip-flops within a group to be in error. Once the error occurs, leaving the correction aspects to the application may not be desirable. The proposed delayed capture methodology is extended to replace parity codes with codes having higher redundancy to enable correction. The hardware overhead due to the proposed methodology is analyzed and an area savings of about 15% is obtained when compared to an existing soft error mitigation methodology with equivalent coverage. The impact of different derating factors in determining the hardware overhead due to the soft error mitigation methodology is then analyzed. We have considered electrical derating and timing derating information for the evaluation purpose. The area overhead of the circuit with implementation of delayed capture methodology, considering different derating factors standalone and in combination is then analyzed. Results indicate that in different circuits, either a combination of these derating factors yield optimal results, or each of them considered standalone. This is due to the dependency of the solution on the heuristic nature of the algorithms used. About 23% area savings are obtained by employing these derating factors for a more optimal grouping of flip-flops. A new paradigm of hardware software co-design for reliability is finally proposed. This is based on application derating in which the application / firmware code is profiled to identify the critical components which must be guarded from soft errors. This identification is based on the ability of the application software to tolerate certain errors in hardware. An algorithm to identify critical components in the control logic based on fault injection is developed. Experimental results indicated that for a safety critical automotive application, only 12% of the sequential logic elements were found to be critical. This approach provides a framework for investigating how software methods can complement hardware methods, to provide a reduced hardware solution for soft error mitigation.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG25445en_US
dc.subjectSoft Error Mitigationen_US
dc.subjectHardware Reliabilityen_US
dc.subjectDelayed Capture Methodologyen_US
dc.subjectFault Tolerant Computingen_US
dc.subjectHardware Deratingen_US
dc.subjectTransient Faultsen_US
dc.subjectSoft Errorsen_US
dc.subjectSoft Error Tolerant Designsen_US
dc.subjectError Control Coding (ECC)en_US
dc.subjectDelayed Captureen_US
dc.subjectFault Toleranceen_US
dc.subject.classificationComputer Scienceen_US
dc.titleLow Overhead Soft Error Mitigation Methodologiesen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record