Fault tolerant trasputer based firmware for parallel solution of partial differntial equations by the boundry element method
Abstract
This work presents a fault-tolerant multi-transputer architecture capable of handling single-node failures without interrupting program execution. The system dynamically reconfigures the interconnections among transputers to bypass faulty processing nodes and activate corresponding backup processes. Implemented on the IMS B008 transputer motherboard, the architecture has been validated through extensive testing under simulated fault conditions. While the current implementation addresses single faults, the design exhibits scalability and can be extended to tolerate multiple faults by further reconfiguration. For instance, if a node at position i fails, the system can tolerate additional faults in nodes positioned before (i - 2) or after (i + 2), ensuring continued operation through strategic bypassing and backup activation. This architecture enhances the reliability and resilience of distributed computing systems, making it suitable for critical applications requiring high fault tolerance.