SWIFT: Software Implemented Fault Tolerance [abstract] (ACM DL, PDF)
George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August
Proceedings of the Third International Symposium on
Code Generation and Optimization (CGO), March 2005.
Winner Best Paper Award.
Winner of the 2015 International Symposium on
Code Generation and Optimization Test of Time Award.
To improve performance and reduce power consumption, processor
designers employ advances that shrink feature sizes, lower voltage
levels, reduce noise margins, and increase clock rates. These
advances, however, also make processors more susceptible to transient
faults that can affect program correctness. To mitigate this
increasing problem, designers build redundancy into systems to the
degree that the soft-error budget will allow. While reliable systems typically employ hardware techniques to address
soft-errors, software techniques can provide a lower cost and more
flexible alternative. To make this alternative more attractive, this
paper presents a new software fault tolerance technique, called SWIFT,
for detecting transient errors. Like other single-threaded software
fault tolerance techniques, SWIFT efficiently manages redundancy by
reclaiming unused instruction-level resources present during the
execution of most programs. SWIFT, however, eliminates the need to
double the memory requirement by acknowledging the use of ECC in
caches and memory. SWIFT also provides a higher level of protection
with enhanced checking of the program counter (PC) at no performance
cost. In addition, this enhanced PC checking makes most code inserted
to detect faults in prior methods unnecessary, significantly enhancing
performance. While SWIFT can be implemented on any architecture and
can protect individual code segments to varying degrees, we evaluate a
fully-redundant implementation running on Itanium 2. In these
experiments, SWIFT demonstrates exceptional fault-coverage with a
reasonable performance cost. Compared to the best known
single-threaded approach utilizing an ECC memory system, SWIFT
demonstrates a 51% average speedup.