Software Fault Detection Using Dynamic Instrumentation [abstract] (CiteSeerX, PDF)
George A. Reis, David I. August, Robert Cohn, and Shubhendu S. Mukherjee
Proceedings of the Fourth Annual Boston Area Architecture Workshop (BARC), February 2006.
Software-only approaches to increase hardware reliability have
been proposed and evaluated as alternatives to hardware
modification. These techniques have shown that they can significantly
improve reliability with reasonable performance
overhead. Software-only techniques do not require any hardware support
and thus are far cheaper and easier to deploy. These techniques can
be used for systems that have already been manufactured and now
require higher reliability than the hardware can offer. All previous proposals have been static compilation techniques
that rely on source code transformations or alterations to the
compilation process. Our proposal is the first application of
software fault detection for transient errors that increases
reliability dynamically. The application of our technique is trivial
since the only requirement is the program binary, which makes it
applicable for legacy programs that no longer have readily available
or easily re-compilable source code. Our dynamic reliability
technique can seamlessly handle variable-length instructions, mixed
code and data, statically unknown indirect jump targets, dynamically
generated code, and dynamically loaded libraries. Our technique is
also able attach to an already running application to increase its
reliability, and detach when appropriate, thus returning to faster
(although unreliable) execution.