IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, v.31, no.3, pp.404 - 417
Abstract
Conventional computer-aided design (CAD) methodologies optimize a processor module for correct operation and prohibit timing violations during nominal operation. We propose recovery-driven design, a design approach that optimizes a processor module for a target timing error rate (ER) instead of correct operation. The target ER is chosen based on how many errors can be gainfully tolerated by a hardware or software error resilience mechanism. We show that significant power benefits are possible from a recovery-driven design approach that deliberately allows errors caused by voltage overscaling to occur during nominal operation, while relying on an error resilience technique to tolerate these errors. We present a detailed evaluation and analysis of such a CAD methodology that minimizes the power of a processor module for a target ER. We show how this design-level methodology can be extended to design recovery-driven processors-processors that are optimized to take advantage of hardware or software error resilience. We also discuss a gradual slack recovery-driven design approach that optimizes for a range of ERs to create soft processors-processors that have graceful failure characteristics and the ability to trade throughput or output quality for additional energy savings over a range of ERs. We demonstrate significant power benefits over conventional design-11.8% on average over all modules and ER targets, and up to 29.1% for individual modules. Processor-level benefits were 19.0%, on average. Benefits increase when recovery-driven design is coupled with an error resilience mechanism or when the number of available voltage domains increases