Seminar: Aviral Shrivastava(Arizona State University) “Software Techniques for Soft Errors”
2017.07.19
Abstract
Exponentially growing rate of soft errors makes reliability a major concern in modern processor design. Initial efforts to protect computation from soft errors were at the hardware level. However, software solutions offer flexible protection (protect only critical applications or critical parts of an application), and can protect computation on even existing off-the-shelf processors. As a result, several software approaches for detecting soft errors have been developed over the last 2-3 decades. In this decade, we have been busy double-checking the software techniques, and re-evaluating how good they are? Existing software detection techniques can be classified into two main groups, i) duplication-based, and ii) control flow checking. In our paper in DAC 2014, we demonstrated that existing control flow checking techniques are not effective in detecting SDCs; in fact, they may actually increase the program’s vulnerability to soft errors. In our DAC 2016 paper, we show that the state of the art instruction duplication techniques like SWIFT also are unable to detect many SDCs, and are not as effective as advertised. Through all this, we gather insights into the vulnerability windows of programs and components, and propose a technique nZDC (nearly Zero silent Data Corruption) to detect (almost) all soft errors. Extensive fault injection experiments on almost all the un-protected microarchitectural components in a gem5 simulated ARM Cortex A53 demonstrate that nZDC is extremely effective, without incurring any more performance penalty than the state-of-the-art.
Speaker biography
Prof. Aviral Shrivastava is Associate Professor in the School of Computing Informatics and Decision Systems Engineering at the Arizona State University. He received his Ph.D. and Masters in Information and Computer Science from University of California, Irvine, and bachelors in Computer Science and Engineering from Indian Institute of Technology, Delhi. He is the recipient of 2011 NSF CAREER Award, 2011 Outstanding Junior Researcher in CSE at ASU. He is the second most prolific author at ESWEEK in the last 5 years, and has a 6-year running streak of papers at DAC. His research lies at the intersection of compilers and architectures of embedded and multi-core systems, with the goal of improving predictability, power, performance, and reliability. His research is funded by several federal agencies including NSF, DOE, NIST, and by several companies, including Intel, Toyota, Raytheon Missile Systems. He is currently serving as an associate editor in ACM TECS and IEEE TMSCS. This year, he is also the co-chair of CODES+ISSS 2017. He serves on organizing and program committees of several top embedded system conferences, including DAC, ISLPED, CODES+ISSS, CASES and LCTES, and regularly serves on NSF and DOE review panels.