Cache-based error recovery for shared memory multiprocessor systemsA multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
Document ID
19900050418
Acquisition Source
Legacy CDMS
Document Type
Conference Paper
Authors
Wu, Kun-Lung (Illinois Univ. Urbana, IL, United States)
Fuchs, W. Kent (Illinois Univ. Urbana, IL, United States)
Patel, Janak H. (Illinois, University Urbana, United States)
Date Acquired
August 14, 2013
Publication Date
January 1, 1989
Subject Category
Computer Systems
Meeting Information
Meeting: 1989 International Conference on Parallel Processing