NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
experimental evaluation of multiprocessor cache-based error recoverySeveral variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Document ID
19930071617
Document Type
Conference Paper
Authors
Janssens, Bob
(NASA Langley Research Center Hampton, VA, United States)
Fuchs, W. K.
(Illinois Univ. Urbana, United States)
Date Acquired
August 16, 2013
Publication Date
August 1, 1991
Subject Category
COMPUTER PROGRAMMING AND SOFTWARE
Meeting Information
1991 International Conference on Parallel Processing(St. Charles, IL)
Funding Number(s)
CONTRACT_GRANT: N00014-91-J-1283
CONTRACT_GRANT: NAG1-613
Distribution Limits
Public
Copyright
Other