NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Failure analysis and modeling of a VAXcluster systemThis paper discusses the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics such as error and failure distributions and hazard rates for both individual machines and for the VAXcluster, reward models were developed to analyze the impact of failures on the system as a whole. The results show that more than 46 percent of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors, but also failures occur in bursts. Approximately 40 percent of all failures occur in bursts and involved multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 vs 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model.
Document ID
19900057801
Acquisition Source
Legacy CDMS
Document Type
Conference Paper
Authors
Tang, Dong
(Illinois Univ. Urbana, IL, United States)
Iyer, Ravishankar K.
(Illinois Univ. Urbana, IL, United States)
Subramani, Sujatha S.
(Illinois, University Urbana, United States)
Date Acquired
August 14, 2013
Publication Date
June 1, 1990
Subject Category
Computer Operations And Hardware
Accession Number
90A44856
Funding Number(s)
CONTRACT_GRANT: NAG1-613
CONTRACT_GRANT: NCA2-184
Distribution Limits
Public
Copyright
Other

Available Downloads

There are no available downloads for this record.
No Preview Available