NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Software Error Incident Categorizations in AerospaceSince the first use of computers in space and aircraft, software errors have occurred. These errors can manifest as loss-of-life or less catastrophically. As the demand for automation increases, software in safety-critical systems should be designed to be tolerant to the most likely software faults. This paper categorizes historic aerospace software errors to determine trends of how and where automation is most likely to fail. A distinction between software producing wrong (erroneous) output versus no output (fail-silent) is introduced. Of the historical incidents analyzed, 87% were from software acting unexpectedly rather than simply stopping. Rebooting was found to be ineffective to clear erroneous behavior, and only partially effective for silent software. Errors were traced back to the software logic itself in 62% of cases, 13% within configurable data, and 25% introduced through input. Thirty percent (30%) of unexpected software behavior was caused by the absence of software and 20% was due to “unknown-unknowns”. These findings indicate that to achieve fault tolerance in safety-critical systems, backup strategies must be employed to detect and respond to erroneous software behavior beyond only fail-silent cases, and robust off-nominal testing should be performed to uncover unanticipated situations.
Document ID
20230012154
Acquisition Source
Langley Research Center
Document Type
Technical Publication (TP)
Authors
Lorraine E. Prokop
(Johnson Space Center Houston, Texas, United States)
Date Acquired
August 16, 2023
Publication Date
August 1, 2023
Subject Category
Computer Programming and Software
Report/Patent Number
NESC-NPP-22-01775
Funding Number(s)
WBS: 869021.01.23.01.01
Distribution Limits
Public
Copyright
Portions of document may include copyright protected material.
Technical Review
NASA Peer Committee
Keywords
Software
Error
Failure
Fault-tolerance
No Preview Available