NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Historical Aerospace Software Errors Categorized to Influence Fault ToleranceSince the first use of computers in space and aircraft, software errors have occurred. These errors can manifest as loss-of-life or less catastrophically. As the demand for automation increases, software in mission or safety-critical systems should be designed to be tolerant to the most likely software faults. This paper categorizes a set of 55 historic aerospace software error incidents from 1962 to 2023 to determine trends of how and where automation is most likely to fail, behaving unexpectedly. A distinction between software producing unexpected (erroneous) output versus no output (failsilent) is introduced. Of the historical incidents analyzed, 85% were from software producing wrong output rather than simply stopping. Rebooting was found to be ineffective to clear erroneous behavior, and not reliable to recover from silent failures. Error origin was within the code/logic itself in 58% of cases, 16% from configurable data, 15% from unexpected sensor input, and 11% from command/operator input. A substantial forty percent (40%) of unexpected software behavior was indicated by the absence of code, arising from unanticipated situations and missing requirements, and 16% of incidents were subjectively deemed “unknown-unknowns”. No incidents were found to be the result of programming language, compiler, tool, or operating system; and only sixteen percent (16%) of all incidents were considered errors traditional computer science/programming in nature. These findings indicate that for fault tolerance, erroneous automation behavior must be a primary consideration especially at critical moments, and reboot recoverability may not be viable. Special care should be taken to validate configurable data and commands prior to use. “Test-like-you-fly”, including hardware-in-the-loop combined with robust off-nominal testing should be used to uncover missing logic arising from unanticipated situations not covered by requirements alone. This study uniquely focuses on manifestations of unexpected flight software behavior, independent of ultimate root cause. We characterize software error behavior and origin to improve software design, test, and operations for resilience to the most common manifestations, and provide a rich dataset for further study.
Document ID
20230012909
Acquisition Source
Langley Research Center
Document Type
Conference Paper
Authors
Lorraine E Prokop
(Johnson Space Center Houston, Texas, United States)
Date Acquired
September 5, 2023
Publication Date
March 2, 2024
Publication Information
Publisher: Institute of Electrical and Electronics Engineers
Subject Category
Space Transportation and Safety
Computer Programming and Software
Meeting Information
Meeting: 45th International IEEE Aerospace Conference
Location: Big Sky, MT
Country: US
Start Date: March 2, 2024
End Date: March 9, 2024
Sponsors: American Institute of Aeronautics and Astronautics, Prognostics and Health Management Society, Institute of Electrical and Electronics Engineers
Funding Number(s)
WBS: 869021.01.23.01.01
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
Technical Review
Single Expert
Keywords
Aerospace
Software
Errors
No Preview Available