NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Fault Management Techniques in Human Spaceflight OperationsThis paper discusses human spaceflight fault management operations. Fault detection and response capabilities available in current US human spaceflight programs Space Shuttle and International Space Station are described while emphasizing system design impacts on operational techniques and constraints. Preflight and inflight processes along with products used to anticipate, mitigate and respond to failures are introduced. Examples of operational products used to support failure responses are presented. Possible improvements in the state of the art, as well as prioritization and success criteria for their implementation are proposed. This paper describes how the architecture of a command and control system impacts operations in areas such as the required fault response times, automated vs. manual fault responses, use of workarounds, etc. The architecture includes the use of redundancy at the system and software function level, software capabilities, use of intelligent or autonomous systems, number and severity of software defects, etc. This in turn drives which Caution and Warning (C&W) events should be annunciated, C&W event classification, operator display designs, crew training, flight control team training, and procedure development. Other factors impacting operations are the complexity of a system, skills needed to understand and operate a system, and the use of commonality vs. optimized solutions for software and responses. Fault detection, annunciation, safing responses, and recovery capabilities are explored using real examples to uncover underlying philosophies and constraints. These factors directly impact operations in that the crew and flight control team need to understand what happened, why it happened, what the system is doing, and what, if any, corrective actions they need to perform. If a fault results in multiple C&W events, or if several faults occur simultaneously, the root cause(s) of the fault(s), as well as their vehicle-wide impacts, must be determined in order to maintain situational awareness. This allows both automated and manual recovery operations to focus on the real cause of the fault(s). An appropriate balance must be struck between correcting the root cause failure and addressing the impacts of that fault on other vehicle components. Lastly, this paper presents a strategy for using lessons learned to improve the software, displays, and procedures in addition to determining what is a candidate for automation. Enabling technologies and techniques are identified to promote system evolution from one that requires manual fault responses to one that uses automation and autonomy where they are most effective. These considerations include the value in correcting software defects in a timely manner, automation of repetitive tasks, making time critical responses autonomous, etc. The paper recommends the appropriate use of intelligent systems to determine the root causes of faults and correctly identify separate unrelated faults.
Document ID
20080026132
Acquisition Source
Johnson Space Center
Document Type
Preprint (Draft being sent to journal)
Authors
O'Hagan, Brian
(NASA Johnson Space Center Houston, TX, United States)
Crocker, Alan
(NASA Johnson Space Center Houston, TX, United States)
Date Acquired
August 24, 2013
Publication Date
January 1, 2006
Subject Category
Mathematical And Computer Sciences (General)
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
No Preview Available