NASA System Safety HandbookSystem safety assessment is defined in NPR 8715.3C, NASA General Safety Program Requirements as a disciplined, systematic approach to the analysis of risks resulting from hazards that can affect humans, the environment, and mission assets. Achievement of the highest practicable degree of system safety is one of NASA's highest priorities. Traditionally, system safety assessment at NASA and elsewhere has focused on the application of a set of safety analysis tools to identify safety risks and formulate effective controls.1 Familiar tools used for this purpose include various forms of hazard analyses, failure modes and effects analyses, and probabilistic safety assessment (commonly also referred to as probabilistic risk assessment (PRA)). In the past, it has been assumed that to show that a system is safe, it is sufficient to provide assurance that the process for identifying the hazards has been as comprehensive as possible and that each identified hazard has one or more associated controls. The NASA Aerospace Safety Advisory Panel (ASAP) has made several statements in its annual reports supporting a more holistic approach. In 2006, it recommended that "... a comprehensive risk assessment, communication and acceptance process be implemented to ensure that overall launch risk is considered in an integrated and consistent manner." In 2009, it advocated for "... a process for using a risk-informed design approach to produce a design that is optimally and sufficiently safe." As a rationale for the latter advocacy, it stated that "... the ASAP applauds switching to a performance-based approach because it emphasizes early risk identification to guide designs, thus enabling creative design approaches that might be more efficient, safer, or both." For purposes of this preface, it is worth mentioning three areas where the handbook emphasizes a more holistic type of thinking. First, the handbook takes the position that it is important to not just focus on risk on an individual basis but to consider measures of aggregate safety risk and to ensure wherever possible that there be quantitative measures for evaluating how effective the controls are in reducing these aggregate risks. The term aggregate risk, when used in this handbook, refers to the accumulation of risks from individual scenarios that lead to a shortfall in safety performance at a high level: e.g., an excessively high probability of loss of crew, loss of mission, planetary contamination, etc. Without aggregated quantitative measures such as these, it is not reasonable to expect that safety has been optimized with respect to other technical and programmatic objectives. At the same time, it is fully recognized that not all sources of risk are amenable to precise quantitative analysis and that the use of qualitative approaches and bounding estimates may be appropriate for those risk sources. Second, the handbook stresses the necessity of developing confidence that the controls derived for the purpose of achieving system safety not only handle risks that have been identified and properly characterized but also provide a general, more holistic means for protecting against unidentified or uncharacterized risks. For example, while it is not possible to be assured that all credible causes of risk have been identified, there are defenses that can provide protection against broad categories of risks and thereby increase the chances that individual causes are contained. Third, the handbook strives at all times to treat uncertainties as an integral aspect of risk and as a part of making decisions. The term "uncertainty" here does not refer to an actuarial type of data analysis, but rather to a characterization of our state of knowledge regarding results from logical and physical models that approximate reality. Uncertainty analysis finds how the output parameters of the models are related to plausible variations in the input parameters and in the modeling assumptions. The evaluation of unrtainties represents a method of probabilistic thinking wherein the analyst and decision makers recognize possible outcomes other than the outcome perceived to be "most likely." Without this type of analysis, it is not possible to determine the worth of an analysis product as a basis for making decisions related to safety and mission success. In line with these considerations the handbook does not take a hazard-analysis-centric approach to system safety. Hazard analysis remains a useful tool to facilitate brainstorming but does not substitute for a more holistic approach geared to a comprehensive identification and understanding of individual risk issues and their contributions to aggregate safety risks. The handbook strives to emphasize the importance of identifying the most critical scenarios that contribute to the risk of not meeting the agreed-upon safety objectives and requirements using all appropriate tools (including but not limited to hazard analysis). Thereafter, emphasis shifts to identifying the risk drivers that cause these scenarios to be critical and ensuring that there are controls directed toward preventing or mitigating the risk drivers. To address these and other areas, the handbook advocates a proactive, analytic-deliberative, risk-informed approach to system safety, enabling the integration of system safety activities with systems engineering and risk management processes. It emphasizes how one can systematically provide the necessary evidence to substantiate the claim that a system is safe to within an acceptable risk tolerance, and that safety has been achieved in a cost-effective manner. The methodology discussed in this handbook is part of a systems engineering process and is intended to be integral to the system safety practices being conducted by the NASA safety and mission assurance and systems engineering organizations. The handbook posits that to conclude that a system is adequately safe, it is necessary to consider a set of safety claims that derive from the safety objectives of the organization. The safety claims are developed from a hierarchy of safety objectives and are therefore hierarchical themselves. Assurance that all the claims are true within acceptable risk tolerance limits implies that all of the safety objectives have been satisfied, and therefore that the system is safe. The acceptable risk tolerance limits are provided by the authority who must make the decision whether or not to proceed to the next step in the life cycle. These tolerances are therefore referred to as the decision maker's risk tolerances. In general, the safety claims address two fundamental facets of safety: 1) whether required safety thresholds or goals have been achieved, and 2) whether the safety risk is as low as possible within reasonable impacts on cost, schedule, and performance. The latter facet includes consideration of controls that are collective in nature (i.e., apply generically to broad categories of risks) and thereby provide protection against unidentified or uncharacterized risks.
Special Publication (SP)
Dezfuli, Homayoon (NASA Headquarters Washington, DC United States)
Benjamin, Allan (Information Systems Labs., Inc. United States)
Everett, Christopher (Information Systems Labs., Inc. United States)
Smith, Curtis (Idaho National Lab. Idaho Falls, ID, United States)
Stamatelatos, Michael (NASA Headquarters Washington, DC United States)
Youngblood, Robert (Idaho National Lab. Idaho Falls, ID, United States)