Towards Reliable Evaluation of Anomaly-Based Intrusion Detection PerformanceThis report describes the results of research into the effects of environment-induced noise on the evaluation process for anomaly detectors in the cyber security domain. This research was conducted during a 10-week summer internship program from the 19th of August, 2012 to the 23rd of August, 2012 at the Jet Propulsion Laboratory in Pasadena, California. The research performed lies within the larger context of the Los Angeles Department of Water and Power (LADWP) Smart Grid cyber security project, a Department of Energy (DoE) funded effort involving the Jet Propulsion Laboratory, California Institute of Technology and the University of Southern California/ Information Sciences Institute. The results of the present effort constitute an important contribution towards building more rigorous evaluation paradigms for anomaly-based intrusion detectors in complex cyber physical systems such as the Smart Grid. Anomaly detection is a key strategy for cyber intrusion detection and operates by identifying deviations from profiles of nominal behavior and are thus conceptually appealing for detecting "novel" attacks. Evaluating the performance of such a detector requires assessing: (a) how well it captures the model of nominal behavior, and (b) how well it detects attacks (deviations from normality). Current evaluation methods produce results that give insufficient insight into the operation of a detector, inevitably resulting in a significantly poor characterization of a detectors performance. In this work, we first describe a preliminary taxonomy of key evaluation constructs that are necessary for establishing rigor in the evaluation regime of an anomaly detector. We then focus on clarifying the impact of the operational environment on the manifestation of attacks in monitored data. We show how dynamic and evolving environments can introduce high variability into the data stream perturbing detector performance. Prior research has focused on understanding the impact of this variability in training data for anomaly detectors, but has ignored variability in the attack signal that will necessarily affect the evaluation results for such detectors. We posit that current evaluation strategies implicitly assume that attacks always manifest in a stable manner; we show that this assumption is wrong. We describe a simple experiment to demonstrate the effects of environmental noise on the manifestation of attacks in data and introduce the notion of attack manifestation stability. Finally, we argue that conclusions about detector performance will be unreliable and incomplete if the stability of attack manifestation is not accounted for in the evaluation strategy.