NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
CyberGAN: Generating High-fidelity Cybersecurity Data With Generative Adversarial NetworksMachine learning for cyber defense offers the promise of detecting adversarial activity against the ground data systems managing critical space assets. A fundamental challenge facing machine learning research in cybersecurity is the lack of high-fidelity, shareable datasets for robust evaluation and testing of machine learning-based solutions. High-fidelity, real-world datasets are necessary for reliable benchmarking of nominal system behavior and malicious activity. Unfortunately, such realistic datasets of both nominal and adversarial activity are rarely shared publicly by data owners due to security and privacy concerns. Besides, the available adversarial data is sparse, which makes training models on malicious activity much harder. This situation has impeded and continues to impede the research and successful adoption of machine learning methods for cyber defense. Researchers have dealt with this problem by generating data within a low-fidelity lab environment, using classified and thus unshareable datasets, or downloading low-fidelity public datasets made available by others. We propose an innovative solution to the problem by employing machine learning methods to generate high-fidelity data. Specifically, we propose the use of Generative Adversarial Networks (GANs) to generate high-fidelity data for cybersecurity purposes. GANs have found successful image processing and natural language applications, but have not yet been investigated for cyber data generation. Our proposed approach first involves training the `discriminator' network of the GAN with a sample of real-world data consisting of malicious and nominal samples. We then use the `generator' network to generate new high-fidelity data samples consisting of an appropriate mix of malicious and nominal activity. We demonstrate applications of our architecture by generating high-fidelity cybersecurity data containing both malicious and nominal samples. We thoroughly evaluate the fidelity of our generated data using heuristics and evaluate its usefulness for machine learning applications using three different datasets. Overall, our approach results in high-fidelity, shareable datasets.
Document ID
20220001546
Acquisition Source
Jet Propulsion Laboratory
Document Type
Preprint (Draft being sent to journal)
External Source(s)
Authors
Zhang, Yuening
Viswanathan, Arun A
Le, Joie
Gonik, Julia
Date Acquired
November 16, 2020
Publication Date
November 16, 2020
Publication Information
Publisher: Pasadena, CA: Jet Propulsion Laboratory, National Aeronautics and Space Administration, 2020
Distribution Limits
Public
Copyright
Other
Technical Review

Available Downloads

There are no available downloads for this record.
No Preview Available