NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Creating Benchmark Data for Artificial Intelligence and Machine Learning Space Biology ResearchTo identify an appropriate AI/ML approach for a specific problem, the best practice is to measure
algorithm performance through the benchmarking process. A scientific benchmark consists of an AI-ready dataset and a reference implementation on a specific scientific question. The NASA Science Mission Directorate (SMD) has started the “Benchmark Initiative for AI/ML to create scientific benchmark datasets in three applications: 1) scientific benchmarking, which finds the best algorithm for a specific problem; 2) application benchmarking, which measures algorithm performance against a set of parameters; and 3) system benchmarking, which evaluates performance of hardware and software architecture. Currently, there are no standardized datasets available to benchmark AI/ML algorithms in the domain of space biology. In this work, we constructed two AI/ML-ready biological datasets from experiments in space-flown mice: cellular imaging and RNA-seq. First, radiation-exposed immune cells harbor DNA damage foci that can be fluorescently marked to visualize the amount of damage following exposure to ionizing radiation. However, such large datasets are difficult to analyze visually, due to imaging inconsistencies and human bias, and classical image processing approaches can fail on imaging artifacts. AI/ML are therefore exciting alternative, providing the speed of machines and the accuracy of humans. We have made this dataset available at https://registry.opendata.aws/bps_microscopy/. Second, high-throughput nucleic acid sequencing (DNA-seq, RNA-seq) has become widespread in biomedical research due to the growing availability and affordability of these assays. However, most sequencing datasets suffer from high dimensionality and low sample count. In this work, we used a generative adversarial network to synthesize a standardized, AI-ready, publicly available benchmark dataset for space biology RNA-seq data with sufficient space-flown and ground control mouse liver samples from NASA GeneLab. This dataset is available at https://registry.opendata.aws/bps_rnaseq/. These datasets are now fully open the Space Biology community to test their favorite AI/ML approaches.
Document ID
20230015992
Acquisition Source
Ames Research Center
Document Type
Poster
Authors
James Casaletto
(University of California, Santa Cruz Santa Cruz, California, United States)
Lauren Sanders
(Blue Marble Space Seattle, Washington, United States)
Sylvain V Costes
(Ames Research Center Mountain View, California, United States)
Date Acquired
November 3, 2023
Subject Category
Life Sciences (General)
Documentation and Information Science
Meeting Information
Meeting: American Society for Gravitational and Space Research (ASGSR 2023)
Location: Washington, DC
Country: US
Start Date: November 14, 2023
End Date: November 18, 2023
Sponsors: American Society for Gravitational and Space Research
Funding Number(s)
WBS: 296511.01.01.12.02.02
Distribution Limits
Public
Copyright
Public Use Permitted.
Technical Review
NASA Peer Committee
No Preview Available