NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Statistical Classification of Biosignature Information using Multiple Instrument Observations The accurate identification of biosignatures (indications of life) from data taken from remote or in situ planetary exploration is one of the most important challenges in astrobiology, the interdisciplinary field examining habitability and the potential for extraterrestrial life. This study employs machine learning algorithms to optimize the identification of biosignatures, with an emphasis on those which are agnostic to a specific biochemical basis. We exploit the wealth of terrestrial data available from biogenic and abiogenic systems to enhance efficient feature prioritization. Our dataset, pulled from public databases and laboratory recorded measurements, includes elemental abundance, isotopic fractionation, and VNIR/Raman spectra The data curation process included standardization for detection limits and ranges. Subsequent feature extraction yielded detailed inputs for machine learning, including combinations of elemental content, isotopic ratios, and parameters of spectral peaks and troughs. Feature significance was evaluated across diverse machine learning methodologies, such as k-nearest neighbors, logistic regression, Random Forest, support vector machines, and Gaussian Naïve Bayes, along with a combined voting classifier. We utilized Receiver Operating Characteristic Area Under the Curve (ROC AUC) across 2,000 50% test-train splits as a robust metric of model performance. Results revealed a promising ROC AUC of 0.853 for the combined voting classifier. Removing elemental abundance data notably reduced model accuracy (13% decrease in AUC), highlighting its critical role in biosignature detection. Several other individual data features exhibited significance within their respective data types, offering additional granularity. This research fortifies the relevance of machine learning to astrobiology, potentially enhancing life detection missions by allowing algorithmic prioritization of high-interest samples for further investigation. Future work will refine data standardization, expand the dataset to include more terrestrial systems, and incorporate convolutional neural networks for spectral feature extraction. The potential for public data sharing is also under exploration, reinforcing our commitment to collective scientific advancement.
Document ID
20230011796
Acquisition Source
Ames Research Center
Document Type
Conference Paper
Authors
Abdullah Shahid
(North Carolina State University Raleigh, United States)
Tao Sheng
(University of Pittsburgh Pittsburgh, United States)
Sunanda Sharma
(Jet Propulsion Laboratory La Cañada Flintridge, United States)
Diana Gentry
(Ames Research Center Mountain View, California, United States)
Date Acquired
August 8, 2023
Subject Category
Earth Resources and Remote Sensing
Meeting Information
Meeting: 23rd Meeting of the American Geophysical Union (AGU)
Location: San Francisco, CA
Country: US
Start Date: December 11, 2023
End Date: December 15, 2023
Sponsors: American Geophysical Union
Funding Number(s)
WBS: 822174.01.21
Distribution Limits
Public
Copyright
Portions of document may include copyright protected material.
Keywords
Statistical
Classification
Biosignature
Information
Multiple
No Preview Available