NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Due to the lapse in federal government funding, NASA is not updating this website. We sincerely regret this inconvenience.

Back to Results
Transcriptomics-based Machine Learning Analysis Predicts Space-Exposed Murine LiversLimited sample sizes, high data dimensionality, and sensitivity to technical and biological variability of next generation sequencing (NGS), typically limits machine learning (ML) approaches in spaceflight studies that include radiation effects. However, pooling smaller studies while addressing intra- and inter-study variabilities allows for ML predictive modeling. Here, integration methods were applied to whole transcriptome shotgun sequencing (RNA-seq) data from six mouse liver GeneLab datasets (GLDS) (n ranging from 6 to 39 samples) from with a total of 81 spaceflight and ground-control samples to determine top features (i.e. genes) relevant to spaceflight including the effect of radiation exposure.

RNASeq counts were normalized for each study, then merged and scaled across all datasets. Data dimensionality was reduced using a minimum redundancy maximum relevance (MRMR) methodology. Redundancy and relevance were computed using the Pearson correlation and F-statistic, respectively. The top 100 MRMR features were used to predict spaceflight vs. ground-control samples using Random Forest (RF), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA) classifiers with 5-fold cross validation (CV). Principal component analysis (PCA) on the complete feature set versus the MRMR features shows separation between spaceflight samples and ground controls (Figure 1A). The ML-based gene sets were compared against differential gene expression results obtained with DESeq2 from individual GLDS.

Using all features or randomly sampled subsets at matching set sizes with MRMR, a maximum classifier accuracy of 69% was shown on the test set over 5 folds. For all classifiers, CV training using at least the top 30 MRMR genes show minimum 89% accuracy and 0.95 AUC value on the test set over 5 folds (Figure 1B). Baseline set analysis on differentially expressed genes (DEGs) identified using padj ≤ 0.05 show 295 DEGs that overlap at least two studies and 13 DEGs that overlap three studies (Figure 1C). Set analysis between the top 100 MRMR features and the DEGs showed 47 genes that overlap at least one study and 24 genes that overlap two studies. Over-representation analysis showed overlapping biological processes related to fatty acid and lipid metabolism which may indicate these processes in the response to spaceflight stressors.

MRMR feature selection for the selected ML methods improve performance relative to a classifier built on all features or randomly sampled subsets. Permutation feature importance within the decorrelated MRMR features showed concordance in feature ranking between ML methods. A challenge of applying ML methods across heterogeneous NGS data is accounting for signal:noise. Here, signal validation across studies was shown by intersecting sets between top MRMR genes and DEGs from DESeq2 analysis. Non-intersecting sets introduce opportunity to explore genes relevant to differentiating space flight exposed groups and implementing ML methods across existing NGS datasets may overcome sample size limitations.
Document ID
20230001594
Acquisition Source
Johnson Space Center
Document Type
Presentation
Authors
Hari Ilangovan
(Science Applications International Corporation (United States) McLean, Virginia, United States)
Prachi Kothiyal
(Wyle (United States) El Segundo, California, United States)
Katherine Hoadley
(Wyle (United States) El Segundo, California, United States)
Robin Elgart
(Wyle (United States) El Segundo, California, United States)
Newton Campbell
(Science Applications International Corporation (United States) McLean, Virginia, United States)
Greg Eley
(Wyle (United States) El Segundo, California, United States)
Parastou Eslami
(Wyle (United States) El Segundo, California, United States)
Date Acquired
February 1, 2023
Subject Category
Space Radiation
Life Sciences (General)
Meeting Information
Meeting: Human Research Program Investigators' Workshop (HRP IWS)
Location: Houston, TX
Country: US
Start Date: February 7, 2023
End Date: February 9, 2023
Sponsors: National Aeronautics and Space Administration
Funding Number(s)
WBS: 651549.01.04.10
CONTRACT_GRANT: NNX16MB01C
CONTRACT_GRANT: NNJ15HK11B
Distribution Limits
Public
Copyright
Public Use Permitted.
Technical Review
Single Expert
Keywords
Machine Learning
No Preview Available