NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Benchmarking Computational Tools for Calling SNPs and Indels in Complex Microbial Populations The NASA BioNutrients missions seek to understand the suitability of microorganisms for bioproduction during space flight. One topic of interest is the stability of microbial genomes during long-term ambient storage and subsequent rehydration and growth. To address these questions, samples from 8 species were flown to ISS for 5 years of desiccated storage at ambient temperature (Stasis Packs) and 2 species were packaged along with powdered media inside a bioreactor system to allow hydration and growth in microgravity (Production Packs). For both systems, Whole Genome Sequencing (WGS) of the DNA extracted from the returned samples and paired ground controls will be conducted to identify changes in genome stability due to time, storage conditions and growth in space. Across the technical replicates, ground controls, 10 timepoints, and multiple experimental conditions, ~300 samples have been selected for initial analysis with WGS sequencing to 100x coverage. A flexible and resource efficient mutation calling pipeline is needed to process this large dataset and allow for comparisons between species.

Many bioinformatics tools for calling Indels and Single Nucleotide Variants (SNVs) are designed for use with pure isolates, where true variations from the reference genome are expected to dominate the reads aligning to the location of mutation. In contrast, DNA from the Stasis Pack (SP) samples was collected directly after recovery from desiccated storage and the Production Pack (PP) samples were collected after fermentation. In this context, reads with mutations are expected to be less frequent than reads that align with the reference genome, as each sample will include multiple lines of cells. Thus, BioNutrients samples are expected to be similar to samples from cancer cell or “pooled” sequencing approaches. In preparation for the analysis of the BioNutrients samples, we have tested three mutation calling tools (GATK for Microbes, BreSeq and DiscoSNP) designed for complex samples.

A challenge of validating mutation identification pipelines is a lack of “Ground Truth” datasets, especially for complex samples. To compare these three tools, we sought to identify mutations in pre-existing WGS data collected from populations of Chlamydomonas reinhardtii that were exposed to UV mutagenesis and growth in LEO as part of the Space Algae-1 mission. Here we present a summary of these tools against the analysis originally conducted using the CRISP tool. Critical metrics are compared such as runtime, the number of SNPs, the number and size of Indels, and patterns of transversion and transitions identified by each tool are reported. By sharing these benchmarking results collected in support of the BioNutrients mission, we aim to guide others seeking to identify SNVs in similarly complex microbial samples.
Document ID
20240013728
Acquisition Source
Ames Research Center
Document Type
Poster
Authors
Philip Sweet
(Oak Ridge Associated Universities Oak Ridge, United States)
Natalie N Ball
(KBR (United States) Houston, Texas, United States)
Barbara Muller
(University of Florida Gainesville, United States)
Sandra Vu
(Wyle (United States) El Segundo, California, United States)
Lisa Anderson
(Wyle (United States) El Segundo, California, United States)
Sadie Downing
(KBR (United States) Houston, Texas, United States)
Amy Gresser
(Ames Research Center Mountain View, United States)
Aditya Hindupur
(Wyle (United States) El Segundo, California, United States)
John Hogan
(Ames Research Center Mountain View, United States)
Hiromi Kagawa
(SETI Institute Mountain View, California, United States)
Aphrodite Kostakis
(Universities Space Research Association Columbia, United States)
Matthew Paddock
(Wyle (United States) El Segundo, California, United States)
Kevin Sims
(Wyle (United States) El Segundo, California, United States)
Kevin Tyre
(Geosyntec Consultants (United States) Atlanta, Georgia, United States)
Fuzhong Zhang
(University of Florida Gainesville, United States)
Fang Bai
(University of Florida Gainesville, United States)
Frances Donovan
(Ames Research Center Mountain View, United States)
A Mark Settles
(Ames Research Center Mountain View, United States)
Date Acquired
October 29, 2024
Subject Category
Man/System Technology and Life Support
Meeting Information
Meeting: ASA Postdoctoral Program Virtual Symposium
Location: Virtual
Country: US
Start Date: November 19, 2024
End Date: November 21, 2024
Sponsors: Oak Ridge Associated Universities
Funding Number(s)
WBS: 858549.07.01.04.21
WBS: 596118.04.25.21.07
Distribution Limits
Public
Copyright
Portions of document may include copyright protected material.
Technical Review
NASA Technical Management
Keywords
Biology
Genomics
Synthetic biology
BioNutrients
Space Algae-1
SNP
Indel
No Preview Available