Optimizing a Small RNAseq Analysis Pipeline for NASA GeneLab Using Open-Source Tools and LibrariesSmall RNA sequencing (small RNAseq) is a powerful tool for studying the regulation of gene expression in various organisms. Small RNAseq has been leveraged in space biology research to study how expression of small RNAs, e.g. micro RNAs (miRNAs), small interfering RNAs (siRNAs), and piwi-interacting RNAs (piRNAs), change upon exposure to the space environment. NASA GeneLab currently hosts small RNAseq raw data derived from space-relevant experiments on the Open Science Data Repository (OSDR). To maximize the accessibility of these data to the scientific community, in addition to hosting raw data, which is only interpretable by bioinformaticians, GeneLab plans to process all small RNAseq datasets and make those processed data available to the scientific community via the OSDR. In this study, we present the development of the GeneLab standardized pipeline for processing small RNAseq datasets. Using human, plant, and synthetic small RNAseq datasets, we interrogate various open-source software and publicly available databases to evaluate their accuracy and reproducibility in each step of the pipeline. For quality control and adapter detection and trimming, we evaluated TrimGalore!, FASTX, SeqKit, and DNApi methods to optimize alignment to reference genomes. We compared BWA, Bowtie, and Bowtie2 to determine the optimal alignment tool. For each alignment tool we also assessed various reference databases, including Ensembl reference genomes and different types of small RNA reference databases, including genome, hairpin, and miRNA references from the miRbase and MirGeneDB databases. To quantify the aligned data, we compared SAMtools, HTSeq, and RSEM for counting alignment events from each alignment tool used. Finally, we evaluated various tools, including DESeq2 and EdgeR, for data normalization and subsequent differential expression analysis. We will present the results from our comparative analyses for each pipeline step and propose a consensus pipeline for processing small RNAseq data derived from various organisms exposed to the space environment.
Document ID
20230015795
Acquisition Source
Ames Research Center
Document Type
Poster
Authors
Richard Barker (Blue Marble Space Seattle, Washington, United States)
Amanda Saravia-Butler (Wyle (United States) El Segundo, California, United States)
Alexis Torres (Blue Marble Space Seattle, Washington, United States)
Mike Lee (Wyle (United States) El Segundo, California, United States)
Lauren M Sanders (Blue Marble Space Seattle, Washington, United States)
Samrawit Gebre (Wyle (United States) El Segundo, California, United States)
Sylvain V Costes (Ames Research Center Mountain View, California, United States)
Date Acquired
November 1, 2023
Subject Category
Computer Programming and SoftwareLife Sciences (General)
Meeting Information
Meeting: Annual Meeting of the American Society for Gravitational and Space Research, 2023
Location: Washington, DC
Country: US
Start Date: November 14, 2023
End Date: November 18, 2023
Sponsors: American Society for Gravitational and Space Research