NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Using Docker Containers to Extend Reproducibility Architecture for the NASA Earth Exchange (NEX)NASA Earth Exchange (NEX) is a data, supercomputing and knowledge collaboratory that houses NASA satellite, climate and ancillary data where a focused community can come together to address large-scale challenges in Earth sciences. As NEX has been growing into a petabyte-size platform for analysis, experiments and data production, it has been increasingly important to enable users to easily retrace their steps, identify what datasets were produced by which process chains, and give them ability to readily reproduce their results. This can be a tedious and difficult task even for a small project, but is almost impossible on large processing pipelines. We have developed an initial reproducibility and knowledge capture solution for the NEX, however, if users want to move the code to another system, whether it is their home institution cluster, laptop or the cloud, they have to find, build and install all the required dependencies that would run their code. This can be a very tedious and tricky process and is a big impediment to moving code to data and reproducibility outside the original system. The NEX team has tried to assist users who wanted to move their code into OpenNEX on Amazon cloud by creating custom virtual machines with all the software and dependencies installed, but this, while solving some of the issues, creates a new bottleneck that requires the NEX team to be involved with any new request, updates to virtual machines and general maintenance support. In this presentation, we will describe a solution that integrates NEX and Docker to bridge the gap in code-to-data migration. The core of the solution is saemi-automatic conversion of science codes, tools and services that are already tracked and described in the NEX provenance system, to Docker - an open-source Linux container software. Docker is available on most computer platforms, easy to install and capable of seamlessly creating and/or executing any application packaged in the appropriate format. We believe this is an important step towards seamless process deployment in heterogeneous environments that will enhance community access to NASA data and tools in a scalable way, promote software reuse, and improve reproducibility of scientific results.
Document ID
20170001275
Document Type
Conference Paper
Authors
Votava, Petr
(California State Univ. at Monterey Bay Seaside, CA, United States)
Michaelis, Andrew
(California State Univ. at Monterey Bay Seaside, CA, United States)
Spaulding, Ryan
(ADNET Systems, Inc. Reston, VA, United States)
Becker, Jeffrey C.
(Computer Sciences Corp. Moffett Field, CA, United States)
Date Acquired
February 3, 2017
Publication Date
December 12, 2016
Subject Category
Earth Resources And Remote Sensing
Computer Systems
Report/Patent Number
ARC-E-DAA-TN38171
Meeting Information
Meeting: 2016 AGU Fall Meeting
Location: San Francisco, CA
Country: United States
Start Date: December 12, 2016
End Date: December 16, 2016
Sponsors: American Geophysical Union
Funding Number(s)
CONTRACT_GRANT: NNA07CA29C
CONTRACT_GRANT: NNX12AD05A
Distribution Limits
Public
Copyright
Public Use Permitted.
Keywords
earth exchange
data
supercomputing
No Preview Available