NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
ASDC’s Python-Based Metadata Extraction Pipeline for Suborbital CampaignsThe FAIRness of data products, especially findability and accessibility depend on rich metadata which, when extracted, can allow for proper curation. Over the past few years, the Atmospheric Science Data Center (ASDC) suborbital science support team has developed a metadata extraction pipeline to ensure the required metadata can be retrieved systematically, effectively, and efficiently to ensure the data can be used by a broad community. The development of a pipeline has presented many, but necessary, challenges to support archival and distribution of ASDC’s 30+ suborbital missions. Though sufficient metadata is provided by instrument scientists, the metadata may not be readily machine actionable due to different formats and templates. Further complicating metadata extraction, our team has found that the nature of metadata can be quite diverse given the difference in measurement types, instruments, and measurement platforms.

A metadata extraction pipeline has been developed to provide an efficient, plugin-in based, method for adding new parsers, a configuration system that lets non-developers customize how files are processed, and a system for identifying and logging metadata quality issues to ensure they are readily found and addressed. The metadata extraction pipeline identifies critical pieces of metadata that are needed to promote data FAIRness, including location, file revision, measurement start/end datetime and can be easily modified to extract further information (such as variables). Given the wide-ranging datasets, the pipeline has been modified to accommodate multiple file formats, including multiple versions of ICARTT (International Consortium for Atmospheric Research on Transport and Transformation), HDF (Hierarchical Data Format), netCDF (network Common Data Form), and multiple versions of the Ames File Format. The pipeline also supports building metadata for file formats that cannot have metadata easily extracted from them, such as PDF (Portable Document Format) and GIF (Graphics Interchange Format). The pipeline has allowed our team to maintain a consistent flow of data and metadata to archival and distribution services, ensuring the ASDC meets the needs of the suborbital science community. This presentation will highlight the ASDC’s suborbital metadata extraction pipeline, its development, how it’s been modified to support data FAIRness, and plans for maintaining the pipeline and adding new features.
Document ID
20240000973
Acquisition Source
Langley Research Center
Document Type
Presentation
Authors
Abraham Porter
(Adnet Systems (United States) Bethesda, Maryland, United States)
Nathan Jester
(Booz Allen Hamilton (United States) Tysons Corner, United States)
Megan Buzanowicz
(Adnet Systems (United States) Bethesda, Maryland, United States)
Sean Leavor
(Adnet Systems (United States) Bethesda, Maryland, United States)
Gabriel Mojica
(Adnet Systems (United States) Bethesda, Maryland, United States)
John Kusterer
(Langley Research Center Hampton, United States)
Gao Chen
(Langley Research Center Hampton, United States)
Date Acquired
January 22, 2024
Subject Category
Meteorology and Climatology
Meeting Information
Meeting: 104th American Meteorological Society Annual Meeting
Location: Baltimore, MD
Country: US
Start Date: January 28, 2024
End Date: February 1, 2024
Sponsors: American Meteorological Society
Funding Number(s)
WBS: 1038.C3.15.00118
Distribution Limits
Public
Copyright
Public Use Permitted.
Technical Review
NASA Peer Committee
No Preview Available