NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Reusing Data and Metadata to Create New Metadata Through Machine-Learning & Other Programmatic MethodsRecent improvements in natural language processing (NLP) enable metadata to be created programmatically from reused original metadata or even the dataset itself. Transfer-learning applied to NLP has greatly improved performance and reduced training data requirements. In this talk, we’ll compare machine-generated metadata to human-generated metadata and discuss characteristics of metadata and data archives that affect suitability for machine-learning reuse of metadata. Where as human-generated metadata is often populated once, populated from the perspective of data supplier, populated by many individuals with different words for the same thing, and limited in length, machine-generated metadata can be updated any number of times, generated from the perspective of any user, constrained to a standardized set of terms that can be evolved over time, and be any length required. Machine-learning generated metadata offers benefits but also additional needs in terms of version control, process transparency, human-computer interaction, and IT requirements. As a successful example, we’ll discuss how a dataset of abstracts and associated human-tagged keywords from a standardized list of several thousand keywords were used to create a machine-learning model that predicted keyword metadata for open-source code projects on code.nasa.gov. We’ll also discuss a less successful example from data.nasa.gov to show how data archive architecture and characteristics of initial metadata can be strong controls on how easy it is to leverage programmatic methods to reuse metadata to create additional metadata.
Document ID
20190033900
Acquisition Source
Headquarters
Document Type
Abstract
Authors
Gosses, Justin
(Science Applications International Corp. Houston, TX, United States)
Buonomo, Anthony R.
(NASA Headquarters Washington, DC, United States)
Thomas, Brian A.
(NASA Headquarters Washington, DC, United States)
Yates, Evan Taylor
(Science Applications International Corporation (SAIC) Mountain View, CA, United States)
Yuan, Rena W.
(United States Department of Agriculture (USDA-Headquarters) Washington, DC, United States)
Date Acquired
December 12, 2019
Publication Date
December 9, 2019
Subject Category
Computer Programming And Software
Report/Patent Number
IN23D-0898
HQ-E-DAA-TN75294
Report Number: IN23D-0898
Report Number: HQ-E-DAA-TN75294
Meeting Information
Meeting: American Geophysical Union (AGU) Fall Meeting 2019
Location: San Francisco, CA
Country: United States
Start Date: December 9, 2019
End Date: December 13, 2019
Sponsors: American Geophysical Union (AGU)
Funding Number(s)
CONTRACT_GRANT: NNX16MB01C
Distribution Limits
Public
Copyright
Public Use Permitted.
Technical Review
Single Expert
No Preview Available