NTRS - NASA Technical Reports Server

As of October 27, 2023, NASA STI Services will no longer have an embargo for accepted manuscripts. For more information visit NTRS News.

Back to Results
Understanding Machine Learning in Earth Science: A Natural Language Processing ApproachMachine learning (ML) is being increasingly utilized in Earth science research. Benefits of ML include efficiency, reduction of human error, and ability to extract hidden patterns within data. However, the mutual lack of each other’s domain knowledge by ML and Earth science stands as a barrier to timely and effective implementation. Earth science, in particular, faces challenges in generating sample data, compared to those of traditional ML problems such as face recognition or stock predictions, where data is abundant and not lacking in ground truth, which is necessary for labeling. Earth science data are more varying in formats, such as HDF5 and image resolutions, and are not standardized across instruments, even within a given Earth science discipline. Previous studies have been done to outline the specific challenges that Earth science faces with ML, while others have focused on using existing publications to mine information efficiently. Other resources such as Scikit-Learn have developed decision trees for choosing appropriate machine learning algorithms, but application within Earth science subjects becomes much more complex. For the current study, we propose a methodology and tool that aids in implementation of ML in Earth science using natural language processing (NLP). Our work comprises three main parts: (1) analyzing existing publications related to ML and Earth science, using natural language processing: (2) extracting from the publications information on ML models subjects in Earth Science: and (3) visualizing the extracted relationships as a network graph. The resulting network graph should aid the Earth science communities in applying optimal ML algorithms and guiding data preparation through visualization of similar studies. The network graph and analysis of document similarity will be the basis of our next step, which is to develop a decision tree for selecting optimal machine learning methodologies for specified Earth science applications.
Document ID
Document Type
Zheng, Laura
(Maryland Univ. College Park, MD, United States)
Albayrak, Arif
(Adnet Systems, Inc. Greenbelt, MD, United States)
Teng, William ORCID
(Adnet Systems, Inc. Greenbelt, MD, United States)
Khayat, Mohammad
(Adnet Systems, Inc. Greenbelt, MD, United States)
Pham, Long
(NASA Goddard Space Flight Center Greenbelt, MD, United States)
Date Acquired
January 16, 2020
Publication Date
December 9, 2019
Subject Category
Cybernetics, Artificial Intelligence And Robotics
Report/Patent Number
Meeting Information
Meeting: American Geophysical Union (AGU) Fall Meeting 2019
Location: San Francisco, CA
Country: United States
Start Date: December 9, 2019
End Date: December 13, 2019
Sponsors: American Geophysical Union (AGU)
Funding Number(s)
Distribution Limits
Public Use Permitted.
Technical Review
NASA Technical Management

Available Downloads

There are no available downloads for this record.
No Preview Available