NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Enriching the Twitter Stream Increasing Data Mining Yield and Quality Using Machine LearningSocial media data streams are important sources of real-time and historical global information for science applications. At the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC), we are exploring the Twitter data stream for its potential in augmenting the validation program of NASA Earth science missions, specifically the Global Precipitation Measurement (GPM) mission. We have implemented a tweet processing infrastructure that outputs classified precipitation tweets. Inputs are "passive" tweets, along with a smaller number of tweets from "active" participants, i.e., those knowingly contributing to our effort. The "active" tweets, presumably of higher quality, enrich the Twitter stream. "Active" sources include data scraped from other social media (e.g., public Facebook posts) and data from existing crowdsourcing programs (e.g., mPING reports). In addition, there is likely relevant precipitation information in images and documents that are the end points of links often included in tweets. Information derived from these "active" sources could then be tweeted into the Twitter stream, thus enriching its quality. The objective of our current work is to mine these tweet­ linked images and documents, using neural networks, to increase the information content and quality related to precipitation. For images, we classified them as either precipitation-related or not. For training and validation, we used images obtained via the Google custom search API. We created two models: (1) by training a simple Convolutional Neural Network and (2) by using transfer learning principles to adapt a pre-trained object recognition model. For documents, both those linked to tweets and the tweet contents, we trained Hierarchical Attention Networks to determine precipitation occurrence, type, and intensity. For training and validation, we used a keyword-filtered tweet data set labelled with ground truth data from Dark Sky (an API to retrieve weather-related labels) and the National Severe Storms Laboratory's Multi­ Radar/Multi-Sensor (MRMS) system. Our results demonstrated the efficacy of our machine learning approaches for enriching the Twitter stream, to derive information potentially useful for validation of earth science satellite data.
Document ID
20180008558
Document Type
Presentation
Authors
Albayrak, Arif (Adnet Systems, Inc. Greenbelt, MD, United States)
Teng, William (Adnet Systems, Inc. Greenbelt, MD, United States)
Corcoran, John (Cornell Univ. Ithaca, NY, United States)
Wang, Sky C. (Michigan Univ. Ann Arbor, MI, United States)
Maksumov, Daniel (Queens Coll. Flushing, NY, United States)
Loeser, Carlee (Adnet Systems, Inc. Greenbelt, MD, United States)
Pham, Long (NASA Goddard Space Flight Center Greenbelt, MD, United States)
Date Acquired
December 18, 2018
Publication Date
December 10, 2018
Subject Category
Earth Resources and Remote Sensing
Report/Patent Number
GSFC-E-DAA-TN63898
NH43B-2988
Meeting Information
American Geophysical Union (AGU) Fall Meeting(Washington, DC)
Funding Number(s)
CONTRACT_GRANT: 80GSFC17C0003
Distribution Limits
Public
Copyright
Use by or on behalf of the US Gov. Permitted.

Available Downloads

NameType 20180008558.pdf STI