NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Natural Language Processing for Extracting Rich Disease Data Aligned To Satellite Meteorological Data Global climate change is redefining our understanding of how diseases spread. In Sri Lanka, vector-borne diseases such as dengue fever, encephalitis, and leptospirosis historically surged during the monsoon seasons when temperatures were high enough for mosquito eggs to hatch. Unfortunately, due to rising temperatures and more erratic rainfall patterns, mosquito eggs can now hatch year-round and are increasingly unpredictable, leading to an alarmingly increasing number of hospitalizations and deaths. More data is needed to adapt our response to these diseases in an increasingly warmer world. In the contemporary landscape, a wealth of disease information is available, yet accessibility remains limited due to unstructured data formats such as PDFs. Therefore, converting unstructured disease reports into structured formats is necessary for effectively leveraging data. This paper introduces a comprehensive framework for collecting unstructured disease reports and transforming them into analyzable formats. By creating separate models tailored to each data format, we can ensure accuracy compared to general models. These straightforward models enhance accessibility and empower other researchers to use our tools. The returned structured data can then be harnessed for analysis, statistical purposes, and informing evidence-based public health interventions, thus facilitating more informed decision-making in healthcare. We deploy this framework to produce geospatial data for Sri Lanka and Brazil for many different conditions and align these data with satellite environmental data, providing for the first time a structured, aligned powerful dataset for disease modeling.
Document ID
20240015043
Acquisition Source
Goddard Space Flight Center
Document Type
Conference Paper
Authors
Mahi Pasarkar
(Rensselaer Polytechnic Institute Troy, New York, United States)
Junseob Kim
(Rensselaer Polytechnic Institute Troy, New York, United States)
Eoin O’Gara
(Rensselaer Polytechnic Institute Troy, New York, United States)
Alan Zhang
(Rensselaer Polytechnic Institute Troy, New York, United States)
Malik Magdon-Ismail
(Rensselaer Polytechnic Institute Troy, New York, United States)
Thilanka Munasinghe
(Rensselaer Polytechnic Institute Troy, New York, United States)
Jiaqi Weng
(Rensselaer Polytechnic Institute Troy, New York, United States)
David Qiu
(Rensselaer Polytechnic Institute Troy, New York, United States)
Ethan Cruz
(Rensselaer Polytechnic Institute Troy, New York, United States)
Jennifer C Wei
(Goddard Space Flight Center Greenbelt, United States)
Ashan Pathirana
(Sri Lanka Ministry of Health )
Date Acquired
November 25, 2024
Subject Category
Computer Programming and Software
Meeting Information
Meeting: 2024 IEEE International Conference on Big Data
Location: Washington, DC
Country: US
Start Date: December 15, 2024
End Date: December 18, 2024
Sponsors: Institute of Electrical and Electronics Engineers
Funding Number(s)
WBS: 656052.04.01.08.04
Distribution Limits
Public
Copyright
Portions of document may include copyright protected material.
Technical Review
External Peer Committee
Keywords
Open Source Open Science
Parsing PDF
Satellite Data
Disease Data
Natural Language Processing
No Preview Available