NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Analysis Ready Data in Analytics Optimized Data Stores for Analysis of Big Earth Data in the CloudCloud computing offers the possibility of making the analysis of Big Data approachable for a wider community due to affordable access to computing power, an ecosystem of usable tools for parallel processing, and migration of many large datasets to archives in the cloud, allowing data-proximal computing. Generally, data analysis acceleration in the cloud comes from running multiple nodes in a split-combine-apply strategy. Data systems such as the Earth Observing System Data and Information System are in a position to "pre-split" the data by storing them in a data store that is optimized for data parallel computing, i.e., an Analytics-Optimized Data Store (AODS). A variety of approaches to AODS are possible, from highly scalable databases to scalable filesystems to data formats optimized for cloud access (e.g., zarr and cloud-optimized datasets), with the optimal choice dependent on both the types of analysis and the geospatial structure of the data. A key question is how much preprocessing of the data to do, both before splitting and as the first part of the apply step. Again, the geospatial structure of the data and the analysis type influence the decision, with the added complexity of the user type. Trans-disciplinary users who are not well-versed in the nuances of quality-filtering and georeferencing of remote sensing orbit/swath/scene data tend to ask for more highly processed data, relying on the data provider to make sensible decisions on preprocessing parameters. (This accounts for the popularity of "Level 3" gridded data, despite the lower spatial resolution it provides.) In this case, data can be preprocessed before the split, resulting in higher performance in the rest of the "apply" step, which can be transformative for use cases such as interactive data exploration at scale. Discipline researchers who are experienced with remote sensing data often prefer more flexibility in customizing the preprocessing data into Analysis Ready Data, resulting in more need for on-the-fly preprocessing.
Document ID
20190033486
Acquisition Source
Goddard Space Flight Center
Document Type
Presentation
Authors
Lynnes, Christopher
(NASA Goddard Space Flight Center Greenbelt, MD, United States)
Hua, Hook
(Jet Propulsion Laboratory (JPL), California Institute of Technology (CalTech) Pasadena, CA, United States)
Date Acquired
December 12, 2019
Publication Date
December 9, 2019
Subject Category
Computer Programming And Software
Report/Patent Number
GSFC-E-DAA-TN75582
Report Number: GSFC-E-DAA-TN75582
Meeting Information
Meeting: American Geophysical Union Fall Meeting
Location: San Francisco, CA
Country: United States
Start Date: December 9, 2019
End Date: December 13, 2019
Sponsors: American Geophysical Union (AGU)
Distribution Limits
Public
Copyright
Public Use Permitted.
Technical Review
Single Expert
No Preview Available