NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Forming Aggregations using Virtual Sharding: Lessons Learned from Simple Scalable Storage (S3)Data aggregation is the ability to combine separate datasets to form a single new logical dataset provides users with a powerful abstraction. The advantage of an aggregate dataset is that the users are freed from having to understand, and incorporate into their workflow, knowledge about the (ad hoc) organization of the constituent datasets. However, aggregating large numbers of files can be computationally complex with data server systems performing many repetitive operations. As part of the authors work on subsetting data stored on Amazon Web Service (AWS) Simple Storage Service (S3), we developed technology to read portions of otherwise monolithic data files. This enables the formation of virtual shards for user in subsetting data stored in HDF5 (hierarchical data format, version 5) files. This same tool can be used to form aggregations that combine data stored in many HDF5 files when those files are stored on S3. The nature of the virtual sharding and the algorithm that exploits it for subsetting is such that it can also be used for aggregation with the need for many of the repetitive operations required by the per file aggregation techniques. We will present timing information that demonstrates the flexibility of this approach. However, the lessons learned is that while this is a useful result in and of itself, these very same techniques can be applied in other contexts where data are stored in services and on media other than S3. For example, this same technique can be applied to data stored on spinning disk. Pushing the envelope for S3 forced a reexamination of our data access techniques which lead to unexpected positive benefits.
Document ID
20190033897
Acquisition Source
Goddard Space Flight Center
Document Type
Poster
Authors
Gallagher, James
(OPeNDAP, Inc. Narragansett, RI, United States)
Potter, Nathan
(OPeNDAP, Inc. Narragansett, RI, United States)
Neumiller, Kodi
(OPeNDAP, Inc. Narragansett, RI, United States)
Date Acquired
December 12, 2019
Publication Date
December 9, 2019
Subject Category
Earth Resources And Remote Sensing
Report/Patent Number
GSFC-E-DAA-TN76065
Meeting Information
Meeting: AGU Fall Meeting 2019
Location: San Fransico, CA
Country: United States
Start Date: December 9, 2019
End Date: December 13, 2019
Sponsors: American Geophysical Union (AGU)
Funding Number(s)
CONTRACT_GRANT: NNG15HZ39C
Distribution Limits
Public
Copyright
Public Use Permitted.
No Preview Available