NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
A petabyte size electronic library using the N-Gram memory engineA model library containing petabytes of data is proposed by Triada, Ltd., Ann Arbor, Michigan. The library uses the newly patented N-Gram Memory Engine (Neurex), for storage, compression, and retrieval. Neurex splits data into two parts: a hierarchical network of associative memories that store 'information' from data and a permutation operator that preserves sequence. Neurex is expected to offer four advantages in mass storage systems. Neurex representations are dense, fully reversible, hence less expensive to store. Neurex becomes exponentially more stable with increasing data flow; thus its contents and the inverting algorithm may be mass produced for low cost distribution. Only a small permutation operator would be recalled from the library to recover data. Neurex may be enhanced to recall patterns using a partial pattern. Neurex nodes are measures of their pattern. Researchers might use nodes in statistical models to avoid costly sorting and counting procedures. Neurex subsumes a theory of learning and memory that the author believes extends information theory. Its first axiom is a symmetry principle: learning creates memory and memory evidences learning. The theory treats an information store that evolves from a null state to stationarity. A Neurex extracts information data without a priori knowledge; i.e., unlike neural networks, neither feedback nor training is required. The model consists of an energetically conservative field of uniformly distributed events with variable spatial and temporal scale, and an observer walking randomly through this field. A bank of band limited transducers (an 'eye'), each transducer in a bank being tuned to a sub-band, outputs signals upon registering events. Output signals are 'observed' by another transducer bank (a mid-brain), except the band limit of the second bank is narrower than the band limit of the first bank. The banks are arrayed as n 'levels' or 'time domains, td.' The banks are the hierarchical network (a cortex) and transducers are (associative) memories. A model Neurex was built and studied. Data were 50 MB to 10 GB samples of text, data base, and images: black/white, grey scale, and high resolution in several spectral bands. Memories at td, S(m(sub td)), were plotted against outputs of memories at td-1. S(m(sub td)) was Boltzman distributed, and memory frequencies exhibited self-organized criticality (SOC); i.e., 'l/f(sup beta)' after long exposures to data. Whereas output signals from level n may be encoded with B(sub output) = O(-log(2)f(sup beta)) bits, and input data encoded with B(sub input) = O((S(td)/S(td-1))(sup n)), B(sup output)/B(sub input) is much less than 1 always, the Neurex determines a canonical code for data and it is a lossless data compressor. Further tests are underway to confirm these results with more data types and larger samples.
Document ID
19940029296
Acquisition Source
Legacy CDMS
Document Type
Conference Paper
Authors
Bugajski, Joseph M.
(Triada Ltd. Ann Arbor, MI, United States)
Date Acquired
September 6, 2013
Publication Date
April 1, 1993
Publication Information
Publication: NASA. Goddard Space Flight Center, The Third NASA Goddard Conference on Mass Storage Systems and Technologies
Subject Category
Documentation And Information Science
Accession Number
94N33802
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
No Preview Available