NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Method and system for data clustering for very large databasesMulti-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points.
Document ID
20080004550
Acquisition Source
Headquarters
Document Type
Other - Patent
Authors
Zhang, Tian
Ramakrishnan, Raghu
Livny, Miron
Date Acquired
August 24, 2013
Publication Date
November 3, 1998
Subject Category
Computer Operations And Hardware
Report/Patent Number
Patent Number: US-PATENT-5,832,182
Patent Application Number: US-PATENT-APPL-SN-690876
Funding Number(s)
CONTRACT_GRANT: NAGW 3921
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
Patent
US-PATENT-5,832,182
Patent Application
US-PATENT-APPL-SN-690876
No Preview Available