NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
A Dissimilarity Measure for Clustering High- and Infinite Dimensional Data that Satisfies the Triangle InequalityThe cosine or correlation measures of similarity used to cluster high dimensional data are interpreted as projections, and the orthogonal components are used to define a complementary dissimilarity measure to form a similarity-dissimilarity measure pair. Using a geometrical approach, a number of properties of this pair is established. This approach is also extended to general inner-product spaces of any dimension. These properties include the triangle inequality for the defined dissimilarity measure, error estimates for the triangle inequality and bounds on both measures that can be obtained with a few floating-point operations from previously computed values of the measures. The bounds and error estimates for the similarity and dissimilarity measures can be used to reduce the computational complexity of clustering algorithms and enhance their scalability, and the triangle inequality allows the design of clustering algorithms for high dimensional distributed data.
Document ID
20030004236
Acquisition Source
Langley Research Center
Document Type
Contractor Report (CR)
Authors
Socolovsky, Eduardo A.
(Hampton Univ. VA United States)
Bushnell, Dennis M.
Date Acquired
September 7, 2013
Publication Date
December 1, 2002
Subject Category
Numerical Analysis
Report/Patent Number
NASA/CR-2002-212136
ICASE-IR-43
NAS 1.26:212136
Report Number: NASA/CR-2002-212136
Report Number: ICASE-IR-43
Report Number: NAS 1.26:212136
Funding Number(s)
CONTRACT_GRANT: NAS1-97046
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
No Preview Available