NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Clustering with Missing Values: No Imputation RequiredClustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Document ID
20070019774
Document Type
Conference Paper
External Source(s)
Authors
Wagstaff, Kiri (Jet Propulsion Lab., California Inst. of Tech. Pasadena, CA, United States)
Date Acquired
August 23, 2013
Publication Date
July 15, 2004
Subject Category
Numerical Analysis
Meeting Information
International Federation of Classification Societies Annual Meeting(Chicago, IL)
Funding Number(s)
CONTRACT_GRANT: NSF IIS-03-25329
Distribution Limits
Public
Copyright
Other
Keywords
constraints
data analysis
clustering
missing values