NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Clustering with Missing Values: No Imputation RequiredClustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Document ID
20070019774
Acquisition Source
Jet Propulsion Laboratory
Document Type
Conference Paper
External Source(s)
Authors
Wagstaff, Kiri
(Jet Propulsion Lab., California Inst. of Tech. Pasadena, CA, United States)
Date Acquired
August 23, 2013
Publication Date
July 15, 2004
Subject Category
Numerical Analysis
Meeting Information
Meeting: International Federation of Classification Societies Annual Meeting
Location: Chicago, IL
Country: United States
Start Date: July 15, 2004
End Date: July 18, 2004
Funding Number(s)
CONTRACT_GRANT: NSF IIS-03-25329
Distribution Limits
Public
Copyright
Other
Keywords
constraints
data analysis
clustering
missing values

Available Downloads

There are no available downloads for this record.
No Preview Available