Dimensionality Reduction Through Classifier EnsemblesIn data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Document ID
Acquisition Source
Ames Research Center
Document Type
Preprint (Draft being sent to journal)
Oza, Nikunj C.
(California Univ. Berkeley, CA United States)
Tumer, Kagan
(NASA Ames Research Center Moffett Field, CA United States)
Norwig, Peter
Date Acquired
September 7, 2013
Publication Date
September 17, 1999
Subject Category
Documentation And Information Science
Distribution Limits
Work of the US Gov. Public Use Permitted.
