Dimensionality Reduction Through Classifier Ensembles

Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter

In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.

Document ID

20000102382

Acquisition Source

Ames Research Center

Document Type

Preprint (Draft being sent to journal)

Authors

Date Acquired

September 7, 2013

Publication Date

September 17, 1999

Subject Category

Distribution Limits

Public

Work of the US Gov. Public Use Permitted.

Available Downloads

Name

Type

20000102382.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records