NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Experiments on Supervised Learning Algorithms for Text CategorizationModern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.
Document ID
20060051818
Document Type
Conference Paper
Authors
Namburu, Setu Madhavi (Connecticut Univ. Storrs, CT, United States)
Tu, Haiying (Connecticut Univ. Storrs, CT, United States)
Luo, Jianhui (Connecticut Univ. Storrs, CT, United States)
Pattipati, Krishna R. (Connecticut Univ. Storrs, CT, United States)
Date Acquired
August 23, 2013
Publication Date
January 1, 2005
Subject Category
Documentation and Information Science
Report/Patent Number
IEEEAC Paper 1260
Meeting Information
Aerospace 2005 IEEE Conference
Funding Number(s)
CONTRACT_GRANT: NAG2-1635
Distribution Limits
Public
Copyright
Other