NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Distributed Monitoring of the R(sup 2) Statistic for Linear RegressionThe problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Document ID
20110012183
Document Type
Conference Paper
Authors
Bhaduri, Kanishka (MCT, Inc. Moffett Field, CA, United States)
Das, Kamalika (SGT, Inc. Moffett Field, CA, United States)
Giannella, Chris R. (Mitre Corp. McLean, VA, United States)
Date Acquired
August 25, 2013
Publication Date
April 28, 2011
Subject Category
Statistics and Probability
Report/Patent Number
ARC-E-DAA-TN2898
Meeting Information
2011 SIAM International Conference on Data Mining(Mesa, AZ)
Funding Number(s)
CONTRACT_GRANT: NNA08CG83C
Distribution Limits
Public
Copyright
Public Use Permitted.

Available Downloads

NameType 20110012183.pdf STI