Image

Kanishka Bhaduri

Member since: Sep 24, 2010, Mission Critical Technologies Inc

Distributed Monitoring of the R2 Statistic for Linear Regression

Shared by Kanishka Bhaduri, updated on Dec 26, 2010

Summary

resource_image
Author(s) :
Kanishka Bhaduri, Kamalika Das, C. Giannella
Abstract

The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes' data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo --- a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

show more info
Publication Name
Distributed Monitoring of the R2 Statistic for Linear Regression
Publication Location
SIAM Data Mining Conference (SDM'11). pp.
Year Published
2011

Files

LinearRegression.pdf
509.6 KB 172 downloads

Discussions

Add New Comment

Kanishka's Projects (4)

Need help?

Visit our help center