commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Елена Картышева <el.kartysh...@yandex.ru>
Subject [statistics] Pull request for GLSMultipleLinearRegression
Date Wed, 22 May 2019 17:18:14 GMT
Hello.

I would like to propose a pull request implementing an option to use variance vector instead
of covariance matrix. It allows users to avoid unnecessary memory usage and excessive computation
in case of uncorrelated but heteroscedastic errors thus making it possible to work with huge
input matrices. Using variance vector in such cases allows to reduce time complexity from
O(N^2) to just O(N) (where N is a number of observations) and dramatically reduce memory usage.
For example, in my practice arose a need to train generalized linear model. Usage of Iteratively
reweighted least squares algorithm requires weighted regression with more than a million observations.
Current implementation would require approximately 12 terabytes of memory while patched version
needs only 8 megabytes. Since IRLS is iterative algorithm a million-times complexity reduction
is also pretty handy.

 
-- 
Sincerely yours, Elena Kartysheva.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message