commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject [math] Improving numerics in OLSMultipleLinearRegression
Date Mon, 09 Jun 2008 02:17:00 GMT
While clear and elegant from a matrix algebra standpoint, the "nailve" 
implementation in OLSMultipleLinearRegression has bad numerical 
qualities.  It is well known that solving the normal equations directly 
does not give good numerics.  I just added some tests to actually verify 
parameter values, using the classic "Longly" dataset, for which NIST 
provides certified statistics.  This is a "hard" design matrix.  R was 
able to get to within 1E-8 of the certified parameter values.  
OLSMultipleLinearRegression can only get 1E-1. 

We have talked in the past about providing an implementation based on QR 
decomposition.   Anyone up for  using the QR decomposition that we now 
have to do this?  I really think we need to do it (or something else to 
improve numerics) before releasing this class.  I will get to it 
eventually, but am a little pegged at the moment.  I will review and 
apply patches if someone is willing to do the implementation.  I can 
also explain here or offline how the R tests and NIST datasets work, as 
these are useful in validating code.

Another thing that we should think about before releasing any of this 
stuff is the completeness of the API.  Many standard regression 
statistics are missing.  If we are going to stick with the Interface / 
Implementation setup, we need to get the right stuff into the 
interface.  It is also awkward to have to insert "1"'s in the design 
matrix to get an intercept term computed.  This is convenient for 
implementation, but awkward for users.  A more natural setup (IMHO) 
would be to expose a "noIntercept" or "hasIntercept" property for the model.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message