commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Refactoring multiple regression classes
Date Thu, 14 Jul 2011 03:31:12 GMT
On 7/13/11 7:14 PM, Greg Sterijevski wrote:
> Phil,
> "How exactly do interfaces make the hierarchy flatter in this case?
> I agree we should aim for as simple a structure as possible.  The
> question is, what is that structure?"
> They may or may not make the structure different. Any design we come up with
> today is likely to be outmoded in 6 months. (In war throw your battle plans
> out the window after the first five minutes.) What I propose is an interface
> which is the most minimal set of functionality (identifiable now) that
> comprise regression. Over time, as we define more and more implementations
> of regression we might see further functionality which is common across
> regressions. These methods will migrate to the interface. The interface will
> grow organically. More importantly any dependency which is not too picky can
> use the interface reference, instead of referencing the concrete class.
> Dependencies which care, will and should have intimate knowledge of the
> class. Most pieces of code which depend on regression will not. The
> interface will not preclude abstract classes.

Fortunately for users, maybe less fortunately for developers, we
can't really "evolve" our API rapidly and incrementally, unless that
evolution avoids backward-incompatible change.  The reason for this
is that we combine bug fixes and API changes in point releases and
users need to be able to upgrade to point releases without having to
make code changes.  We make incompatible changes in major releases
only.  The good news is that we are in the runup right now to a
major release of [math], so we have once-every-few-years opportunity
to make incompatible changes.  The maybe less wonderful news is that
what we design for 3.0 we will need to live with for a couple of
years, so we need to be careful not to lock ourselves in to design
constraints that will be hard to innovate within.  This is why we
favor abstract classes over interfaces.
> The way I see it, you would have a core interface:
> public interface RegressionIface{
> boolean hasIntercept();
> long getN();
> void addObservation(double[] x, double y);
> void addObservation(double[] xy);
>  RegressionResults regress()
>  RegressionResults regress(int[] vars)
> }
> You would then have a subinterface
> public interface UpdatingRegression{
>  void clear();
> void addObservations( double[][] x, double[] y);
> }

I thought about that model; but the "fixed model" versions may not
need to or want to support the "addAll" semantics - just setData.  I
was thinking that addObservations above would be included in the
base, since it could always be implemented serially.

> Why should code which is running a regression need to know more than this?
> If for example, the QR regression and the SVD based regression share common
> functionality for manipulating the data incore, then they can inherit from
> an abstract base class which implements RegressionIface.  The user in most
> cases will not care. He/she may care whether the data is incore or not, but
> thats about it.

Exactly, which is why I like your design at the top level.
> The real action, in my opinion, is in the RegressionResults class. Here you
> might need a bushy, thick tree. All regressions must generation an immutable
> RegressionResults. However, that is the minimum info that would be
> generated. We might, for example, have ConstrainedRegressionResults.
> public class ConstrainedRegressionResults.  extends RegressionResults{
>    private double[] lagrangian;
> }

Agree here again.  RegressionResults should include only the basic
stuff that every model will include and subclasses will extend it.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message