Hello All,
Sorry for being a bit slow on the uptake... I am still in the wilds of
numerical imprecision with the longley data. I am getting close to figuring
out where the error is being accumulated.
I agree that interfaces impose rigidity in the design. However, there are
broad similarities in linear regression. Whether we are doing OLS, panel
regression, ridge regression, robust regression, etc., the model is linear:
Y = XB + e. The estimation technique may be complicated (or nonlinear) ,
but that is an implementation detail. I like interfaces because it forces a
discipline on the code. It forces you to specify a minimum contract which
oftentimes makes larger problems more tractable. The end user should be able
to swap out one technique for another with minimal recoding. It also makes
it easy to write RMI stubs and make distributed calls.
I like abstract classes in general, but we might end up with an abstract
class with no concrete methods, or even worse. We might have a very very
deep inheritance tree ( AbstractRegression>>
AbstractUpdatingLinearRegression>>
PanelRegression>>
OneWayFixedEffects. ).
That being said, Phil and Ted, you guys are definitely the experts on the
design. I thought I would add my opinion to the mix.
On the features front, as we deliberate over the design issues its important
that we have an eye to what is missing. Here are some features which I
believe should be in the regression package:
1. A SVD based OLS regression. (because sometimes messing with eigenvalues
is a must)
2. A functionality to impose arbitrary linear equality restrictions:
Perhaps a regress method with the following signature,
public RegressionResult regress( RealMatrix coeff, RealVector const);
3. Related to (2) linear hypothesis testing
4. Related to (2) estimates of the LaGrangian and its variance covariance
matrix
5. Robust variance covariance estimators
6. Perhaps panel regression.
This one is a bit larger than regression. We would need to track the
other dimensions of an observation (if we are studying income for a group
of people, we might track the individual, the year, race and so forth as
other dimensions which impose level shifts in the hyperplane). The
regression might need a pull mechanism to make two passes through the data.
First pass builds things like means or augments the design matrix with dummy
variables. The second pass would actually run the regression on the
transformed data.
7. Some sort of redundancy indicator. This is especially important when you
allow for parameter restrictions since you want to know which parameter is
not being used.
8. Some meta structure to allow for stepwise regression.
Greg
