commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Nguyen <bennguye...@gmail.com>
Subject RE: [statistics] Proposed OLS grammar
Date Fri, 19 Jul 2019 15:47:23 GMT
Hello Dr. Paul King,

I am working on the new regression module for Commons Statistics as a student in GSoC. I had
a brief look at your Groovy Data Science (which I will have to look at more deeply in the
future because it’s an interesting and high-quality tutorial/showcase), and noticed that
in your slides you mentioned the 7 main types of regression. One of the central purposes of
this new Commons Statistics Regression component is to design an architecture which can support
these different types by allowing a good base for other developers to append more regression
types beyond just OLS and GLS in math3.

Currently I’m trying to design for this purpose, using OLS as a starting base and EJML for
matrix operations (instead of math3.linear). The plan is to have OLS, GLS and Logistic done
by around end of August, and adding other regression types in the future, hopefully with other
developers. 
The updating regressions like SimpleRegression you’ve used will likely stay as is for now
unless you have suggestions for them?

I also wanted to take this opportunity to as you as a user:
1. What would make your life easier?
2. What features should definitely be kept?
a. Do you value the current data input interface (with just newSampleData() directly from
OLS class)?
b. Or would you consider some of the others mentioned which is needed if using the same loaded
data in different types of regression is important?
3. What features should be improved?
a. Would you consider the current running time sufficient or is it restrictive for you in
any way? (hopefully EJML helped bit in that regard – perhaps benchmarks will be made after
OLS is done)
4. Any suggestions/requests for specific features?
a. Perhaps a summary printout under a RegressionResults interface?

Thank you for your time, I appreciate any input you can give me.

Cheers,
-Ben Nguyen

From: Paul King
Sent: Friday, July 19, 2019 6:26 AM
To: Commons Developers List
Subject: Re: [statistics] Proposed OLS grammar

There are about 10 files using classes from the math3.stat package in
the examples I mentioned. I have stayed away from math4 while it's
still snapshot.

Repo: https://github.com/paulk-asert/groovy-data-science

Slides: https://speakerdeck.com/paulk/groovy-data-science

Most of the examples are in the subprojects/HousePrices project with a
few others just using StatUtil.

It's not my full-time day job to be using those classes but I'd be
keen to have those examples working nicely.

Cheers, Paul.

On Fri, Jul 19, 2019 at 9:11 PM Gilles Sadowski <gilleseran@gmail.com> wrote:
>
> Hi.
>
> Your experience as a user of "Commons Math" would be most useful
> to help us craft a better (or, at least, no worse) design for "Commons
> Statistics".
> Would you share pointers to actual use-cases?
>
> Thanks,
> Gilles
>
> 2019-07-19 7:03 UTC+02:00, Paul King <paul.king.asert@gmail.com>:
> > Cool. I'd be keen to try out the API, when you are ready, in my
> > "Apache Groovy for data science" examples which currently use the
> > commons math3 classes.
> >
> > Cheers, Paul.
> >
> > On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gilleseran@gmail.com>
> > wrote:
> >>
> >> Hi.
> >>
> >> Le ven. 19 juil. 2019 à 01:45, Paul King <paul.king.asert@gmail.com>
a
> >> écrit :
> >> >
> >> > How does this relate to the OLS classes in commons math?
> >> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
> >>
> >> The new "Commons Statistics" component purports to replace the
> >> functionality
> >> currently defined in the package "org.apache.commons.math4.stat" of
> >> "Commons
> >> Math.
> >>
> >> Regards,
> >> Gilles
> >>
> >> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <ericbarnhill@gmail.com>
> >> > wrote:
> >> > >
> >> > > I suggested the following grammar to aim for in our meeting today
with
> >> > > the
> >> > > developing OLS module. If you see anything you'd prefer to change
> >> > > let's
> >> > > establish it now , if anyone doesn't like it later, it's on me.
> >> > >
> >> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[]
> >> > > x);
> >> > > Regression ols = new OLSRegression();
> >> > > RegressionResults results = ols.regress(data);
> >> > > betas = results.getBetas() ;
> >> > >
> >> > > where:
> >> > > RegressionData is an interface
> >> > > RegressionDataLoader is a factory class and of() a (possibly
> >> > > overloaded)
> >> > > static method
> >> > > Regression is an interface, implemented by OLSRegression
> >> > > RegressionResults is an interface, the specific class returned is
> >> > > OLSResults which implements it.
> >> > > betas are the intercept and slopes of the regression model
> >> > >
> >> > > I think this preserves abstraction at the levels desired, since we
> >> > > will
> >> > > want in future flexibility as to regression type, posslble state
> >> > > parameters
> >> > > set on the regression object, and results contents and format. But
> >> > > also
> >> > > doesn't take on any unnecessary abstractions.
> >> > >
> >> > > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message