commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Barnhill <>
Subject Re: Google Summer of Code 2019 Mentor Registration
Date Tue, 12 Mar 2019 16:13:51 GMT
What I have now found, doing a bit of background research for this, is that
there is a well-developed pure Java machine learning library called WEKA ( . It seems to have good
institutional support and be well maintained. LIke I had in mind, the
syntax is pretty intuitive and similar in style to Scikit-Learn. There is a
nice tutorial using it that can be found at
which illustrates this. I don't know what I would want to do differently,
that Weka hasn't already done, other than its targeting of Java 8. So I
think it would probably be re-inventing the wheel to try to get something
similar started here.

I will re-focus my mind on trying to get some momentum for the stats
functions, which is what I had in mind last summer. I do think if healthy
momentum can build for stats functions, there is a natural fit for a fair
amount of machine learning to be incorporated including our own mothballed
clustering and neural net libraries.


On Mon, Mar 11, 2019 at 5:28 PM Bruno P. Kinoshita <> wrote:

>  Sounds like an interesting idea Eric. I wonder if we would get some
> dogfooding through projects like Apache OpenNLP (one that I know uses ML in
> Java).
> CheersBruno
>     On Tuesday, 12 March 2019, 1:24:24 pm NZDT, Eric Barnhill <
>> wrote:
>  On Sat, Mar 9, 2019 at 4:56 PM Gilles Sadowski <>
> wrote:
> > Hi Eric.
> >
> > Le ven. 8 mars 2019 à 22:22, Eric Barnhill <> a
> > écrit :
> > >
> > > I am definitely willing to mentor development of the stats libraries
> as I
> > > was last year. Now that I work more in data science I am happy to also
> > > mentor the ML library
> >
> > What are you referring to?
> >
> Commons-math had a machine learning library. Now that I look it over it is
> really a bit emaciated. Still, I think there is an opportunity here to get
> some components up to date that could be pretty widely used, rethinking the
> structure and grammar of the library to echo Python's highly successful
> scikit-learn and Keras libraries.
> There are a lot of young people who are interested in getting into data
> science, we might get a good candidate or two looking to distinguish
> themselves. Also Java is such an important language in data science and
> engineering, even if a lot of the ML model building to date is in R and
> Python, so it is a great language for someone entering ML to know.
> > You have to register as a mentor. :-)
> >
> Sent.
> >
> > Then, read and follow the guidelines:
> >
> >
> > What should be done ASAP is tag existing, or new issues,
> > with the appropriate label so that tasks will appear here:
> >
> Will do tomorrow, hopefully is not too late.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message